I have a long string, which I have to manipulate in a specific way. The string can include other substrings which causes problems with my code. For that reason, before doing anything to the string, I replace all the substrings (anything introduced by " and ended with a non escaped ") with placeholders in the format: $0, $1, $2, ..., $n. I know for sure that the main string itself doesn't contain the character $ but one of the substrings (or more) could be for example "$0".
Now the problem: after manipulation/formatting the main string, I need to replace all the placeholders with their actual values again.
Conveniently I have them saved in this format:
// TypeScript
let substrings: { placeholderName: string; value: string }[];
But doing:
// JavaScript
let mainString1 = "main string $0 $1";
let mainString2 = "main string $0 $1";
let substrings = [
{ placeholderName: "$0", value: "test1 $1" },
{ placeholderName: "$1", value: "test2" }
];
for (const substr of substrings) {
mainString1 = mainString1.replace(substr.placeholderName, substr.value);
mainString2 = mainString2.replaceAll(substr.placeholderName, substr.value);
}
console.log(mainString1); // expected result: "main string test1 test2 $1"
console.log(mainString2); // expected result: "main string test1 test2 test2"
// wanted result: "main string test1 $1 test2"
is not an option since the substrings could include $x which would replace the wrong thing (by .replace() and by .replaceAll()).
Getting the substrings is archived with an regex, maybe a regex could help here too? Though I have no control about what is saved inside the substrings...
If you're sure that all placeholders will follow the $x format, I'd go with the .replace() method with a callback:
const result = mainString1.replace(
/\$\d+/g,
placeholder => substrings.find(
substring => substring.placeholderName === placeholder
)?.value ?? placeholder
);
// result is "main string test1 $1 test2"
This may not be the most efficient code. But here is the function I made with comments.
Note: be careful because if you put the same placeholder inside itself it will create an infinite loop. Ex:
{ placeholderName: "$1", value: "test2 $1" }
let mainString1 = "main string $0 $1";
let mainString2 = "main string $0 $1";
let substrings = [{
placeholderName: "$0",
value: "test1 $1"
},
{
placeholderName: "$1",
value: "test2"
},
];
function replacePlaceHolders(mainString, substrings) {
let replacedString = mainString
//We will find every placeHolder, the followin line wil return and array with all of them. Ex: ['$1', $n']
let placeholders = replacedString.match(/\$[0-9]*/gm)
//while there is some place holder to replace
while (placeholders !== null && placeholders.length > 0) {
//We will iterate for each placeholder
placeholders.forEach(placeholder => {
//extrac the value to replace
let value = substrings.filter(x => x.placeholderName === placeholder)[0].value
//replace it
replacedString = replacedString.replace(placeholder, value)
})
//and finally see if there is any new placeHolder inserted in the replace. If there is something the loop will start again.
placeholders = replacedString.match(/\$[0-9]*/gm)
}
return replacedString
}
console.log(replacePlaceHolders(mainString1, substrings))
console.log(replacePlaceHolders(mainString2, substrings))
EDIT:
Ok... I think I understood your problem now... You did't want the placeHoldersLike strings inside your values to be replaced.
This version of code should work as expected and you won't have to worry aboy infine loops here. However, be carefull with your placeHolders, the "$" is a reserved caracter in regex and they are more that you should scape. I asume all your placeHolders will be like "$1", "$2", etc. If they are not, you should edit the regexPlaceholder function that wraps and scapes that caracter.
let mainString1 = "main string $0 $1";
let mainString2 = "main string $0 $1 $2";
let substrings = [
{ placeholderName: "$0", value: "$1 test1 $2 $1" },
{ placeholderName: "$1", value: "test2 $2" },
{ placeholderName: "$2", value: "test3" },
];
function replacePlaceHolders(mainString, substrings) {
//You will need to escape the $ characters or maybe even others depending of how you made your placeholders
function regexPlaceholder(p) {
return new RegExp('\\' + p, "gm")
}
let replacedString = mainString
//We will find every placeHolder, the followin line wil return and array with all of them. Ex: ['$1', $n']
let placeholders = replacedString.match(/\$[0-9]*/gm)
//if there is any placeHolder to replace
if (placeholders !== null && placeholders.length > 0) {
//we will declare some variable to check if the values had something inside that can be
//mistaken for a placeHolder.
//We will store how many of them have we changed and replace them back at the end
let replacedplaceholdersInValues = []
let indexofReplacedValue = 0
placeholders.forEach(placeholder => {
//extrac the value to replace
let value = substrings.filter(x => x.placeholderName === placeholder)[0].value
//find if the value had a posible placeholder inside
let placeholdersInValues = value.match(/\$[0-9]*/gm)
if (placeholdersInValues !== null && placeholdersInValues.length > 0) {
placeholdersInValues.forEach(placeholdersInValue => {
//if there are, we will replace them with another mark, so our primary function wont change them
value = value.replace(regexPlaceholder(placeholdersInValue), "<markToReplace" + indexofReplacedValue + ">")
//and store every change to make a rollback later
replacedplaceholdersInValues.push({
placeholderName: placeholdersInValue,
value: "<markToReplace" + indexofReplacedValue + ">"
})
})
indexofReplacedValue++
}
//replace the actual placeholders
replacedString = replacedString.replace(regexPlaceholder(placeholder), value)
})
//if there was some placeholderlike inside the values, we change them back to normal
if (replacedplaceholdersInValues.length > 0) {
replacedplaceholdersInValues.forEach(replaced => {
replacedString = replacedString.replace(replaced.value, replaced.placeholderName)
})
}
}
return replacedString
}
console.log(replacePlaceHolders(mainString1, substrings))
console.log(replacePlaceHolders(mainString2, substrings))
The key is to choose a placeholder that is impossible in both the main string and the substring. My trick is to use non-printable characters as the placeholder. And my favorite is the NUL character (0x00) because most other people would not use it because C/C++ consider it to be end of string. Javascript however is robust enough to handle strings that contain NUL (encoded as unicode \0000):
let mainString1 = "main string \0-0 \0-1";
let mainString2 = "main string \0-0 \0-1";
let substrings = [
{ placeholderName: "\0-0", value: "test1 $1" },
{ placeholderName: "\0-1", value: "test2" }
];
The rest of your code does not need to change.
Note that I'm using the - character to prevent javascript from interpreting your numbers 0 and 1 as part of the octal \0.
If you have an aversion to \0 like most programmers then you can use any other non-printing characters like \1 (start of heading), 007 (the character that makes your terminal make a bell sound - also, James Bond) etc.
Related
I am trying to split a string so that I can separate it depending on a pattern. I'm having trouble getting the correct regex pattern to do so. I also need to insert the results into an array of objects. Perhaps by using a regex pattern, the string can be split into a resulting array object to achieve the objective. Note that the regex pattern must not discriminate between - or --. Or is there any better way to do this?
I tried using string split() method, but to no avail. I am trying to achieve the result below:
const example1 = `--filename test_layer_123.png`;
const example2 = `--code 1 --level critical -info "This is some info"`;
const result1 = [{ name: "--filename", value: "test_layer_123.png" }];
const result2 = [
{ name: "--code", value: "1" },
{ name: "--level", value: "critical" },
{ name: "-info", value: "This is some info" },
];
If you really want to use Regex to solve this.
Try this Pattern /((?:--|-)\w+)\s+"?([^-"]+)"?/g
Code example:
function matchAllCommands(text, pattern){
let new_array = [];
let matches = text.matchAll(pattern);
for (const match of matches){
new_array.push({name: match.groups.name, value: match.groups.value});
}
return new_array;
}
let RegexPattern = /(?<name>(?:--|-)\w+)\s+"?(?<value>[^-"]+)"?/g;
let text = '--code 1 --level critical -info "This is some info"';
console.log(matchAllCommands(text, RegexPattern));
Here is a solution that splits the argument string using a positive lookahead, and creates the array of key & value pairs using a map:
function getArgs(str) {
return str.split(/(?= --?\w+ )/).map(str => {
let m = str.match(/^ ?([^ ]+) (.*)$/);
return {
name: m[1],
value: m[2].replace(/^"(.*)"$/, '$1')
};
});
}
[
'--filename test_layer_123.png', // example1
'--code 1 --level critical -info "This is some info"' // example2
].forEach(str => {
var result = getArgs(str);
console.log(JSON.stringify(result, null, ' '));
});
Positive lookahead regex for split:
(?= -- positive lookahead start
--?\w+ -- expect space, 1 or 2 dashes, 1+ word chars, a space
) -- positive lookahead end
Match regex in map:
^ -- anchor at start of string
? -- optional space
([^ ]+) -- capture group 1: capture everything to next space
-- space
(.*) -- capture group 2: capture everything that's left
$ -- anchor at end of string
I am trying implement a replacing mechanism for a string like prepared statements that are evaluated dynamicaly in javascript. I have replacements like
[{username:"Max",age:10}]
Eg assume we have the string as input (username) is (age) so a find replace is easy by the attribute and its value.
However I want something more advanced where parentheses are 'identified' and evaluted from the inner to outer eg for input:
[{username:"Max",age:10,myDynamicAttribute:"1",label1:'awesome', label2:'ugly'}]
and string
(username) is (age) and (label(myDynamicAttribute)). In the first iteration of replacements the string should become
(username) is (age) and (label1)
and in second Peter is 10 and awesome. Is there any tool or pattern that I can use to 'understand' the inner parentheses first and the evaluate the other?. I tried regexes but I wasn't able to create a regex that matches the inner parentheses first and then the outer.
You could tokenise the string and use a recursive replacer that traverses the tokens in one pass. If text within parentheses does not match with an object property, they are left as they are. When parentheses occur in the string that is retrieved from the object, they are taken as literals, and no attempt is made to perform a lookup on those again.
function interpolate(encoded, lookup) {
const tokens = encoded.matchAll(/[^()]+|./g);
function* dfs(end="") {
while (true) {
const {value, done} = tokens.next();
if (value == end || done) return;
if (value != "(") yield value;
else {
const key = [...dfs(")")].join("");
yield lookup[key] ?? `(${key})`;
}
}
}
return [...dfs()].join("");
}
// Example run
const lookup = {
username: "Max",
age: 10,
myDynamicAttribute: "1",
label1019: 'awesome(really)', // These parentheses are treated as literals
really: "not", // ...so that this will not be retrieved
label2: 'ugly',
};
const str = "1) (username) is (age) (uh!) and (label(myDynamicAttribute)0(myDynamicAttribute)9)"
const res = interpolate(str, lookup);
console.log(res);
We can write a regular expression that finds a parenthesized expression which contains no internal parentheses, use the expression's internals as a key for our data object, replace the whole expression with that value, and then recur. We would stop when the string contains no such parenthesized expressions, and return the string intact.
Here's one way:
const interpolate = (data, str, parens = str .match (/\(([^\(\)]+)\)/)) =>
parens ? interpolate (data, str. replace (parens [0], data [parens [1]])) : str
const data = {username: 'Max', age: 10, myDynamicAttribute: '1', label1: 'awesome', label2: 'ugly'}
const str = `(username) is (age) and (label(myDynamicAttribute))`
console .log (interpolate (data, str))
This would lead to a sequence of recursive calls with these strings:
"(username) is (age) and (label(myDynamicAttribute))",
"Max is (age) and (label(myDynamicAttribute))",
"Max is 10 and (label(myDynamicAttribute))",
"Max is 10 and (label1)",
"Max is 10 and awesome"
I am wanting / needing to split a string by a specific character, for instance a '/' that I can reliably expect, but I need to know what the characters directly in front of that character are up to the space before those characters.
For example:
let myStr = "bob u/ used cars nr/ no resale value i/ information is attached to the vehicle tag bb/ Joe's wrecker service"
So, I can split by the '/' already using
mySplitStr = myStr.split('/');
But now mySplitStr is an array like
mySplitStr[1] = "bob u"
mySplitStr[2] = " used cars nr"
mySplitStr[3] = " no resale value i"
etc
I need, however, to know what the characters are just prior to the '/'.
u
nr
i
etc
so that I know what to do with the information following the '/'.
Any help is greatly appreciated.
You could use this regular expression argument for the split:
let parts = myStr.split(/\s*(\S+)\/\s*/);
Now you will have the special characters at every odd position in the resulting array.
let myStr = "bob u/ used cars nr/ no resale value i/ information is attached to the vehicle tag bb/ Joe's wrecker service";
let parts = myStr.split(/\s*(\S+)\/\s*/);
console.log(parts);
.as-console-wrapper { max-height: 100% !important; top: 0; }
For a more structured result, you could use these special character combinations as keys of an object:
let myStr = "bob u/ used cars nr/ no resale value i/ information is attached to the vehicle tag bb/ Joe's wrecker service";
let obj = myStr.split(/\s*(\S+)\/\s*/).reduceRight( (acc, v) => {
if (acc.default === undefined) {
acc.default = v;
} else {
acc[v] = acc.default;
acc.default = undefined;
}
return acc;
}, {});
console.log(obj);
I think, this is what you're looking for:
"bob u/ used cars nr/ no resale value i/ information is attached to the vehicle tag bb/ Joe's wrecker service"
.split('/')
.map(splitPart => {
const wordsInPart = splitPart.split(' ');
return wordsInPart[wordsInPart.length - 1];
});
// Produces: ["u", "nr", "i", "bb", "service"]
Splitting by '/' is not enough. You also need to visit every part of your split result and extract the last "work" from it.
After you split your string, you indeed get an array, where the last set of characters is the one you want to know, and you can grab it with this:
let mySplitStr = myStr.split('/');
for(let i = 0; i < mySplitStr.length; i++) {
let mySplitStrEl = mySplitStr[i].split(" "); // Split current text element
let lastCharsSet = mySplitStrEl[mySplitStrEl.length -1]; // Grab its last set of characters
let myCurrentStr = mySplitStrEl.splice(mySplitStrEl.length -1, 1); // Remove last set of characters from text element
myCurrentStr = mySplitStrEl.join(" "); // Join text element back into a string
switch(lastCharsSet) {
case "u":
// Your code here
case "nr":
// Your code here
case "i":
// Your code here
}
}
Inside the loop, for the first iteration:
// lastCharsSet is "u"
// myCurrentStr is "bob"
I have the following string
133. Alarm (Peep peep)
My goal is to split the string using regex into 3 parts and store it as a json object, like
{
"id": "133",
"Title": "Alarm",
"Subtitle": "Peep peep"
}
I can get the number using
function getID(text){
let numberPattern = /\d+/g;
let id = title.match(numberPattern);
if(id){
return id[0];
}
}
and the text between braces using
function getSubtitle(text){
let braces = /\((.*)\)/i;
let subtitle = title.match(braces);
if(subtitle){
return subtitle[1];
}
}
I'm wondering if I can get the three values from the string using a single regex expression (assuming that I will apply it on a long list of that string shape)
You can do this:
const data = '133. Alarm (Peep peep)'
const getInfo = data => {
let [,id, title, subtitle] = data.match(/(\d+)\.\s*(.*?)\s*\((.*?)\)/)
return { id, title, subtitle }
}
console.log(getInfo(data))
Something like
let partsPattern = /(\d+)\.\s*(.*[^[:space:]])\s*\((.*)\)/
Not sure if JS can into POSIX charsets, you might want to use \s instead of [:space:] (or even the space itself if you know that there aren't any other whitespaces expected).
This should capture all the three parts inside the respective submatches (numbers 1, 2 and 3).
You could use one function. exec() will return null if no matches are found, else it will return the matched string, followed by the matched groups. With id && id[1] a check is performed to not access the second element of id for when a match is not found and id === null.
The second element is used id[1] instead of id[0] because the first element will be the matched string, which will contain the dots and whitespace that helped find the match.
var str = "133. Alarm (Peep peep)";
function getData(str) {
var id = (/(\d+)\./).exec(str);
var title = (/\s+(.+)\s+\(/).exec(str);
var subtitle = (/\((.+)\)/).exec(str);
return {
"id": id && id[1],
"Title": title && title[1],
"Subtitle": subtitle && subtitle[1]
};
}
console.log(getData(str));
The question is simple. I have a string that contains multiple elements which are embedded in single-quotation marks:
var str = "'alice' 'anna marie' 'benjamin' 'christin' 'david' 'muhammad ali'"
And I want to parse it so that I have all those names in an array:
result = [
'alice',
'anna marie',
'benjamin',
'christin',
'david',
'muhammad ali'
]
Currently I'm using this code to do the job:
var result = str.match(/\s*'(.*?)'\s*'(.*?)'\s*'(.*?)'\s*'(.*?)'/);
But this regular expression is too long and it's not flexible, so if I have more elements in the str string, I have to edit the regular expression.
What is the fastest and most efficient way to do this parsing? Performance and felxibility is important in our web application.
I have looked at the following question but they are not my answer:
Regular Expression For Quoted String
Regular Expression - How To Find Words and Quoted Phrases
Define the pattern once and use the global g flag.
var matches = str.match(/'[^']*'/g);
If you want the tokens without the single quotes around them, the normal approach would be to use sub-matches in REGEX - however JavaScript doesn't support the capturing of sub-groups when the g flag is used. The simplest (though not necessarily most efficient) way around this would be to remove them afterwards, iteratively:
if (matches)
for (var i=0, len=matches.length; i<len; i++)
matches[i] = matches[i].replace(/'/g, '');
[EDIT] - as the other answers say, you could use split() instead, but only if you can rely on there always being a space (or some common delimiter) between each token in your string.
A different approach
I came here needing an approach that could parse a string for quotes and non quotes, preserve the order of quotes and non quotes, then output it with specific tags wrapped around them for React or React Native so I ended up not using the answers here because I wasn't sure how to get them to fit my need then did this instead.
function parseQuotes(str) {
var openQuote = false;
var parsed = [];
var quote = '';
var text = '';
var openQuote = false;
for (var i = 0; i < str.length; i++) {
var item = str[i];
if (item === '"' && !openQuote) {
openQuote = true;
parsed.push({ type: 'text', value: text });
text = '';
}
else if (item === '"' && openQuote) {
openQuote = false;
parsed.push({ type: 'quote', value: quote });
quote = '';
}
else if (openQuote) quote += item;
else text += item;
}
if (openQuote) parsed.push({ type: 'text', value: '"' + quote });
else parsed.push({ type: 'text', value: text });
return parsed;
}
That when given this:
'Testing this "shhhh" if it "works!" " hahahah!'
produces that:
[
{
"type": "text",
"value": "Testing this "
},
{
"type": "quote",
"value": "shhhh"
},
{
"type": "text",
"value": " if it "
},
{
"type": "quote",
"value": "works!"
},
{
"type": "text",
"value": " "
},
{
"type": "text",
"value": "\" hahahah!"
}
]
which allows you to easily wrap tags around it depending on what it is.
https://jsfiddle.net/o6seau4e/4/
When a regex object has the the global flag set, you can execute it multiple times against a string to find all matches. It works by starting the next search after the last character matched in the last run:
var buf = "'abc' 'def' 'ghi'";
var exp = /'(.*?)'/g;
for(var match=exp.exec(buf); match!=null; match=exp.exec(buf)) {
alert(match[0]);
}
Personally, I find it a really good way to parse strings.
EDIT: the expression /'(.*?)'/g matches any content between single-quote ('), the modifier *? is non-greedy and it greatly simplifies the pattern.
One way;
var str = "'alice' 'benjamin' 'christin' 'david'";
var result = {};
str.replace(/'([^']*)'/g, function(m, p1) {
result[p1] = "";
});
for (var k in result) {
alert(k);
}
If someone gets here and requires more complex string parsing, with both single or double quotes and ability for escaping the quote this is the regex. Tested in JS and Ruby.
r = /(['"])((?:\\\1|(?!\1).)*)(\1)/g
str = "'alice' ddd vvv-12 'an\"na m\\'arie' \"hello ' world\" \"hello \\\" world\" 'david' 'muhammad ali'"
console.log(str.match(r).join("\n"))
'alice'
'an"na m\'arie'
"hello ' world"
"hello \" world"
'david'
'muhammad ali'
See that non-quoted strings were not found. If the goal is to also find non-quote words then a small fix will do:
r = /(['"])((?:\\\1|(?!\1).)*)(\1)|([^'" ]+)/g
console.log(str.match(r).join("\n"))
'alice'
ddd
vvv-12
'an"na m\'arie'
"hello ' world"
"hello \" world"
'david'
'muhammad ali'