How to replace all substrings enclosed by {} in a string? - javascript

I have to address an algo problem with vanilla JS.
I have a string:
let stringExample = "my {cat} has {2} ears";
And I want to replace it (in order) with an array of replacements
const replaceArray = ["dog", "4"];
So that after the replacement, the stringExample should be:
"my dog has 4 ears"

One approach that can be used is as follows:
// a sample of Strings and their corresponding replacement-arrays:
let string1 = "my {cat} has {2} ears",
arr1 = ["dog", "4"],
string2 = "They call me ~Ishmael~",
arr2 = ['Susan'],
string3 = "It is a ##truth## ##universally acknowledged##...",
arr3 = ['questionable assertion', 'oft cited'];
// here we use a named Arrow function which takes three arguments:
// haystack: String, the string which features the words/characters to
// be replaced,
// needles: Array of Strings with which to replace the identified
// groups in the 'haystack',
// Array of Strings, these strings represent the characters with
// which the words/characters to be replaced may be identified;
// the first String (index 0) is the character marking the begining
// of the capture group, and the second (index 1) indicates the
// end of the captured group; this is supplied with the default
// curly-braces:
const replaceWords = (haystack, needles, delimiterPair = ['{', '}']) => {
// we use destructuring assignment, to assign the string at
// index 0 to the 'start' variable, the string at index 1 to
// the 'end' variable; if no argument is provided the default
// curly-braces are used/assigned:
const [start, end] = delimiterPair,
// here we construct a regular expression, using a template
// string which interpolates the variables within the string;
// the regular expression is composed of:
// 'start' (eg: '{')
// .+ : any character that appears one or more times,
// ? : lazy quantifier so the expression matches the
// shortest possible string,
// 'end' (eg: '}'),
// matched with the 'g' (global) flag to replace all
// matches within the supplied string.
// this gives a regular expression of: /{.+?}/g
regexp = new RegExp(`${start}.+?${end}`, 'g')
// here we compose a String using the template literal, to
// interpolate the 'haystack' variable and also a tab-character
// concatenated with the result returned by String.prototype.replace()
// we use an anonymous function to supply the replacement-string:
//
return `"${haystack}":\t` + haystack.replace(regexp, () => {
// here we remove the first element from the array of replacements
// provided to the function, and return that to the string as the
// replacement:
return needles.shift();
})
}
console.log(replaceWords(string1, arr1));
console.log(replaceWords(string2, arr2, ['~', '~']));
console.log(replaceWords(string3, arr3, ['##', '##']));
JS Fiddle demo.
This has no sanity checks at all, nor have I explored to find any edge-cases. If at all possible I would seriously recommend using an open source – and well-tested, well-proofed – framework or templating library.
References:
Arrow function syntax.
Regular Expressions.
String.prototype.replace().
Bibliography:
Regular Expression Syntax Cheat Sheet.

Related

How can i check for only one occurence only of each provided symbol?

I have a provided array of symbols, which can be different. For instance, like this - ['#']. One occurrence of each symbol is a mandatory. But in a string there can be only one of each provided sign.
Now I do like this:
const regex = new RegExp(`^\\w+[${validatedSymbols.join()}]\\w+$`);
But it also returns an error on signs like '=' and so on. For example:
/^\w+[#]\w+$/.test('string#=string') // false
So, the result I expect:
'string#string' - ok
'string##string - not ok
Using a complex regex is most likely not the best solution. I think you would be better of creating a validation function.
In this function you can find all occurrence of the provided symbols in string. Then return false if no occurrences are found, or if the list of occurrences contains duplicate entries.
// https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions#escaping
const escapeRegExp = (string) => string.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
function validate(string, symbols) {
if (symbols.length == 0) {
throw new Error("at least one symbol must be provided in the symbols array");
}
const symbolRegex = new RegExp(symbols.map(escapeRegExp).join("|"), "g");
const symbolsInString = string.match(symbolRegex); // <- null if no match
// string must at least contain 1 occurrence of any symbol
if (!symbolsInString) return false;
// symbols may only occur once
const hasDuplicateSymbols = symbolsInString.length != new Set(symbolsInString).size;
return !hasDuplicateSymbols;
}
const validatedSymbols = ["#", "="];
const strings = [
"string!*string", // invalid (doesn't have "#" nor "=")
"string#!string", // valid
"string#=string", // valid
"string##string", // invalid (max 1 occurance per symbol)
];
console.log("validatedSymbols", "=", JSON.stringify(validatedSymbols));
for (const string of strings) {
const isValid = validate(string, validatedSymbols);
console.log(JSON.stringify(string), "//=>", isValid);
}
I think you are looking for the following:
const regex = new RegExp(`^\\w+[${validatedSymbols.join()}]?\\w+$`);
The question mark means 1 or 0 of the previous group.
You might also need to escape the symbols in validatedSymbols as some symbols have a different meaning in regex
Edit:
For mandatory symbols it would be easier to add a group per symbol:
^\w+(#\w*){1}(#\w*){1}\w+$
Where the group is:
(#\w*){1}

How do you access the groups of match/matchAll like an array?

Here's what I would like to be able to do:
function convertVersionToNumber(line) {
const groups = line.matchAll(/^# ([0-9]).([0-9][0-9]).([0-9][0-9])\s*/g);
return parseInt(groups[1] + groups[2] + groups[3]);
}
convertVersionToNumber("# 1.03.00")
This doesn't work because groups is an IterableIterator<RegExpMatchArray>, not an array. Array.from doesn't seem to turn it into an array of groups either. Is there an easy way (ideally something that can fit on a single line) that can convert groups into an array?
The API of that IterableIterator<RegExpMatchArray> is a little inconvenient, and I don't know how to skip the first element in a for...of. I mean, I do know how to use both of these, it just seems like it's going to add 4+ lines so I'd like to know if there is a more concise way.
I am using typescript, so if it has any syntactic sugar to do this, I'd be happy to use that.
1) matchAll will return an Iterator object Iterator [RegExp String Iterator]
result will contain an Iterator and when you use the spread operator It will give you all matches. Since it contains only one match so It contains a single element only.
[ '# 1.03.00', '1', '03', '00', index: 0, input: '# 1.03.00', groups: undefined ]
Finally, we used a spread operator to get all value and wrap it in an array
[...result]
function convertVersionToNumber(line) {
const result = line.matchAll(/^# ([0-9]).([0-9][0-9]).([0-9][0-9])\s*/g);
const groups = [...result][0];
return parseInt(groups[1] + groups[2] + groups[3]);
}
console.log(convertVersionToNumber("# 1.03.00"));
Since you are using regex i.e /^# ([0-9]).([0-9][0-9]).([0-9][0-9])\s*/
2) If there are multiple matches then yon can spread results in an array and then use for..of to loop over matches
function convertVersionToNumber(line) {
const iterator = line.matchAll(/# ([0-9]).([0-9][0-9]).([0-9][0-9])\s*/g);
const results = [...iterator];
for (let arr of results) {
const [match, g1, g2, g3] = arr;
console.log(match, g1, g2, g3);
}
}
convertVersionToNumber("# 1.03.00 # 1.03.00");
Alternate solution: You can also get the same result using simple match also
function convertVersionToNumber(line) {
const result = line.match(/\d/g);
return +result.join("");
}
console.log(convertVersionToNumber("# 1.03.00"));
You do not need .matchAll in this concrete case. You simply want to match a string in a specific format and re-format it by only keeping the three captured substrings.
You may do it with .replace:
function convertVersionToNumber(line) {
return parseInt(line.replace(/^# (\d)\.(\d{2})\.(\d{2})[\s\S]*/, '$1$2$3'));
}
console.log( convertVersionToNumber("# 1.03.00") );
You may check if the string before replacing is equal to the new string if you need to check if there was a match at all.
Note you need to escape dots to match them as literal chars.
The ^# (\d)\.(\d{2})\.(\d{2})[\s\S]* pattern matches
^ - start of string
# - space + #
(\d) - Group 1: a digit
\. - a dot
(\d{2}) - Group 2: two digits
\. - a dot
(\d{2}) - Group 3: two digits
[\s\S]* - the rest of the string (zero or more chars, as many as possible).
The $1$2$3 replacement pattern is the concatenated Group 1, 2 and 3 values.

Wrap a regular expression so that it only matches the whole string

Related: Regex - Match whole string
I want a given regexp to match the whole string. For example, when given a regexp /abc/, it should only match string "abc" but not "abcd". I have searched the above question, which has somehow a similar situation than mine. The difference here is that the regexp is not directly written in my code. So I can't change /abc/ to /^abc$/ directly in the source code.
Let's say, I want a function which takes two arguments: a 'regexp' (e.g. /abc/) and a string (e.g. 'abc'). The function returns matched result if and only if the given regexp matches the whole string, but returns null otherwise.
What I'm trying:
function match(regexp, string) {
var parse = /\/(.*)\/(.*)/.exec(regexp);
var reg = new RegExp('^' + parse[1] + '$', parse[2]);
return string.match(reg);
}
Is this code correct? Any better way to do so?
You transform regex such as /a+|b+/ into ^a+|b+$ which still matches e.g. 'xbb'. A solution would be to wrap the inner regex with an anonymous group: ^(?:a+|b+)$.
Also, you currently truncate useful regex flags such as /.../m or /.../i.
Alternatively, you could simply use the original regex and check if the result covers the whole input string length:
function fullmatch(regex, string) {
const result = string.match(regex);
return !!result && result[0].length === string.length;
}
// Example:
console.log(fullmatch(/a+|b+/, 'xbb')); // false
console.log(fullmatch(/a+|b+/, 'bbb')); // true
// Respects lazy quantifier:
console.log(fullmatch(/b+?/, 'bbb')); // false
console.log(fullmatch(/b+/, 'bbb')); // true
// Invariant to global flag g:
console.log(fullmatch(/b+?/g, 'bbb')); // false
console.log(fullmatch(/b+/g, 'bbb')); // true

Regex matching comma delimited strings

Given any of the following strings, where operator and value are just placeholders:
"operator1(value)"
"operator1(value), operator2(value)"
"operator1(value), operator2(value), operator_n(value)"
I need to be able to match so i can get each operator and it's value as follows:
[[operator1, value]]
[[operator1, value], [operator2, value]]
[[operator1, value], [operator2, value], [operator_n, value]]
Please Note: There could be n number of operators (comma delimited) in the given string.
My current attempt will match on operator1(value) but nothing with multiple operators. See regex101 for the results.
/^(.*?)\((.*)\)$/
You should be able to do this with a single regex using the global flag.
var re= /(?:,\s*)?([^(]+?)\(([^)]+)\)/g;
var results = re.exec(str);
See the result at Regex 101: https://regex101.com/r/eC3uK3/2
Here's a pure regex answer to this question, this will work so long as your variables are always separated by a , and a space, should traverse through lines without much issue
https://regex101.com/r/eC3uK3/4
([^\(]*)(\([^, ]*\))(?:, )?(?:\n)?
Matches on:
operator1(value), operator2(value), operator_n(value),
operator1(value), operator2(value)
Explanation:
So, this sets up 2 capture groups and 2 non-capture groups.
The first capture group will match a value name until a parenthesis (by using a negated set and greedy). The second capture group will grab the parenthesis and the value name until the end of the parenthesis are found (note you can get rid of the parenthesis by escaping the outer set of parenthesis rather than the inner (Example here: https://regex101.com/r/eC3uK3/6). There's an optional ", " in a non capturing group, and an optional "\n" in another non-capturing group to handle any newline characters that you may happen across.
This should break your data out into:
'Operator1'
'(value)'
'operator2'
'(value)'
For as many as there are.
You can do this by first splitting then using a regular expression:
[
"operator1(value)",
"operator1(value), operator2(value)",
"operator1(value), operator2(value), operator_n(value)"
].forEach((str)=>{
var results = str
.split(/[,\s]+/) // split operations
.map(s=>s.match(/(\w+)\((\w+)\)/)) // extracts parts of the operations
.filter(Boolean) // ensure there's no error (in case of impure entries)
.map(s=>s.slice(1)); // make the desired result
console.log(results);
});
The following function "check" will achieve what you are looking for, if you want a string instead of an array of result, simply use the .toString() method on the array returned from the function.
function check(str) {
var myRe = /([^(,\s]*)\(([^)]*)\)/g;
var myArray;
var result = [];
while ((myArray = myRe.exec(str)) !== null) {
result.push(`[${myArray[1]}, ${myArray[2]}]`);
};
return result;
}
var check1 = check("operator1(value)");
console.log("check1", check1);
var check2 = check("operator1(value), operator2(value)");
console.log("check2", check2);
var check3 = check("operator1(value), operator2(value), operator_n(value)");
console.log("check3", check3);
This can also be done with a simple split and a for loop.
var data = "operator1(value), operator2(value), operator_n(value)",
ops = data.substring(0, data.length - 1), // Remove the last parenth from the string
arr = ops.split(/\(|\), /),
res = [], n, eN = arr.length;
for (n = 0; n < eN; n += 2) {
res.push([arr[n], arr[n + 1]]);
}
console.log(res);
The code creates a flattened array from a string, and then nests arrays of "operator"/"value" pairs to the result array. Works for older browsers too.

Tokenize a JavaScript String depending on the characters

In JavaScript, let's say I have a String like "23+var-5/422*b".
I want to split this String so that I get [23,+,var,-,5,/,422,*,b].
I want to tokenize it so that I split the string into 3 types of tokens:
Numerical literals, [0-9].
String literals, [A-z].
Operator characters, [-+*/].
So basically, go through the string, and for each "cluster of characters" that share the same class (each with 1 or more characters), convert that into a token.
I could probably use a for loop, comparing each character with each class, and manually create a token every time the current "character class" changes... it would be very tedious and use many variables and loops.
Does anyone know a more elegant (less verbose) way to get there?
A global regexp match will do this for you:
var str = "23+var-5/422*b";
var arr = str.match(/[0-9]+|[a-zA-Z]+|[-+*/]/g); // notice the creation of one token
// per operator (even if consecutive)
However, it simply ignores invalid characters instead of erroring out.
Here's a way to do it using Regex. Obviously the code can be simplified more if you use Underscore.js or CoffeeScript. So here's a longer version using vanilla JS:
var s = "23+var-5/422*b"; // your string
var re1 = /[0-9]/; // Regex for numerals
var re2 = /[a-zA-Z]/; // Regex for roman chars
var re3 = /[-+*\/]/; // Regex you wanted for operators
// Helper function, return true if n none-negative
function nonNegative(n) {
return n >= 0;
}
// helper function: add any none-negative n to array arr
function addNonNegative(n, arr) {
if (nonNegative(n)) {arr.push(n)};
}
// The main function to split string s
function split(s) {
var result = []; // The result array, initialized
// Do while string s is none empty.
while(s.length > 0) {
// The order of indices of regex found
var order = [];
// search for index or which the regex occurs, then if that index is none-negative, add it to the 'order' array
addNonNegative(s.search(re1), order);
addNonNegative(s.search(re2), order);
addNonNegative(s.search(re3), order);
// sort the order array
order = order.sort();
// variables to slice the string s.
// start is always 0. Marks the starting index of the first matched regex
var start = order.shift();
// Marks the starting index of the second matched regex
var end = order.shift(); // end is the second result in order
result.push(s.slice(start, end)); // slice the string s from start to end
// update s so that exclude what was sliced before
s = s.slice(end);
// boundary condition: finally when end is null once all regex have been pulled, set s = ""
if (end == null) {s = ""};
}
return result;
}

Categories

Resources