Intelligent regex to understand input - javascript

Following Split string that used to be a list, I am doing this:
console.log(lines[line]);
var regex = /(-?\d{1,})/g;
var cluster = lines[line].match(regex);
console.log(cluster);
which will give me this:
((3158), (737))
["3158", "737"]
where 3158 will be latter treated as the ID in my program and 737 the associated data.
I am wondering if there was a way to treat inputs of this kind too:
((3158, 1024), (737))
where the ID will be a pair, and do something like this:
var single_regex = regex_for_single_ID;
var pair_regex = regex_for_pair_ID;
if(single_regex)
// do my logic
else if(pair_regex)
// do my other logic
else
// bad input
Is that possible?
Clarification:
What I am interested in is treating the two cases differently. For example one solution would be to have this behavior:
((3158), (737))
["3158", "737"]
and for pairs, concatenate the ID:
((3158, 1024), (737))
["31581024", "737"]

For a simple way, you can use .replace(/(\d+)\s*,\s*/g, '$1') to merge/concatenate numbers in pair and then use simple regex match that you are already using.
Example:
var v1 = "((3158), (737))"; // singular string
var v2 = "((3158, 1024), (737))"; // paired number string
var arr1 = v1.replace(/(\d+)\s*,\s*/g, '$1').match(/-?\d+/g)
//=> ["3158", "737"]
var arr2 = v2.replace(/(\d+)\s*,\s*/g, '$1').match(/-?\d+/g)
//=> ["31581024", "737"]
We use this regex in .replace:
/(\d+)\s*,\s*/
It matches and groups 1 or more digits followed by optional spaces and comma.
In replacement we use $1 that is the back reference to the number we matched, thus removing spaces and comma after the number.

You may use an alternation operator to match either a pair of numbers (capturing them into separate capturing groups) or a single one:
/\((-?\d+), (-?\d+)\)|\((-?\d+)\)/g
See the regex demo
Details:
\((-?\d+), (-?\d+)\) - a (, a number (captured into Group 1), a ,, space, another number of the pair (captured into Group 2) and a )
| - or
\((-?\d+)\) - a (, then a number (captured into Group 3), and a ).
var re = /\((-?\d+), (-?\d+)\)|\((-?\d+)\)/g;
var str = '((3158), (737)) ((3158, 1024), (737))';
var res = [];
while ((m = re.exec(str)) !== null) {
if (m[3]) {
res.push(m[3]);
} else {
res.push(m[1]+m[2]);
}
}
console.log(res);

Related

How to slice optional arguments in RegEx?

Actually i have the following RegExp expression:
/^(?:(?:\,([A-Za-z]{5}))?)+$/g
So the accepted input should be something like ,IGORA but even ,IGORA,GIANC,LOLLI is valid and i would be able to slice the string to 3 group in this case, in other the group number should be equals to the user input that pass the RegExp test.
i was trying to do something like this in JavaScript but it return only the last value
var str = ',GIANC,IGORA';
var arr = str.match(/^(?:(?:\,([A-Za-z]{5}))?)+$/).slice(1);
alert(arr);
So the output is 'IGORA' while i would it to be 'GIANC' 'IGORA'
Here is another example
/^([A-Z]{5})(?:(?:\,([A-Za-z]{2}))?)+$/g
test of regexp may have at least 5 chart string but it also can have other 5 chart string separated with a comma so from input
IGORA,CIAOA,POPOP
I would have an array of ["IGORA","CIAOA","POPOP"]
You can capture the words in a capturing surrounded by an optional preceding comma or an optional trailing comma.
You can test the regex here: ,?([A-Za-z]+),?
const pattern = /,?([A-Za-z]+),?/gm;
const str = `,IGORA,GIANC,LOLLI`;
let matches = [];
let match;
// Iterate until no match found
while ((m = pattern.exec(str))) {
// The first captured group is the match
matches.push(m[1]);
}
console.log(matches);
There are other ways to do this, but I found that one of the simple ways is by using the replace method, as it can replace all instances that match that regex.
For example:
var regex = /^(?:(?:\,([A-Za-z]{5}))?)+$/g;
var str = ',GIANC,IGORA';
var arr = [];
str.replace(regex, function(match) {
arr[arr.length] = match;
return match;
});
console.log(arr);
Also, in my code snippet you can see that there is an extra coma in each string, you can solve that by changing line 5 to arr[arr.length] = match.replace(/^,/, '').
Is this what you're looking for?
Explanation:
\b word boundary (starting or ending a word)
\w a word ([A-z])
{5} 5 characters of previous
So it matches all 5-character words but not NANANANA
var str = 'IGORA,CIAOA,POPOP,NANANANA';
var arr = str.match(/\b\w{5}\b/g);
console.log(arr); //['IGORA', 'CIAOA', 'POPOP']
If you only wish to select words separated by commas and nothing else, you can test for them like so:
(?<=,\s*|^) preceded by , with any number of trailing space, OR is the first word in list.
(?=,\s*|$) followed by , and any number of trailing spaces OR is last word in list.
In the following code, POPOP and MOMMA are rejected because they are not separated by a comma, and NANANANA fails because it is not 5 character.
var str = 'IGORA, CIAOA, POPOP MOMMA, NANANANA, MEOWI';
var arr = str.match(/(?<=,\s*|^)\b\w{5}\b(?=,\s*|$)/g);
console.log(arr); //['IGORA', 'CIAOA', 'MEOWI']
If you can't have any trailing spaces after the comma, just leave out the \s* from both (?<=,\s*|^) and (?=,\s*|$).

Matching whole words with Javascript's Regex with a few restrictions

I am trying to create a regex that can extract all words from a given string that only contain alphanumeric characters.
Yes
yes absolutely
#no
*NotThis
orThis--
Good *Bad*
1ThisIsOkay2 ButNotThis2)
Words that should have been extracted: Yes, yes, absolutely, Good, 1ThisIsOkay2
Here is the work I have done thus far:
/(?:^|\b)[a-zA-Z0-9]+(?=\b|$)/g
I had found this expression that works in Ruby ( with some tweaking ) but I have not been able to convert it to Javascript regex.
Use /(?:^|\s)\w+(?!\S)/g to match 1 or more word chars in between start of string/whitespace and another whitespace or end of string:
var s = "Yes\nyes absolutely\n#no\n*NotThis\norThis-- \nGood *Bad*\n1ThisIsOkay2 ButNotThis2)";
var re = /(?:^|\s)\w+(?!\S)/g;
var res = s.match(re).map(function(m) {
return m.trim();
});
console.log(res);
Or another variation:
var s = "Yes\nyes absolutely\n#no\n*NotThis\norThis-- \nGood *Bad*\n1ThisIsOkay2 ButNotThis2)";
var re = /(?:^|\s)(\w+)(?!\S)/g;
var res = [];
while ((m=re.exec(s)) !== null) {
res.push(m[1]);
}
console.log(res);
Pattern details:
(?:^|\s) - either start of string or whitespace (consumed, that is why trim() is necessary in Snippet 1)
\w+ - 1 or more word chars (in Snippet 2, captured into Group 1 used to populate the resulting array)
(?!\S) - negative lookahead failing the match if the word chars are not followed with non-whitespace.
You can do that (where s is your string) to match all the words:
var m = s.split(/\s+/).filter(function(i) { return !/\W/.test(i); });
If you want to proceed to a replacement, you can do that:
var res = s.split(/(\s+)/).map(function(i) { return i.replace(/^\w+$/, "#");}).join('');

What's the JS RegExp for this specific string?

I have a rather isolated situation in an inventory management program where our shelf locations have a specific format, which is always Letter: Number-Letter-Number, such as Y: 1-E-4. Most of us coworkers just type in "y1e4" and are done with it, but that obviously creates issues with inconsistent formats in a database. Are JS RegExp's the ideal way to automatically detect and format these alphanumeric strings? I'm slowly wrapping my head around JavaScript's Perl syntax, but what's a simple example of formatting one of these strings?
spec: detect string format of either "W: D-W-D" or "WDWD" and return "W: D-W-D"
This function will accept any format and return undefined if it doesnt match, returns the formatted string if a match does occur.
function validateInventoryCode(input) {
var regexp = /^([a-zA-Z]+)(?:\:\s*)?(\d+)-?(\w+)-?(\d+)$/
var r = regexp.exec(input);
if(r != null) {
return `${r[1]}: ${r[2]}-${r[3]}-${r[4]}`;
}
}
var possibles = ["y1e1", "y:1e1", "Y: 1r3", "y: 32e4", "1:e3e"];
possibles.forEach(function(posssiblity) {
console.log(`input(${posssiblity}), result(${validateInventoryCode(posssiblity)})`);
})
function validateInventoryCode(input) {
var regexp = /^([a-zA-Z]+)(?:\:\s*)?(\d+)-?(\w+)-?(\d+)$/
var r = regexp.exec(input);
if (r != null) {
return `${r[1]}: ${r[2]}-${r[3]}-${r[4]}`;
}
}
I understand the question as "convert LetterNumberLetterNumber to Letter: Number-Letter-Number.
You may use
/^([a-z])(\d+)([a-z])(\d+)$/i
and replace with $1: $2-$3-$4
Details:
^ - start of string
([a-z]) - Group 1 (referenced with $1 from the replacement pattern) capturing any ASCII letter (as /i makes the pattern case-insensitive)
(\d+) - Group 2 capturing 1 or more digits
([a-z]) - Group 3, a letter
(\d+) - Group 4, a number (1 or more digits)
$ - end of string.
See the regex demo.
var re = /^([a-z])(\d+)([a-z])(\d+)$/i;
var s = 'y1e2';
var result = s.replace(re, '$1: $2-$3-$4');
console.log(result);
OR - if the letters must be turned to upper case:
var re = /^([a-z])(\d+)([a-z])(\d+)$/i;
var s = 'y1e2';
var result = s.replace(re,
(m,g1,g2,g3,g4)=>`${g1.toUpperCase()}: ${g2}-${g3.toUpperCase()}-${g4}`
);
console.log(result);
this is the function to match and replace the pattern: DEMO
function findAndFormat(text){
var splittedText=text.split(' ');
for(var i=0, textLength=splittedText.length; i<textLength; i++){
var analyzed=splittedText[i].match(/[A-z]{1}\d{1}[A-z]{1}\d{1}$/);
if(analyzed){
var formattedString=analyzed[0][0].toUpperCase()+': '+analyzed[0][1]+'-'+analyzed[0][2].toUpperCase()+'-'+analyzed[0][3];
text=text.replace(splittedText[i],formattedString);
}
}
return text;
}
i think it's just as it reads:
y1e4
Letter, number, letter, number:
/([A-z][0-9][A-z][0-9])/g
And yes, it's ok to use regex in this case, like form validations and stuff like that. it's just there are some cases on which abusing of regular expressions gives you a bad performance (into intensive data processing and the like)
Example
"HelloY1E4world".replace(/([A-z][0-9][A-z][0-9])/g, ' ');
should return: "Hello world"
regxr.com always comes in handy

Regex matching comma delimited strings

Given any of the following strings, where operator and value are just placeholders:
"operator1(value)"
"operator1(value), operator2(value)"
"operator1(value), operator2(value), operator_n(value)"
I need to be able to match so i can get each operator and it's value as follows:
[[operator1, value]]
[[operator1, value], [operator2, value]]
[[operator1, value], [operator2, value], [operator_n, value]]
Please Note: There could be n number of operators (comma delimited) in the given string.
My current attempt will match on operator1(value) but nothing with multiple operators. See regex101 for the results.
/^(.*?)\((.*)\)$/
You should be able to do this with a single regex using the global flag.
var re= /(?:,\s*)?([^(]+?)\(([^)]+)\)/g;
var results = re.exec(str);
See the result at Regex 101: https://regex101.com/r/eC3uK3/2
Here's a pure regex answer to this question, this will work so long as your variables are always separated by a , and a space, should traverse through lines without much issue
https://regex101.com/r/eC3uK3/4
([^\(]*)(\([^, ]*\))(?:, )?(?:\n)?
Matches on:
operator1(value), operator2(value), operator_n(value),
operator1(value), operator2(value)
Explanation:
So, this sets up 2 capture groups and 2 non-capture groups.
The first capture group will match a value name until a parenthesis (by using a negated set and greedy). The second capture group will grab the parenthesis and the value name until the end of the parenthesis are found (note you can get rid of the parenthesis by escaping the outer set of parenthesis rather than the inner (Example here: https://regex101.com/r/eC3uK3/6). There's an optional ", " in a non capturing group, and an optional "\n" in another non-capturing group to handle any newline characters that you may happen across.
This should break your data out into:
'Operator1'
'(value)'
'operator2'
'(value)'
For as many as there are.
You can do this by first splitting then using a regular expression:
[
"operator1(value)",
"operator1(value), operator2(value)",
"operator1(value), operator2(value), operator_n(value)"
].forEach((str)=>{
var results = str
.split(/[,\s]+/) // split operations
.map(s=>s.match(/(\w+)\((\w+)\)/)) // extracts parts of the operations
.filter(Boolean) // ensure there's no error (in case of impure entries)
.map(s=>s.slice(1)); // make the desired result
console.log(results);
});
The following function "check" will achieve what you are looking for, if you want a string instead of an array of result, simply use the .toString() method on the array returned from the function.
function check(str) {
var myRe = /([^(,\s]*)\(([^)]*)\)/g;
var myArray;
var result = [];
while ((myArray = myRe.exec(str)) !== null) {
result.push(`[${myArray[1]}, ${myArray[2]}]`);
};
return result;
}
var check1 = check("operator1(value)");
console.log("check1", check1);
var check2 = check("operator1(value), operator2(value)");
console.log("check2", check2);
var check3 = check("operator1(value), operator2(value), operator_n(value)");
console.log("check3", check3);
This can also be done with a simple split and a for loop.
var data = "operator1(value), operator2(value), operator_n(value)",
ops = data.substring(0, data.length - 1), // Remove the last parenth from the string
arr = ops.split(/\(|\), /),
res = [], n, eN = arr.length;
for (n = 0; n < eN; n += 2) {
res.push([arr[n], arr[n + 1]]);
}
console.log(res);
The code creates a flattened array from a string, and then nests arrays of "operator"/"value" pairs to the result array. Works for older browsers too.

How to extract two strings from url using regex?

I've matched a string successfully, but I need to split it and add some new segments to URL. If it is possible by regex, How to match url and extract two strings like in the example below?
Current result:
["domain.com/collection/430000000000000"]
Desired result:
["domain.com/collection/", "430000000000000"]
Current code:
var reg = new RegExp('domain.com\/collection\/[0-9]+');
var str = 'http://localhost:3000/#/domain.com/collection/430000000000000?page=0&layout=grid';
console.log(str.match(reg));
You want Regex Capture Groups.
Put the parts you want to extract into braces like this, each part forming a matching group:
new RegExp('(domain.com\/collection\/)([0-9]+)')
Then after matching, you can extract each group content by index, with index 0 being the whole string match, 1 the first group, 2 the second etc. (thanks for the addendum, jcubic!).
This is done with exec() on the regex string like described here:
/\d(\d)\d/.exec("123");
// → ["123", "2"]
First comes the whole match, then the group matches in the sequence they appear in the pattern.
You can declare an array and then fill it with the required values that you can capture with parentheses (thus, making use of capturing groups):
var reg = /(domain.com\/collection)\/([0-9]+)/g;
// ^ ^ ^ ^
var str = 'http://localhost:3000/#/domain.com/collection/430000000000000?page=0&layout=grid';
var arr = [];
while ((m = reg.exec(str)) !== null) {
arr.push(m[1]);
arr.push(m[2]);
}
console.log(arr);
Output: ["domain.com/collection", "430000000000000"]

Categories

Resources