Matching whole words with Javascript's Regex with a few restrictions - javascript

I am trying to create a regex that can extract all words from a given string that only contain alphanumeric characters.
Yes
yes absolutely
#no
*NotThis
orThis--
Good *Bad*
1ThisIsOkay2 ButNotThis2)
Words that should have been extracted: Yes, yes, absolutely, Good, 1ThisIsOkay2
Here is the work I have done thus far:
/(?:^|\b)[a-zA-Z0-9]+(?=\b|$)/g
I had found this expression that works in Ruby ( with some tweaking ) but I have not been able to convert it to Javascript regex.

Use /(?:^|\s)\w+(?!\S)/g to match 1 or more word chars in between start of string/whitespace and another whitespace or end of string:
var s = "Yes\nyes absolutely\n#no\n*NotThis\norThis-- \nGood *Bad*\n1ThisIsOkay2 ButNotThis2)";
var re = /(?:^|\s)\w+(?!\S)/g;
var res = s.match(re).map(function(m) {
return m.trim();
});
console.log(res);
Or another variation:
var s = "Yes\nyes absolutely\n#no\n*NotThis\norThis-- \nGood *Bad*\n1ThisIsOkay2 ButNotThis2)";
var re = /(?:^|\s)(\w+)(?!\S)/g;
var res = [];
while ((m=re.exec(s)) !== null) {
res.push(m[1]);
}
console.log(res);
Pattern details:
(?:^|\s) - either start of string or whitespace (consumed, that is why trim() is necessary in Snippet 1)
\w+ - 1 or more word chars (in Snippet 2, captured into Group 1 used to populate the resulting array)
(?!\S) - negative lookahead failing the match if the word chars are not followed with non-whitespace.

You can do that (where s is your string) to match all the words:
var m = s.split(/\s+/).filter(function(i) { return !/\W/.test(i); });
If you want to proceed to a replacement, you can do that:
var res = s.split(/(\s+)/).map(function(i) { return i.replace(/^\w+$/, "#");}).join('');

Related

How to slice optional arguments in RegEx?

Actually i have the following RegExp expression:
/^(?:(?:\,([A-Za-z]{5}))?)+$/g
So the accepted input should be something like ,IGORA but even ,IGORA,GIANC,LOLLI is valid and i would be able to slice the string to 3 group in this case, in other the group number should be equals to the user input that pass the RegExp test.
i was trying to do something like this in JavaScript but it return only the last value
var str = ',GIANC,IGORA';
var arr = str.match(/^(?:(?:\,([A-Za-z]{5}))?)+$/).slice(1);
alert(arr);
So the output is 'IGORA' while i would it to be 'GIANC' 'IGORA'
Here is another example
/^([A-Z]{5})(?:(?:\,([A-Za-z]{2}))?)+$/g
test of regexp may have at least 5 chart string but it also can have other 5 chart string separated with a comma so from input
IGORA,CIAOA,POPOP
I would have an array of ["IGORA","CIAOA","POPOP"]
You can capture the words in a capturing surrounded by an optional preceding comma or an optional trailing comma.
You can test the regex here: ,?([A-Za-z]+),?
const pattern = /,?([A-Za-z]+),?/gm;
const str = `,IGORA,GIANC,LOLLI`;
let matches = [];
let match;
// Iterate until no match found
while ((m = pattern.exec(str))) {
// The first captured group is the match
matches.push(m[1]);
}
console.log(matches);
There are other ways to do this, but I found that one of the simple ways is by using the replace method, as it can replace all instances that match that regex.
For example:
var regex = /^(?:(?:\,([A-Za-z]{5}))?)+$/g;
var str = ',GIANC,IGORA';
var arr = [];
str.replace(regex, function(match) {
arr[arr.length] = match;
return match;
});
console.log(arr);
Also, in my code snippet you can see that there is an extra coma in each string, you can solve that by changing line 5 to arr[arr.length] = match.replace(/^,/, '').
Is this what you're looking for?
Explanation:
\b word boundary (starting or ending a word)
\w a word ([A-z])
{5} 5 characters of previous
So it matches all 5-character words but not NANANANA
var str = 'IGORA,CIAOA,POPOP,NANANANA';
var arr = str.match(/\b\w{5}\b/g);
console.log(arr); //['IGORA', 'CIAOA', 'POPOP']
If you only wish to select words separated by commas and nothing else, you can test for them like so:
(?<=,\s*|^) preceded by , with any number of trailing space, OR is the first word in list.
(?=,\s*|$) followed by , and any number of trailing spaces OR is last word in list.
In the following code, POPOP and MOMMA are rejected because they are not separated by a comma, and NANANANA fails because it is not 5 character.
var str = 'IGORA, CIAOA, POPOP MOMMA, NANANANA, MEOWI';
var arr = str.match(/(?<=,\s*|^)\b\w{5}\b(?=,\s*|$)/g);
console.log(arr); //['IGORA', 'CIAOA', 'MEOWI']
If you can't have any trailing spaces after the comma, just leave out the \s* from both (?<=,\s*|^) and (?=,\s*|$).

Split a string according to flanking characters in javascript

Javascript lets you split a string according to regular expression. Is it possible to use this functionality to split a string only when the delimiter is flanked by certain characters?
For example, if I want to split the string 12-93 but not at-13 using the - character? Is that possible?
Using a regular expression seems promising, but doing "12-93".split(/[0-9]-[0-9]/) yields ["1", "3"] because the flanking digits are considered to be part of the delimiter.
Can I specify the above split pattern (a dash preceded and followed by a digit) without chopping the flanking digits?
Other Examples
"55,966,575-165,162,787" should yield ["55,966,575", "165,162,787"]
"55,966,575x-165,162,787" should yield ["55,966,575x-165,162,787"]
"sdf55,966,575-165,162,787" should yield ["sdf55,966,575", "165,162,787"]
Using two adjacent character sets seems to work.
See example at https://regex101.com/r/uFHMW1/1
([0-9,a-z]+?[0-9]+)-([0-9]+[0-9,a-z]+)
Try this (live here https://repl.it/EOOQ/0 ):
var strings = [
"55,966,575-165,162,787",
"55,966,575x-165,162,787",
"sdf55,966,575-165,162,787",
];
var pattern = '^([0-9,a-z]+?[0-9]+)-([0-9]+[0-9,a-z]+)$';
var regex = new RegExp(pattern, 'i');
var matched = strings.map(function (string) {
var matches = string.match( regex );
if (matches) {
return [matches[1], matches[2]];
} else {
return [string];
}
});
console.log(matched)
You can also run the above expression as split() like:
string.split(re).filter( str => str.length )
where Array.filter() is used to get rid of the leading and trailing empty strings created when the RegExp matches your input.
var strings = [
"55,966,575-165,162,787",
"55,966,575x-165,162,787",
"sdf55,966,575-165,162,787",
];
var pattern = '^([0-9,a-z]+?[0-9]+)-([0-9]+[0-9,a-z]+)$';
var regex = new RegExp(pattern, 'i');
var matched = strings.map( string => string.split(regex).filter( str => str.length ) );
console.log(matched)
Try using a non-capturing lookahead. You are using a regex that captures all of the characters found, then uses that result as the split character(s).

What's the JS RegExp for this specific string?

I have a rather isolated situation in an inventory management program where our shelf locations have a specific format, which is always Letter: Number-Letter-Number, such as Y: 1-E-4. Most of us coworkers just type in "y1e4" and are done with it, but that obviously creates issues with inconsistent formats in a database. Are JS RegExp's the ideal way to automatically detect and format these alphanumeric strings? I'm slowly wrapping my head around JavaScript's Perl syntax, but what's a simple example of formatting one of these strings?
spec: detect string format of either "W: D-W-D" or "WDWD" and return "W: D-W-D"
This function will accept any format and return undefined if it doesnt match, returns the formatted string if a match does occur.
function validateInventoryCode(input) {
var regexp = /^([a-zA-Z]+)(?:\:\s*)?(\d+)-?(\w+)-?(\d+)$/
var r = regexp.exec(input);
if(r != null) {
return `${r[1]}: ${r[2]}-${r[3]}-${r[4]}`;
}
}
var possibles = ["y1e1", "y:1e1", "Y: 1r3", "y: 32e4", "1:e3e"];
possibles.forEach(function(posssiblity) {
console.log(`input(${posssiblity}), result(${validateInventoryCode(posssiblity)})`);
})
function validateInventoryCode(input) {
var regexp = /^([a-zA-Z]+)(?:\:\s*)?(\d+)-?(\w+)-?(\d+)$/
var r = regexp.exec(input);
if (r != null) {
return `${r[1]}: ${r[2]}-${r[3]}-${r[4]}`;
}
}
I understand the question as "convert LetterNumberLetterNumber to Letter: Number-Letter-Number.
You may use
/^([a-z])(\d+)([a-z])(\d+)$/i
and replace with $1: $2-$3-$4
Details:
^ - start of string
([a-z]) - Group 1 (referenced with $1 from the replacement pattern) capturing any ASCII letter (as /i makes the pattern case-insensitive)
(\d+) - Group 2 capturing 1 or more digits
([a-z]) - Group 3, a letter
(\d+) - Group 4, a number (1 or more digits)
$ - end of string.
See the regex demo.
var re = /^([a-z])(\d+)([a-z])(\d+)$/i;
var s = 'y1e2';
var result = s.replace(re, '$1: $2-$3-$4');
console.log(result);
OR - if the letters must be turned to upper case:
var re = /^([a-z])(\d+)([a-z])(\d+)$/i;
var s = 'y1e2';
var result = s.replace(re,
(m,g1,g2,g3,g4)=>`${g1.toUpperCase()}: ${g2}-${g3.toUpperCase()}-${g4}`
);
console.log(result);
this is the function to match and replace the pattern: DEMO
function findAndFormat(text){
var splittedText=text.split(' ');
for(var i=0, textLength=splittedText.length; i<textLength; i++){
var analyzed=splittedText[i].match(/[A-z]{1}\d{1}[A-z]{1}\d{1}$/);
if(analyzed){
var formattedString=analyzed[0][0].toUpperCase()+': '+analyzed[0][1]+'-'+analyzed[0][2].toUpperCase()+'-'+analyzed[0][3];
text=text.replace(splittedText[i],formattedString);
}
}
return text;
}
i think it's just as it reads:
y1e4
Letter, number, letter, number:
/([A-z][0-9][A-z][0-9])/g
And yes, it's ok to use regex in this case, like form validations and stuff like that. it's just there are some cases on which abusing of regular expressions gives you a bad performance (into intensive data processing and the like)
Example
"HelloY1E4world".replace(/([A-z][0-9][A-z][0-9])/g, ' ');
should return: "Hello world"
regxr.com always comes in handy

Extract word between '=' and '('

I have the following string
234234=AWORDHERE('sdf.'aa')
where I need to extract AWORDHERE.
Sometimes there can be space in between.
234234= AWORDHERE('sdf.'aa')
Can I do this with a regular expression?
Or should I do it manually by finding indexes?
The datasets are huge, so it's important to do it as fast as possible.
Try this regex:
\d+=\s?(\w+)\(
Check Demo
in Javascript it would like that:
var myString = "234234=AWORDHERE('sdf.'aa')";// or 234234= AWORDHERE('sdf.'aa')
var myRegexp = /\d+=\s?(\w+)\(/g;
var match = myRegexp.exec(myString);
console.log(match[1]); // AWORDHERE
You could do this at least three ways. You need to benchmark to see what's fastest.
Substring w/ indexes
function extract(from) {
var ixEq = from.indexOf("=");
var ixParen = from.indexOf("(");
return from.substring(ixEq + 1, ixParen);
}
.
Splits
function extract(from) {
var spEq = from.split("=");
var spParen = spEq[1].split("(");
return spParen[0];
}
Regex (demo)
Here is some sample regex you could use
/[^=]+=([^(]+).*/g
This says
[^=]+ - One or more character which is not an =
= - The = itself
( - creates a matching group so you can access your match in code
[^(]+ - One or more character which is not a (
) - closes the matching group
.* - Matches the rest of the line
the /g on the end tells it to perform the match on all lines.
Using look around you can search for string preceded by = and followed by ( as following.
Regex: (?<==)[A-Z ]+(?=\()
Explanation:
(?<==) checks if [A-Z ] is preceded by an =.
[A-Z ]+ matches your pattern.
(?=\() checks if matched pattern is followed by a (.
Regex101 Demo
var str = "234234= AWORDHERE('sdf.'aa')";
var regexp = /.*=\s+(\w+)\(.*\)/g;
var match = regexp.exec(str);
alert( match[1] );
I made my solution for this just a little more general than you asked for, but I don't think it takes much more time to execute. I didn't measure. If you need greater efficiency than this provides, comment and I or someone else can help you with that.
Here's what I did, using the command prompt of node:
> var s = "234234= AWORDHERE('sdf.'aa')"
undefined
> var a = s.match(/(\w+)=\s*(\w+)\s*\(.*/)
undefined
> a
[ '234234= AWORDHERE(\'sdf.\'aa\')',
'234234',
'AWORDHERE',
index: 0,
input: '234234= AWORDHERE(\'sdf.\'aa\')' ]
>
As you can see, this matches the number before the = in a[1], and it matches the AWORDHERE name as you requested in a[2]. This will work with any number (including zero) spaces before and/or after the =.

Making Regex more safe

i'm trying to turn a bunch of regex more safe, what i mean by more safe, i want more accuracy.
So, i'm very new with RegExp, and i want know if i'm doing this right (not the Regex, but turn into more safety).
So, i'm starting now, and this is the first RegExp that i want change, i want push the 01/2011.
Past RegExp:
var text = 'INSCRIÇÃO: 60.537.263/0001-66 COMP: 01/2011 COD REC: 150';
var reg = /COMP.*?(\d\S*)/;
var match = reg.exec(text);
console.log(match[1]);
New RegExp:
var text = 'INSCRIÇÃO: 60.537.263/0001-66 COMP: 01/2011 COD REC: 150';
var reg = /COMP:\s([0-9]{0,2}\/[0-9]{0,4})/;
var match = reg.exec(text);
console.log(match[1]);
Why this? This text is just a part of a huge text, so i need accuraci.
Other question is about turn the Regex optional, so if doesn't match anything, return undefined.
Thanks.
According to your feedback:
i want specifically push the value with two numbers, one / and four numbers
You can use
/\bCOMP:\s*(\d{2}\/\d{4})(?!\d)/g
The \b is a word boundary, thus 5COMP won't be matched.
The \s* will match 0 or more whitespace (if there must be whitespace, use + quantifier instead).
The \d{2} will match exactly 2 digits.
The \d{4} will match 4 digits and no more because of the look-ahead (?!\d). This look-ahead just makes sure there is no digit after the 4 previous digits. You may use \b here as well to ensure matching a word boundary.
arr = [];
var re = /\bCOMP:\s*(\d{2}\/\d{4})(?!\d)/g;
var str = 'COMP:10/9995, COMP: 21/1234, COMP: 21/123434, REGCOMP: 21/1234';
var m;
while ((m = re.exec(str)) !== null) {
arr.push(m[1]);
}
console.log(arr);

Categories

Resources