Is there a limit to regex for javascript pattern searches? - javascript

This is as simplified as I can get my problem to reproduce. I get the same result in Safari and Chrome...
var temp = 'a b s40 x';
var div = $('#test');
div.append('<br>' + temp)
var problem = temp.replace(/(^| )(a|c|s\d{1,3}|x)( |$)/g, ' ').trim();
div.append('<br>' + problem); //=b x
var solution = temp.replace(/(^| )(a|c|s\d{1,3})( |$)/g, ' ').replace(/(^| )x( |$)/g, ' ').trim();
div.append('<br>' + solution); //=b
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
<div id='test'>Hello...
</div>
problem uses: (a|c|s\d{1,3}|x)
solution uses: (a|c|s\d{1,3}) and a second replace to get the x...
why is solution != problem??
My fiddle: https://jsfiddle.net/zo5caun2/

The problem is that it won't replace overlapping matches. When it matches s40, the match includes the spaces before and after the word. It can't match x because there's no space before it, since that was part of the previous match. And it's not at the beginning, so ^ doesn't match before it, either.
That doesn't happen in solution because you're doing two separate replacements, so the second replace doesn't care what was matched in the first one.
Use \b to match word boundaries instead of explicit spaces.
var temp = 'a b s40 x';
var div = $('#test');
div.append('<br>' + temp)
var problem = temp.replace(/\b(a|c|s\d{1,3}|x)\b/g, ' ').trim();
div.append('<br>' + problem); //=b x
var solution = temp.replace(/\b(a|c|s\d{1,3})\b/g, ' ').replace(/\bx\b/g, ' ').trim();
div.append('<br>' + solution); //=b
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
<div id='test'>Hello...
</div>

If you just need modify first regex to have the same result as in the second regex this should solve your problem.
i'm not sure why you put this (^| ) - start or space and space or end( |$). you can use \b
If you need ensure that are some chars in front or after capture then you can use look behind.
var temp = 'a b s40 x';
var problem = temp.replace(/\b(a|c|s\d{1,3}|x)\b/g,' ').trim();
console.log(problem)
var solution = temp.replace(/(^| )(a|c|s\d{1,3})( |$)/g,' ').replace(/(^| )x( |$)/g,' ').trim();
console.log(solution)

Related

RegEx Data Values Javascript white Space

I am trying to add the correct white space for data i am receiving. currently it shows like this
NotStarted
ReadyforPPPDReview
this is the code i am using
.replace(/([A-Z])/g, '$1')
"NotStarted" shows correct "Not Started" but "ReadyforPPPDReview" shows "Readyfor P P P D Review" when it should look like this "Ready for PPPD Review"
what is the best way to handle both of these using one regex or function?
You would need an NLP engine to handle this properly. Here are two approaches with simple regex, both have limitations:
1. Use list of stop words
We blindly add spaces before and after the stop words:
var str = 'NotStarted, ReadyforPPPDReview';
var wordList = 'and, for, in, on, not, review, the'; // stop words
var wordListRe = new RegExp('(' + wordList.replace(/, */g, '|') + ')', 'gi');
var result1 = str
.replace(wordListRe, ' $1 ') // add space before and after stop words
.replace(/([a-z])([A-Z])/g, '$1 $2') // add space between lower case and upper case chars
.replace(/ +/g, ' ') // remove excessive spaces
.trim(); // remove spaces at start and end
console.log('str: ' + str);
console.log('result1: ' + result1);
As you can imagine the stop words approach has some severe limitations. For example, words formula input would result in for mula in put.
1. Use a mapping table
The mapping table lists words that need to be spaced out (no drugs involved), as in this code snippet:
var str = 'NotStarted, ReadyforPPPDReview';
var spaceWordMap = {
NotStarted: 'Not Started',
Readyfor: 'Ready for',
PPPDReview: 'PPPD Review'
// add more as needed
};
var spaceWordMapRe = new RegExp('(' + Object.keys(spaceWordMap).join('|') + ')', 'gi');
var result2 = str
.replace(spaceWordMapRe, function(m, p1) { // m: matched snippet, p1: first group
return spaceWordMap[p1] // replace key in spaceWordMap with its value
})
.replace(/([a-z])([A-Z])/g, '$1 $2') // add space between lower case and upper case chars
.replace(/ +/g, ' ') // remove excessive spaces
.trim(); // remove spaces at start and end
console.log('str: ' + str);
console.log('result2: ' + result2);
This approach is suitable if you have a deterministic list of words as input.

Remove all content after last '\' is not working

I'm trying to rename a document, I want to remove all the content after the last '\' and then give it another name.
I did it like this but it doesn't seem to be working:
var newDocName = documentPath.replace(/\/$/, '');
var newDocName = newDocName + "\test.pdf";
The '\' doesn't get removed after the first line of code.
Any idea what am I doing wrong?
/\/$/ means you want to match a / if it's the last character in the string meaning this code would replace the very last / if, and only if, it's at the end of the string.
If you want to remove the content after the last \ then you can use a combination of split to split the string on \s then use slice to get everything but the last element. Finally, use join to bring them all back together.
var uri = 'path\\to\\my\\file.ext';
var parts = uri.split('\\');
var withoutFile = parts.slice(0, parts.length - 1);
var putItBackTogether = withoutFile.join('\\');
var voila = putItBackTogether + '\\new-file.name';
console.log(voila);
It is forward slash, use \\ istead.
Try to substitute it for:
var newDocName = documentPath.replace(/\\/$/, '');
Your REGEX has a bad format: you should escape your backquotes (\).
So it may be:
var newDocName = documentPath.replace(/[\\/]$/, '');
var newDocName = newDocName + "\\test.pdf";
This regular expression will search for \ or / at the end ($) of you path. You could use regex101 to test your regular expressions.
You also should consider not using regular expressions when you don’t need them:
var newDocName = documentPath[documentPath.length - 1] == "\\" ? documentPath + "test.pdf" : documentPath + "\\test.pdf";

replace string value and boundry spaces if they exist

I have a string and am trying to findout what's a better way by
replacing (the boundry spaces as well if they exist)
while also replacing the value without
doing multipule passes (ie calling, trim_multispace and trimed_result).
var replaceVal = "c";
var strVals = "a b c d e f g h";
var replacedVal = strVals.replace(new RegExp("\\b"+replaceVal+"\\b",""),"");
alert(replacedVal)
var trim_multispace = replacedVal.replace(/ +(?= )/g,'');
var trimed_result = trim_multispace.replace(/^\s+|\s+$/g, '');
alert(trimed_result)
I am not sure if I got you correctly, but this regex works in the way you specified
strVals.replace(new RegExp(replaceVal, 'g'), '')
.replace(/ +/g, ' ').replace(/^\s+|\s+$/g, '');
The following does this in almost one step, first replace the value and any surrounding whitespace with a single space, then trim the result:
var replacedVal = strVals.replace(new RegExp("\\s*"+replaceVal+"\\s*", ""), " ").trim();

Javascript regular expression - matching multiple occurrences

I'm a little stuck on a problem here.
I'm trying to match multiple occurrences of a regular expression in a string, but I don't get all occurrences:
Sample:
s = new RegExp(';' + y[p][0] + '_' + y[p][1] + '_' + y[p][2] + '_([0-9]*);', 'g');
e = null;
e = s.exec(grArr);
while (e != null) {
alert(e[0]+'-'+e[1]+'-'+e[2]); //debugging output
r = r + e[0]; //adding results to output var
e = s.exec(grArr);
}
Sample variables:
//to be searched:
var grArr=';0_0_709711498101583267971121121179999105110111_11994876;0_0_709711498101583267971121121179999105110111_11994877;0_0_709711498101583267971121121179999105110111_11994878;0_0_709711498101583267971121121179999105110111_11994879;0_0_709711498101583268117110107101108103114252110_11994872;0_0_709711498101583268117110107101108103114252110_11994873;0_0_709711498101583268117110107101108103114252110_11994874;0_0_709711498101583268117110107101108103114252110_11994875;0_0_7097114981015832839910411997114122_11994868;0_0_7097114981015832839910411997114122_11994869;0_0_7097114981015832839910411997114122_11994870;0_0_7097114981015832839910411997114122_11994871;0_1_71114246115115101583276_11994870;0_1_71114246115115101583276_11994874;0_1_71114246115115101583276_11994878;0_1_71114246115115101583277_11994869;0_1_71114246115115101583277_11994873;0_1_71114246115115101583277_11994877;0_1_71114246115115101583283_11994868;0_1_71114246115115101583283_11994872;0_1_71114246115115101583283_11994876;0_1_7111424611511510158328876_11994871;0_1_7111424611511510158328876_11994875;0_1_7111424611511510158328876_11994879;'
//search Pattern:
y[0][0]='0';
y[0][1]='1';
y[0][2]='71114246115115101583283';
This results in 2 occurrences - not 3 as it should be.
The problem is that you're using the semicolon twice: Once at the start of the regex, once at the end.
Since in your example the three "matches" directly follow each other, the second occurrence is not found because its preceding semicolon has already been used in the previous match.
Solution: Use word boundaries ('\\b') instead of ';' in your regex.

Regular expression to parse jQuery-selector-like string

text = '#container a.filter(.top).filter(.bottom).filter(.middle)';
regex = /(.*?)\.filter\((.*?)\)/;
matches = text.match(regex);
log(matches);
// matches[1] is '#container a'
//matchss[2] is '.top'
I expect to capture
matches[1] is '#container a'
matches[2] is '.top'
matches[3] is '.bottom'
matches[4] is '.middle'
One solution would be to split the string into #container a and rest. Then take rest and execute recursive exec to get item inside ().
Update: I am posting a solution that does work. However I am looking for a better solution. Don't really like the idea of splitting the string and then processing
Here is a solution that works.
matches = [];
var text = '#container a.filter(.top).filter(.bottom).filter(.middle)';
var regex = /(.*?)\.filter\((.*?)\)/;
var match = regex.exec(text);
firstPart = text.substring(match.index,match[1].length);
rest = text.substring(matchLength, text.length);
matches.push(firstPart);
regex = /\.filter\((.*?)\)/g;
while ((match = regex.exec(rest)) != null) {
matches.push(match[1]);
}
log(matches);
Looking for a better solution.
This will match the single example you posted:
<html>
<body>
<script type="text/javascript">
text = '#container a.filter(.top).filter(.bottom).filter(.middle)';
matches = text.match(/^[^.]*|\.[^.)]*(?=\))/g);
document.write(matches);
</script>
</body>
</html>
which produces:
#container a,.top,.bottom,.middle
EDIT
Here's a short explanation:
^ # match the beginning of the input
[^.]* # match any character other than '.' and repeat it zero or more times
#
| # OR
#
\. # match the character '.'
[^.)]* # match any character other than '.' and ')' and repeat it zero or more times
(?= # start positive look ahead
\) # match the character ')'
) # end positive look ahead
EDIT part II
The regex looks for two types of character sequences:
one ore more characters starting from the start of the string up to the first ., the regex: ^[^.]*
or it matches a character sequence starting with a . followed by zero or more characters other than . and ), \.[^.)]*, but must have a ) ahead of it: (?=\)). This last requirement causes .filter not to match.
You have to iterate, I think.
var head, filters = [];
text.replace(/^([^.]*)(\..*)$/, function(_, h, rem) {
head = h;
rem.replace(/\.filter\(([^)]*)\)/g, function(_, f) {
filters.push(f);
});
});
console.log("head: " + head + " filters: " + filters);
The ability to use functions as the second argument to String.replace is one of my favorite things about Javascript :-)
You need to do several matches repeatedly, starting where the last match ends (see while example at https://developer.mozilla.org/en/Core_JavaScript_1.5_Reference/Global_Objects/RegExp/exec):
If your regular expression uses the "g" flag, you can use the exec method multiple times to find successive matches in the same string. When you do so, the search starts at the substring of str specified by the regular expression's lastIndex property. For example, assume you have this script:
var myRe = /ab*/g;
var str = "abbcdefabh";
var myArray;
while ((myArray = myRe.exec(str)) != null)
{
var msg = "Found " + myArray[0] + ". ";
msg += "Next match starts at " + myRe.lastIndex;
print(msg);
}
This script displays the following text:
Found abb. Next match starts at 3
Found ab. Next match starts at 9
However, this case would be better solved using a custom-built parser. Regular expressions are not an effective solution to this problem, if you ask me.
var text = '#container a.filter(.top).filter(.bottom).filter(.middle)';
var result = text.split('.filter');
console.log(result[0]);
console.log(result[1]);
console.log(result[2]);
console.log(result[3]);
text.split() with regex does the trick.
var text = '#container a.filter(.top).filter(.bottom).filter(.middle)';
var parts = text.split(/(\.[^.()]+)/);
var matches = [parts[0]];
for (var i = 3; i < parts.length; i += 4) {
matches.push(parts[i]);
}
console.log(matches);

Categories

Resources