Javascript regular expression - matching multiple occurrences - javascript

I'm a little stuck on a problem here.
I'm trying to match multiple occurrences of a regular expression in a string, but I don't get all occurrences:
Sample:
s = new RegExp(';' + y[p][0] + '_' + y[p][1] + '_' + y[p][2] + '_([0-9]*);', 'g');
e = null;
e = s.exec(grArr);
while (e != null) {
alert(e[0]+'-'+e[1]+'-'+e[2]); //debugging output
r = r + e[0]; //adding results to output var
e = s.exec(grArr);
}
Sample variables:
//to be searched:
var grArr=';0_0_709711498101583267971121121179999105110111_11994876;0_0_709711498101583267971121121179999105110111_11994877;0_0_709711498101583267971121121179999105110111_11994878;0_0_709711498101583267971121121179999105110111_11994879;0_0_709711498101583268117110107101108103114252110_11994872;0_0_709711498101583268117110107101108103114252110_11994873;0_0_709711498101583268117110107101108103114252110_11994874;0_0_709711498101583268117110107101108103114252110_11994875;0_0_7097114981015832839910411997114122_11994868;0_0_7097114981015832839910411997114122_11994869;0_0_7097114981015832839910411997114122_11994870;0_0_7097114981015832839910411997114122_11994871;0_1_71114246115115101583276_11994870;0_1_71114246115115101583276_11994874;0_1_71114246115115101583276_11994878;0_1_71114246115115101583277_11994869;0_1_71114246115115101583277_11994873;0_1_71114246115115101583277_11994877;0_1_71114246115115101583283_11994868;0_1_71114246115115101583283_11994872;0_1_71114246115115101583283_11994876;0_1_7111424611511510158328876_11994871;0_1_7111424611511510158328876_11994875;0_1_7111424611511510158328876_11994879;'
//search Pattern:
y[0][0]='0';
y[0][1]='1';
y[0][2]='71114246115115101583283';
This results in 2 occurrences - not 3 as it should be.

The problem is that you're using the semicolon twice: Once at the start of the regex, once at the end.
Since in your example the three "matches" directly follow each other, the second occurrence is not found because its preceding semicolon has already been used in the previous match.
Solution: Use word boundaries ('\\b') instead of ';' in your regex.

Related

Replace all “?” by “&” except first

I’d would to replace all “?” by “&” except the first one by javascript. I found some regular expressions but they didn’t work.
I have something like:
home/?a=1
home/?a=1?b=2
home/?a=1?b=2?c=3
And I would like:
home/?a=1
home/?a=1&b=2
home/?a=1&b=2&c=3
Someone know how to I can do it?
Thanks!
I don't think it's possible with regex but you can split the string and then join it back together, manually replacing the first occurance:
var split = 'home/?a=1?b=2'.split('?'); // [ 'home/', 'a=1', 'b=2' ]
var replaced = split[0] + '?' + split.slice(1).join('&') // 'home/?a=1&b=2'
console.log(replaced);
You could match from the start of the string not a question mark using a negated character class [^?]+ followed by matching a question mark and capture that in the first capturing group. In the second capturing group capture the rest of the string.
Use replace and pass a function as the second parameter where you return the first capturing group followed by the second capturing group where all the question marks are replaced by &
let strings = [
"home/?a=1",
"home/?a=1?b=2",
"home/?a=1?b=2?c=3"
];
strings.forEach((str) => {
let result = str.replace(/(^[^?]+\?)(.*)/, function(match, group1, group2) {
return group1 + group2.replace(/\?/g, '&')
});
console.log(result);
});
You can split it by "?" and then rewrap the array:
var string = "home/?a=1?b=2";
var str = string.split('?');
var new = str[0] + '?'; // text before first '?' and first '?'
for( var x = 1; x < str.length; x++ ) {
new = new + str[x];
if( x != ( str.length - 1 ) ) new = new + '&'; //to avoid place a '&' after the string
}
You can use /([^\/])\?/ as pattern in regex that match any ? character that isn't after / character.
var str = str.replace(/([^\/])\?/g, "$1&");
var str = "home/?a=1\nhome/?a=1?b=2\nhome/?a=1?b=2?c=3\n".replace(/([^\/])\?/g, "$1&");
console.log(str);

indexOf for multiple options

Let say, I get the following using var content = this.innerHTML:
w here </div>
Using indexOf (or other ways), I want to check for the first position that has either "Space", "<" or "&nbsp".
In this case, it will be 1 (after "w").
What I am confused about is how do I check for the very first position that has either one of these three choices? Do I use Do...while to check for individual "options"?
You're probably looking for a Regular Expression (Regex) and the String#search method. Regex is a bit much to learn all at once, but I'll explain this example code.
You can use square brackets to denote a set of characters, so for example [ <] says "match either a space or a less-than sign."
You can use the pipe | to separate possibilities if you want to match one pattern or another, and that's how to account for matching a non-breaking space HTML entity.
var string = 'w here </div>',
index = string.search(/[ <]| /)
console.log(index) //=> 1
You can use a regular expression with alternations (|), which means "match one of these things". That will also tell you what you found, if that's useful:
function check(str) {
var m = / |<| /.exec(str);
if (!m) {
console.log("Not found in '" + str + "'");
return;
}
console.log("'" + m[0] + "' found at index " + m.index + " in '" + str + "'");
}
check("w here </div>");
check("where </div>");
check("where</div>");

Dynamic replace in regular expression scope

I need to rewrite some require paths in JavaScript source files:
Example (foo => ../../../foo/baz):
var a = require('foo/a'); => var b = require('../../../foo/baz/a');
var a = require('foo/../b'); => var b = require('../../../foo/baz/../b');
Note: This replacement will be done on a complete js source files. So require(' and ') must be used as delimiter!
So far we have figured out to use some setup like this:
var source = '';
source += "var a = require('foo/a');\n";
source += "var b = require('foo/../b');\n";
source += "console.log(a + b);";
var options = {
'foo': '../../../foo/baz'
};
for (var key in options) {
var regex = new RegExp('require[(](\"|\')' + key, 'g');
source = source.replace(regex, "require('" + options[key]);
}
console.log(source);
Though above source code is working. I am not sure if this is save as I am just skipping the closing delimiter.
I think this does it:
str = str.replace(/require\((['"])([^'"]*)foo\/([^'"]*)(['"])/g, "require($1$2../../../foo/baz/$3$4");
Here's that regex live: http://regex101.com/r/bE5jI4
Explanation:
require matches the characters require literally (case sensitive)
\( matches the character ( literally
1st Capturing group (['"])
['"] match either ' or " literally
2nd Capturing group ([^'"]*)
[^'"]* match a single character not present in the list below
Quantifier: Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
'" a single character in the list '" literally
foo matches the characters foo literally (case sensitive)
\/ matches the character / literally
3rd Capturing group ([^'"]*)
[^'"]* match a single character not present in the list below
Quantifier: Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
'" a single character in the list '" literally
4th Capturing group (['"])
['"] match ' or " literally
You may have to tweak it if there's optional whitespace before the opening quotes, or if your paths may contain ' or " characters. (In that latter case, you'll need two replacements, one when the wrapper quotes are ' and the other when they're ".)
This should work:
var source = '';
source += "var a = require('foo/a');\n";
source += "var b = require('foo/../b');\n";
source += "console.log(a + b);";
var options = {
'foo': '../../../foo/baz'
};
for (var key in options) {
var regex = new RegExp('(require)\\((["\'])(' + key + ')([^"\']*)\\2\\)', 'g');
source = source.replace(regex, "$1('" + options[key] + "$4')");
}
console.log(source);
OUTPUT:
var a = require('../../../foo/baz/a');
var b = require('../../../foo/baz/../b');
console.log(a + b);

Extract string when preceding number or combo of preceding characters is unknown

Here's an example string:
++++#foo+bar+baz++#yikes
I need to extract foo and only foo from there or a similar scenario.
The + and the # are the only characters I need to worry about.
However, regardless of what precedes foo, it needs to be stripped or ignored. Everything else after it needs to as well.
try this:
/\++#(\w+)/
and catch the capturing group one.
You can simply use the match() method.
var str = "++++#foo+bar+baz++#yikes";
var res = str.match(/\w+/g);
console.log(res[0]); // foo
console.log(res); // foo,bar,baz,yikes
Or use exec
var str = "++++#foo+bar+baz++#yikes";
var match = /(\w+)/.exec(str);
alert(match[1]); // foo
Using exec with a g modifier (global) is meant to be used in a loop getting all sub matches.
var str = "++++#foo+bar+baz++#yikes";
var re = /\w+/g;
var match;
while (match = re.exec(str)) {
// In array form, match is now your next match..
}
How exactly do + and # play a role in identifying foo? If you just want any string that follows # and is terminated by + that's as simple as:
var foostring = '++++#foo+bar+baz++#yikes';
var matches = (/\#([^+]+)\+/g).exec(foostring);
if (matches.length > 1) {
// all the matches are found in elements 1 .. length - 1 of the matches array
alert('found ' + matches[1] + '!'); // alerts 'found foo!'
}
To help you more specifically, please provide information about the possible variations of your data and how you would go about identifying the token you want to extract even in cases of differing lengths and characters.
If you are just looking for the first segment of text preceded and followed by any combination of + and #, then use:
var foostring = '++++#foo+bar+baz++#yikes';
var result = foostring.match(/[^+#]+/);
// will be the single-element array, ['foo'], or null.
Depending on your data, using \w may be too restrictive as it is equivalent to [a-zA-z0-9_]. Does your data have anything else such as punctuation, dashes, parentheses, or any other characters that you do want to include in the match? Using the negated character class I suggest will catch every token that does not contain a + or a #.

Regular expression to parse jQuery-selector-like string

text = '#container a.filter(.top).filter(.bottom).filter(.middle)';
regex = /(.*?)\.filter\((.*?)\)/;
matches = text.match(regex);
log(matches);
// matches[1] is '#container a'
//matchss[2] is '.top'
I expect to capture
matches[1] is '#container a'
matches[2] is '.top'
matches[3] is '.bottom'
matches[4] is '.middle'
One solution would be to split the string into #container a and rest. Then take rest and execute recursive exec to get item inside ().
Update: I am posting a solution that does work. However I am looking for a better solution. Don't really like the idea of splitting the string and then processing
Here is a solution that works.
matches = [];
var text = '#container a.filter(.top).filter(.bottom).filter(.middle)';
var regex = /(.*?)\.filter\((.*?)\)/;
var match = regex.exec(text);
firstPart = text.substring(match.index,match[1].length);
rest = text.substring(matchLength, text.length);
matches.push(firstPart);
regex = /\.filter\((.*?)\)/g;
while ((match = regex.exec(rest)) != null) {
matches.push(match[1]);
}
log(matches);
Looking for a better solution.
This will match the single example you posted:
<html>
<body>
<script type="text/javascript">
text = '#container a.filter(.top).filter(.bottom).filter(.middle)';
matches = text.match(/^[^.]*|\.[^.)]*(?=\))/g);
document.write(matches);
</script>
</body>
</html>
which produces:
#container a,.top,.bottom,.middle
EDIT
Here's a short explanation:
^ # match the beginning of the input
[^.]* # match any character other than '.' and repeat it zero or more times
#
| # OR
#
\. # match the character '.'
[^.)]* # match any character other than '.' and ')' and repeat it zero or more times
(?= # start positive look ahead
\) # match the character ')'
) # end positive look ahead
EDIT part II
The regex looks for two types of character sequences:
one ore more characters starting from the start of the string up to the first ., the regex: ^[^.]*
or it matches a character sequence starting with a . followed by zero or more characters other than . and ), \.[^.)]*, but must have a ) ahead of it: (?=\)). This last requirement causes .filter not to match.
You have to iterate, I think.
var head, filters = [];
text.replace(/^([^.]*)(\..*)$/, function(_, h, rem) {
head = h;
rem.replace(/\.filter\(([^)]*)\)/g, function(_, f) {
filters.push(f);
});
});
console.log("head: " + head + " filters: " + filters);
The ability to use functions as the second argument to String.replace is one of my favorite things about Javascript :-)
You need to do several matches repeatedly, starting where the last match ends (see while example at https://developer.mozilla.org/en/Core_JavaScript_1.5_Reference/Global_Objects/RegExp/exec):
If your regular expression uses the "g" flag, you can use the exec method multiple times to find successive matches in the same string. When you do so, the search starts at the substring of str specified by the regular expression's lastIndex property. For example, assume you have this script:
var myRe = /ab*/g;
var str = "abbcdefabh";
var myArray;
while ((myArray = myRe.exec(str)) != null)
{
var msg = "Found " + myArray[0] + ". ";
msg += "Next match starts at " + myRe.lastIndex;
print(msg);
}
This script displays the following text:
Found abb. Next match starts at 3
Found ab. Next match starts at 9
However, this case would be better solved using a custom-built parser. Regular expressions are not an effective solution to this problem, if you ask me.
var text = '#container a.filter(.top).filter(.bottom).filter(.middle)';
var result = text.split('.filter');
console.log(result[0]);
console.log(result[1]);
console.log(result[2]);
console.log(result[3]);
text.split() with regex does the trick.
var text = '#container a.filter(.top).filter(.bottom).filter(.middle)';
var parts = text.split(/(\.[^.()]+)/);
var matches = [parts[0]];
for (var i = 3; i < parts.length; i += 4) {
matches.push(parts[i]);
}
console.log(matches);

Categories

Resources