Unexpected behavior of regexp in JavaScript - javascript

I've encountered this weird behavior:
I'm on a breakpoint (variables don't change). At the console you can see, that each time I try to evaluate regexp methods on the same unchanging variable "text" I get these opposite responses. Is there an explanation for such thing?
The relevant code is here:
this.singleRe = /<\$([\s\S]*?)>/g;
while( this.singleRe.test( text ) ){
match = this.singleRe.exec( text );
result = "";
if( match ){
result = match[ 1 ].indexOf( "." ) != -1 ? eval( "obj." + match[ 1 ] ) : eval( "value." + match[ 1 ] );
}
text = text.replace( this.singleRe , result );
}

When you use regex with exec() and a global flag - g, a cursor is changing each time, like here:
var re = /\w/g;
var s = 'Hello regex world!'
re.exec(s); // => ['H']
re.exec(s); // => ['e']
re.exec(s); // => ['l']
re.exec(s); // => ['l']
re.exec(s); // => ['o']
Note the g flag! This means that regex will match multiple occurencies instead of one!
EDIT
I suggest instead of using regex.exec(string) to use string.match(regex) if possible. This will yield an array of occurences and it is easy to inspect the array or to iterate through it.

Related

How to use balanced parenthesis algorithm to replace words from a javascript string?

Let's consider we have a string named str, which is defined as :
var str = "I want to replace( this & ( this ) )"
Now, I did something like this :
str = str.replace(/replace\((.*?)\)/gm, function(_, a) {
console.log("Replacing : " + a)
return "it !"
}
Output :
// In Console
Replacing : this & ( this
// Returned
I want to it ! )
But, I wanted the output as :
// In Console
Replacing : this & ( this )
// In Return
I want it !
I heard about the Balanced Parenthesis Algorithm. Therefore can this help me to solve this task ? If yes, how ? What if, there are more brackets in the string ? If no, how can this be done ?
const str = "I want to replace ( this & ( this ) )"
const withoutPlaceholder = [str.split('to replace')[0], 'it!'].join('')
// output: I want it!
const withoutAllParenthesis = str.split(/\(([^)]+)\)/)
.find(strPart => !strPart.includes('(') && !strPart.includes(')')) + ' it!'
// output: I want to replace it!
const balancedParenthesis = XRegExp.matchRecursive(str, '\\(', '\\)', 'g')[0]
const withoutBalancedParenthesis = str.split(balancedParenthesis).join('it!')
// output: I want to replace (it!)
console.log(withoutPlaceholder)
console.log(withoutAllParenthesis)
console.log(withoutBalancedParenthesis)
<script src="https://cdnjs.cloudflare.com/ajax/libs/xregexp/3.2.0/xregexp-all.js"></script>
related questions and answers:
- Regular expression to match balanced parentheses

How to split() string with brackets in Javascript

i just want to make
str = "a(bcde(dw)d)e"
to
arr = {"a", "(bcde)", "(dw)", "(d)", "e"}
What regEx can i use in str.split()?
PS: Explanations || helpful links welcome.
Examples:
s: "a(bcdefghijkl(mno)p)q" --> [ 'a', '(bcdefghijkl)', '(mno)', '(p)', 'q' ]
s: "abc(cba)ab(bac)c" --> [ 'abc', '(cba)', 'ab', '(bac)', 'c' ]
Go through each parentheses using a counter:
array = [], c = 0;
'abc(cba)ab(bac)c'.split(/([()])/).filter(Boolean).forEach(e =>
// Increase / decrease counter and push desired values to an array
e == '(' ? c++ : e == ')' ? c-- : c > 0 ? array.push('(' + e + ')') : array.push(e)
);
console.log(array)
Edit
str = "a(bcde(dw)d)e"
// replace any `(alpha(` by `(alpha)(`
str1 = str.replace(/\(([^)]+)\(/g, '($1)(');
// replace any `)alpha)` by )(alpha)`
str2 = str1.replace(/\)([^(]+)\)/g, ')($1)');
// prefix any opening parenthesis with #--# (just a character string unlikly to appear in the original string)
str3 = str2.replace(/\(/g, '#--#(');
// prefix any closing parenthesis with #--#
str4 = str3.replace(/\)/g, ')#--#');
// remove any double `#--#`
str5 = str4.replace(/(#--#)+/g, '#--#');
// split by invented character string
arr = str5.split('#--#');
console.log(arr);
Old wrong answer
str = "a(bcde(dw)d)e"
console.log(str.split(/[()]/));
This looks a little bit weird, but it's like this.
str is string which has a split method. This can take a string or a regular expression as argument. A string will be delimited by " and a RegExp by /.
The brackets [] wrap a character class which means any one of the characters inside. Then inside we have the two parentheses () which are the two characters we are looking for.
I don't think the result you want is possible without modifying the values of the array after the split. But if you want to be able to split the string based on 2 symbols (in this case the brackets '(' and ')') you can do this:
var arr = str.split("(").toString().split(")");
It returns an array with the "words" of the string.
I hope I could help.
Given that the desired output includes characters that aren't in the string, e.g., adding closing or opening parentheses to the substrings in the outer part of the nested parentheses, it will be necessary to make some changes to the individual substrings after they are extracted one way or another.
Maybe something like this:
function getGroups(str) {
var groups = str.match(/(?:^|[()])[^()]+/g)
if (!groups) return []
var parenLevel = 0
return groups.map(function(v) {
if (v[0] === "(") {
parenLevel++
} else if (v[0] === ")") {
parenLevel--
}
v = v.replace(/[()]/,"")
return parenLevel > 0 ? "(" + v + ")" : v
})
}
console.log(JSON.stringify( getGroups("a(bcde(dw)d)e") ))
console.log(JSON.stringify( getGroups("abc(cba)ab(bac)c") ))
console.log(JSON.stringify( getGroups("ab(cd)ef(gh)") ))
console.log(JSON.stringify( getGroups("ab(cd)(e(f(gh)i))") ))
console.log(JSON.stringify( getGroups("(ab(c(d))ef(gh)i)") ))

Extract all matches from given string

I have string:
=?windows-1256?B?IObH4cPM5dLJIA==?= =?windows-1256?B?x+HYyO3JIC4uLg==?= =?windows-1256?B?LiDH4djj5s3Hyg==?= =?windows-1256?B?Rlc6IOTP5skgKA==?=
I need to extract all matches between ?B? and ==?=.
As a result I need:
IObH4cPM5dLJIA
x+HYyO3JIC4uLg
LiDH4djj5s3Hyg
Rlc6IOTP5skgKA
P.S. This string is taken from textarea and after function executed, script should replace current textarea value with result. I've tried everything,
var result = str.substring(str.indexOf('?B?')+3,str.indexOf('==?='));
Works almost the way I need, but it only finds first match. And this doesn't work:
function Doit(){
var str = $('#test').text();
var pattern = /(?B?)([\s\S]*?)(==?=)/g;
var result = str.match(pattern);
for (var i = 0; i < result.length; i++) {
$('#test').html(result);
};
}
? has a special meaning in regex which matches preceding character 0 or 1 time..
So, ? should be escaped with \?
So the regex should be
(?:\?B\?)(.*?)(?:==\?=)
[\s\S] has no effect and is similar to .
The metacharacter ? needs escaping, i.e. \? so it is treated as a literal ?.
[\s\S] is important as it matches all characters including newlines.
var m,
pattern = /\?B\?([\s\S]*?)==\?=/g;
while ( m = pattern.exec( str ) ) {
console.log( m[1] );
}
// IObH4cPM5dLJIA
// x+HYyO3JIC4uLg
// LiDH4djj5s3Hyg
// Rlc6IOTP5skgKA
Or a longer but perhaps clearer way of writing the above loop:
m = pattern.exec( str );
while ( m != null ) {
console.log( m[1] );
m = pattern.exec( str );
}
The String match method does not return capture groups when the global flag is used, but only the full match itself.
Instead, the capture group matches of a global match can be collected from multiple calls to the RegExp exec method. Index 0 of a match is the full match, and the further indices correspond to each capture group match. See MDN exec.

Form Regex that finds pattern within a repeating decimal

How can I form a regular expression that match the unique numbers that repeat in a repeating decimals?
Currently my regular expressions is the following.
var re = /(?:[^\.]+\.\d*)(\d+)+(?:\1)$/;
Example:
// Pass
deepEqual( func(1/111), [ "0.009009009009009009", "009" ] );
// Fails, since func(11/111) returns [ "0.099099099099099", "9" ]
deepEqual( func(11/111), [ "0.099099099099099", "099" ] );
Live demo here: http://jsfiddle.net/9dGsw/
Here's my code.
// Goal: Find the pattern within repeating decimals.
// Problem from: Ratio.js <https://github.com/LarryBattle/Ratio.js>
var func = function( val ){
var re = /(?:[^\.]+\.\d*)(\d+)+(?:\1)$/;
var match = re.exec( val );
if( !match ){
val = (val||"").toString().replace( /\d$/, '' );
match = re.exec( val );
}
return match;
};
test("find repeating decimals.", function() {
deepEqual( func(1), null );
deepEqual( func(1/10), null );
deepEqual( func(1/111), [ "0.009009009009009009", "009" ] );
// This test case fails...
deepEqual( func(11/111), [ "0.099099099099099", "099" ],
"What's wrong with re in func()?" );
deepEqual( func(100/111), [ "0.9009009009009009", "009"] );
deepEqual( func(1/3), [ "0.3333333333333333", "3"]);
});
Ok. I somewhat solved my own problem by taking Joel's advice.
The problem was that the regular expression section, (\d+)+(?:\1)$, was matching the pattern closest to the end of the string, which made it return "9", instead of "099" for the string "0.099099099099099".
The way I overcame this problem was by setting the match length to 2 or greater, like so.
(\d{2,})+(?:\1)$,
and filtering the result with /^(\d+)(?:\1)$/, incase that a pattern is stuck inside a pattern.
Here's the code that passes all my test cases.
Live Demo: http://jsfiddle.net/9dGsw/1/
var func = function( val ){
val = (val || "").toString();
var RE_PatternInRepeatDec = /(?:[^\.]+\.\d*)(\d{2,})+(?:\1)$/,
RE_RepeatingNums = /^(\d+)(?:\1)$/,
match = RE_PatternInRepeatDec.exec( val );
if( !match ){
// Try again but take off last digit incase of precision error.
val = val.replace( /\d$/, '' );
match = RE_PatternInRepeatDec.exec( val );
}
if( match && 1 < match.length ){
// Reset the match[1] if there is a pattern inside the matched pattern.
match[1] = RE_RepeatingNums.test(match[1]) ? RE_RepeatingNums.exec(match[1])[1] : match[1];
}
return match;
};
Thank you for everyone that helped.
Use: var re = /^(?:\d*)\.(\d{1,3})(?:\1)+$/
I have defined the min/max length with {min,max} of the repeating decimal because otherwise 009009009 would match in the first test case as well. Maybe it is still not the final solution, but at least a hint.

Remove all dots except the first one from a string

Given a string
'1.2.3.4.5'
I would like to get this output
'1.2345'
(In case there are no dots in the string, the string should be returned unchanged.)
I wrote this
function process( input ) {
var index = input.indexOf( '.' );
if ( index > -1 ) {
input = input.substr( 0, index + 1 ) +
input.slice( index ).replace( /\./g, '' );
}
return input;
}
Live demo: http://jsfiddle.net/EDTNK/1/
It works but I was hoping for a slightly more elegant solution...
There is a pretty short solution (assuming input is your string):
var output = input.split('.');
output = output.shift() + '.' + output.join('');
If input is "1.2.3.4", then output will be equal to "1.234".
See this jsfiddle for a proof. Of course you can enclose it in a function, if you find it necessary.
EDIT:
Taking into account your additional requirement (to not modify the output if there is no dot found), the solution could look like this:
var output = input.split('.');
output = output.shift() + (output.length ? '.' + output.join('') : '');
which will leave eg. "1234" (no dot found) unchanged. See this jsfiddle for updated code.
It would be a lot easier with reg exp if browsers supported look behinds.
One way with a regular expression:
function process( str ) {
return str.replace( /^([^.]*\.)(.*)$/, function ( a, b, c ) {
return b + c.replace( /\./g, '' );
});
}
You can try something like this:
str = str.replace(/\./,"#").replace(/\./g,"").replace(/#/,".");
But you have to be sure that the character # is not used in the string; or replace it accordingly.
Or this, without the above limitation:
str = str.replace(/^(.*?\.)(.*)$/, function($0, $1, $2) {
return $1 + $2.replace(/\./g,"");
});
You could also do something like this, i also don't know if this is "simpler", but it uses just indexOf, replace and substr.
var str = "7.8.9.2.3";
var strBak = str;
var firstDot = str.indexOf(".");
str = str.replace(/\./g,"");
str = str.substr(0,firstDot)+"."+str.substr(1,str.length-1);
document.write(str);
Shai.
Here is another approach:
function process(input) {
var n = 0;
return input.replace(/\./g, function() { return n++ > 0 ? '' : '.'; });
}
But one could say that this is based on side effects and therefore not really elegant.
This isn't necessarily more elegant, but it's another way to skin the cat:
var process = function (input) {
var output = input;
if (typeof input === 'string' && input !== '') {
input = input.split('.');
if (input.length > 1) {
output = [input.shift(), input.join('')].join('.');
}
}
return output;
};
Not sure what is supposed to happen if "." is the first character, I'd check for -1 in indexOf, also if you use substr once might as well use it twice.
if ( index != -1 ) {
input = input.substr( 0, index + 1 ) + input.substr(index + 1).replace( /\./g, '' );
}
var i = s.indexOf(".");
var result = s.substr(0, i+1) + s.substr(i+1).replace(/\./g, "");
Somewhat tricky. Works using the fact that indexOf returns -1 if the item is not found.
Trying to keep this as short and readable as possible, you can do the following:
JavaScript
var match = string.match(/^[^.]*\.|[^.]+/g);
string = match ? match.join('') : string;
Requires a second line of code, because if match() returns null, we'll get an exception trying to call join() on null. (Improvements welcome.)
Objective-J / Cappuccino (superset of JavaScript)
string = [string.match(/^[^.]*\.|[^.]+/g) componentsJoinedByString:''] || string;
Can do it in a single line, because its selectors (such as componentsJoinedByString:) simply return null when sent to a null value, rather than throwing an exception.
As for the regular expression, I'm matching all substrings consisting of either (a) the start of the string + any potential number of non-dot characters + a dot, or (b) any existing number of non-dot characters. When we join all matches back together, we have essentially removed any dot except the first.
var input = '14.1.2';
reversed = input.split("").reverse().join("");
reversed = reversed.replace(\.(?=.*\.), '' );
input = reversed.split("").reverse().join("");
Based on #Tadek's answer above. This function takes other locales into consideration.
For example, some locales will use a comma for the decimal separator and a period for the thousand separator (e.g. -451.161,432e-12).
First we convert anything other than 1) numbers; 2) negative sign; 3) exponent sign into a period ("-451.161.432e-12").
Next we split by period (["-451", "161", "432e-12"]) and pop out the right-most value ("432e-12"), then join with the rest ("-451161.432e-12")
(Note that I'm tossing out the thousand separators, but those could easily be added in the join step (.join(','))
var ensureDecimalSeparatorIsPeriod = function (value) {
var numericString = value.toString();
var splitByDecimal = numericString.replace(/[^\d.e-]/g, '.').split('.');
if (splitByDecimal.length < 2) {
return numericString;
}
var rightOfDecimalPlace = splitByDecimal.pop();
return splitByDecimal.join('') + '.' + rightOfDecimalPlace;
};
let str = "12.1223....1322311..";
let finStr = str.replace(/(\d*.)(.*)/, '$1') + str.replace(/(\d*.)(.*)/, '$2').replace(/\./g,'');
console.log(finStr)
const [integer, ...decimals] = '233.423.3.32.23.244.14...23'.split('.');
const result = [integer, decimals.join('')].join('.')
Same solution offered but using the spread operator.
It's a matter of opinion but I think it improves readability.

Categories

Resources