JavaScript Regex Infinite Loop on Some Patterns

JavaScript Regex Infinite Loop on Some Patterns - javascript

I am trying to use the exec method on the JavaScript Regex object and can get into an infinite loop where exec does not return a null depending on the expression.
Here is a test function I wrote to illustrate the problem. I ran it in Chrome 32. I am defining the Regex and match variables outside the loop. The max/Reached Max test is there to break out of the infinite loop.
function textExec(reg, text, max) {
max = max || 10
var match = null;
while (match = reg.exec(text)) {
console.log(match);
console.log(match.length + " " + match.index + "," + reg.lastIndex);
if (--max < 0 || match.index == reg.lastIndex) {
console.log('Reached Max');
break;
}
}
}
Here is a simple test that runs as expected.
textExec(/(o[a-z])/g, "body=//soap:Body");
["od", "od", index: 1, input: "body=//soap:Body"]
2 1,3
["oa", "oa", index: 8, input: "body=//soap:Body"]
2 8,10
["od", "od", index: 13, input: "body=//soap:Body"]
2 13,15
Here is the regular expression I am trying to use. It extracts an optional variable name and a required XPath expression. This will go into an infinite loop that is only stopped by the test I added. It appears to get to the end of the input text and hang.
textExec(/(([a-zA-Z0-9_-]*)=)?(.*)/g, "body=//soap:Body");
["body=//soap:Body", "body=", "body", "//soap:Body", index: 0, input: "body=//soap:Body"]
4 0,16
["", undefined, undefined, "", index: 16, input: "body=//soap:Body"]
4 16,16
Reached Max
Here is the same test simplified. It still sends it into an infinite loop.
textExec(/.*/g, "body=//soap:Body");
["body=//soap:Body", index: 0, input: "body=//soap:Body"]
1 0,16
["", index: 16, input: "body=//soap:Body"]
1 16,16
Reached Max
If the text includes a new-line, \n, it would hang at the character before it.
textExec(/.*/g, "//soap:Envelope\n//soap:Body");
["//soap:Envelope", index: 0, input: "//soap:Envelope?//soap:Body"]
1 0,15
["", index: 15, input: "//soap:Envelope\n//soap:Body"]
1 15,15
Reached Max
I would appreciate any help.
Wes.

The pattern .* matches the zero characters in the source string that come after the first match. It will keep on matching those zero characters forever. You could simplify a demonstration of that by matching against the empty string in the first place.
What you could do is quit when the match position stops changing.

Related

A simple regex solution for digit grouping and validation in javascript

I had a requirement to group digits (adding thousands separators) on both sides of the decimal point (the whole and the fractional parts) using javascript. I investigated whether this would be possible without too much complexity using regex.
The solutions available online seem to be focused on grouping whole numbers or only to the left of the decimal point, so I tried to come up with my own solution, including a validation regex...

The code in the snippet only tests using the given validation regex, so if it is not sound, the test results would be invalid...
function addThousandsSeparators(n) {
return n.toString().replace(/-?\d+?(?=(?:\d{3})+(?:\D|$))/gy, '$&,').replace(/\d{3}(?=\d)/g, '$&,')
}
function validateThousandsSeparators(n) {
return /^-?\d{1,3}(?:,\d{3})*(?:\.(?:\d{3},)*\d{1,3})?$/.test(n)
}
var tests = [
0,
1,
1.1,
1.0123456789,
-10.012345678,
210.01234567,
-3210.0123456,
43210.012345,
-543210.01234,
6543210.0123,
-76543210.012,
876543210.01,
-9876543210.1,
1,
10,
-210,
-3210,
43210,
543210,
-6543210,
-76543210,
876543210,
9876543210,
0.0123456789,
.012345678,
-0.01234567,
-.0123456,
0.012345,
.01234,
-0.0123,
-.012,
0.01,
.1
]
, i
function test(n) {
n = addThousandsSeparators(n)
console.log(
(
validateThousandsSeparators(n) ?
'valid: ' : 'invalid: '
) + n
)
}
for(i of tests)
test(i)

Brief
Please note that the output is what the OP expects. The output's decimal points should mirror the whole number separators such that 1234.1234 becomes 1,234.123,4 and not 1,234.1,234. For this opposite effect (format of 1,234.1,234), please expand and run the snippet directly below.
var nums = [ 0, 1, 1.1, 1.0123456789, -10.012345678, 210.01234567, -3210.0123456, 43210.012345, -543210.01234, 6543210.0123, -76543210.012, 876543210.01, -9876543210.1, 1, 10, -210, -3210, 43210, 543210, -6543210, -76543210, 876543210, 9876543210, 0.0123456789, .012345678, -0.01234567, -.0123456, 0.012345, .01234, -0.0123, -.012, 0.01, .1 ];
nums.forEach(function(n){
console.log(addSeparators(n));
});
function addSeparators(n) {
var regex = /\d{3}(?![-.]|$)/g;
var replace = `$&,`;
n = n.toString();
n = reverseNumber(n).replace(regex, replace);
return reverseNumber(n);
}
function reverseNumber(n) {
return n.split("").reverse().join("");
}
Code
The code below provides a simple regex and method to adding separators. The string is first reversed to add separators to the whole number part, then reversed once again to add separators to the decimal number part.
var nums = [ 0, 1, 1.1, 1.0123456789, -10.012345678, 210.01234567, -3210.0123456, 43210.012345, -543210.01234, 6543210.0123, -76543210.012, 876543210.01, -9876543210.1, 1, 10, -210, -3210, 43210, 543210, -6543210, -76543210, 876543210, 9876543210, 0.0123456789, .012345678, -0.01234567, -.0123456, 0.012345, .01234, -0.0123, -.012, 0.01, .1 ];
nums.forEach(function(n){
console.log(addSeparators(n));
});
function addSeparators(n) {
var regex = /\d{3}(?=[^.,]+$)(?!-)/g;
var replace = `$&,`;
n = n.toString();
n = reverseNumber(n).replace(regex, replace);
n = reverseNumber(n).replace(regex, replace);
return n;
}
function reverseNumber(n) {
return n.split("").reverse().join("");
}
Explanation
\d{3} Match a digit exactly 3 times
(?=[^.,]+$) Ensure no decimal points or separators exist between here and the end of the line (prevents duplicate separators and ensures it's only manipulating the part of the string after the .)
(?!-) Ensure what follows is not the negative sign (this prevents, for example, -543210.01234 from becoming -,543,210.012,34)

JavaScript - Matching alphanumeric patterns with RegExp

I'm new to RegExp and to JS in general (Coming from Python), so this might be an easy question:
I'm trying to code an algebraic calculator in Javascript that receives an algebraic equation as a string, e.g.,
string = 'x^2 + 30x -12 = 4x^2 - 12x + 30';
The algorithm is already able to break the string in a single list, with all values on the right side multiplied by -1 so I can equate it all to 0, however, one of the steps to solve the equation involves creating a hashtable/dictionary, having the variable as key.
The string above results in a list eq:
eq = ['x^2', '+30x', '-12', '-4x^2', '+12x', '-30'];
I'm currently planning on iterating through this list, and using RegExp to identify both variables and the respective multiplier, so I can create a hashTable/Dictionary that will allow me to simplify the equation, such as this one:
hashTable = {
'x^2': [1, -4],
'x': [30, 12],
' ': [-12]
}
I plan on using some kind of for loop to iter through the array, and applying a match on each string to get the values I need, but I'm quite frankly, stumped.
I have already used RegExp to separate the string into the individual parts of the equation and to remove eventual spaces, but I can't imagine a way to separate -4 from x^2 in '-4x^2'.

You can try this
(-?\d+)x\^\d+.
When you execute match function :
var res = "-4x^2".match(/(-?\d+)x\^\d+/)
You will get res as an array : [ "-4x^2", "-4" ]
You have your '-4' in res[1].
By adding another group on the second \d+ (numeric char), you can retrieve the x power.
var res = "-4x^2".match(/(-?\d+)x\^(\d+)/) //res = [ "-4x^2", "-4", "2" ]
Hope it helps

If you know that the LHS of the hashtable is going to be at the end of the string. Lets say '4x', x is at the end or '-4x^2' where x^2 is at end, then we can get the number of the expression:
var exp = '-4x^2'
exp.split('x^2')[0] // will return -4
I hope this is what you were looking for.

function splitTerm(term) {
var regex = /([+-]?)([0-9]*)?([a-z](\^[0-9]+)?)?/
var match = regex.exec(term);
return {
constant: parseInt((match[1] || '') + (match[2] || 1)),
variable: match[3]
}
}
splitTerm('x^2'); // => {constant: 1, variable: "x^2"}
splitTerm('+30x'); // => {constant: 30, variable: "x"}
splitTerm('-12'); // => {constant: -12, variable: undefined}
Additionally, these tool may help you analyze and understand regular expressions:
https://regexper.com/
https://regex101.com/
http://rick.measham.id.au/paste/explain.pl

Javascript - Remove all char '0' that come before another char

I have many strings like this:
0001, 0002, ..., 0010, 0011, ..., 0100, 0101,...
I would like these to become like this:
1, 2, ..., 10, 11, ..., 100, 101, ...
So I would like to remove all the 0 chars before a different char is present.
I tried with
.replace(/0/g, '')
But of course then it also removes the 0 chars after. Therefore for example 0010 becomes 1 instead of 10. Can you please help me?

You can do
.replace(/\d+/g, function(v){ return +v })

This is the shortes Solution
"0001".replace(/^0+/,""); // => 1
...
// Tested on Win7 Chrome 44+
^ ... starting of the String
0+ ... At least one 0
P.s.: test Regex on pages likes: https://regex101.com/ or https://www.debuggex.com
Update 1:
For one long String
"0001, 0002, 0010, 0011, 0100, 0101".replace(/(^|\s)0+/g,"") // => 1, 2, 10, 11, 100, 101
// Tested on Win7 Chrome 44+
Examples:
// short Strings
var values = ['0001', '0002','0010', '0011','0100','0101'];
for(var idx in values){
document.write(values[idx] + " -> "+values[idx].replace(/^0+/,"") + "<br/>");
}
// one long String
document.write("0001, 0002, 0010, 0011, 0100, 0101".replace(/(^|\s)0+/g,""));

Previously answered here.
.replace(/^0+(?!$)/, '')
Functionally the same as winner_joiner's answer, with the exception that this particular regex won't return a completely empty string should the input consist entirely of zeroes.

Use regex as /(^|,\s*)0+/g it will select 0's at beginning or followed by , and space
document.write('0001, 0002, ..., 0010, 0011, ..., 0100, 0101,...'.replace(/(^|,\s*)0+/g,'$1'))
Explanation :
(^|,\s*)0+
Debuggex Demo

var text='00101';
var result=parseInt(text);

Get ANSI color for character at index

I have developed couleurs NPM package which can be set to append rgb method to String.prototype:
> console.log("Hello World!".rgb(255, 0, 0)) // "Hello World!" in red
Hello World!
undefined
> "Hello World!".rgb(255, 0, 0)
'\u001b[38;5;196mHello World!\u001b[0m'
This works fine. What's the proper way to get the ANSI color/style of character at index i?
Probably this can be hacked with some regular expressions, but I'm not sure if that's really good (however, if a correct implementation is available I'm not against it)... I'd prefer a native way to get the color/style by accessing the character interpreted by tty.
> function getStyle (input, i) { /* get style at index `i` */ return style; }
> getStyle("Hello World!".rgb(255, 0, 0), 0); // Get style of the first char
{
start: "\u001b[38;5;196m",
end: "\u001b[0m",
char: "H"
}
> getStyle("Hello " + "World!".rgb(255, 0, 0), 0); // Get style of the first char
{
start: "",
end: "",
char: "H"
}
Things get complicated when we have multiple combined styles:
> console.log("Green and Italic".rgb(0, 255, 0).italic())
Green and Italic
undefined
> getStyle("Green and Italic".rgb(0, 255, 0).italic(), 0);
{
start: "\u001b[3m\u001b[38;5;46m",
end: "\u001b[0m\u001b[23m",
char: "G"
}
> getStyle(("Bold & Red".bold() + " but this one is only red").rgb(255, 0, 0), 0);
{
start: "\u001b[38;5;196m\u001b[1m",
end: "\u001b[22m\u001b[0m",
char: "B"
}
> getStyle(("Bold & Red".bold() + " but this one is only red").rgb(255, 0, 0), 11);
{
start: "\u001b[38;5;196m",
end: "\u001b[0m",
char: "u"
}
> ("Bold & Red".bold() + " but this one is only red").rgb(255, 0, 0)
'\u001b[38;5;196m\u001b[1mBold & Red\u001b[22m but this one is only red\u001b[0m'
Like I said, I'm looking for a native way (maybe using a child process).
So, how to get the complete ANSI style for character at index i?

There are a couple of ways to 'add' formatting to text, and this is one of them. The problem is you are mixing text and styling into the same object -- a text string. It's similar to RTF
Here is some \b bold\b0 and {\i italic} text\par
but different from, say, the native format of Word .DOC files, which works with text runs:
(text) Here is some bold and italic text\r
(chp) 13 None
4 sprmCFBold
5 None
6 sprmCFItalic
6 None
-- the number at the left is the count of characters with a certain formatting.
The latter format is what you are looking for, since you want to index characters in the plain text. Subtracting the formatting lengths will show which one you are interested in. Depending on how many times you expect to ask for a formatting, you can do one-time runs only, or cache the formatted text somewhere.
A one-time run needs to inspect each element of the encoded string, incrementing the "text" index when not inside a color string, and updating the 'last seen' color string if it is. I added a compatible getCharAt function for debugging purposes.
var str = '\u001b[38;5;196m\u001b[1mBo\x1B[22mld & Red\u001b[22m but this one is only red\u001b[0m';
const map = {
bold: ["\x1B[1m", "\x1B[22m" ]
, italic: ["\x1B[3m", "\x1B[23m" ]
, underline: ["\x1B[4m", "\x1B[24m" ]
, inverse: ["\x1B[7m", "\x1B[27m" ]
, strikethrough: ["\x1B[9m", "\x1B[29m" ]
};
String.prototype.getColorAt = function(index)
{
var strindex=0, color=[], cmatch, i,j;
while (strindex < this.length)
{
cmatch = this.substr(strindex).match(/^(\u001B\[[^m]*m)/);
if (cmatch)
{
// Global reset?
if (cmatch[0] == '\x1B[0m')
{
color = [];
} else
{
// Off code?
for (i=0; i<map.length; i++)
{
if (map[i][1] == cmatch[0])
{
// Remove On code?
for (j=color.length-1; j>=0; j--)
{
if (color[j] == map[i][0])
color.splice (j,1);
}
break;
}
}
if (j==map.length)
color.push (cmatch[0]);
}
strindex += cmatch[0].length;
} else
{
/* a regular character! */
if (!index)
break;
strindex++;
index--;
}
}
return color.join('');
}
String.prototype.getCharAt = function(index)
{
var strindex=0, cmatch;
while (strindex < this.length)
{
cmatch = this.substr(strindex).match(/^(\u001B\[[^m]*m)/);
if (cmatch)
{
strindex += cmatch[0].length;
} else
{
/* a regular character! */
if (!index)
return this.substr(strindex,1);
strindex++;
index--;
}
}
return '';
}
console.log (str);
color = str.getColorAt (1);
text = str.getCharAt (1);
console.log ('color is '+color+color.length+', char is '+text);
The returned color is still in its original escaped encoding. You can make it return a constant of some kind by adding these into your original map array.

I can't provide you with a full solution, but here's a sketch:
maintain a stack which accumulates the current format
split a string into chunks espace sequence | just a character
iterate over this list of chunks
if it's just a char, save its index + the current state of the stack
if it's an escape, either push the respective format onto the stack, or pop the format from it
You can also use this algorithm to convert an escaped string into html, and use XML methods to walk the result tree.
BTW, the latter would be also nice the other way round, how about this:
console.log("<font color='red'>hi <b>there</b></font>".toANSI())

Doing assignment in VBscript now.. Need to give positions of each "e" in a string

I've done this in JavaScript but needless to say I can't just swap it over.
In Jscript I used this:
var estr = tx_val
index = 0
positions = []
while((index = estr.indexOf("e", index + 1)) != -1)
{
positions.push(index);
}
document.getElementById('ans6').innerHTML = "Locations of 'e' in string the are: "
+ positions;
I tried using the same logic with VBS terms, ie join, I also tried using InStr. I'm just not sure how to yank out that 'e'... Maybe I'll try replacing it with another character.
Here is what I tried with VBScript. I tried using InStr and replace to yank out the first occurance of 'e' in each loop and replace it with an 'x'. I thought that maybe this would make the next loop through give the location of the next 'e'. -- When I don't get a subscript out of range 'i' error, I only get one location back from the script and its 0.
(6) show the location of each occurence of the character "e" in the string "tx_val" in the span block with id="ans6"
countArr = array()
countArr = split(tx_val)
estr = tx_val
outhtml = ""
positions = array()
i=0
for each word in countArr
i= i+1
positions(i) = InStr(1,estr,"e",1)
estr = replace(estr,"e","x",1,1)
next
document.getElementById("ans6").innerHTML = "E is located at: " & positions
What can I do that is simpler than this and works? and thank you in advance, you all help a lot.
EDIT AGAIN: I finally got it working right. I'm not 100% how. But I ran through the logic in my head a few dozen times before I wrote it and after a few kinks it works.
local = ""
simon = tx_val
place=(InStr(1,simon,"e"))
i=(len(simon))
count = tx_val
do
local = (local & " " & (InStr((place),simon,"e")))
place = InStr((place+1),simon,"e")
count = (InStr(1,simon,"e"))
loop while place <> 0
document.getElementById("ans6").innerHTML= local

InStr has slightly different parameters to indexOf:
InStr([start, ]string, searchValue[, compare])
start: The index at which to start searching
string: The string to search
searchValue: The string to search for
Also note that Visual Basic indexes strings beginning at 1 so all the input and return index values are 1 more than the original JavaScript.

You can try split(). For example a simple string like this:
string = "thisismystring"
Split on "s", so we have
mystring = Split(string,"s")
So in the array mystring, we have
thi i my tring
^ ^ ^ ^
[0] [1] [2] [3]
All you have to do is check the length of each array item using Len(). For example, item 0 has length of 3 (thi), so the "s" is at position 4 (which is index 3). Take note of this length, and do for the next item. Item 1 has length of 1, so we add it to 4, to get 5, and so on.
#Update, here's an example using vbscript
thestring = "thisismystring"
delimiter="str"
mystring = Split(thestring,delimiter)
c=0
For i=0 To UBound(mystring)-1
c = c + Len(mystring(i)) + Len(delimiter)
WScript.Echo "index of s: " & c - Len(delimiter)
Next
Trial:
C:\test> cscript //nologo test.vbs
index of str is: 8

Develop Reference

JavaScript is the programming language of the Web.

JavaScript Regex Infinite Loop on Some Patterns - javascript

Related

A simple regex solution for digit grouping and validation in javascript

JavaScript - Matching alphanumeric patterns with RegExp

Javascript - Remove all char '0' that come before another char

Get ANSI color for character at index

Doing assignment in VBscript now.. Need to give positions of each "e" in a string

Categories

Resources