How can I match this patern over multiple lines - javascript

Given the below text, I want to return an array of all the the lines of text with the following format 1.SSRDOCSYYHK1/////25AUG52/M//YOUNG/LANDON/KWAN- 1.1 and this should macth a line even if it is actually broken across multiple lines
RegExp:
str.match(/\d{1,2}.SSRDOCSYYHK1\/\/\/\/\/.+?\d\.\d/g)
Full Text:
var str= "A-CA25592185
A-ERNONREF/CHGFEEPLUSFAREDIF/CXL BY FLT TIME NOVALUE
TKG FAX-NOT PRICED FARE TYPE EX
FOP- 1.CA
G- 1.SSRDOCSWSHK1/////25MAY55/M//YOUNG/LANDON/KWAN - 1.
1
)>MD
2.SSRPSPTYYHK1///25AUG52/M- 1.1
3.SSRDOCSWSHK1/////25AUG52/F//YOUNG/LILY/LIMKUO - 2.1
4.SSRPSPTYYHK1///25AUG52/F- 2.1
5.SSRDOCSWSHK1/////25AUG52/F//YOUNG/ANDREA/LAUREN - 3.1
6.SSRPSPTYYHK1///25AUG52/F- 3.1
7.SSRDOCSWSHK1/////17MAR93/M//YOUNG/ETHAN/WESLEY - 4.1
8.SSRPSPTYYHK1///25AUG52/M- 4.1
9.SSRDOCSWSHK1/////23NOV96/M//YOUNG/WINSTON/JEREMY - 5.1
10.SSRPSPTYYHK1///25AUG52/M- 5.1
11.SSRDOCSYYHK1/////25MAY55/M//YOUNG/LANDON/KWAN - 1.
1
12.SSRDOCSYYHK1/////04MAR59/F//YOUNG/LILY/LIMKUO - 2.1
13.SSRDOCSYYHK1/////25AUG52/F//YOUNG/ANDREA/LAUREN - 3.1
)>MD
7.SSRDOCSWSHK1/////25AUG52/M//YOUNG/ETHAN/WESLEY - 4.1
8.SSRPSPTYYHK1///25AUG52/M- 4.1
9.SSRDOCSWSHK1/////25AUG52/M//YOUNG/WINSTON/JEREMY - 5.1
10.SSRPSPTYYHK1///25AUG52/M- 5.1
11.SSRDOCSYYHK1/////25MAY55/M//YOUNG/LANDON/KWAN - 1.
1
12.SSRDOCSYYHK1/////25AUG52/F//YOUNG/LILY/LIMKUO - 2.1
13.SSRDOCSYYHK1/////25AUG52/F//YOUNG/ANDREA/LAUREN - 3.1
14.SSRDOCSYYHK1/////25AUG52/M//YOUNG/ETHAN/WESLEY - 4.1
15.SSRDOCSYYHK1/////25AUG52/M//YOUNG/WINSTON/JEREMY - 5.1
**** ITEMS SUPPRESSED ****/DR"
I expect an array with all the matches but the two instances of line 11 are not matched due to the line break which can occur in any of the below way and will not currently be matched:
var str="1.SSRDOCSYYHK1/////25AUG52/M//
YOUNG/LANDON/KWAN- 1.1"
var str="1.SSRDOCSYYHK1/////25AUG52/M//YOUNG/LANDON/KWAN- 1.
1"
var str="1.SSRDOCSYYHK1/////25AUG52/M//YOUNG/LANDON/KWAN- 1
.1"
var str="1.SSRDOCSYYHK1/////25AUG52/M//YOUNG/LANDON/KWAN-
1.1"
var str="1.SSRDOCSYYHK1/////25AUG52/M//YOUNG/LANDON/KWAN
- 1.1"
How change I tell this RegExp to still match in all of the above cases?
I did try str.match(/\d{1,2}.SSRDOCSYYHK1\/\/\/\/\/.+?\d\.\d/m) with no luck
Heres the array that I do get:
anubhava's answer below returns the following array, note that slots 0 and 2 actually hold two lines that were captured as a single instance. This always happens when a line breaks like this and is followed by another matching line if I use his example.

If it can be broken anywhere, not only in the DOT matching, the \s trick won't work.
I don't think there's a way to ignore line breaks in javascript regex (or any other engine, actually).
Your best option would be to remove all line breaks before matching, like so:
str = str.replace(/(\r\n|\n|\r)/gm,"");
And then you .match

Ok. below regular work for 1.SSRDOCSYYHK1/////25AUG52/M//YOUNG/LANDON/KWAN- 1.1
[0-1]\.[A-Z]+[0-1]\/\/\/\/\/[0-9]+[A-Z]+[0-9]+\/[A-Z]\/\/[A-Z]+\/[A-Z]+\/[A-Z]+\-\s[0-1]\.[0-1]
And it work for :
1.SSRDOCSYYHK1/////25AUG52/M//YOUNG/LANDON/KWAN- 1.
1
[0-1]\.[A-Z]+[0-1]\/\/\/\/\/[0-9]+[A-Z]+[0-9]+\/[A-Z]\/\/[A-Z]+\/[A-Z]+\/[A-Z]+\-\s[0-1]\.\n[0-1]
And it work for :
1.SSRDOCSYYHK1/////25AUG52/M//YOUNG/LANDON/KWAN- 1
.1
[0-1]\.[A-Z]+[0-1]\/\/\/\/\/[0-9]+[A-Z]+[0-9]+\/[A-Z]\/\/[A-Z]+\/[A-Z]+\/[A-Z]+\-\s[0-1]\n\.[0-1]
And it work for :
1.SSRDOCSYYHK1/////25AUG52/M//YOUNG/LANDON/KWAN-
1.1
[0-1]\.[A-Z]+[0-1]\/\/\/\/\/[0-9]+[A-Z]+[0-9]+\/[A-Z]\/\/[A-Z]+\/[A-Z]+\/[A-Z]+\-\n[0-1]\.[0-1]
And it work for :
1.SSRDOCSYYHK1/////25AUG52/M//YOUNG/LANDON/KWAN
- 1.1
[0-1]\.[A-Z]+[0-1]\/\/\/\/\/[0-9]+[A-Z]+[0-9]+\/[A-Z]\/\/[A-Z]+\/[A-Z]+\/[A-Z]+\n\-\s[0-1]\.[0-1]
now you should define nested if and else .....(Conditional statements)
good look.

DOT in Javascript doesn't match new lines and unfortunately there is no DOTALL switch in JS regex engine.
However as a workaround you can use [\s\S] for DOT and match regex across new lines as well.
Following regex should work for you:
var arr = str.match(/\d{1,2}\.SSRDOCSYYHK1\/{3,5}[\s\S]+?\d\.\d/g);
Live Demo: http://ideone.com/QIYCMA

Related

Regex to validate Telephone number extension

I have requirement to validate telephone number (TN) extension (Just extension only). The extension can be in 3-6 digit length and in 3 digit extension that should not follow 11. And yes other things, the extension should not have special chars and all zeros.
For example: 911, 311 etc.,
We have written the below one.
(?!0+$)[0-9](?!.*11).[0-9]*$
The issue with the above is
For 311, 211 --> Validation is pass.
For 38311, 2311 --> Those are 4 and 5 digit length extension and it can be suffixed with '11'. But the above pattern is not allowing it. How can I achieve that?
You could use:
(?!^((0+)|(\d11))$)(?=^\d{3,6}$).*
(?!^((0+)|(\d11))$) - From start to finish make sure it's not all zeros nor a digit followed by 11
(?=^\d{3,6}$) - From start to finish make sure we are dealing with 3 to 6 digits
.* - If the previous validations passed then it's safe to grab everything
https://regex101.com/r/eIVvvX/1
For checking nonzero you can simply use > operator and for rest of rules you can use this pattern
let data = ['911','311','38311','2311','000000','123111', '112']
data.forEach(v=>{
console.log(v, '\t' , v > 0 && /^(?:(?:(?!11$)\d){3}|\d{4,6})$/.test(v))
})
For checking non zero you can use regex too, but i prefer the above method personally
^(?!^0+$)(?:(?:(?!11$)\d){3}|\d{4,6})$

RegEx to filter out all but one decimal point [duplicate]

i need a regular expression for decimal/float numbers like 12 12.2 1236.32 123.333 and +12.00 or -12.00 or ...123.123... for using in javascript and jQuery.
Thank you.
Optionally match a + or - at the beginning, followed by one or more decimal digits, optional followed by a decimal point and one or more decimal digits util the end of the string:
/^[+-]?\d+(\.\d+)?$/
RegexPal
The right expression should be as followed:
[+-]?([0-9]*[.])?[0-9]+
this apply for:
+1
+1.
+.1
+0.1
1
1.
.1
0.1
Here is Python example:
import re
#print if found
print(bool(re.search(r'[+-]?([0-9]*[.])?[0-9]+', '1.0')))
#print result
print(re.search(r'[+-]?([0-9]*[.])?[0-9]+', '1.0').group(0))
Output:
True
1.0
If you are using mac, you can test on command line:
python -c "import re; print(bool(re.search(r'[+-]?([0-9]*[.])?[0-9]+', '1.0')))"
python -c "import re; print(re.search(r'[+-]?([0-9]*[.])?[0-9]+', '1.0').group(0))"
You can check for text validation and also only one decimal point validation using isNaN
var val = $('#textbox').val();
var floatValues = /[+-]?([0-9]*[.])?[0-9]+/;
if (val.match(floatValues) && !isNaN(val)) {
// your function
}
This is an old post but it was the top search result for "regular expression for floating point" or something like that and doesn't quite answer _my_ question. Since I worked it out I will share my result so the next person who comes across this thread doesn't have to work it out for themselves.
All of the answers thus far accept a leading 0 on numbers with two (or more) digits on the left of the decimal point (e.g. 0123 instead of just 123) This isn't really valid and in some contexts is used to indicate the number is in octal (base-8) rather than the regular decimal (base-10) format.
Also these expressions accept a decimal with no leading zero (.14 instead of 0.14) or without a trailing fractional part (3. instead of 3.0). That is valid in some programing contexts (including JavaScript) but I want to disallow them (because for my purposes those are more likely to be an error than intentional).
Ignoring "scientific notation" like 1.234E7, here is an expression that meets my criteria:
/^((-)?(0|([1-9][0-9]*))(\.[0-9]+)?)$/
or if you really want to accept a leading +, then:
/^((\+|-)?(0|([1-9][0-9]*))(\.[0-9]+)?)$/
I believe that regular expression will perform a strict test for the typical integer or decimal-style floating point number.
When matched:
$1 contains the full number that matched
$2 contains the (possibly empty) leading sign (+/-)
$3 contains the value to the left of the decimal point
$5 contains the value to the right of the decimal point, including the leading .
By "strict" I mean that the number must be the only thing in the string you are testing.
If you want to extract just the float value out of a string that contains other content use this expression:
/((\b|\+|-)(0|([1-9][0-9]*))(\.[0-9]+)?)\b/
Which will find -3.14 in "negative pi is approximately -3.14." or in "(-3.14)" etc.
The numbered groups have the same meaning as above (except that $2 is now an empty string ("") when there is no leading sign, rather than null).
But be aware that it will also try to extract whatever numbers it can find. E.g., it will extract 127.0 from 127.0.0.1.
If you want something more sophisticated than that then I think you might want to look at lexical analysis instead of regular expressions. I'm guessing one could create a look-ahead-based expression that would recognize that "Pi is 3.14." contains a floating point number but Home is 127.0.0.1. does not, but it would be complex at best. If your pattern depends on the characters that come after it in non-trivial ways you're starting to venture outside of regular expressions' sweet-spot.
Paulpro and lbsweek answers led me to this:
re=/^[+-]?(?:\d*\.)?\d+$/;
>> /^[+-]?(?:\d*\.)?\d+$/
re.exec("1")
>> Array [ "1" ]
re.exec("1.5")
>> Array [ "1.5" ]
re.exec("-1")
>> Array [ "-1" ]
re.exec("-1.5")
>> Array [ "-1.5" ]
re.exec(".5")
>> Array [ ".5" ]
re.exec("")
>> null
re.exec("qsdq")
>> null
For anyone new:
I made a RegExp for the E scientific notation (without spaces).
const floatR = /^([+-]?(?:[0-9]+(?:\.[0-9]+)?|\.[0-9]+)(?:[eE][+-]?[0-9]+)?)$/;
let str = "-2.3E23";
let m = floatR.exec(str);
parseFloat(m[1]); //=> -2.3e+23
If you prefer to use Unicode numbers, you could replace all [0-9] by \d in the RegExp.
And possibly add the Unicode flag u at the end of the RegExp.
For a better understanding of the pattern see https://regexper.com/.
And for making RegExp, I can suggest https://regex101.com/.
EDIT: found another site for viewing RegExp in color: https://jex.im/regulex/.
EDIT 2: although op asks for RegExp specifically you can check a string in JS directly:
const isNum = (num)=>!Number.isNaN(Number(num));
isNum("123.12345678E+3");//=> true
isNum("80F");//=> false
converting the string to a number (or NaN) with Number()
then checking if it is NOT NaN with !Number.isNaN()
If you want it to work with e, use this expression:
[+-]?[0-9]+([.][0-9]+)?([eE][+-]?[0-9]+)?
Here is a JavaScript example:
var re = /^[+-]?[0-9]+([.][0-9]+)?([eE][+-]?[0-9]+)?$/;
console.log(re.test('1'));
console.log(re.test('1.5'));
console.log(re.test('-1'));
console.log(re.test('-1.5'));
console.log(re.test('1E-100'));
console.log(re.test('1E+100'));
console.log(re.test('.5'));
console.log(re.test('foo'));
Here is my js method , handling 0s at the head of string
1- ^0[0-9]+\.?[0-9]*$ : will find numbers starting with 0 and followed by numbers bigger than zero before the decimal seperator , mainly ".". I put this to distinguish strings containing numbers , for example, "0.111" from "01.111".
2- ([1-9]{1}[0-9]\.?[0-9]) : if there is string starting with 0 then the part which is bigger than 0 will be taken into account. parentheses are used here because I wanted to capture only parts conforming to regex.
3- ([0-9]\.?[0-9]): to capture only the decimal part of the string.
In Javascript , st.match(regex), will return array in which first element contains conformed part. I used this method in the input element's onChange event , by this if the user enters something that violates the regex than violating part is not shown in element's value at all but if there is a part that conforms to regex , then it stays in the element's value.
const floatRegexCheck = (st) => {
const regx1 = new RegExp("^0[0-9]+\\.?[0-9]*$"); // for finding numbers starting with 0
let regx2 = new RegExp("([1-9]{1}[0-9]*\\.?[0-9]*)"); //if regx1 matches then this will remove 0s at the head.
if (!st.match(regx1)) {
regx2 = new RegExp("([0-9]*\\.?[0-9]*)"); //if number does not contain 0 at the head of string then standard decimal formatting takes place
}
st = st.match(regx2);
if (st?.length > 0) {
st = st[0];
}
return st;
}
Here is a more rigorous answer
^[+-]?0(?![0-9]).[0-9]*(?![.])$|^[+-]?[1-9]{1}[0-9]*.[0-9]*$|^[+-]?.[0-9]+$
The following values will match (+- sign are also work)
.11234
0.1143424
11.21
1.
The following values will not match
00.1
1.0.00
12.2350.0.0.0.0.
.
....
How it works
The (?! regex) means NOT operation
let's break down the regex by | operator which is same as logical OR operator
^[+-]?0(?![0-9]).[0-9]*(?![.])$
This regex is to check the value starts from 0
First Check + and - sign with 0 or 1 time ^[+-]
Then check if it has leading zero 0
If it has,then the value next to it must not be zero because we don't want to see 00.123 (?![0-9])
Then check the dot exactly one time and check the fraction part with unlimited times of digits .[0-9]*
Last, if it has a dot follow by fraction part, we discard it.(?![.])$
Now see the second part
^[+-]?[1-9]{1}[0-9]*.[0-9]*$
^[+-]? same as above
If it starts from non zero, match the first digit exactly one time and unlimited time follow by it [1-9]{1}[0-9]* e.g. 12.3 , 1.2, 105.6
Match the dot one time and unlimited digit follow it .[0-9]*$
Now see the third part
^[+-]?.{1}[0-9]+$
This will check the value starts from . e.g. .12, .34565
^[+-]? same as above
Match dot one time and one or more digits follow by it .[0-9]+$

Performance of regex within jQuery data selector: dependance on certain string length

The setup: I have a div with a bunch of radio buttons, each of which has been associated with a custom attribute and value using $(element).data(attr_name,attr_value);. When an underlying data structure is changed, I iterate over the fields and set the appropriate buttons to checked:true by using the ':data' selector found here: https://stackoverflow.com/a/2895933/1214731
$($('#style-options').find(':radio').filter(':data('+key+'=='+value+')'))
.prop('checked',true).button('refresh');
This works great: it finds the appropriate elements, even with floating-point values.
Performance depends on value:
I noticed that when I clicked on certain buttons, the page took fractionally longer to respond (for most buttons there was no noticeable delay). Digging a little deeper, this seems to be occurring when certain floating point values are being searched for.
Using chrome dev tools, I logged the following:
> key='fill-opacity';
"fill-opacity"
> value=.2*2;
0.4
> console.time('find data'); for(var i=0;i<100;++i){$('#style-options').find(':radio').filter(':data('+key+'=='+value+')')} console.timeEnd('find data');
find data: 43.352ms undefined
> value=.2*3;
0.6000000000000001
> console.time('find data'); for(var i=0;i<100;++i){$('#style-options').find(':radio').filter(':data('+key+'=='+value+')')} console.timeEnd('find data');
find data: 10322.866ms undefined
The difference in speed is a factor of >200!
Next, I tried typing the number in manually (e.g. decimal place, six, 14x zeros, one) - same speed. All numbers with the same number of digits were the same speed. I then reduced the number of digits progressively:
# of digits time (ms)
16 10300
15 5185
14 2665
13 1314
12 673
11 359
10 202
9 116
8 77
7 60
6 50
5 41
4 39
I quickly ruled out the equality check between numeric and string - no dependence on string length there.
The regex execution is strongly dependent on string length
In the linked answer above, the regex that parses the data string is this:
var matcher = /\s*(?:((?:(?:\\\.|[^.,])+\.?)+)\s*([!~><=]=|[><])\s*("|')?((?:\\\3|.)*?)\3|(.+?))\s*(?:,|$)/g;
The string passed in is of the form [name operator value]. The length of name doesn't seem to make much difference; the length of value has a big impact on speed however.
Specific questions:
1) Why does the length of name have minimal effect on performance, while the length of value has a large effect?
2) Doubling the execution time with each additional character in name seems excessive - is this just a characteristic of the particular regex the linked solution uses, or is it a more general feature?
3) How can I improve performance without sacrificing a lot of flexibility? I'd like to still be able to pass arguments as a single string to a jQuery selector so type checking up front seems difficult, though I'm open to suggestions.
Basic test code for regex matching speeds:
matcher = /\s*(?:((?:(?:\\\.|[^.,])+\.?)+)\s*([!~><=]=|[><])\s*("|')?((?:\\\3|.)*?)\3|(.+?))\s*(?:,|$)/g;
console.time('regex'); for(var i=0;i<1000;++i){matcher.lastIndex=0; matcher.exec('x=='+.1111111111111)}; console.timeEnd('regex')
regex: 538.018ms
//add an extra digit - doubles duration of test
console.time('regex'); for(var i=0;i<1000;++i){matcher.lastIndex=0; matcher.exec('x=='+.11111111111111)}; console.timeEnd('regex')
regex: 1078.742ms
//add a bunch to the length of 'name' - minimal effect
console.time('regex'); for(var i=0;i<1000;++i){matcher.lastIndex=0; matcher.exec('xxxxxxxxxxxxxxxxxxxx=='+.11111111111111)}; console.timeEnd('regex')
regex: 1084.367ms
A characteristic of regexp matching is that they are greedy. If you try to match the expression a.*b to the string abcd, it will happen in these steps:
the first "a" will match
the .* will match the second char, then the third, till the end of the string
reaching the end of the string, there is still a "b" to matched, the matching will fail
the regexp processing starts to backtrack
the last char will be "unmatched" and it will try to match "b" to "d". Fails again. More backtracking
tries to match "b" to "c". Fail. Backtrack.
match "b" to "b". Success. Matching ends.
Although you matched just a few chars, you iterated all the string. If you have more than one greedy operator, you can easily get an input string that will match with exponential complexity.
Understanding backtracking will prevent a lot of errors and performance problems. For example, 'a.*b' will match all the string 'abbbbbbb', instead of just the first 'ab'.
The easiest way to prevent these kind of errors in modern regexp engines, is to use the non-greedy version of the operators * and +. They are usually represented by the same operators followed by a question mark: *? and +?.
I confess that I really didn't stop to debug the complicate regexp that you posted, but I believe that the problem is before matching the '=' symbol. The greedy operator is in this subexpression:
(?:\\\.|[^.,])+\.?)+
I'd try to change it to a non-greedy version:
(?:\\\.|[^.,])+\.?)+?
but this is really just a wild guess. I'm using pattern recognition to solve the problem :-) It makes sense because it is backtracking for each character of the "value" till matching the operator. The name is matched linearly.
This regular expression is just too complex for my taste. I love regular expressions, but it looks like this one matches this famous quotation:
Some people, when confronted with a problem, think "I know, I'll use
regular expressions." Now they have two problems.

Cannot fathom out why Javascript regex not working

I want to validate some input to check that it is a positive currency value of the format:
0.5, 0.55, 5 (note this one), 5.5, 5.55, 55 etc etc.
The code that I'm using is:
if ($("#gross").val()>0 && !/^\d+?\.?\d?\d$/.test($("#gross").val())) {
alert($("#gross").val() + " is invalid currency");
}
It works for everything except a single digit, eg 5 (and 5.) but does work for 5.5.
What am I doing wrong?
You've forgotten to add a ? at the end, before the $. A better way of doing it would be the following:
/^\d+?\.?\d{0,2}$/
This checks that there are up to two decimal places for the number - if you'd like to check for any amount, you could use something like:
/^(?!\.$)(?:(?!0\d)(\d*)\.?(\d[0-9]*))$/
Note that it's a good idea to explicitly convert your string into a number, and also cache the value of #gross.
var grossVal = $("#gross").val();
if (+grossVal > 0 && !/^\d+?\.?\d{0,2}$/.test(grossVal)) {
alert(grossVal + " is invalid currency");
}
+? will match the fewest possible matches, in this case, 1 digit.
I think you're looking for something like:
/^\d+(\.\d{0,2})?$/
Which would be a series of digits, potentially followed by a decimal and anywhere between 0 to 2 digits.
Consider using alternation to break down a regular expression into the form a|b|c|d.
Then we can use several different forms, let:
a = 0 -- 0
b = [1-9]\d* -- n (non-zero integer), n cannot start with 0
c = 0[.]\d{1,2} -- 0.x or 0.xy
d = [1-9]\d*[.]\d{1,2} -- n.x or n.xy, n (non-zero integer)
This will allow us to reject values like 09 and 1., as they are not covered by any of the individual forms accepted.

Allowing money amounts to be entered into an input with a JavaScript Regex

I'm using plain vanilla JavaScript and need some help with my regex. Money in the following formats has to be allowed, and in these formats only (with no limit on the number of 0s (tens, hundreds, thousands, etc.) for the dollar amounts allowed):
$25,000
$25000
25,000
25000
25000.01
25,000.99
2000.99
50.00
50
1.95
1 .99
0.25
$0.25
0.2
2.3
2000.5
.75
var regex = /^\$?.?[1-9][0-9,]*(.[0-9]{0,2})?$/;
Currently, it's not allowing amounts like 0.99 to be entered.
Try this
^\$?(?:\d+|\d{1,3}(?:,\d{1,3})*)(?:\.\d{2})?$
See it here on Regexr
The only thing that is is not matching is your third last example, it has a space before the dot. Is that valid?
Edit:
My frist solution has the restriction, that it would accept numbers starting with 0, like 001. This solution uses a negative lookahead to avoid this:
^\$?(?!0\d)(?:\d+|\d{1,3}(?:,\d{1,3})*)(?:\.\d{2})?$
See it here on Regexr
Solution without lookahead
^\$?(?:0|[1-9]\d*|[1-9]\d{0,2}(?:,\d{1,3})*)(?:\.\d{2})?$
See it on Regexr
^\$?(?:[1-9]\d?\d?(?:(?:,\d{3})*|(?:\d{3})*)|0)(?:\.\d\d?)?$
Altho I wouldn't use such strict input restrictions.

Categories

Resources