Regular expression that match string without fixed length - javascript

i have a strings like this
01084700069811461719010010285322921DA192089940088
01084700088763891719050010BM2120
Those strings are datamatrix string, and i have to split strings in this way
0108470006981146 17190100 102853229 21DA192089940088
0108470008876389 17190500 10BM2120
Each block start with a fixed code and is followed by digits o characters
01 + 14 digits
17 + 6 digits
10 + from 1 to 20 characters
21 + from 1 to 20 characters
I try to make this using regular expression, and for the first two blocks i don't have problems because the length is fixed. I have problems with the third (and/or the fourth) block.
I create this regexp
/^(01\d{14})(?:(17\d{6}))*(?:(10\w*))*(?:(21\w*))*$/
For this string is correct
01084700088763891719050010BM2120
Group 1. 0108470008876389 (ok)
Group 2. 17190500 (ok)
Group 3. 10BM2120 (ok)
but for the other string
01084700069811461719010010285322921DA192089940088
the regexp match
Group 1. 0108470006981146 (ok)
Group 2. 17190100 (ok)
Group 3. 10285322921DA192089940088 (no)
I don't be able to create a regexp that is able to match correctly the third and the fourth block because have not a fixed length of characters and because is possible that in the third block there is the string "21" that is also the start code of the next block.
It's possibile to create a regular expression that match correctly all the parts of the string.
Thanks to all

You may use
^(01\d{14})(17\d{6})?(10\w{1,20})?(21\w{1,20})?$
See the regex demo
Note that you do not have to wrap the capturing groups with non-capturing ones if you plan to quantify them, you may quantify the capturing groups directly.
Also, to make a group optional, it is enough to use ? quantifier, as * matches 0 or more occurrences.
Pattern details
^ - start of string
(01\d{14}) - Group 1: 01 and 24 digits
(17\d{6})? - Group 2 (optional): 17 and 6 digits
(10\w{1,20})? - Group 3 (optional): 10 and 1 to 20 word chars
(21\w{1,20})? - Group 4 (optional): 21 and 1 to 20 word chars
$ - end of string.
Note that to match only alphanumeric chars, you need to replace \w with [^\W_] since \w also matches _.

var inputValue = "01084700088763891719050010BM2120";
var regexpr = /(01\d{14})(17\d{6})(10[A-Za-z0-9]{1,20})(21[A-Za-z0-9]{1,20})/;
inputValue.replace(regexpr, "$1 $2 $3 $4");
The output will be :
"0108470008876389 17190500 10BM 2120"
var inputValue = "01084700069811461719010010285322921DA192089940088";
var regexpr = /(01\d{14})(17\d{6})(10[A-Za-z0-9]{1,20})(21[A-Za-z0-9]{1,20})/;
inputValue.replace(regexpr, "$1 $2 $3 $4");
The output will be :
"0108470006981146 17190100 102853229 21DA192089940088"

Related

Regex must be in length range and contain letters and numbers

I want to write a regex that will find substrings of length 10-15 where all characters are [A-Z0-9] and it must contain at least 1 letter and one number (spaces are ok but not other special characters). Some examples:
ABABABABAB12345 should match
ABAB1234ABA32 should match
ABA BA BABAB12345 should match
1234567890987 should not match
ABCDEFGHIJK should not match
ABABAB%ABAB12?345 should not match
So far the best two candidates I have come up with are:
(?![A-Z]{10,15}|[0-9]{10,15})[0-9A-Z]{10,15} - this fails because if the string has 10 consecutive numbers/letters it will not match, even though the 15 character string has a mix (e.g ABABABABAB12345).
(?=.*[0-9])(?=.*[A-Z])([A-Z0-9]+){10,15 } - this fails because it will match 15 consecutive letters as long as there is a number later in the string (even though it is outside the match) and vice versa (e.g. 123456789098765 abcde will match 123456789098765).
(I need to do this in python and js)
If each string is on its own line, then you can use start/end anchors to construct the regex:
^(?=.*[0-9])(?=.*[A-Z])(?:\s*[A-Z0-9]\s*){10,15}$
^ - start of line
(?=.*[0-9]) - lookahead, must contain a number
(?=.*[A-Z]) - lookahead, must contain a letter
(?: - start a non-capturing group
\s*[A-Z0-9]\s* Contains a letter or number with optional whitespaace
) - end non-capturing group
{10,15} - Pattern occurs 10 to 15 times
$ - end of line
See a live example here: https://regex101.com/r/eWX2Qo/1
This doesn't account for ABA BA BABAB12345, but this still might help.
Based on what you're trying to match, it looks like you want there to be a mix.
What you can do is two lookaheads. One looking for a in the following 15 characters, and another looking for a letter in the same space. If this matches, then it looks for a group of numbers and letters of length 10 to 15.
(?=.{0,14}\d)(?=.{0,14}[A-Z])[A-Z\d]{10,15}
https://regex101.com/r/qw1Q0S/1
(?=.{0,14}\d) character 1 through 15 has to be a number
(?=.{0,14}[A-Z]) character 1 through 15 has to be a capital letter
[A-Z\d]{10,15} match 10 to 15 letters and numbers if the previous conditions are true
Edit with an improved answer:
To account for the spaces, you can tweak the above concept.
(?=(?:. *+){0,14}\d)(?=(?:. *+){0,14}[A-Z])(?:[A-Z\d] *){10,15}
Above, in the lookahead we were matching .{0,14}. . has been changed to (?:. *+), which is a non capturing group that matches . in addition to 0 or more spaces.
So putting it together:
Lookahead 1:
(?=(?:. *+){0,14}\d)
This matches 0,14 characters that may or may not be followed by spaces. This effectively ignoring spaces. This also uses a possessive quantifier ( *+) when matching spaces to prevent the engine from backtracking when spaces are matched. The pattern would work without the + modifier, but would more than double the steps taken to match on the example.
Lookahead 2:
(?=(?:. *+){0,14}[A-Z])
Same as lookahead 1, but now testing for a capital letter instead of a digit.
If lookahead 1 and lookahead 2 both match, then the engine will be left in a place where our matches can potentially be made.
Actual match:
(?:[A-Z\d] *){10,15}
This matches the capital letters and numbers, but now also 0 or more spaces. The only drawback being that the trailing space will be include in your match, although that's easily handled in post processing.
Edit:
All whitespace (\r, \n, \t and ) can be accounted for by using \s vs .
Depending on the amount of space that exists. the possessive quantifier is necessary to prevent catestrophic backtracking. This modification to the input using possessive quantifiers completes in 22,332 steps, while this one using the same input, but with a regular quantifier, fails match anything due to catastrophic backtracking .
It should be noted that the possessive quantifier *+ is not supported with javascript or python's builtin re module, but it is supported with python's regex module:
>>> import regex
>>> pattern = r'(?=(?:.\s*+){0,14}\d)(?=(?:.\s*+){0,14}[A-Z])(?:[A-Z\d]\s*){10,15}'
>>> regex.search(pattern, 'AAAAAAAAAA\n2')
<regex.Match object; span=(0, 12), match='AAAAAAAAAA\n2'>
>>>
Has the right stuff
function lfunko() {
let a = ["ABABABABAB12345","ABAB1234ABA32","ABA BA BABAB12345","1234567890987","ABCDEFGHIJK","ABABAB%ABAB12?345"];
let o = a.map((s,i) => {
let ll = s.split("").filter(s => s.match(/[A-Z]/)).length;
let ln = s.split("").filter(s => s.match(/[0-9]/)).length;
let ot = s.split("").filter(s => s.match(/[^A-Z0-9]/)).length;
let sum = ll + ln
return (ll > 1 && ln > 1 && sum > 9 && sum < 16 && ot == 0)? `${s} - TRUE`:`${s} - FALSE`;
});
console.log(JSON.stringify(o));
}
Execution log
11:18:20 PM Notice Execution started
11:18:21 PM Info ["ABABABABAB12345 - TRUE","ABAB1234ABA32 - TRUE","ABA BA BABAB12345 - FALSE","1234567890987 - FALSE","ABCDEFGHIJK - FALSE","ABABAB%ABAB12?345 - FALSE"]
11:18:21 PM Notice Execution completed
Your require of [A-Z0-9] does not include spaces so third example should be false.
Should be
ABABABABAB12345 should match
ABAB1234ABA32 should match
ABA BA BABAB12345 should not match has spaces
1234567890987 should not match
ABCDEFGHIJK should not match
ABABAB%ABAB12?345 should not match

Regex to find character only if it occurs 4 times

I'm stuck on making this Regex. I tried using look-ahead and look-behind together, but I couldn't use the capture group in the look-behind. I need to extract characters from a string ONLY if it occurs 4 times.
If I have these strings
3346AAAA44
3973BBBBBB44
9755BBBBBBAAAA44
The first one will match because it has 4 A's in a row.
The second one will NOT match because it has 6 B's in a row.
The third one will match because it still has 4 A's. What makes it even more frustrating, is that it can be any char from A to Z occuring 4 times.
Positioning does not matter.
EDIT: My attempt at the regex, doesn't work.
(([A-Z])\2\2\2)(?<!\2*)(?!\2*)
If lookbehind is allowed, after capturing the character, negative lookbehind for \1. (because if that matches, the start of the match is preceded by the same character as the captured first character). Then backreference the group 3 times, and negative lookahead for the \1:
`3346AAAA44
3973BBBBBB44
9755BBBBBBAAAA44`
.split('\n')
.forEach((str) => {
console.log(str.match(/([a-z])(?<!\1.)\1{3}(?!\1)/i));
});
([a-z]) - Capture a character
(?<!\1.) Negative lookbehind: check that the position at the 1st index of the captured group is not preceded by 2 of the same characters
\1{3} - Match the same character that was captured 3 more times
(?!\1) - After the 4th match, make sure it's not followed by the same character
Another version without lookbehind (see demo). The captured sequence of 4 equal characters will be rendered in Group 2.
(?:^|(?:(?=(\w)(?!\1))).)(([A-Z])\3{3})(?:(?!\3)|$)
(?:^|(?:(?=(\w)(?!\1))).) - ensure it's the beginning of the string. Otherwise, the 2nd char must be different from the 1st one - if yes, skip the 1st char.
(([A-Z])\3{3}) Capture 4 repeated [A-Z] chars
(?:(?!\3)|$) - ensure the first char after those 4 is different. Or it's the end of the string
As it was suggested by bobble-bubble in this comment - the expression above can be simplified to (demo):
(?:^|(\w)(?!\1))(([A-Z])\3{3})(?!\3)
Another variant could be capturing the first char in a group 1.
Assert that then the previous 2 chars on the left are not the same as group 1, match an additional 3 times group 1 which is a total of 4 the same chars.
Then assert what is on the right is not group 1.
([A-Z])(?<!\1\1)\1{3}(?!\1)
([A-Z]) Capture group 1, match a single char A-Z
(?<!\1\1) Negative lookbehind, assert what is on the left is not 2 times group 1
\1{3} Match 3 times group 1
(?!\1) Assert what is on the right is not group 1
For example
let pattern = /([A-Z])(?<!\1\1)\1{3}(?!\1)/g;
[
"3346AAAA44",
"3973BBBBBB44",
"9755BBBBBBAAAA44",
"AAAA",
"AAAAB",
"BAAAAB"
].forEach(s =>
console.log(s + " --> " + s.match(pattern))
);

Regex for a string - length 9, Third character letter and remaining numeric

I am trying to create a RegEx to match a string with the following criterion
Length 7 or 9
The first character must be a letter a-z or A-Z if a number of characters are 7. The remaining 6 must be numeric 0-9.
example:
a555444
B999999
Third character must be a letter a-z or A-Z if number of characters are 9 The remaining 8 must be numeric 0-9
example:
12B456789
16K456745
My regex:
^[a-zA-Z][0-9]{6}$
This is what I have so far and its handling only first scenario. Please help me in constructing a regex to handle this requirement.
You may use an alternation:
^([A-Za-z][0-9]{6}|[0-9]{2}[A-Za-z][0-9]{6})$
See the regex demo
Details
^ - start of a string
( - start of a grouping construct (you may add ?: after ( to make it non-capturing) that will match either...
[A-Za-z][0-9]{6} - an ASCII letter and then 6 digits
| - or
[0-9]{2}[A-Za-z][0-9]{6} - 2 digits, 1 letter, 6 digits
) - end of the grouping construct
$ - end of string.
Try regex like that:
^([0-9]{2})?[a-zA-Z][0-9]{6}$
? shows that [0-9]{2} group is optional.
So if you have a string with length 7, this group doesn't exist. If you have a string with length 9 this group exists.

Regex to match all single or double digit numbers

I need to create a regex to match all occurrences of a single or double digit (surrounded with a space on each side of the digit). I am trying with this regex:
\s\d{1,2}\s
and I am running it on this string:
charge to at10d to and you 12 sdf 90 fdsf fsdf32 frere89 32fdsfdsf ball for 1 8 toyota
matches:
' 12 ', ' 90 ', ' 1 '
but it does not match the 8 when it is beside the 1.
Does anyone know how I can adjust the regex so I can include both these digits together?
You are trying the match both the leading and trailing space around the required number. If you just get away with one of these spaces(by using a lookahead as shown below to remove the trailing space), you will get rid of the problem.
Try this regex:
\s\d{1,2}(?=\s)
Explanation:
\s - matches a white-space
\d{1,2} - matches a 1 or 2 digit number
(?=\s) - returns the position(0-length match) which is followed by a space. So, it doesn't actually matches the space but makes sure that current position is followed by a space
Click for Demo
You can use the word boundry \b expression.
The word boundary will find numbers that are not wrapped by a letters, numbers or the underscore sign. It will catch 1 or 2 digits that are wrapped by spaces, start and end of string, and non word characters (#12# - will get he 12).
var str = "51 charge to at10d to and you 12 sdf 90 fdsf fsdf32 frere89 32fdsfdsf ball for 1 8 toyota 12";
var result = str.match(/\b\d{1,2}\b/g);
console.log(result);

“combine” 2 regex with a logic or?

I have two patterns for javascript:
/^[A-z0-9]{10}$/ - string of exactly length of 10 of alphanumeric symbols.
and
/^\d+$/ - any number of at least length of one.
How to make the expression of OR string of 10 or any number?
var pattern = /^([A-z0-9]{10})|(\d+)$/;
doesn't work by some reason. It passes at lest
pattern.test("123kjhkjhkj33f"); // true
which is not number and not of length of 10 for A-z0-9 string.
Note that your ^([A-z0-9]{10})|(\d+)$ pattern matches 10 chars from the A-z0-9 ranges at the start of the string (the ^ only modifies the ([A-z0-9]{10}) part (the first alternative branch), or (|) 1 or more digits at the end of the stirng with (\d+)$ (the $ only modifies the (\d+) branch pattern.
Also note that the A-z is a typo, [A-z] does not only match ASCII letters.
You need to fix it as follows:
var pattern = /^(?:[A-Za-z0-9]{10}|\d+)$/;
or with the i modifier:
var pattern = /^(?:[a-z0-9]{10}|\d+)$/i;
See the regex demo.
Note that grouping is important here: the (?:...|...) makes the anchors apply to each of them appropriately.
Details
^ - start of string
(?: - a non-capturing alternation group:
[A-Za-z0-9]{10} - 10 alphanumeric chars
| - or
\d+ - 1 or more digits
) - end of the grouping construct
$ - end of string

Categories

Resources