Regexp, greediness until second match - javascript

I am trying something like this
^(.*)[\s]*(?:\[[\s]*(.*)[\s]*\])?$
My idea is that first match returns everything but the occasional second match which is anything inside []. Incoming string to match is already trimmed.
For instance
'aaaaa [] [ddd]' -> returns 'aaaa []' plus 'ddd'
'[] [ddd]' -> returns '[]' plus 'ddd'
'aaaaaaaa' -> returns 'aaaaaaa' plus NULL
'aaaaaaaa []' -> returns 'aaaaaaa' plus ''
'aaaaaa [' -> returns 'aaaaaa [' plus NULL
'aaaa [] ddd' -> returns 'aaaa [] ddd' plus NULL
'[a] [b] [c] [d]' returns '[a] [b] [c]' plus 'd' instead of '' plus 'a] [b] [c] [d'
'[fff]' -> return '' plus 'fff' <- That's particular since first match can never be null
My main problems are due to the first match, since both .* (swallows all) and *? (swallows only until first ] if multiple) give an undesired result
Pseudocode for algorithm would be something like:
If last char is a ']', second match will be anything inside up to the
closest '[' backwards (if exist) -> this can be null or '' if input
string ends with '[]'
Rest is first match, which cannot be NULL, only
''
Any suggestion?

If there are no nesting, you can use this regex:
^(.*?)\s*(?:\[([^\]]*)\])?$
regex101 demo
Otherwise, if you can have nested [] in the main [], then the regex will have to be revised. You can make a regex for nested [] but only up to a certain level of nesting; if you have up to 2 levels of nesting, you make a regex for 2, if you have up to 5 levels of nesting, you make a more complex one for 5, etc.

I think regular expressions are not the answer here, especially because you give a simple algorithm to solve the problem. Just translate your algorithm into code.
Also regular expressions are not the solution because you have unbalanced and nested [] as you state in your comments, which make regex impractical.
Try some javascript like this :
function parse ( text ) {
var first, inside;
if ( text.substr (-1) == ']' ) {
var pos = text.lastIndexOf ('[');
first = text.substr (0, pos);
inside = text.substr ( pos + 1, text.length -pos - 1);
} else {
first = text;
}
return [ first, inside ];
}

I'm not sure to understand what you want to do but, here is a try : /(.*?)\[(.*?)\]$/.
Another try, allowing the second group to remain undefined : /(.*?)(?:\[(.*?)\])?$/.
I have never used Scriptular but here is what Chrome's console says :
// result : [full match, group 1, group 2]
'abc'.match(/(.*?)(?:\[(.*?)\])?$/) // ["abc", "abc", undefined]
'[abc]'.match(/(.*?)(?:\[(.*?)\])?$/) // ["[abc]", "", "abc"]
What about this one : /(.*?)(?:\[([^\[]*?)\])?$/?
'aze[[[rty]'.match(/(.*?)(?:\[([^\[]*?)\])?$/) // ["aze[[[rty]", "aze[[", "rty"]
Last try : /(.+?)(?:\[([^\[]*?)\])?$/.
test result
-------------------------------------------
'' null
'aze' ["aze", "aze", undefined]
'[rty]' ["[rty]", "[rty]", undefined]
'aze[rty]' ["aze[rty]", "aze", "rty"]
'aze[]' ["aze[]", "aze", ""]
'aze[][rty]' ["aze[][rty]", "aze[]", "rty"]
'aze[[]rty]' ["aze[[]rty]", "aze[", "]rty"]

Related

Get multiple substrings between " or '

I am trying to figure out how to get substrings when the substrings are either located between a ' (single quote) or " (double quote)
Example:
Input: The "quick" brown "fox" 'jumps' over the 'lazy dog'
Output: ['quick', 'fox', 'jumps', 'lazy dog']
I have tried doing this with a regex but fell flat.
const string = "The "quick" brown "fox" 'jumps' over the 'lazy dog'"
const pattern = /(?:'([^']*)')|(?:"([^"]*)")/;
console.log(strippedText.match(pattern));
But it only returns the first single quoted or double quotes word.
Use the global flag, g, after the last / in the pattern, and change the function from match to matchAll. So: pattern = /(?:'([^']*)')|(?:"([^"]*)")/g;. This returns an array of arrays, so you'll need to do processing on that to get the normal array that you want.
const string = `The "quick" brown "fox" 'jumps' over the 'lazy dog'`; // Uses backticks since we use " and '
const pattern = /(?:'([^']*)')|(?:"([^"]*)")/g; // Pattern has the global flag "g" at the end so it allows multiple matches
const matches = [...string.matchAll(pattern)] // Convert RegExpStringIterator into array with the spread operator "..."
.map(([_, first, second]) => first ?? second); // Convert the array of arrays into something sensible.
console.log(matches);
Without mapping, matches would look like this:
[
[
"\"quick\"",
null,
"quick"
],
[
"\"fox\"",
null,
"fox"
],
[
"'jumps'",
"jumps",
null
],
[
"'lazy dog'",
"lazy dog",
null
]
]
So with this line:
.map(([_, first, second]) => first ?? second)
We destructure the inner array, discarding the 0th index (which is the whole match, including things inside a "do no match" group (?:), so it includes the quotes at the beginning and end), and extracting the 1st and 2nd indices. The first ?? second means that if first is not null or undefined, it returns first, otherwise it returns second.

why condition is always true in javascript?

Could you please tell me why my condition is always true? I am trying to validate my value using regex.i have few conditions
Name should not contain test "text"
Name should not contain three consecutive characters example "abc" , "pqr" ,"xyz"
Name should not contain the same character three times example "aaa", "ccc" ,"zzz"
I do like this
https://jsfiddle.net/aoerLqkz/2/
var val = 'ab dd'
if (/test|[^a-z]|(.)\1\1|abc|bcd|cde|def|efg|fgh|ghi|hij|ijk|jkl|klm|lmn|mno|nop|opq|pqr|qrs|rst|stu|tuv|uvw|vwx|wxy|xyz/i.test(val)) {
alert( 'match')
} else {
alert( 'false')
}
I tested my code with the following string and getting an unexpected result
input string "abc" : output fine :: "match"
input string "aaa" : output fine :: "match"
input string "aa a" : **output ** :: "match" why it is match ?? there is space between them why it matched ????
input string "sa c" : **output ** :: "match" why it is match ?? there is different string and space between them ????
The string sa c includes a space, the pattern [^a-z] (not a to z) matches the space.
Possibly you want to use ^ and $ so your pattern also matches the start and end of the string instead of looking for a match anywhere inside it.
there is space between them why it matched ????
Because of the [^a-z] part of your regular expression, which matches the space:
> /[^a-z]/i.test('aa a');
true
The issue is the [^a-z]. This means that any string that has a non-letter character anywhere in it will be a match. In your example, it is matching the space character.
The solution? Simply remove |[^a-z]. Without it, your regex meets all three criteria.
test checks if the value contains the word 'test'.
abc|bcd|cde|def|efg|fgh|ghi|hij|ijk|jkl|klm|lmn|mno|nop|opq|pqr|qrs|rst|stu|tuv|uvw|vwx|wxy|xyz checks if the value contains three sequential letters.
(.)\1\1 checks if any character is repeated three times.
Complete regex:
/test|(.)\1\1|abc|bcd|cde|def|efg|fgh|ghi|hij|ijk|jkl|klm|lmn|mno|nop|opq|pqr|qrs|rst|stu|tuv|uvw|vwx|wxy|xyz/i`
I find it helpful to use a regex tester, like https://www.regexpal.com/, when writing regular expressions.
NOTE: I am assuming that the second criteria actually means "three consecutive letters", not "three consecutive characters" as it is written. If that is not true, then your regex doesn't meet the second criteria, since it only checks for three consecutive letters.
I would not do this with regular expresions, this expresion will always get more complicated and you have not the possibilities you had if you programmed this.
The rules you said suggest the concept of string derivative. The derivative of a string is the distance between each succesive character. It is specially useful dealing with password security checking and string variation in general.
const derivative = (str) => {
const result = [];
for(let i=1; i<str.length; i++){
result.push(str.charCodeAt(i) - str.charCodeAt(i-1));
}
return result;
};
//these strings have the same derivative: [0,0,0,0]
console.log(derivative('aaaaa'));
console.log(derivative('bbbbb'));
//these strings also have the same derivative: [1,1,1,1]
console.log(derivative('abcde'));
console.log(derivative('mnopq'));
//up and down: [1,-1, 1,-1, 1]
console.log(derivative('ababa'));
With this in mind you can apply your each of your rules to each string.
// Rules:
// 1. Name should not contain test "text"
// 2. Name should not contain three consecutive characters example "abc" , "pqr" ,"xyz"
// 3. Name should not contain the same character three times example "aaa", "ccc" ,"zzz"
const derivative = (str) => {
const result = [];
for(let i=1; i<str.length; i++){
result.push(str.charCodeAt(i) - str.charCodeAt(i-1));
}
return result;
};
const arrayContains = (master, sub) =>
master.join(",").indexOf( sub.join( "," ) ) == -1;
const rule1 = (text) => !text.includes('text');
const rule2 = (text) => !arrayContains(derivative(text),[1,1]);
const rule3 = (text) => !arrayContains(derivative(text),[0,0]);
const testing = [
"smthing textual",'abc','aaa','xyz','12345',
'1111','12abb', 'goodbcd', 'weeell'
];
const results = testing.map((input)=>
[input, rule1(input), rule2(input), rule3(input)]);
console.log(results);
Based on the 3 conditions in the post, the following regex should work.
Regex: ^(?:(?!test|([a-z])\1\1|abc|bcd|cde|def|efg|fgh|ghi|hij|ijk|jkl|klm|lmn|mno|nop|opq|pqr|qrs|rst|stu|tuv|uvw|vwx|wxy|xyz).)*$
Demo

Regex match entire string while grouping

I'm trying to match a currency string that may or may not be suffixed with one of K, M, or Bn, and group them into two parts
Valid matches:
500 K // Expected grouping: ["500", "K"]
900,000 // ["900,000", ""]
2.3 Bn // ["2.3", "Bn"]
800M // ["800", "M"]
ps: I know the matches first item in match output array is the entire match string, the above expected grouping in only an example
The Regex I've got so far is this:
/\b([-\d\,\.]+)\s?([M|Bn|K]?)\b/i
When I match it with a normal string, it does OK.
"898734 K".match(/\b([-\d\,\.]+)\s?([M|Bn|K]?)\b/i)
=> ["898734 K", "898734", "K"] // output
"500,000".match(/\b([-\d\,\.]+)\s?([M|Bn|K]?)\b/i)
=> ["500,000", "500,000", ""]
Trouble is, it also matches space in there
"89 8734 K".match(/\b([-\d\,\.]+)\s?([M|Bn|K]?)\b/i)
=> ["89 ", "89", ""]
And I'm not sure why. So I thought I'd add /g option in there to match entire string, but now it doesn't group the matches.
"898734 K".match(/\b([-\d\,\.]+)\s?([M|Bn|K]?)\b/gi)
=> ["898734 K"]
What change do I need to make to get the regex behave as expected?
You could use a different regular expression, which looks for some numbers, a comma or dot and some other numbers as well, some whitepspace and the wanted letters.
var array = ['500 K', '900,000', '2.3 Bn', '800M'],
regex = /(\d+[.,]?\d*)\s*(K|Bn|M|$)/
array.forEach(function (a) {
var m = a.match(regex);
if (m) {
m.shift();
console.log(m);
}
});
.as-console-wrapper { max-height: 100% !important; top: 0; }
You have a problem and want to use a regex to solve the problem. Now you have two problems...
Joke aside, I think you can achieve what you want to do without any regex:
"".join([c for i, c in enumerate(itertools.takewhile(lambda c: c.isdigit() or c in ',.', s))]), s[i+1:]
I tried this with s="560 K", s="900,000", etc and it seems to work.

What is the purpose of the 'y' sticky pattern modifier in JavaScript RegExps?

MDN introduced the 'y' sticky flag for JavaScript RegExp. Here is a documentation excerpt:
y
sticky; matches only from the index indicated by the lastIndex property of this regular expression in the target string (and does not attempt to match from any later indexes).
There's also an example:
var text = 'First line\nSecond line';
var regex = /(\S+) line\n?/y;
var match = regex.exec(text);
console.log(match[1]); // prints 'First'
console.log(regex.lastIndex); // prints '11'
var match2 = regex.exec(text);
console.log(match2[1]); // prints 'Second'
console.log(regex.lastIndex); // prints '22'
var match3 = regex.exec(text);
console.log(match3 === null); // prints 'true'
But there isn't actually any difference between the usage of the g global flag in this case:
var text = 'First line\nSecond line';
var regex = /(\S+) line\n?/g;
var match = regex.exec(text);
console.log(match[1]); // prints 'First'
console.log(regex.lastIndex); // prints '11'
var match2 = regex.exec(text);
console.log(match2[1]); // prints 'Second'
console.log(regex.lastIndex); // prints '22'
var match3 = regex.exec(text);
console.log(match3 === null); // prints 'true'
Same output. So I guess there might be something else regarding the 'y' flag and it seems that MDN's example isn't a real use-case for this modifier, as it seems to just work as a replacement for the 'g' global modifier here.
So, what could be a real use-case for this experimental 'y' sticky flag? What's its purpose in "matching only from the RegExp.lastIndex property" and what makes it differ from 'g' when used with RegExp.prototype.exec?
Thanks for the attention.
The difference between y and g is described in Practical Modern JavaScript:
The sticky flag advances lastIndex like g but only if a match is found
starting at lastIndex, there is no forward search. The sticky flag was added to improve the performance of writing lexical analyzers using
JavaScript...
As for a real use case,
It could be used to require a regular expression match starting at position n where n
is what lastIndex is set to. In the case of a non-multiline regular
expression, a lastIndex value of 0 with the sticky flag would be in
effect the same as starting the regular expression with ^ which
requires the match to start at the beginning of the text searched.
And here is an example from that blog, where the lastIndex property is manipulated before the test method invocation, thus forcing different match results:
var searchStrings, stickyRegexp;
stickyRegexp = /foo/y;
searchStrings = [
"foo",
" foo",
" foo",
];
searchStrings.forEach(function(text, index) {
stickyRegexp.lastIndex = 1;
console.log("found a match at", index, ":", stickyRegexp.test(text));
});
Result:
"found a match at" 0 ":" false
"found a match at" 1 ":" true
"found a match at" 2 ":" false
There is definitely a difference in behaviour as showed below:
var text = "abc def ghi jkl"
undefined
var regexy = /\S(\S)\S/y;
undefined
var regexg = /\S(\S)\S/g;
undefined
regexg.exec(text)
Array [ "abc", "b" ]
regexg.lastIndex
3
regexg.exec(text)
Array [ "def", "e" ]
regexg.lastIndex
7
regexg.exec(text)
Array [ "ghi", "h" ]
regexg.lastIndex
11
regexg.exec(text)
Array [ "jkl", "k" ]
regexg.lastIndex
15
regexg.exec(text)
null
regexg.lastIndex
0
regexy.exec(text)
Array [ "abc", "b" ]
regexy.lastIndex
3
regexy.exec(text)
null
regexy.lastIndex
0
..but I have yet to fully understand what is going on there.

javascript regexp split including delimiter

I want to split '9088{2}12{1}729' into [ "9088", "{2}12", "{1}729" ]
or even more useful to me: [ "9088", "2-12", "1-729" ]
tried:
'9088{2}12{1}729'.split(/\{[0-9]+\}/); => ["9088", "12", "729"]
also tried:
'9088{2}12{1}729'.match(/\{[0-9]+\}/); => ["{2}"]
I know it probably involved some other regexp string to split including delimiters.
Tried it in php, I guess you can do it in one line also.
preg_split( '/{/', preg_replace( '/}/', '-', "9088{2}12{1}729" ) )
Array ( [0] => 9088 [1] => 2-12 [2] => 1-729 )
Just have to wrap the replace function with split to get the preference order correct.
I think I like js more :)
even more useful to me: [ "9088", "2-12", "1-729" ]
It can be done using simple tricks!
"9088{2}12{1}729".replace(/\}/g,'-').split(/\{/g)
// ["9088", "2-12", "1-729"]
You can use a simple zero-width positive lookahead with /(?={)/:
'9088{2}12{1}729'.split(/(?=\{)/); // => ["9088","{2}12","{1}729"]
The "zero-width" part means that the actual matched text is the empty string so the split throws away nothing, and the lookahead means it matches just before the contained pattern, so /(?=\{)/ matches the empty strings between characters where indicated by an arrow:
9 0 8 8 { 2 } 1 2 { 1 } 7 2 9
↑ ↑
You can then use Array.prototype.map to convert from {1}2 form to 1-2 form.
'9088{2}12{1}729'.split(/(?=\{)/)
.map(function (x) { return x.replace('{', '').replace('}', '-'); });
yields
["9088","2-12","1-729"]

Categories

Resources