Regular Expression form {m,n} does not use upper limit - javascript

My understanding was that the regexp form a{m,n} would match a at most n times. However, the following snippet does not work as I would expect (this is javascript):
/\{{2,2}/.exec ('df{{{df')
// [ '{{', index: 2, input: 'df{{{df' ]
Shouldn't it return null?

It is matching the text because there are two. That satisfies the requirements your regex specifies. If you want to prevent extras from matching use a negative lookahead: (?!\{).
(?:^|[^{])(\{{2,2}(?!\{))
Then, use the first captured group.
Edit, by the way, the the ,2 in {2,2} is optional in this case, since it's the same number.
Edit: Added usage example to get rid of first matched character. (Javascript doesn't support negative lookbehind.
var myRegexp = /(?:^|[^{])(\{{2,2}(?!\{))/g;
var match = myRegexp.exec(myString);
alert(match[1]);

What your expression states is find {{ anywhere in the string, which it will find. If you want to find only {{ and not {{{ then you need to specify that you want to find:
/[^{]\{{2,2}[^{]/
In English:
[Any Character Not a {] followed by [Exactly 2 {] followed by [Any Character Not a {]
This will match a{{b but not a{b and not a{{{{b

It matches because it contains a substring with exactly 2 left braces. If you want it to fail to match, you have to specify that anything outside the 2 left braces you are looking for can't be a left brace.

That regular expression is looking for exactly two left-curly-braces ({{), which it finds in the string "df{{{df" at index 2 (immediately after the first "df"). Looks right to me.

Related

Regular expression that matches a single word in any order

After a long, unsuccessful search I am starting to wonder if what I am looking for is possible, I would like a regular expression which requires that each letter chosen is mandatory but only once and in any order.
Example : ^[abc]{3}$
The result I expect would be that it matches only that :
abc, bac, cba, acb
While I get :
acc, abb, cca, aab
Do you see where I am going with this?
You may use a regex like this with a negative lookahead of the matched character in a back-reference:
^(?:([abc])(?!.*\1)){3}$
RegEx Demo
Here is another way.
^(?!.*([abc]).*\1)[abc]{3}$
Demo
The negative lookahead
(?!.*([abc]).*\1)
asserts that no character is repeated and
[abc]{3}
together with the two anchors asserts that the string has a length of three and is composed of the characters in the character class.

Regexp: excluding a word but including non-standard punctuation

I want to find strings that contain words in a particular order, allowing non-standard characters in between the words but excluding a particular word or symbol.
I'm using javascript's replace function to find all instances and put into an array.
So, I want select...from, with anything except 'from' in between the words. Or I can separate select...from from select...from (, as long as I exclude nesting. I think the answer is the same for both, i.e. how do I write: find x and not y within the same regexp?
From the internet, I feel this should work: /\bselect\b^(?!from).*\bfrom\b/gi but this finds no matches.
This works to find all select...from: /\bselect\b[0-9a-zA-Z#\(\)\[\]\s\.\*,%_+-]*?\bfrom\b/gi but modifying it to exclude the parenthesis "(" at the end prevents any matches: /\bselect\b[0-9a-zA-Z#\(\)\[\]\s\.\*,%_+-]*?\bfrom\b\s*^\(/gi
Can anyone tell me how to exclude words and symbols within this regexp?
Many thanks
Emma
Edit: partial string input:
left outer join [stage].[db].[table14] o on p.Project_id = o.project_id
left outer join
(
select
different_id
,sum(costs) - ( sum(brushes) + sum(carpets) + sum(fabric) + sum(other) + sum(chairs)+ sum(apples) ) as overallNumber
from
(
select ace from [stage].db.[table18] J
Javascript:
sequel = stringInputAsAbove;
var tst = sequel.replace(/\bselect\b[\s\S]*?\bfrom\b/gi, function(a,b) { console.log('match: '+a); selects.push(b); return a; });
console.log(selects);
Console.log(selects) should print an array of numbers, where each number is the starting character of a select...from. This works for the second regexp I gave in my info, printing: [95, 251]. Your \s\S variation does the same, #stribizhev.
The first example ^(?!from).* should do likewise but returns [].
The third example \s*^\( should return 251 only but returns []. However I have just noticed that the positive expression \s*\( does give 95, so some progress! It's the negatives I'm getting wrong.
Your \bselect\b^(?!from).*\bfrom\b regex doesn't work as expected because:
^ means here beginning of a line, not negation of next part, so
the \bselect\b^ means, select word followed by beginning of a
line. After removal of ^ regex start to match something
(DEMO) but it is still invalid.
in multiline text .* without modification will not match new line,
so regex will match only select...from in single lines, but if you
change it for (.|\n)* (as a simple example) it will match
multiline, but still invalid
the * is greede quantifire, so it will match as much a possible,
but if you use reluctant quantifire *?, regex will match to first
occurance of from word, and int will start to return relativly
correct result.
\bselect\b(?!from) means match separate select word which is not
directly followed by separate from word, so it would be
selectfrom somehow composed of separate words (because
select\bfrom) so (?!from) doesn't work and it is redundant
In effect you will get regex very similar to what Stribizhev gave you: \bselect\b(.|\n)*?\bfrom\b
In third expression you meke same mistake: \bselect\b[0-9a-zA-Z#\(\)\[\]\s\.\*,%_+-]*?\bfrom\b\s*^\( using ^ as (I assume) a negation, not beginning of a line. Remove ^ and you will again get relativly valid result (match from select through from to closing parathesis ) ).
Your second regex works similar to \bselect\b(.|\n)*?\bfrom\b or \bselect\b[\s\S]*?\bfrom\b.
I wrote "relativly valid result", as I also think, that parsing SQL with regex could be very camplicated, so I am not sure if it will work in every case.
You can also try to use positive lookahead to match just position in text, like:
(?=\bselect\b(?:.|\n)*?\bfrom\b)
DEMO - the () was added to regex just to return beginning index of match in groups, so it would be easier to check it validity
Negation in regex
We use ^ as negation in character class, for example [^a-z] means match anything but not letter, so it will match number, symbol, whitespace, etc, but not letter from range a to z (Look here). But this negation is on a level of single character. I you use [^from] it will prevent regex from matching characters f,r,o and m (demo). Also the [^from]{4} will avoid matching from but also form, morf, etc.
To exlude the whole word from matching by regex, you need to use negative look ahead, like (?!from), which will fail to match, if there will be chosen word from fallowing given position. To avoid matching whole line containing from you could use ^(?!.*from.*).+$ (demo).
However in your case, you don't need to use this construction, because if you replace greedy quantifire .*\bfrom with .*?\bfrom it will match to first occurance of this word. Whats more it would couse problems. Take a look on this regex, it will not match anything because (?![\s\S]*from[\s\S]*) is not restricted by anything, so it will match only if there is no from after select, but we want to match also from! in effect this regex try to match and exclude from at once, and fail. so the (?!.*word.*) construction works much better to exclude matching line with given word.
So what to do if we don't what to match a word in a fragment of a match? I think select\b([^f]|f(?!rom))*?\bfrom\b is a good solution. With ([^f]|f(?!rom))*? it will match everything between select and from, but will not exclude from.
But if you would like to match only select...from not followed by ( then it is good idea to use (?!\() like. But in your regex (multiline, use of (.|\n)*? or [\s\S]*? it will cause to match up to next select...from part, because reluctant quantifire will chenge a plece where it need to match to make whole regex . In my opinion, good solution would be to use again:
select\b([^f]|f(?!rom))*?\bfrom\b(?!\s*?\()
which will not overlap additional select..from and will not match if there is \( after select...from - check it here

Regular expression to retrieve from the URL

/test-test-test/test.aspx
Hi there,
I am having a bit difficult to retrieve the first bit out from the the above URL.
test-test-test
I tried this /[\w+|-]/g but it match the last test.aspx as well.
Please help out.
Thanks
One way of doing it is using the Dom Parser as stated here: https://stackoverflow.com/a/13465791/970247.
Then you could access to the segments of the url using for example: myURL.segments; // = Array = ['test-test-test', 'test.aspx']
You need to use a positive lookahead assertion. | inside a character class would match a literal | symbol. It won't act like an alternation operator. So i suggest you to remove that.
[\w-]+(?=\/)
(?=\/) called positive lookahead assertion which asserts that the match must be followed by an forward slash. In our case test-test-test only followed by a forward slash, so it got matched. [\w-]+ matches one or more word character or hyphen. + repeats the previous token one or more times.
Example:
> "/test-test-test/test.aspx".match(/[\w-]+(?=\/)/g)
[ 'test-test-test' ]
[\w+|-] is wrong, should be [\w-]+. "A series of characters that are either word characters or hyphens", not "a single character that is a word character, a plus, a pipe, or a hyphen".
The g flag means global match, so naturally all matches will be found instead of just the first one. So you should remove that.
> '/test-test-test/test.aspx'.match(/[\w-]+/)
< ["test-test-test"]

Satisfying two condition in one regex pattern in javascript

I am not sure if I have put the question right.
I want to satisfy both the text with one regular expression.
text1 = 'foobar';
text2 = 'foobar-baz';
Expected Output of text1
$1 should be bar
$2 should be ''
Expected Output of text2
$1 should be bar
$2 should be baz
Here is what I have tried:
/foo([a-z0-9\-_=\+\/]+)(\-(.*))?/i
result for text1 is correct but for text2, $1 gets the full string foobar-baz
The problem here is due to the possible inclusion of - in the first capturing group. There are 2 cases:
There are one or more - in the string, and you want to pick the last group delimited by the hyphen. Intuitively, we think of greedy quantifier, and a simple solution like:
input.match(/foo([a-z0-9_=+\/-]+)-(.*)/)
would work.
However the second case, where there are no - in the string, combined with the previous case, causes problem.
Since [a-z0-9_=+\/-]+ contains -, if you make -(.*) optional, given an input in the first case, it will just match to the end of the string and put everything in the first capturing group.
We need to control the backtracking behavior so that when there is at least one -, it must match it and match the last one, and allow the first group to gobble up everything when there is no -.
One solution which makes minimal change to your current regex is:
input.match(/foo([a-z0-9_=+\/-]+?)(?:-([a-z0-9_=+\/]*))?$/)
The lazy quantifier makes the engine tries from the left-most - first, and the anchor $ and the character class without - at the end forces the engine to split only at the last - if any.
Note that the second capturing group will be undefined when there is no -.
Sample input output:
'foogoobarbaz'.match(/foo([a-z0-9_=+\/-]+?)(?:-([a-z0-9_=+\/]*))?$/)
> [ "foogoobarbaz", "goobarbaz", undefined ]
'foogoobar-baz'.match(/foo([a-z0-9_=+\/-]+?)(?:-([a-z0-9_=+\/]*))?$/)
> [ "foogoobar-baz", "goobar", "baz" ]
'foogoo-bar-baz'.match(/foo([a-z0-9_=+\/-]+?)(?:-([a-z0-9_=+\/]*))?$/)
> [ "foogoo-bar-baz", "goo-bar", "baz" ]
You can use a non-capturing group:
/foo([a-z0-9\-_=\+\/]+)(?:-(.*))?/i
That solves the problem of avoiding the additional capture group. However, your pattern still has the problem of including - as a valid character for the first string. Because of that, when you execute the pattern against "foobar-baz", the entire fragment "bar-baz" will match the first group in the pattern.
You're going to have to decide what it is you want to match; your rule is currently at odds with the result you seek. If you remove the - from the first group:
/foo([a-z0-9_=\+\/]+)(?:-(.*))?/i
then you get the result you say you're looking for.

Javascript lookahead regular expression

I'm trying to write a regular expression to parse the following string out into three distinct parts. This is for a highlighting engine I'm writing:
"\nOn and available after solution."
I have a regular expression that's dynamically created for any word a user might input. In the above example, the word is "on".
The regular expression expects a word with any amount of white space ([\s]*) followed by the search word (with no -\w following it, eg: on-time, on-wards should not be a valid result. To complicate this, there can be a -,$,< or > symbol following the example, so on-, on> or on$ are valid. This is why there is a negative lookahead after the search word in my regular expression below.
There's a complicated reason for this, but it's not relevant to the question. The last part should be the rest of the sentence. In this example, " and available after solution."
So,
p1 = "\n"
p2 = "On"
p3 = " and available after solution"
I currently have the following regular expression.
test = new RegExp('([\\s]*)(on(?!\\-\\w))([$\\-><]*?\\s(?=[.]*))',"gi")
The first part of this regular expression ([\\s]*)(on(?!\\-\\w))[$\\-><]*? works as expected. The last part does not.
In the last part, what I'm trying to do is force the regular expression engine to match whitespace before matching additional characters. If it can not match a space, then the regular expression should end. However, when I run this regular expression, I get the following results
str1 = "\nOn ly available after solution."
test.exec(str1)
["\n On ", "\n ", "On"]
So it would appear to me that the last positive look ahead is not working. Thanks for any suggestions, and if anyone needs some clarification, let me know.
EDIT:
It would appear that my regular expression was not matching because I didn't realize the following caveat:
You can use any regular expression inside the lookahead. (Note that this is not the case with lookbehind. I will explain why below.) Any valid regular expression can be used inside the lookahead. If it contains capturing parentheses, the backreferences will be saved. Note that the lookahead itself does not create a backreference. So it is not included in the count towards numbering the backreferences. If you want to store the match of the regex inside a backreference, you have to put capturing parentheses around the regex inside the lookahead, like this: (?=(regex)). The other way around will not work, because the lookahead will already have discarded the regex match by the time the backreference is to be saved.
The dot in the character class [.] means a literal dot. Change it to just . if you wish to match any character.
The lookahead (?=.*) will always match and is completely pointless. Change it to (.*) if you just want to capture that part of the string.
I think the problem is your positive lookahead on(?!\-\w) is trying to match any on that is not followed by - then \w. I think what you want instead is on(?!\-|\w), which matches on that is not followed by - OR \w

Categories

Resources