Regular expression to retrieve from the URL - javascript

/test-test-test/test.aspx
Hi there,
I am having a bit difficult to retrieve the first bit out from the the above URL.
test-test-test
I tried this /[\w+|-]/g but it match the last test.aspx as well.
Please help out.
Thanks

One way of doing it is using the Dom Parser as stated here: https://stackoverflow.com/a/13465791/970247.
Then you could access to the segments of the url using for example: myURL.segments; // = Array = ['test-test-test', 'test.aspx']

You need to use a positive lookahead assertion. | inside a character class would match a literal | symbol. It won't act like an alternation operator. So i suggest you to remove that.
[\w-]+(?=\/)
(?=\/) called positive lookahead assertion which asserts that the match must be followed by an forward slash. In our case test-test-test only followed by a forward slash, so it got matched. [\w-]+ matches one or more word character or hyphen. + repeats the previous token one or more times.
Example:
> "/test-test-test/test.aspx".match(/[\w-]+(?=\/)/g)
[ 'test-test-test' ]

[\w+|-] is wrong, should be [\w-]+. "A series of characters that are either word characters or hyphens", not "a single character that is a word character, a plus, a pipe, or a hyphen".
The g flag means global match, so naturally all matches will be found instead of just the first one. So you should remove that.
> '/test-test-test/test.aspx'.match(/[\w-]+/)
< ["test-test-test"]

Related

How to use regex ?: operator and get the right group in my case? [duplicate]

This is an example string:
123456#p654321
Currently, I am using this match to capture 123456 and 654321 in to two different groups:
([0-9].*)#p([0-9].*)
But on occasions, the #p654321 part of the string will not be there, so I will only want to capture the first group. I tried to make the second group "optional" by appending ? to it, which works, but only as long as there is a #p at the end of the remaining string.
What would be the best way to solve this problem?
You have the #p outside of the capturing group, which makes it a required piece of the result. You are also using the dot character (.) improperly. Dot (in most reg-ex variants) will match any character. Change it to:
([0-9]*)(?:#p([0-9]*))?
The (?:) syntax is how you get a non-capturing group. We then capture just the digits that you're interested in. Finally, we make the whole thing optional.
Also, most reg-ex variants have a \d character class for digits. So you could simplify even further:
(\d*)(?:#p(\d*))?
As another person has pointed out, the * operator could potentially match zero digits. To prevent this, use the + operator instead:
(\d+)(?:#p(\d+))?
Your regex will actually match no digits, because you've used * instead of +.
This is what (I think) you want:
(\d+)(?:#p(\d+))?

Ambiguity in regex in javascript

var a = 'a\na'
console.log(a.match(/.*/g)) // ['a', '', 'a', '']
Why are there two empty strings in the result?
Let's say if there are empty strings, why isn't there one at beginning and at the end of each line as well, hence 4 empty strings?
I am not looking for how to select 'a's but just want to understand the presence of the empty strings.
The best explanation I can offer for the following:
'ab\na'.match(/.*/g)
["ab", "", "a", ""]
Is that JavaScript's match function uses dot not in DOT ALL mode, meaning that dot does not match across newlines. When the .* pattern is applied to ab\na, it first matches ab, then stops at the newline. The newline generates an empty match. Then, a is matched, and then for some reason the end of the string matches another empty match.
If you just want to extract the non whitespace content from each line, then you may try the following:
print('ab\na'.match(/.+/g))
ab,a
Let's say if there are empty strings, why isn't there one at beginning
and at the end...
.* applies greediness. It swallows a complete line asap. By a line I mean everything before a line break. When it encounters end of a line, it matches again due to star quantifier.
If you want 4 you may add ? to star quantifier and make it lazy .*? but yet this regex has different result in different flavors because of the way they handle zero-length matches.
You can try .*? with both PCRE and JS engines in regex101 and see the differences.
Question:
You may ask why does engine try to find a match at the end of line while whole thing is already matched?
Answer:
It's for the reason that we have a definition for end of lines and end of strings. So not whole thing is matched. There is a left position that has a chance to be matched and we have it with star quantifier.
This left position is end of line here which is a true match for $ when m flag is on. A . doesn't match this position but a .* or .*? match because they would be a pattern for zero-length positions too as any X-STAR patterns like \d*, \D*, a* or b?
Star operator * means there can be any number of ocurrences (even 0 ocurrences). With the expression used, an empty string can be a match. Not sure what are you looking for, but maybe a + operator (1 or more ocurrences) will be better?
Want to add some more info, regex use a greedy algorithm by default (in some languages you can override this behaviour), so it will pick as much of the text as it can. In this case, it will pick the a, because it can be processed with the regex, so the "\na" is still there. "\n" does not match the ".", so the only available option is the empty string. Then, we will process the next line, and again, we can match a "a". After this, only the empty string matches the regex.
* Matches the preceding expression 0 or more times.
. matches any single character except the newline character.
That is what official doc says about . and *. So i guess the array you received is something like this:
[ the first "any character" of the first line, following "nothing", the first "any character" of the second line, following "nothing"]
And the new-line character is just ignored

Need help to find the right regex pattern to match

my RegEx is not working the way i think, it should.
[^a-zA-Z](\d+-)?OSM\d*(?![a-zA-Z])
I will use this regex in a javascript, to check if a string match with it.
Should match:
12345612-OSM34
12-OSM34
OSM56
7-OSM
OSM
Should not match:
-OSM
a-OSM
rOSMann
rOSMa
asdrOSMa
rOSM89
01-OSMann
OSMond
23OSM
45OSM678
One line, represents a string in my javascript.
https://www.regex101.com/r/xQ0zG1/3
The rules for matching:
match OSM if it stands alone
optional match if line starts with digit/s AND is followed by a -
optional match if line ends with digit/s
match all 3 above combined
no match if line starts with a character/word except OSM
no match if line end with chracter/word except OSM
I Hope someone can help.
You can use the following simplified pattern using anchors:
^(?:\d+-)?OSM\d*$
The flags needed (if matching multi-line paragraph) would be: g for global match and m for multi-line match, so that ^ and $ match the begin/end of each line.
EDIT
Changed the (\d+-) match to (?:\d+-) so that it doesn't group.
[^a-zA-Z](\d+-)?OSM\d*(?![a-zA-Z])
[^a-zA-Z] In regex, you specify what you want, not what you don't want. This piece of code says there must be one character that isn't a letter. I believe what you wanted to say is to match the start of a line. You don't need to specify that there's no letter, you're about to specify what there will be on the line anyway. The start of a regex is represented with ^ (outside of brackets). You'll have to use the m flag to make the regex multi-line.
(\d+-)? means one or more digits followed by a - character. The ? means this whole block isn't required. If you don't want foreign digits, you might want to use [0-9] instead, but it's not as important. This part of the code, you got right. However, if you don't need capture blocks, you could write (?:) instead of ().
\d*(?![a-zA-Z]) uses lookahead, but you almost never need to do that. Again, specifying what you don't want is a bad idea because then I could write OSMé and it would match because you didn't specify that é is forbidden. It's much simpler to specify what is allowed. In your case since you want to match line ends. So instead, you can write \d*$ which means zero or more digits followed by the end of the line.
/^(?:\d+-)?OSM\d*$/gm is the final result.

Regexp: excluding a word but including non-standard punctuation

I want to find strings that contain words in a particular order, allowing non-standard characters in between the words but excluding a particular word or symbol.
I'm using javascript's replace function to find all instances and put into an array.
So, I want select...from, with anything except 'from' in between the words. Or I can separate select...from from select...from (, as long as I exclude nesting. I think the answer is the same for both, i.e. how do I write: find x and not y within the same regexp?
From the internet, I feel this should work: /\bselect\b^(?!from).*\bfrom\b/gi but this finds no matches.
This works to find all select...from: /\bselect\b[0-9a-zA-Z#\(\)\[\]\s\.\*,%_+-]*?\bfrom\b/gi but modifying it to exclude the parenthesis "(" at the end prevents any matches: /\bselect\b[0-9a-zA-Z#\(\)\[\]\s\.\*,%_+-]*?\bfrom\b\s*^\(/gi
Can anyone tell me how to exclude words and symbols within this regexp?
Many thanks
Emma
Edit: partial string input:
left outer join [stage].[db].[table14] o on p.Project_id = o.project_id
left outer join
(
select
different_id
,sum(costs) - ( sum(brushes) + sum(carpets) + sum(fabric) + sum(other) + sum(chairs)+ sum(apples) ) as overallNumber
from
(
select ace from [stage].db.[table18] J
Javascript:
sequel = stringInputAsAbove;
var tst = sequel.replace(/\bselect\b[\s\S]*?\bfrom\b/gi, function(a,b) { console.log('match: '+a); selects.push(b); return a; });
console.log(selects);
Console.log(selects) should print an array of numbers, where each number is the starting character of a select...from. This works for the second regexp I gave in my info, printing: [95, 251]. Your \s\S variation does the same, #stribizhev.
The first example ^(?!from).* should do likewise but returns [].
The third example \s*^\( should return 251 only but returns []. However I have just noticed that the positive expression \s*\( does give 95, so some progress! It's the negatives I'm getting wrong.
Your \bselect\b^(?!from).*\bfrom\b regex doesn't work as expected because:
^ means here beginning of a line, not negation of next part, so
the \bselect\b^ means, select word followed by beginning of a
line. After removal of ^ regex start to match something
(DEMO) but it is still invalid.
in multiline text .* without modification will not match new line,
so regex will match only select...from in single lines, but if you
change it for (.|\n)* (as a simple example) it will match
multiline, but still invalid
the * is greede quantifire, so it will match as much a possible,
but if you use reluctant quantifire *?, regex will match to first
occurance of from word, and int will start to return relativly
correct result.
\bselect\b(?!from) means match separate select word which is not
directly followed by separate from word, so it would be
selectfrom somehow composed of separate words (because
select\bfrom) so (?!from) doesn't work and it is redundant
In effect you will get regex very similar to what Stribizhev gave you: \bselect\b(.|\n)*?\bfrom\b
In third expression you meke same mistake: \bselect\b[0-9a-zA-Z#\(\)\[\]\s\.\*,%_+-]*?\bfrom\b\s*^\( using ^ as (I assume) a negation, not beginning of a line. Remove ^ and you will again get relativly valid result (match from select through from to closing parathesis ) ).
Your second regex works similar to \bselect\b(.|\n)*?\bfrom\b or \bselect\b[\s\S]*?\bfrom\b.
I wrote "relativly valid result", as I also think, that parsing SQL with regex could be very camplicated, so I am not sure if it will work in every case.
You can also try to use positive lookahead to match just position in text, like:
(?=\bselect\b(?:.|\n)*?\bfrom\b)
DEMO - the () was added to regex just to return beginning index of match in groups, so it would be easier to check it validity
Negation in regex
We use ^ as negation in character class, for example [^a-z] means match anything but not letter, so it will match number, symbol, whitespace, etc, but not letter from range a to z (Look here). But this negation is on a level of single character. I you use [^from] it will prevent regex from matching characters f,r,o and m (demo). Also the [^from]{4} will avoid matching from but also form, morf, etc.
To exlude the whole word from matching by regex, you need to use negative look ahead, like (?!from), which will fail to match, if there will be chosen word from fallowing given position. To avoid matching whole line containing from you could use ^(?!.*from.*).+$ (demo).
However in your case, you don't need to use this construction, because if you replace greedy quantifire .*\bfrom with .*?\bfrom it will match to first occurance of this word. Whats more it would couse problems. Take a look on this regex, it will not match anything because (?![\s\S]*from[\s\S]*) is not restricted by anything, so it will match only if there is no from after select, but we want to match also from! in effect this regex try to match and exclude from at once, and fail. so the (?!.*word.*) construction works much better to exclude matching line with given word.
So what to do if we don't what to match a word in a fragment of a match? I think select\b([^f]|f(?!rom))*?\bfrom\b is a good solution. With ([^f]|f(?!rom))*? it will match everything between select and from, but will not exclude from.
But if you would like to match only select...from not followed by ( then it is good idea to use (?!\() like. But in your regex (multiline, use of (.|\n)*? or [\s\S]*? it will cause to match up to next select...from part, because reluctant quantifire will chenge a plece where it need to match to make whole regex . In my opinion, good solution would be to use again:
select\b([^f]|f(?!rom))*?\bfrom\b(?!\s*?\()
which will not overlap additional select..from and will not match if there is \( after select...from - check it here

Removing Numbers from a String using Javascript

How do I remove numbers from a string using Javascript?
I am not very good with regex at all but I think I can use with replace to achieve the above?
It would actually be great if there was something JQuery offered already to do this?
//Something Like this??
var string = 'All23';
string.replace('REGEX', '');
I appreciate any help on this.
\d matches any number, so you want to replace them with an empty string:
string.replace(/\d+/g, '')
I've used the + modifier here so that it will match all adjacent numbers in one go, and hence require less replacing. The g at the end is a flag which means "global" and it means that it will replace ALL matches it finds, not just the first one.
Just paste this into your address bar to try it out:
javascript:alert('abc123def456ghi'.replace(/\d+/g,''))
\d indicates a character in the range 0-9, and the + indicates one or more; so \d+ matches one or more digits. The g is necessary to indicate global matching, as opposed to quitting after the first match (the default behavior).

Categories

Resources