Regular expression for text extraction

Regular expression for text extraction - javascript

Can you please help me with the regular expression. I am newbie to this.
my requirement is I want to extract the vehicle no (i.e, 123456789) from the below url :
mysite.com/resource?slk=121&ops=rewww&from=kld&to=aop&search=things&validVehicle=sdfdsdff-sdfdf-sddf%3AVX%3ALNCX%3A123456789%3AOPW%3ALOS
I tried the below expression:
[&?]{1}validVehicle[=]{1}[^&]*[%3A]{1}([^%&]+)
But it is giving invalid results. Can you pelase help me on this.

A pure regex solution:
[&?]validVehicle=[^&]*(\d{9})
Or, if you are sure they appear after %3A and not followed with a digit:
[&?]validVehicle=[^&]*%3A(\d{9})(?!\d)
See this regex demo and another regex demo. The value you seek is in Group 1.
Details:
[&?] - a ? or &
validVehicle= - a literal substring
[^&]* - any symbols other than &, as many as possible up to the last
%3A - literal substring
(\d{9}) - Group 1: 9 digits
(?!\d) - not followed with a digit.

A "structural" approach might be to use those "%3a" colons as the delimiters of the pattern, combined with non-greedy wildcards .* (this matches fourth field of 'validVehicle' as defined by the delimiter %3a, and assumes this structure does not change):
[&?]validVehicle=(?:.*?%3a){3}(.*?)%3a
The utility of this way vs the \d{9} patterns already suggested really just depends on what you know for certain about the incoming data. Such patterns would certainly match nine digits in other fields of that delimited value.

Related

Regular Expression for entering values in javascript

I have a sample format for which I want the regular expression in javascript. The format is as below.
I-KA-BGLK-ENB-V001
I am unable to try as I dont know much about the Regex Please let me know how to get it.
Even If I get the regex it will do, the javascript part I can handle it.

try this
var str = 'I-KA-BGLK-ENB-V001';
var re = /^[A-Z]-[A-Z]{2}-[A-Z]{4}-[A-Z]{3}-[A-Z]\d{3}$/;
re.test(str);// true
[A-Z] - means any uppercase letter
\d - means any digit 0-9
\d{3} - means 3 digits
[A-Z]{2} - means 2 uppercace letters
You can change if you need digits in some places.
If you dont care about lowercase or uppercase replace [A-Z] with \w
https://github.com/zeeshanu/learn-regex - lessons
Or you google "learn regex easy"

The valid regular expression for this is:
^\w-\w{2}-\w{4}-\w{3}-\w{4}$
^\w-\w{2}-\w{4}-\w{3}-\w\d{3}$
Explanation
Code
I don't wanna spoon-feed you, but the next step is to Check whether a string matches a regex in JS.

Well to match strings in the I-KA-BGLK-ENB-V001 format you can use this regex:
^[A-Z]\-[A-Z]{2}\-[A-Z]{4}\-[A-Z]{3}\-\w{4}$
You can test it in Regex101, where you can see an example of matching strings and check the meaning and the specifications for each part of it.

Regular Expression Issue Pattern not working

I have a regular expression which accepts only an email with the following pattern.
#stanford.edu.uk or word.edu.word
here it is
/(\.edu\.\w\w\w?)$/
It appears that this only works when .edu is followed by ".xx" (example: school.edu.au or college.edu.uk). I need this to also work for e-mails that end with .edu (example: school.edu or student.college.edu)
I tried this:
/(\.w+\.w+\.edu)$/
If any one can help?

Your (\.edu\.\w\w\w?)$ pattern requires a . and at 2 to 3 word chars after it before the end of the string, so it can't match strings with .edu at the end.
You may fix the pattern using
\.edu(?:\.\w{2,3})?$
See the regex demo
Details
\.edu - an .edu substring
(?:\.\w{2,3})? - an optional non-capturing group matching 1 or 0 occurrences of
\. - a dot
\w{2,3} - 2 to 3 word chars
$ - end of string.
Note that \w matches letters, digits and _. You might want to precise this bit in case you only want to match letters ([a-zA-Z] to only handle ASCII, or use ECMAScript 2018 powered \p{L} Unicode property class (not working in older browsers), or build your own pattern to support all Unicode letters).
Also, consider going through How to validate an email address using a regular expression?

Regex for digits and hyphen only

I am trying to understand regex, for digits of length 10 I can simply do
/^[0-9]{10}$/
for hyphen only I can do
/^[-]$/
combining the two using group expression will result in
/^([0-9]{10})|([-])$/
This expression does not work as intended, it somehow will match part of the string instead of not match at all if the string is invalid.
How do I make the regex expression that accepts only "-" or 10 digits?

It would have worked fine to combine your two regexps exactly as you had them. In other words, just use the alternation/pipe operator to combine
/^[0-9]{10}$/
and
/^[-]$/
as is, directly into
/^[0-9]{10}$|^[-]$/
↑↑↑↑↑↑↑↑↑↑↑ ↑↑↑↑↑ YOUR ORIGINAL REGEXPS, COMBINED AS IS WITH |
This can be represented as
and that would have worked fine. As others have pointed out, you don't need to specify the hyphen in a character class, so
/^[0-9]{10}$|^-$/
↑ SIMPLIFY [-] TO JUST -
Now, we notice that each of the two alternatives has a ^ at the beginning and a $ at the end. That is a bit duplicative, and it also makes it little harder to see immediately that the regexp is always matching things from beginning to end. Therefore, we can rewrite this, as explained in other answers, by taking the ^ and $ out of both sub-regexps, and combine their contents using the grouping operator ():
/^([0-9]{10}|-)$/
↑↑↑↑↑↑↑↑↑↑↑↑↑ GROUP REGEXP CONTENTS WITH PARENS, WITH ANCHORS OUTSIDE
The corresponding visualization is
That would also work fine, but you could use \d instead of [0-9], so the final, simplest version is:
/^(\d{10}|-)$/
↑↑ USE \d FOR DIGITS
and this visualizes as
If for some reason you don't want to "capture" the group, use (?:, as in
/^(?:\d{10}|-)$/
↑↑ DON'T CAPTURE THE GROUP
and the visualization now shows that group is not captured:
By the way, in your original attempt to combine the two regexps, I noticed that you parenthesized them as in
/^([0-9]{10})|([-])$/
↑↑↑↑↑↑↑↑↑↑↑ ↑↑↑↑↑ YOU PARENTHESIZED THE SUB-REGEXPS
But actually this is not necessary, because the pipe (alternation, of "or") operator has low precedence already (actually it has the lowest precedence of any regexp operator); "low precedence" means it will apply only after things on both side are already processed, so what you wrote here is identical to
/^[0-9]{10}|[-]$/
which, however, still won't work for the reasons mentioned in other answers, as is clear from its visualization:

How do I make the regex expression that accepts only "-" or 10 digits?
You can use:
/^([0-9]{10}|-)$/
RegEx Demo
Your regex is just asserting presence of hyphen in the end due to misplacements of parentheses.
Here is the effective breakdown of OP's regex:
^([0-9]{10}) # matches 10 digits at start
| # OR
([-])$ # matches hyphen at end
which will cause OP's regex to match any input starting with 10 digits or ending with hyphen making these invalid inputs also a valid match:
1234567890111
1234----
------------------
1234567890--------

To get the regex expression that accepts only "-" or 10 digits - change your regexp as shown below:
^(\d{10}|-)$
DEMO link

The problem with your regex is it's looking for strings either
starting with 10 digits i.e. ^([0-9]{10}) or
ends with "-" - i.e. ([-])$
You needs an addtional wrapping ^( .. )$ to get this work. i.e.
/^(([0-9]{10})|([-]))$/
Better yet /^([0-9]{10}|-)$/ since [-] and - are both the same.

Regex lookaround for a group doesn't work

Happy Saturday,
I'm wondering if Stackoverflow's users could give me a clue about one specific Regex..
(^visite\d+)(?!\D)
The above regex works well..
It says that :
visite12345 --> is a good anwser (the string does match)
visite1a --> is not a good anwser (the string doesn't match)
However for:
visite12345a --> It doesn't work.
Indeed, the output is visite1234, whereas I'd like to get the same answer that for visite1a (string doesn't match)...
I use http://regexr.com/ to test my regexp.
Do you have any idea how to so?
Thank you very much.

The regex (^visite\d+)(?!\D) matches visite at the start of the string, followed with one or more digits that should not be followed with a non-digit.
The "issue" is that the engine can backtrack within \d+ pattern and it can match 2 digits if the third is not followed with a nondigit.
The best way to solve it is to check the actual requirements and adjust the pattern.
If the digits are the last characters in the string you just should replace the lookahead with the $ anchor.
A generic solution for this is making the subpattern atomic with a capturing group inside a positive lookahead and a backreference, and make sure the lookahead is changed to something like (?![a-zA-Z]) - fail if there is a letter):
/^visite(?=(\d+))\1(?![a-z])/i
See the regex demo
Or if a word boundary should follow the digits (i.e. digits should be followed with a letter, digit or an underscore), use \b instead of the lookahead:
/^visite\d+\b/
See another demo

Regular Expression for a REST endpoint

Can someone please help me in defining a regular expression for an endpoint.
person/^((?!-).)*$/
This regex needs to match a number of things but mainly:
person/:id
it should NOT match
person/1234-5678-9123 (it's currently not matching this which is good)
the problem I have is that it should NOT match this but it is:
person/123456789123 (it's currently matching this but shouldn't)
To be clear, If you go to: http://regex101.com and paste in:
^((?!-).)*$
You can see that is matches 123456789123 WHICH IS WRONG
How can I change the RegEx so it doesn't match 123456789123
Cheers.

Your regex ^((?!-).)*$ is same as ^[^-]*$ that is match any charcater but not of - zero or more times.
The reason for why your regex not matches this person/1234-5678-9123 is because it has - symbol. But person/123456789123 string isn't has - symbol, so this got matched.
To match the string which has - between the numbers then you could try the below regex.
^.*?\d+-\d+.*$
OR
^(?=.*?-).+$
(?=.*?-) Positive lookahead asserts that the string must contain an - symbol.
DEMO

Develop Reference

JavaScript is the programming language of the Web.

Regular expression for text extraction - javascript

Related

Regular Expression for entering values in javascript

Regular Expression Issue Pattern not working

Regex for digits and hyphen only

Regex lookaround for a group doesn't work

Regular Expression for a REST endpoint

Categories

Resources