Regular Expression Issue Pattern not working

Regular Expression Issue Pattern not working - javascript

I have a regular expression which accepts only an email with the following pattern.
#stanford.edu.uk or word.edu.word
here it is
/(\.edu\.\w\w\w?)$/
It appears that this only works when .edu is followed by ".xx" (example: school.edu.au or college.edu.uk). I need this to also work for e-mails that end with .edu (example: school.edu or student.college.edu)
I tried this:
/(\.w+\.w+\.edu)$/
If any one can help?

Your (\.edu\.\w\w\w?)$ pattern requires a . and at 2 to 3 word chars after it before the end of the string, so it can't match strings with .edu at the end.
You may fix the pattern using
\.edu(?:\.\w{2,3})?$
See the regex demo
Details
\.edu - an .edu substring
(?:\.\w{2,3})? - an optional non-capturing group matching 1 or 0 occurrences of
\. - a dot
\w{2,3} - 2 to 3 word chars
$ - end of string.
Note that \w matches letters, digits and _. You might want to precise this bit in case you only want to match letters ([a-zA-Z] to only handle ASCII, or use ECMAScript 2018 powered \p{L} Unicode property class (not working in older browsers), or build your own pattern to support all Unicode letters).
Also, consider going through How to validate an email address using a regular expression?

Related

Extracting text from a string after 5 characters and without the last slash

I have a few strings:
some-text-123123#####abcdefg/
some-STRING-413123#####qwer123t/
some-STRING-413123#####456zxcv/
I would like to receive:
abcdefg
qwer123t
456zxcv
I have tried regexp:
/[^#####]*[^\/]/
But this not working...

To get whatever comes after five #s and before the last /, you can use
/#####(.*)\//
and pick up the first group.
Demo:
const regex = /#####(.*)\//;
console.log('some-text-123123#####abcdefg/'.match(regex)[1]);
console.log('some-STRING-413123#####qwer123t/'.match(regex)[1]);
console.log('some-STRING-413123#####456zxcv/'.match(regex)[1]);

assumptions:
the desired part of the string sample will always:
start after 5 #'s
end before a single /
suggestion: /(?<=#{5})\w*(?=\/)/
So (?<=#{5}) is a lookbehind assertion which will check to see if any matching string has the provided assertion immediately behind it (in this case, 5 #'s).
(?=\/) is a lookahead assertion, which will check ahead of a matching string segment to see if it matches the provided assertion (in this case, a single /).
The actual text the regex will return as a match is \w*, consisting of a character class and a quantifier. The character class \w matches any alphanumeric character ([A-Za-z0-9_]). The * quantifier matches the preceding item 0 or more times.
successful matches:
'some-text-123123#####abcdefg/'
'some-STRING-413123#####qwer123t/'
'some-STRING-413123#####456zxcv/'
I would highly recommend learning Regular Expressions in-depth, as it's a very powerful tool when fully utilised.
MDN, as with most things web-dev, is a fantastic resource for regex. Everything from my answer here can be learned on MDN's Regular expression syntax cheatsheet.
Also, an interactive tool can be very helpful when putting together a complex regular expression. Regex 101 is typically what I use, but there are many similar web-tools online that can be found from a google search.

You pattern does not work because you are using negated character classes [^
The pattern [^#####]*[^\/] can be written as [^#]*[^\/] and matches optional chars other than # and then a single char other than /
Here are some examples of other patterns that can give the same match.
At least 5 leading # chars and then matching 1+ word chars in a group and the / at the end of the string using an anchor $, or omit the anchor if that is not the case:
#####(\w+)\/$
Regex demo
If there should be a preceding character other than #
[^#]#####(\w+)\/$
(?<!#)#####(\w+)\/$
Regex demo
Matching at least 5 # chars and no # or / in between using a negated character class in this case:
#####([^#\/]+)\/
Or with lookarounds:
(?<=(?<!#)#####)[^#\/]+(?=\/)
Regex demo

Regex: match underscore-wrapped words unless they start with # / #

I'm trying to work around this bug in Tiptap (a WYSIWYG editor for Vue) by passing in a custom regex so that the regex that identifies italics notation in Markdown (_value_) would not be applied to strings that start with # or #, e.g. #some_tag_value would not get transformed into #sometagvalue.
This is my regex so far - /(^|[^##_\w])(?:\w?)(_([^_]+)_)/g
Edit: new regex with help from # Wiktor Stribiżew /(^|[^##_\w])(_([^_]+)_)/g
While it satisfies most of the common cases, it currently still fails when
underscores are mid-word, e.g. ant_farm_ should be matched (antfarm)
I have also provided some "should match" and "should not match" cases here https://regexr.com/50ibf for easier testing
Should match (between underscores)
_italic text here_
police_woman_
_fire_fighter
a thousand _words_
_brunch_ on a Sunday
Should not match
#ta_g_
__value__
#some_tag_value
#some_value_here
#some_tag_
#some_val_
#_hello_

You may use the following pattern:
(?:^|\s)[^##\s_]*(_([^_]+)_)
See the regex demo
Details
(?:^|\s) - start of string or whitespace
[^##\s_]* - 0 or more chars other than #, #, _ and whitespace
(_([^_]+)_) - Group 1: _, 1+ chars other than _ (captured into Group 2) and then _.

For science, this monstrosity works in Chrome (and Node.js).
let text = `
<strong>Should match</strong> (between underscores)
_italic text here_
police_woman_
_fire_fighter
a thousand _words_
_brunch_ on a Sunday
<strong>Should not match</strong>
#ta_g_
__value__
#some_tag_value
#some_value_here
#some_tag_
#some_val_
#_hello_
`;
let re = /(?<=(?:\s|^)(?![##])[^_\n]*)_([^_]+)_/g;
document.querySelector('div').innerHTML = text.replace(re, '<em>$1</em>');
div { white-space: pre; }
<div/>
This captures _something_ as full match, and something as 1st capture group (in order to remove the underscores). You can't capture just something, because then you lose the ability to tell what is inside the underscores, and what is outside (try it with (?<=(?:\s|^)(?![##])[^_\n]*_)([^_]+)(?=_)).
There are two things that prevent it being universally applicable:
Look-behinds are not supported in all JavaScript engines
Most regexp engines do not support variable-length look-behinds
EDIT: This is a bit stronger, and should allow you to additionally match_this_and_that_ but not #match_this_and_that correctly:
/(?<=(?:\s|^)(?![##])(?!__)\S*)_([^_]+)_/
Explanation:
_([^_]+)_ Match non-underscory bit between two underscores
(?<=...) that is preceded by
(?:\s|^) either a whitespace or a start of a line/string
(i.e. a proper word boundary, since we can't use `\b`)
\S* and then some non-space characters
(?![##]) that don't start with `#`, `#`,
(?!__) or `__`.
regex101 demo

Here's something, it's not as compact as other answers, but I think it's easier to understand what is going on. Match group \3 is what you want.
Needs the multiline flag
^([a-zA-Z\s]+|_)(([a-zA-Z\s]+)_)+?[a-zA-Z\s]*?$
^ - match the start of the line
([a-zA-Z\s]+|_) - multiple words or _
(([a-zA-Z\s]+)_)+? - multiple words followed by _ at least once, but the minimum match.
[a-zA-Z\s]*? - any final words
$ - the end of the line
In summary the breakdown of the things to match one of
_<words>_
<words>_<words>_
<words>_<words>_<words>
_<words>_<words>

Regular expression for text extraction

Can you please help me with the regular expression. I am newbie to this.
my requirement is I want to extract the vehicle no (i.e, 123456789) from the below url :
mysite.com/resource?slk=121&ops=rewww&from=kld&to=aop&search=things&validVehicle=sdfdsdff-sdfdf-sddf%3AVX%3ALNCX%3A123456789%3AOPW%3ALOS
I tried the below expression:
[&?]{1}validVehicle[=]{1}[^&]*[%3A]{1}([^%&]+)
But it is giving invalid results. Can you pelase help me on this.

A pure regex solution:
[&?]validVehicle=[^&]*(\d{9})
Or, if you are sure they appear after %3A and not followed with a digit:
[&?]validVehicle=[^&]*%3A(\d{9})(?!\d)
See this regex demo and another regex demo. The value you seek is in Group 1.
Details:
[&?] - a ? or &
validVehicle= - a literal substring
[^&]* - any symbols other than &, as many as possible up to the last
%3A - literal substring
(\d{9}) - Group 1: 9 digits
(?!\d) - not followed with a digit.

A "structural" approach might be to use those "%3a" colons as the delimiters of the pattern, combined with non-greedy wildcards .* (this matches fourth field of 'validVehicle' as defined by the delimiter %3a, and assumes this structure does not change):
[&?]validVehicle=(?:.*?%3a){3}(.*?)%3a
The utility of this way vs the \d{9} patterns already suggested really just depends on what you know for certain about the incoming data. Such patterns would certainly match nine digits in other fields of that delimited value.

Regex for a valid hashtag

I need regular expression for validating a hashtag. Each hashtag should starts with hashtag("#").
Valid inputs:
1. #hashtag_abc
2. #simpleHashtag
3. #hashtag123
Invalid inputs:
1. #hashtag#
2. #hashtag#hashtag
I have been trying with this regex /#[a-zA-z0-9]/ but it is accepting invalid inputs also.
Any suggestions for how to do it?

The current accepted answer fails in a few places:
It accepts hashtags that have no letters in them (i.e. "#11111", "#___" both pass).
It will exclude hashtags that are separated by spaces ("hey there #friend" fails to match "#friend").
It doesn't allow you to place a min/max length on the hashtag.
It doesn't offer a lot of flexibility if you decide to add other symbols/characters to your valid input list.
Try the following regex:
/(^|\B)#(?![0-9_]+\b)([a-zA-Z0-9_]{1,30})(\b|\r)/g
It'll close up the above edge cases, and furthermore:
You can change {1,30} to your desired min/max
You can add other symbols to the [0-9_] and [a-zA-Z0-9_] blocks if you wish to later
Here's a link to the demo.

To answer the current question...
There are 2 issues:
[A-z] allows more than just letter chars ([, , ], ^, _, ` )
There is no quantifier after the character class and it only matches 1 char
Since you are validating the whole string, you also need anchors (^ and $)to ensure a full string match:
/^#\w+$/
See the regex demo.
If you want to extract specific valid hashtags from longer texts...
This is a bonus section as a lot of people seek to extract (not validate) hashtags, so here are a couple of solutions for you. Just mind that \w in JavaScript (and a lot of other regex libraries) equal to [a-zA-Z0-9_]:
#\w{1,30}\b - a # char followed with one to thirty word chars followed with a word boundary
\B#\w{1,30}\b - a # char that is either at the start of string or right after a non-word char, then one to thirty word (i.e. letter, digit, or underscore) chars followed with one to thirty word chars followed with a word boundary
\B#(?![\d_]+\b)(\w{1,30})\b - # that is either at the start of string or right after a non-word char, then one to thirty word (i.e. letter, digit, or underscore) chars (that cannot be just digits/underscores) followed with a word boundary
And last but not least, here is a Twitter hashtag regex from https://github.com/twitter/twitter-text/tree/master/js... Sorry, too long to paste in the SO post, here it is: https://gist.github.com/stribizhev/715ee1ee2dc1439ffd464d81d22f80d1.

You could try the this : /#[a-zA-Z0-9_]+/
This will only include letters, numbers & underscores.

A regex code that matches any hashtag.
In this approach any character is accepted in hashtags except main signs !##$%^&*()
(?<=(\s|^))#[^\s\!\#\#\$\%\^\&\*\(\)]+(?=(\s|$))
Usage Notes
Turn on "g" and "m" flags when using!
It is tested for Java and JavaScript languages via https://regex101.com and VSCode tools.
It is available on this repo.

Unicode general categories can help with that task:
/^#[\p{L}\p{Nd}_]+$/gu
I use \p{L} and \p{Nd} unicode categories to match any letter or decimal digit number. You can add any necessary category for your regex. The complete list of categories can be found here: https://unicode.org/reports/tr18/#General_Category_Property
Regex live demo:
https://regexr.com/5tvmo

useful and tested regex for detecting hashtags in the text
/(^|\s)(#[a-zA-Z\d_]+)/ig
examples of valid matching hashtag:
#abc
#ab_c
#ABC
#aBC

/\B(?:#|＃)((?![\p{N}_]+(?:$|\b|\s))(?:[\p{L}\p{M}\p{N}_]{1,60}))/ug
allow any language characters or characters with numbers or _.
numbers alone or numbers with _ are not allowed.
It's unicode regex, so if you are using Python, you may need to install regex.
to test it https://regex101.com/r/NLHUQh/1

business phone regex containing if-else expression

I am trying to write business phone number regex in javascript, my requirements are:
It should contain only digits,dashes and whitespaces
It should not end with - but can end with whitespaces
There should be only 1 - between two groups
It should match numbers with and without - like 1, 123, 678-78
I have tried following regex but it fails for 123-- as it is invalid one anybody please suggest me something
/^([ ]*[0-9]+[-]?[0-9 ]*?([-])[ ]*[0-9]+[ ]*|[0-9 ]*[ ]*)+$/.test('123--2')

Try this
/^[0-9]+(-[0-9\s]+)*$/

I don't know if you still need an answer to this, but this works for your requirements:
/^(?!.+-\s*$)\s*((?:\d+\s*-?\s*)+)$/
Explanation:
^ start of string
(?!.+-\s*$) disallow - (or - followed by whitespace) at the end of the string
\s* optional leading spaces
( start capturing
(?:\d+\s*-?\s*)+ one or more groups of the following:
one or more digits,
possibly followed by whitespace,
possibly followed by a single hyphen,
possibly followed by more whitespace
) stop capturing
$ end of the string
Demo

Develop Reference

JavaScript is the programming language of the Web.

Regular Expression Issue Pattern not working - javascript

Related

Extracting text from a string after 5 characters and without the last slash

Regex: match underscore-wrapped words unless they start with # / #

Regular expression for text extraction

Regex for a valid hashtag

business phone regex containing if-else expression

Categories

Resources