Excluding URLS that contain a string in Regex

Excluding URLS that contain a string in Regex - javascript

I am using regex to add a survey to pages and I want to include it on all pages except payment and signin pages. I can't use look arounds for the regex so I am attempting to use the following but it isn't working.
^/.*[^(credit|signin)].*
Which should capture all urls except those containing credit or signin

[ indicates the start of a character class and the [^ negation is per-character. Thus your regular expression is "anything followed by any character not in this class followed by anything," which is very likely to match anything.
Since you are using specific strings, I don't think a regular expression is appropriate here. It would be a lot simpler to check that credit and signin don't exist in the string, such as with JavaScript:
-1 === string.indexOf("credit") && -1 === string.indexOf("signin")
Or you could check that a regular expression does not match
false === /credit|signin/.test(string)

Whitelisting words in regex is generally pretty easy, and usually follows a form of:
^.*(?:option1|option2).*$
The pattern breaks down to:
^ - start of string
.* - 0 or more non-newline characters*
(?: - open non-capturing group
option1|option2 - | separated list of options to whitelist
) - close non-capturing group
.* - 0 or more non-newline characters
$ - end of string
Blacklisting words in a regex is a bit more complicated to understand, but can be done with a pattern along the lines of:
^(?:(?!option1|option2).)*$
The pattern breaks down to:
^ - start of string
(?: - open non-capturing group
(?! - open negative lookahead (the next characters in the string must not match the value contained in the negative lookahead)
option1|option2 - | separated list of options to blacklist
) - close negative lookahead
. - a single non-newline character*
) - close non-capturing group
* - repeat the group 0 or more times
$ - end of string
Basically this pattern checks that the values in the blacklist do not occur at any point in the string.
* exact characters vary depending on the language, so use caution
The final version:
/^(?:(?!credit|signin).)*$/

Related

Regular Expression Strict Test

I am trying to create regex where user have to enter exactly the same thing no extra no less
Here is my regex;
/[a-zA-Z0-9][a-zA-Z0-9\-]*\.myshopify\.com/
when I test this with, for example, myshop.myshopify.coma it returns true or myshop.myshopify.com myshop123.myshopify.com still returns true
What I am trying to get is if user enters myshop.myshopify.coma or myshop321.myshopify.com myshop123.myshopify.coma it shouldn't be match.
It should only match when the entire input is exactly like this [anything except ()=>%$ etc].myshopify.com
what should I include in my regex to strictly test exactly one thing.

you can use boundary-type assertions to match the beginning of an input (^) and an end ($) - to make sure your input matches fully.
const pattern = /^[a-zA-Z0-9][a-zA-Z0-9\-]*\.myshopify\.com$/
console.log(pattern.test('myshop.myshopify.com')) // true
console.log(pattern.test('myshop.myshopify.coma')) // false
console.log(pattern.test('myshop.myshopify.com myshop123.myshopify.com')) // false

You'd currently allow for input like "A---", so besides the good point about start and end line anchors, you'd maybe want to reconsider your pattern. Maybe something like:
^[a-z\d]+(?:-[a-z\d]+)*\.myshopify\.com$
See the online demo
^ - Start line anchor.
[a-z\d]+ - 1+ any alnum character.
(?: - Open non-capture group:
-[a-z\d]+ - A literal hyphen followed by 1+ alnum chars.
)* - Close non-capture group and match it zero or more times.
\.myshopify\.com - Match a ".myshopify.com" literallyy.
$ - End line anchor.
A 2nd option would be to use a negative lookahead to achieve the same concept:
^(?!-|.*-[-.])[a-z\d-]+\.myshopify\.com$
See the online demo
^ - Start line anchor.
(?! - Negative lookahead for:
- - A leading hypen
| - Or:
.*-[-.] - Any character other than newline zero or more times up to an hypen with either another hypen or a literal dot.
) - Close negative lookahead.
[a-zA-Z\d]+ - 1+ any alnum character.
\.myshopify\.com - Match a ".myshopify.com" literallyy.
$ - End line anchor.
In both cases I used both the global and case-insensitive flags: /<pattern>/gi. See a sample below:
const patt1 = /^[a-z\d]+(?:-[a-z\d]+)*\.myshopify\.com$/gi
console.log(patt1.test('myshop.myshopify.com'))
console.log(patt1.test('myshop-.myshopify.com'))
const patt2 = /^(?!-|.*-[-.])[a-z\d-]+\.myshopify\.com$/gi
console.log(patt2.test('myshop.myshopify.com'))
console.log(patt2.test('myshop-.myshopify.com'))

Regex: match underscore-wrapped words unless they start with # / #

I'm trying to work around this bug in Tiptap (a WYSIWYG editor for Vue) by passing in a custom regex so that the regex that identifies italics notation in Markdown (_value_) would not be applied to strings that start with # or #, e.g. #some_tag_value would not get transformed into #sometagvalue.
This is my regex so far - /(^|[^##_\w])(?:\w?)(_([^_]+)_)/g
Edit: new regex with help from # Wiktor Stribiżew /(^|[^##_\w])(_([^_]+)_)/g
While it satisfies most of the common cases, it currently still fails when
underscores are mid-word, e.g. ant_farm_ should be matched (antfarm)
I have also provided some "should match" and "should not match" cases here https://regexr.com/50ibf for easier testing
Should match (between underscores)
_italic text here_
police_woman_
_fire_fighter
a thousand _words_
_brunch_ on a Sunday
Should not match
#ta_g_
__value__
#some_tag_value
#some_value_here
#some_tag_
#some_val_
#_hello_

You may use the following pattern:
(?:^|\s)[^##\s_]*(_([^_]+)_)
See the regex demo
Details
(?:^|\s) - start of string or whitespace
[^##\s_]* - 0 or more chars other than #, #, _ and whitespace
(_([^_]+)_) - Group 1: _, 1+ chars other than _ (captured into Group 2) and then _.

For science, this monstrosity works in Chrome (and Node.js).
let text = `
<strong>Should match</strong> (between underscores)
_italic text here_
police_woman_
_fire_fighter
a thousand _words_
_brunch_ on a Sunday
<strong>Should not match</strong>
#ta_g_
__value__
#some_tag_value
#some_value_here
#some_tag_
#some_val_
#_hello_
`;
let re = /(?<=(?:\s|^)(?![##])[^_\n]*)_([^_]+)_/g;
document.querySelector('div').innerHTML = text.replace(re, '<em>$1</em>');
div { white-space: pre; }
<div/>
This captures _something_ as full match, and something as 1st capture group (in order to remove the underscores). You can't capture just something, because then you lose the ability to tell what is inside the underscores, and what is outside (try it with (?<=(?:\s|^)(?![##])[^_\n]*_)([^_]+)(?=_)).
There are two things that prevent it being universally applicable:
Look-behinds are not supported in all JavaScript engines
Most regexp engines do not support variable-length look-behinds
EDIT: This is a bit stronger, and should allow you to additionally match_this_and_that_ but not #match_this_and_that correctly:
/(?<=(?:\s|^)(?![##])(?!__)\S*)_([^_]+)_/
Explanation:
_([^_]+)_ Match non-underscory bit between two underscores
(?<=...) that is preceded by
(?:\s|^) either a whitespace or a start of a line/string
(i.e. a proper word boundary, since we can't use `\b`)
\S* and then some non-space characters
(?![##]) that don't start with `#`, `#`,
(?!__) or `__`.
regex101 demo

Here's something, it's not as compact as other answers, but I think it's easier to understand what is going on. Match group \3 is what you want.
Needs the multiline flag
^([a-zA-Z\s]+|_)(([a-zA-Z\s]+)_)+?[a-zA-Z\s]*?$
^ - match the start of the line
([a-zA-Z\s]+|_) - multiple words or _
(([a-zA-Z\s]+)_)+? - multiple words followed by _ at least once, but the minimum match.
[a-zA-Z\s]*? - any final words
$ - the end of the line
In summary the breakdown of the things to match one of
_<words>_
<words>_<words>_
<words>_<words>_<words>
_<words>_<words>

Regex consume a character if it matches, but not otherwise

I am trying to write a regex expression which will capture all instances of the '#' character, except when two such characters appear in succession (essentially, an escape sequence). For example:
abd#ajk: # should be matched
abd##ajk: No matches
abd###ajk: The final # should match.
abd####ajk: No matches
This almost works with the negative lookahead expression #(?!#), except that because the second # is not consumed, the last of two # symbols will still be matched. What I think I want to do is to lookahead for an # but consume the character if it is there; otherwise, do not consume it. Is this possible?
Edit: I'm using Javascript which unfortunately rules out several good approaches :(

In JavaScript, to split strings at an unescaped #, you may actually match chunks of text that is either ## (an escaped #) and any chars other than #:
var strs = ['abd#ajk','abd##ajk','abd###ajk','abd####ajk'];
var rx = /(?:[^#]|##)+/g;
for (var s of strs) {
console.log(s, "=>", s.match(rx))
}
The regex is
/(?:[^#]|##)+/g
See its demo
Details
(?: - start of a non-capturing group that matches either of the 2 alternatives:
[^#]- any char other than#`
| - or
## - 2 #s
)+ - repeat matching 1 or more times.
The g modifier finds all matching occurrences inside the input string.

Since you didn't tag a programming language to your question here is my 2 cents for Java:
(?<=(?<!#)(?:##){0,999})#(?!#)
Java doesn't support infinite lookbehinds but bounded so here I explicitly specified max of even occurrences of #: 999.
JavsScript
Lookbehinds in JavaScript are not implemented and supported by many browsers yet. If you are trying to do this in JS then this would be your working solution:
Method 1
((?:[^#]*(?:##)+[^#]*)+)|#
(?:[^#]*(?:##)+[^#]*)+ Match ## occurrences and all its leading / trailing characters
|# Or a single #
JS Code:
str.split(/((?:[^#]*(?:##)+[^#]*)+)|#/).filter(Boolean);
Method 2 (Recommended)
Or if you don't have problem with using match() this is much more cleaner and of course faster:
(?:[^#]*(?:##)+[^#]*)+|[^#]+
JS Code:
console.log(
"aaaa#######bbb#aa###cccc##ddddd#".match(/(?:[^#]*(?:##)+[^#]*)+|[^#]+/g)
);

Regex Javascript, negative lookahead with white space

In my Javascript angular application I have a regex to validate usernames.
The issue I am facing, after doing much research is that utilising negative lookahead with expressions that contain white spaces is not working.
The Requirements
Username can be composed of many alphanumeric strings split at most with one space. Spaces at edges are not allowed. also the username should be filtered against a couple of banned names.
1)
(/^[a-zA-Z\d]+([\s][a-zA-Z\d]+)+?$/).test("admin may not be used")
allows alphanumeric words to be split by one consequent space at a time, and disallows spaces at edges
2)
(/^(?!(?:admin|alfred)$)[a-zA-Z\d]+$/).test("admin")
works and word admin is not allowed
3) merging both:
(/^(?!(?:admin|alfred)$)[a-zA-Z\d]+([\s][a-zA-Z\d]+)+?$/).test("admin may not be used")
fails! and will allow the banned word admin to be used.
Expected Result:
Both filters are expected to work, that is the banned words list , as well as consequent space filter.
Can you please point what possibly is wrong with my expression?

You may consider using
/^(?!.*\b(?:admin|alfred)\b)[a-zA-Z\d]+(?:\s[a-zA-Z\d]+)+$/
or a bit longer one that is a bit more effecient (but less readable):
/^(?!(?:admin|alfred)\b)[a-zA-Z\d]+(?:\s(?!(?:admin|alfred)\b)[a-zA-Z\d]+)+$/
See the regex demo
If one word usernames are allowed, replace the last + (1 or more repetitions) with a ? (1 or 0 repetitions) quantifier (demo).
If you use it in AngularJS, also use ng-trim="false" to make sure leading and trailing whitespaces are not allowed.
Pattern details:
^ - start of string
(?!.*\b(?:admin|alfred)\b) - after zero or more chars (.*) there can't be admin or alfred as whole words (else, the regex will return false)
[a-zA-Z\d]+ - 1 or more alphanumerics
(?: - start of a non-capturing group
\s - a whitespace
[a-zA-Z\d]+ - 1 or more alphanumerics
)+ - end of the non-capturing group that will be repeated 1 or more times
$ - end of string.

business phone regex containing if-else expression

I am trying to write business phone number regex in javascript, my requirements are:
It should contain only digits,dashes and whitespaces
It should not end with - but can end with whitespaces
There should be only 1 - between two groups
It should match numbers with and without - like 1, 123, 678-78
I have tried following regex but it fails for 123-- as it is invalid one anybody please suggest me something
/^([ ]*[0-9]+[-]?[0-9 ]*?([-])[ ]*[0-9]+[ ]*|[0-9 ]*[ ]*)+$/.test('123--2')

Try this
/^[0-9]+(-[0-9\s]+)*$/

I don't know if you still need an answer to this, but this works for your requirements:
/^(?!.+-\s*$)\s*((?:\d+\s*-?\s*)+)$/
Explanation:
^ start of string
(?!.+-\s*$) disallow - (or - followed by whitespace) at the end of the string
\s* optional leading spaces
( start capturing
(?:\d+\s*-?\s*)+ one or more groups of the following:
one or more digits,
possibly followed by whitespace,
possibly followed by a single hyphen,
possibly followed by more whitespace
) stop capturing
$ end of the string
Demo

Develop Reference

JavaScript is the programming language of the Web.

Excluding URLS that contain a string in Regex - javascript

Related

Regular Expression Strict Test

Regex: match underscore-wrapped words unless they start with # / #

Regex consume a character if it matches, but not otherwise

Regex Javascript, negative lookahead with white space

business phone regex containing if-else expression

Categories

Resources