Regular Expression Strict Test - javascript

I am trying to create regex where user have to enter exactly the same thing no extra no less
Here is my regex;
/[a-zA-Z0-9][a-zA-Z0-9\-]*\.myshopify\.com/
when I test this with, for example, myshop.myshopify.coma it returns true or myshop.myshopify.com myshop123.myshopify.com still returns true
What I am trying to get is if user enters myshop.myshopify.coma or myshop321.myshopify.com myshop123.myshopify.coma it shouldn't be match.
It should only match when the entire input is exactly like this [anything except ()=>%$ etc].myshopify.com
what should I include in my regex to strictly test exactly one thing.

you can use boundary-type assertions to match the beginning of an input (^) and an end ($) - to make sure your input matches fully.
const pattern = /^[a-zA-Z0-9][a-zA-Z0-9\-]*\.myshopify\.com$/
console.log(pattern.test('myshop.myshopify.com')) // true
console.log(pattern.test('myshop.myshopify.coma')) // false
console.log(pattern.test('myshop.myshopify.com myshop123.myshopify.com')) // false

You'd currently allow for input like "A---", so besides the good point about start and end line anchors, you'd maybe want to reconsider your pattern. Maybe something like:
^[a-z\d]+(?:-[a-z\d]+)*\.myshopify\.com$
See the online demo
^ - Start line anchor.
[a-z\d]+ - 1+ any alnum character.
(?: - Open non-capture group:
-[a-z\d]+ - A literal hyphen followed by 1+ alnum chars.
)* - Close non-capture group and match it zero or more times.
\.myshopify\.com - Match a ".myshopify.com" literallyy.
$ - End line anchor.
A 2nd option would be to use a negative lookahead to achieve the same concept:
^(?!-|.*-[-.])[a-z\d-]+\.myshopify\.com$
See the online demo
^ - Start line anchor.
(?! - Negative lookahead for:
- - A leading hypen
| - Or:
.*-[-.] - Any character other than newline zero or more times up to an hypen with either another hypen or a literal dot.
) - Close negative lookahead.
[a-zA-Z\d]+ - 1+ any alnum character.
\.myshopify\.com - Match a ".myshopify.com" literallyy.
$ - End line anchor.
In both cases I used both the global and case-insensitive flags: /<pattern>/gi. See a sample below:
const patt1 = /^[a-z\d]+(?:-[a-z\d]+)*\.myshopify\.com$/gi
console.log(patt1.test('myshop.myshopify.com'))
console.log(patt1.test('myshop-.myshopify.com'))
const patt2 = /^(?!-|.*-[-.])[a-z\d-]+\.myshopify\.com$/gi
console.log(patt2.test('myshop.myshopify.com'))
console.log(patt2.test('myshop-.myshopify.com'))

Related

Regex to allow * only after a certain position in the string

I have an input field in the UI which must satisfy the following three conditions.
The input string can be a max of 20 chars long and will be alphanumeric.
However it can have * but only after 6th character, no other special characters allowed.
Eg. Abc123* or Abc1234* or just Abc1234(until 20 chars)
The overall length of string is 20 chars but as soon as we encounter * we should not allow any further characters
I understand the 3rd condition can be a bit too much of an ask for regex and I should be able to handle that using javascript.
However I at least want to get the first 2 conditions resolved using regex.
I have tried few stuffs below but they don’t give the desired results:
export function formatInput(value) {
return value
.replace(/[^\w*]|_/g, '')
.replace(new RegExp('(^[\\w]{6})([\\w*]{14}$)’, 'g'), '$1');
}
I tried to tweak few things, like if I remove the first replace statement it doesn’t filter out any special characters at all.
Also in the below case it restricts me to enter * as 7th character before allowing me to proceed further which is also incorrect.
export function formatInput(value) {
return value
.replace(/[^\w*]|_/g, '')
.replace(new RegExp('(^[\\w]{6})[\\w]', 'g'), '$1')
.replace(/(\..*)\./g, '$1')
.replace(new RegExp('(\\*[\\w]{14}).', 'g'), '$1');
}
I am trying to enhance my knowledge about regex, so my above attempts might not be correct, any help will be highly appreciated.
I'm a bit confused with your attempt to replace, so let me just share a pattern that should tick the 3 boxes you described:
^(?!.{21}|.{1,5}\*)[A-Za-z\d]+\*?$
See the online demo
^ - Start line anchor.
(?!.{21}|.{1,5}\*) - Negative lookahead with alternation to prevent position is followed by 21 characters or 1-5 characters before the asterisks.
[A-Za-z\d]+ - 1+ Alphanumeric chars.
\*? - An optional asterisks.
$ - End line anchor.
Note that you could also use case-insensitive flag as an alternative and use ^(?!.{21}|.{1,5}\*)[A-Z\d]+\*?$.
Something like this would work:
/^(?:[a-z\d]{6,19}\*|[a-z\d]{1,20})$/gmi
^ - start line anchor
(?: - start non-capturing group
[a-z\d]{6,19}\* - we can have 6 to 19 alphanums followed by one asterisk
| - boolean or
[a-z\d]{1,20} - we can have 1 to 20 alphanums
) - end non-capturing group
$ - end line anchor
/gmi - flags: global, multiline, case-insensitive
g and m are only needed for regex101 purposes so that you can test out multiple inputs at once
https://regex101.com/r/ytMlQO/1/
The following pattern satisfies all your conditions. This pattern assumes there is no other content in value aside from the password to be checked. I also share the concern about performing manipulations a la replace in a function that is supposed to be purely for verification, so I've written the function below to simply test to ensure conformance with the prescribed pattern.
const passwordPattern = /^[A-Za-z\d]{6}(?:[A-Za-z\d]|\*(?!.+)){0,14}$/;
const checkPassword = value => passwordPattern.test(value);
console.log(checkPassword("Abc123")); // true
console.log(checkPassword("Abc123*")); // true
console.log(checkPassword("Abc12*")); // false - asterisk appears before position 6
console.log(checkPassword("Abc123*4")); // false - other characters appear after asterisk
console.log(checkPassword("123456789012345678901")); // false - >20 characters in length
Regex101

Regex: match underscore-wrapped words unless they start with # / #

I'm trying to work around this bug in Tiptap (a WYSIWYG editor for Vue) by passing in a custom regex so that the regex that identifies italics notation in Markdown (_value_) would not be applied to strings that start with # or #, e.g. #some_tag_value would not get transformed into #sometagvalue.
This is my regex so far - /(^|[^##_\w])(?:\w?)(_([^_]+)_)/g
Edit: new regex with help from # Wiktor Stribiżew /(^|[^##_\w])(_([^_]+)_)/g
While it satisfies most of the common cases, it currently still fails when
underscores are mid-word, e.g. ant_farm_ should be matched (antfarm)
I have also provided some "should match" and "should not match" cases here https://regexr.com/50ibf for easier testing
Should match (between underscores)
_italic text here_
police_woman_
_fire_fighter
a thousand _words_
_brunch_ on a Sunday
Should not match
#ta_g_
__value__
#some_tag_value
#some_value_here
#some_tag_
#some_val_
#_hello_
You may use the following pattern:
(?:^|\s)[^##\s_]*(_([^_]+)_)
See the regex demo
Details
(?:^|\s) - start of string or whitespace
[^##\s_]* - 0 or more chars other than #, #, _ and whitespace
(_([^_]+)_) - Group 1: _, 1+ chars other than _ (captured into Group 2) and then _.
For science, this monstrosity works in Chrome (and Node.js).
let text = `
<strong>Should match</strong> (between underscores)
_italic text here_
police_woman_
_fire_fighter
a thousand _words_
_brunch_ on a Sunday
<strong>Should not match</strong>
#ta_g_
__value__
#some_tag_value
#some_value_here
#some_tag_
#some_val_
#_hello_
`;
let re = /(?<=(?:\s|^)(?![##])[^_\n]*)_([^_]+)_/g;
document.querySelector('div').innerHTML = text.replace(re, '<em>$1</em>');
div { white-space: pre; }
<div/>
This captures _something_ as full match, and something as 1st capture group (in order to remove the underscores). You can't capture just something, because then you lose the ability to tell what is inside the underscores, and what is outside (try it with (?<=(?:\s|^)(?![##])[^_\n]*_)([^_]+)(?=_)).
There are two things that prevent it being universally applicable:
Look-behinds are not supported in all JavaScript engines
Most regexp engines do not support variable-length look-behinds
EDIT: This is a bit stronger, and should allow you to additionally match_this_and_that_ but not #match_this_and_that correctly:
/(?<=(?:\s|^)(?![##])(?!__)\S*)_([^_]+)_/
Explanation:
_([^_]+)_ Match non-underscory bit between two underscores
(?<=...) that is preceded by
(?:\s|^) either a whitespace or a start of a line/string
(i.e. a proper word boundary, since we can't use `\b`)
\S* and then some non-space characters
(?![##]) that don't start with `#`, `#`,
(?!__) or `__`.
regex101 demo
Here's something, it's not as compact as other answers, but I think it's easier to understand what is going on. Match group \3 is what you want.
Needs the multiline flag
^([a-zA-Z\s]+|_)(([a-zA-Z\s]+)_)+?[a-zA-Z\s]*?$
^ - match the start of the line
([a-zA-Z\s]+|_) - multiple words or _
(([a-zA-Z\s]+)_)+? - multiple words followed by _ at least once, but the minimum match.
[a-zA-Z\s]*? - any final words
$ - the end of the line
In summary the breakdown of the things to match one of
_<words>_
<words>_<words>_
<words>_<words>_<words>
_<words>_<words>

Match everything Between two Characters except when there is a Blank line

I am trying to find a regex pattern that matches everything between one or two dollar signs, \$.*\$|\${2}.*\${2}, except when there is a blank line (it's either two or one, can't be this: \$.*\$\$). Below, I provide examples of what I want to match and what I want to skip. The match should include/exclude everything.
Examples of what I want to match:
$$ \abc + ko$$
$*-ls$
Here the single dollar sign has a escape character before it so it won't break the match.
$$
654a\$
$$
$123
a*/\
[]{}$
Examples of what I want to exclude:
$$
asd
$$
$asdasd$$
Again, I want to match everything if they are bound by one $ or two $ at each side, unless there is (are) empty line(s) in between.
So far I figured out how to match the ones occurring in a single line, but I am struggling how to include break-line and exclude them if the whole line is empty.
Here is what I have:
^\${2}.*[^\\$]\${2}$|^\$.*[^\\$]\$$
Demo
You may use
/^[^\S\r\n]{0,3}(\${1,2})(?:(?!\1|^$)[\s\S])+?\1[^\S\r\n]*$/gm
See the regex demo
Details
^ - start of a line (since m makes ^ match line start positions)
[^\S\r\n]{0,3} - zero to three occurrences of any whitespace but CR and LF
(\${1,2}) - Group 1 holding one or two $ chars
(?:(?!\1|^$)[\s\S])+? - any char ([\s\S]), 1 or more occurrences, but as few as possible (due to the lazy +? quantifier), that does not start the same sequence as captured in Group 1 (\1) and a position between two line break chars (^$)
\1 - the same value as in Group 1 ($ or $$)
[^\S\r\n]* - zero or more occurrences of any whitespace but CR and LF
$ - end of a line (since m makes ^ match line start positions)
For your example data, you might use
(?<!\S)(\$\$?+)[^\r\n$]*(?:\$(?!\$)[^\r\n$]*)*(?:\r?\n(?![^\S\r\n]*$)[^\r\n$]*(?:\$(?!\$)[^\r\n$]*)*)*\1(?!\S)
Explanation
(?<!\S) Assert a whitespace boundary on the left
(\$\$?+) Capture group 1, match $ or $$ where the second one is possessive (prevent backtracking)
[^\r\n$]*(?:\$(?!\$)[^\r\n$]*)* Match any char except $ or newline or a $ when not directly followed by another $
(?: Non capture group
\r?\n(?![^\S\r\n]*$) Match a newline, assert not a line consisting of only spaces
[^\r\n$]*(?:\$(?!\$)[^\r\n$]*)* Same pattern as above
)* Close the group and repeat 0+ times
\1 Backreference to what is captured in group 1
(?!\S) Assert a whitespace boundary on the right
Regex demo

Catching start number and final number

i'm trying to create a regex to catch the first number in the line and the last one, but i'm having some problem with the last one:
The lines look like this:
00005 SALARIO MENSAL 17030 36.397.291,92 36.397.291,92
00010 HORAS TRABALHADAS 0798 19.731,93 19.731,93
And this is my regex:
(^\d+).*(\d)
As you can see here: http://regexr.com/3crbt is not working as expected. I can get the first one, but the last is just the last number.
Thanks!
You can use
/^(\d+).*?(\d+(?:[,.]\d+)*)$/gm
See the regex demo
The regex matches:
^ - start of the line
(\d+) - captures into Group 1 one or more digits
.*? - matches any characters but a newline, as few as possible up to
(\d+(?:[,.]\d+)*) - one or more digits followed with zero or more sequences of , or . followed with one or more digits (Group 2)
$ - end of the string
The /g modifier ensures we get all matches and /m modifier makes the ^ and $ match start and end of a line respectively.
I tried the following one:
(^(\d+))|(\d+$)
And its seems to work on the regexr.com thingy. But matching them up might require some assumptions that each line has at least two numbers.
You need to make the .* non-greedy by changing it to .*? and add + to the second digit sequence match.
^(\d+).*?(\d+)$
If you want to match the full last number, use this:
^(\d+).*?([\d\.,]+)$
Example

Excluding URLS that contain a string in Regex

I am using regex to add a survey to pages and I want to include it on all pages except payment and signin pages. I can't use look arounds for the regex so I am attempting to use the following but it isn't working.
^/.*[^(credit|signin)].*
Which should capture all urls except those containing credit or signin
[ indicates the start of a character class and the [^ negation is per-character. Thus your regular expression is "anything followed by any character not in this class followed by anything," which is very likely to match anything.
Since you are using specific strings, I don't think a regular expression is appropriate here. It would be a lot simpler to check that credit and signin don't exist in the string, such as with JavaScript:
-1 === string.indexOf("credit") && -1 === string.indexOf("signin")
Or you could check that a regular expression does not match
false === /credit|signin/.test(string)
Whitelisting words in regex is generally pretty easy, and usually follows a form of:
^.*(?:option1|option2).*$
The pattern breaks down to:
^ - start of string
.* - 0 or more non-newline characters*
(?: - open non-capturing group
option1|option2 - | separated list of options to whitelist
) - close non-capturing group
.* - 0 or more non-newline characters
$ - end of string
Blacklisting words in a regex is a bit more complicated to understand, but can be done with a pattern along the lines of:
^(?:(?!option1|option2).)*$
The pattern breaks down to:
^ - start of string
(?: - open non-capturing group
(?! - open negative lookahead (the next characters in the string must not match the value contained in the negative lookahead)
option1|option2 - | separated list of options to blacklist
) - close negative lookahead
. - a single non-newline character*
) - close non-capturing group
* - repeat the group 0 or more times
$ - end of string
Basically this pattern checks that the values in the blacklist do not occur at any point in the string.
* exact characters vary depending on the language, so use caution
The final version:
/^(?:(?!credit|signin).)*$/

Categories

Resources