Regex to find attributes without using lookbehinds - javascript

I'm trying to create a regex in JS that will match attributes with the following syntax #scope.name:
They start with an # sign
The scope and name of the attribute is separated by a period
Only letters are supported, and numbers/underscores as well as long as they are not in the beginning of the scope or name.
Escaped # signs are represented like this "##", and they should be ignored. Here are some examples of some expected behaviors I'm looking for:
String
Match 1
Group 1
Match 2
Group 1
a.b
No match
-
No match
-
#a.b
#a.b
a.b
No match
-
##a.b
No match
-
No match
-
###a.b
#a.b
a.b
No match
-
#a.b##c.d###e.f
#a.b
a.b
#e.f
e.f
The current regex I have - /(?<=(?<!#)(?:##)*)#([a-zA-Z]\w*\.[a-zA-Z]\w*)/ - doesn't work on Safari because lookbehinds aren't supported yet. I need a solution that doesn't use them.
Here's the regex101 url to test your expressions and compare the results to the expected ones I have currently.

Instead of a lookbehind you can use a non-capture group with expected pattern preceding your capture group:
[ 'a.b', '#a.b', '##a.b', '###a.b', '#a.b##c.d###e.f', '#a_.b_', '#_a.b', '#a.b_'
].forEach(str => {
let result = [...str.matchAll(/(?:^|[^#]|##)#([a-zA-Z]\w*\.[a-zA-Z]\w*)/g)];
console.log({
str,
result: result.length ? result.map(r => r[1]).join(', ') : 'no match'
});
});
Explanation of regex:
(?:^|[^#]|##) -- non-capture group: either start of string, a non # char, or two consecutive # chars
# -- literal # char
([a-zA-Z]\w*\.[a-zA-Z]\w*) -- your original capture group

I'd first replace all ## with a special char (not letter, number, or underscore), then use simplified regex #(\w+\.\w+)
const strings=[
'a.b',
'#a.b',
'##a.b',
'###a.b',
'#a.b##c.d###e.f']
const modified=strings.map(s=>s.replaceAll('##','*'))
modified.forEach(s=>{
let result=[...s.matchAll(/#(\w+\.\w+)/g)]
console.log(s)
if(result.length){
console.log('matches: ',result.map(r=>r[1]).join(', '))
}else{
console.log('No match')
}
})

Related

Pass the regex test if there is only one space in the string

I am trying to get lines where there is a single space. I am currently doing it a different way because I still can't find a regex for it:
const line= "a b c";
/ {1}/.test(line)
Expected: false
Gets: true
I think this isn't syntactically good but am open to suggestions:
line.match(/ /g).length == 1
What should I look into?
Use the following regex test:
/^\S* (?=\S*)$/.test(line)
^\S* - starts with optional non-spaces chars
(?=\S*)$ - positive lookahead, ensures that space is followed by any number of non-space chars (if occur) to the end of the string
The regular expression /^[^\s]\s[^\s]$/ matches a string that contains only a single whitespace character.
here is the code example:
const regex = /^[^\s]*\s[^\s]*$/;
console.log(regex.test("ab c")); // true
console.log(regex.test("a b c")); // false
Matching a string without newlines containing a single space
^\S* \S*$
Explanation
^ Start of string
\S* Match optional non whitespace chars
Match a single space
\S* Match optional non whitespace chars
$ End of string
See a regex101 demo.
const regex = /^\S* \S*$/;
[
"a b c",
"",
" ",
" ",
"a ",
"a b"
].forEach(s =>
console.log(`'${s}' --> ${regex.test(s)}`)
);
The question is not clear for me.
The {1} on the first regex does not have any effect this way but it makes me think that you do not want to accept multiple consecutive spaces. The second regex is just fine if you are interested only in the lines that contain exactly one space.
What exactly do you need?
Do you want the line to contain exactly one space?
Or multiple spaces are allowed, just to not be consecutive?
The following code snippet shows solutions for both questions:
function test(input) {
console.log({
input,
exactlyOne: (input.match(/ /g) ?? []).length === 1,
noConsecutive1: / {2}/.test(input) === false,
noConsecutive2: input.includes(' ') === false,
});
}
// no consecutive spaces
test('a b c');
test('a b');
test('a');
test('a ');
test(' ');
test('');
// consecutive spaces; they all should report "exactlyOne: false, noConsecutive: false"
test('a b c');
test('a b');
test('aa ');
test(' ');
The second search can be done without regexps. I cannot tell if it runs faster; for large inputs I think that the regexp is faster but I didn't check.
if (input.includes(' ')) {
console.log('two consecutive spaces found in the input');
}
I added it to the code snippet above.
How about:
^[^\s]*\s[^\s]*$
Explanation:
Start: ^
Any number of non-spaces: [^\s]*
A single space \s
Any number of non-spaces (again): [^\s]*
End: $

Validate text with javascript RegEX

I'm trying to validate text with javascript but can find out why it's not working.
I have been using : https://regex101.com/ for testing where it works but in my script it fails
var check = "test"
var pattern = new RegExp('^(?!\.)[a-zA-Z0-9._-]+$(?<!\.)','gmi');
if (!pattern.test(check)) validate_check = false;else validate_check = true;
What i'm looking for is first and last char not a dot, and string may contain [a-zA-Z0-9._-]
But the above check always fails even on the word : test
+$(?<!\.) is invalid in your RegEx
$ will match the end of the text or line (with the m flag)
Negative lookbehind → (?<!Y)X will match X, but only if Y is not before it
What about more simpler RegEx?
var checks = ["test", "1-t.e_s.t0", ".test", "test.", ".test."];
checks.forEach(check => {
var pattern = new RegExp('^[^.][a-zA-Z0-9\._-]+[^.]$','gmi');
console.log(check, pattern.test(check))
});
Your code should look like this:
var check = "test";
var pattern = new RegExp('^[^.][a-zA-Z0-9\._-]+[^.]$','gmi');
var validate_check = pattern.test(check);
console.log(validate_check);
A few notes about the pattern:
You are using the RegExp constructor, where you have to double escape the backslash. In this case with a single backslash, the pattern is ^(?!.)[a-zA-Z0-9._-]+$(?<!.) and the first negative lookahead will make the pattern fail if there is a character other than a newline to the right, that is why it does not match test
If you use the /i flag for a case insensitive match, you can shorten [A-Za-z] to just one of the ranges like [a-z] or use \w to match a word character like in your character class
This part (?<!\.) using a negative lookbehind is not invalid in your pattern, but is is not always supported
For your requirements, you don't have to use lookarounds. If you also want to allow a single char, you can use:
^[\w-]+(?:[\w.-]*[\w-])?$
^ Start of string
[\w-]+ Match 1+ occurrences of a word character or -
(?: Non capture group
[\w.-]*[\w-] Match optional word chars, a dot or hyphen
)? Close non capture group and make it optional
$ End of string
Regex demo
const regex = /^[\w-]+(?:[\w.-]*[\w-])?$/;
["test", "abc....abc", "a", ".test", "test."]
.forEach((s) =>
console.log(`${s} --> ${regex.test(s)}`)
);

Javascript how to identify a combination of letters and strip a portion of it

Im very new to Regex . Right now im trynig to use regex to prepare my markup string before sending it to the database.
Here is an example string:
#[admin](user:3) Testing this string #[hellotessginal](user:4) Hey!
So far i am able to identify #[admin](user:3) the entire term here using /#\[(.*?)]\((.*?):(\d+)\)/g
But the next step forward is that i wish to remove the (user:3) leaving me with #[admin].
Hence the result of passing through the stripper function would be:
#[admin] Testing this string #[hellotessginal] Hey!
Please help!
You may use
s.replace(/(#\[[^\][]*])\([^()]*?:\d+\)/g, '$1')
See the regex demo. Details:
(#\[[^\][]*]) - Capturing group 1: #[, 0 or more digits other than [ and ] as many as possible and then ]
\( - a ( char
[^()]*? - 0 or more (but as few as possible) chars other than ( and )
: - a colon
\d+ - 1+ digits
\) - a ) char.
The $1 in the replacement pattern refers to the value captured in Group 1.
See the JavaScript demo:
const rx = /(#\[[^\][]*])\([^()]*?:\d+\)/g;
const remove_parens = (string, regex) => string.replace(regex, '$1');
let s = '#[admin](user:3) Testing this string #[hellotessginal](user:4) Hey!';
s = remove_parens(s, rx);
console.log(s);
Try this:
var str = "#[admin](user:3) Testing this string #[hellotessginal](user:4) Hey!";
str = str.replace(/ *\([^)]*\) */g, ' ');
console.log(str);
You can replace matches of the following regular expression with empty strings.
str.replace(/(?<=\#\[(.*?)\])\(.*?:\d+\)/g, ' ');
regex demo
I've assumed the strings for which "admin" and "user" are placeholders in the example cannot contain the characters in the string "()[]". If that's not the case please leave a comment and I will adjust the regex.
I've kept the first capture group on the assumption that it is needed for some unstated purpose. If it's not needed, remove it:
(?<=\#\[.*?\])\(.*?:\d+\)
There is of course no point creating a capture group for a substring that is to be replaced with an empty string.
Javascript's regex engine performs the following operations.
(?<= : begin positive lookbehind
\#\[ : match '#['
(.*?) : match 0+ chars, lazily, save to capture group 1
\] : match ']'
) : end positive lookbehind
\(.*?:\d+\) : match '(', 0+ chars, lazily, 1+ digits, ')'

Regex - I want my string to end with 2 special character

I've been trying to make a regex that ends with 2 special characters, but I couldnt find solution. Here is what i tried, but it seems like it is not working.
/.[!##$%^&*]{2}+$/;
Thanks in advance.
Try this regex:
^.*[!##$%^&*]{2}$
Demo
const regex = /^.*[!##$%^&*]{2}$/;
const str = `abc##\$`;
let m;
if(str.match(regex)) {
console.log("matched");
}
else
console.log("not matched");
The /.[!##$%^&*]{2}+$/ regex matches
. - any character but a line break char
[!##$%^&*]{2}+ - in PCRE/Boost/Java/Oniguruma and other regex engines supporting possessive quantifiers, it matches exactly 2 cars from the defined set, but in JS, it causes a "Nothing to repeat" error
$ - end of string.
To match any string ending with 2 occurrences of the chars from your defined set, you need to remove the . and + and use
console.log(/[!##$%^&*]{2}$/.test("##"))
Or, if these 2 chars cannot be preceded by a 3rd one:
console.log(/(?:^|[^!##$%^&*])[!##$%^&*]{2}$/.test("##"))
// ^^^^^^^^^^^^^^^^^
The (?:^|[^!##$%^&*]) non-capturing group matches start of string (^) or (|) any char other than !, #, #, $, %, ^, &, * ([^!##$%^&*])

Javascript multiple regex pattern

I'm trying to exclude some internal IP addresses and some internal IP address formats from viewing certain logos and links in the site.I have multiple range of IP addresses(sample given below). Is it possible to write a regex that could match all the IP addresses in the list below using javascript?
10.X.X.X
12.122.X.X
12.211.X.X
64.X.X.X
64.23.X.X
74.23.211.92
and 10 more
Quote the periods, replace the X's with \d+, and join them all together with pipes:
const allowedIPpatterns = [
"10.X.X.X",
"12.122.X.X",
"12.211.X.X",
"64.X.X.X",
"64.23.X.X",
"74.23.211.92" //, etc.
];
const allowedRegexStr = '^(?:' +
allowedIPpatterns.
join('|').
replace(/\./g, '\\.').
replace(/X/g, '\\d+') +
')$';
const allowedRegexp = new RegExp(allowedRegexStr);
Then you're all set:
'10.1.2.3'.match(allowedRegexp) // => ['10.1.2.3']
'100.1.2.3'.match(allowedRegexp) // => null
How it works:
First, we have to turn the individual IP patterns into regular expressions matching their intent. One regular expression for "all IPs of the form '12.122.X.X'" is this:
^12\.122\.\d+\.\d+$
^ means the match has to start at the beginning of the string; otherwise, 112.122.X.X IPs would also match.
12 etc: digits match themselves
\.: a period in a regex matches any character at all; we want literal periods, so we put a backslash in front.
\d: shorthand for [0-9]; matches any digit.
+: means "1 or more" - 1 or more digits, in this case.
$: similarly to ^, this means the match has to end at the end of the string.
So, we turn the IP patterns into regexes like that. For an individual pattern you could use code like this:
const regexStr = `^` + ipXpattern.
replace(/\./g, '\\.').
replace(/X/g, '\\d+') +
`$`;
Which just replaces all .s with \. and Xs with \d+ and sticks the ^ and $ on the ends.
(Note the doubled backslashes; both string parsing and regex parsing use backslashes, so wherever we want a literal one to make it past the string parser to the regular expression parser, we have to double it.)
In a regular expression, the alternation this|that matches anything that matches either this or that. So we can check for a match against all the IP's at once if we to turn the list into a single regex of the form re1|re2|re3|...|relast.
Then we can do some refactoring to make the regex matcher's job easier; in this case, since all the regexes are going to have ^...$, we can move those constraints out of the individual regexes and put them on the whole thing: ^(10\.\d+\.\d+\.\d+|12\.122\.\d+\.\d+|...)$. The parentheses keep the ^ from being only part of the first pattern and $ from being only part of the last. But since plain parentheses capture as well as group, and we don't need to capture anything, I replaced them with the non-grouping version (?:..).
And in this case we can do the global search-and-replace once on the giant string instead of individually on each pattern. So the result is the code above:
const allowedRegexStr = '^(?:' +
allowedIPpatterns.
join('|').
replace(/\./g, '\\.').
replace(/X/g, '\\d+') +
')$';
That's still just a string; we have to turn it into an actual RegExp object to do the matching:
const allowedRegexp = new RegExp(allowedRegexStr);
As written, this doesn't filter out illegal IPs - for instance, 10.1234.5678.9012 would match the first pattern. If you want to limit the individual byte values to the decimal range 0-255, you can use a more complicated regex than \d+, like this:
(?:\d{1,2}|1\d{2}|2[0-4]\d|25[0-5])
That matches "any one or two digits, or '1' followed by any two digits, or '2' followed by any of '0' through '4' followed by any digit, or '25' followed by any of '0' through '5'". Replacing the \d with that turns the full string-munging expression into this:
const allowedRegexStr = '^(?:' +
allowedIPpatterns.
join('|').
replace(/\./g, '\\.').
replace(/X/g, '(?:\\d{1,2}|1\\d{2}|2[0-4]\\d|25[0-5])') +
')$';
And makes the actual regex look much more unwieldy:
^(?:10\.(?:\d{1,2}|1\d{2}|2[0-4]\d|25[0-5])\.(?:\d{1,2}|1\d{2}|2[0-4]\d|25[0-5]).(?:\d{1,2}|1\d{2}|2[0-4]\d|25[0-5])|12\.122\....
but you don't have to look at it, just match against it. :)
You could do it in regex, but it's not going to be pretty, especially since JavaScript doesn't even support verbose regexes, which means that it has to be one humongous line of regex without any comments. Furthermore, regexes are ill-suited for matching ranges of numbers. I suspect that there are better tools for dealing with this.
Well, OK, here goes (for the samples you provided):
var myregexp = /\b(?:74\.23\.211\.92|(?:12\.(?:122|211)|64\.23)\.(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])\.(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])|(?:10|64)\.(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])\.(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])\.(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9]))\b/g;
As a verbose ("readable") regex:
\b # start of number
(?: # Either match...
74\.23\.211\.92 # an explicit address
| # or
(?: # an address that starts with
12\.(?:122|211) # 12.122 or 12.211
| # or
64\.23 # 64.23
)
\. # .
(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])\. # followed by 0..255 and a dot
(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9]) # followed by 0..255
| # or
(?:10|64) # match 10 or 64
\. # .
(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])\. # followed by 0..255 and a dot
(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])\. # followed by 0..255 and a dot
(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9]) # followed by 0..255
)
\b # end of number
/^(X|\d{1,3})(\.(X|\d{1,3})){3}$/ should do it.
If you don't actually need to match the "X" character you could use this:
\b(?:\d{1,3}\.){3}\d{1,3}\b
Otherwise I would use the solution cebarrett provided.
I'm not entirely sure of what you're trying to achieve here (doesn't look anyone else is either).
However, if it's validation, then here's a solution to validate an IP address that doesn't use RegEx. First, split the input string at the dot. Then using parseInt on the number, make sure it isn't higher than 255.
function ipValidator(ipAddress) {
var ipSegments = ipAddress.split('.');
for(var i=0;i<ipSegments.length;i++)
{
if(parseInt(ipSegments[i]) > 255){
return 'fail';
}
}
return 'match';
}
Running the following returns 'match':
document.write(ipValidator('10.255.255.125'));
Whereas this will return 'fail':
document.write(ipValidator('10.255.256.125'));
Here's a noted version in a jsfiddle with some examples, http://jsfiddle.net/VGp2p/2/

Categories

Resources