Regex special character '{' matches in JS but not in Java - javascript

test string: abc{123
regex: \w+{\d+
This matches in JS, but when I try to match it in Java it gives me this error:
Illegal repetition near index 2
\w+{\d+
It works in Java only when I escape the { character like this: \w+\{\d+
I tried it on these two links:
JS Link : http://myregexp.com/index.html
Java Link:http://www.ocpsoft.org/tutorials/regular-expressions/java-visual-regex-tester/
Desired result: If it matches in JS, it should match in Java also.
What is the difference between the regex implementation in Java and JS? How can I make it behave in the same way in Java and in JS?

How can I make it behave in the same way in Java and in JS?
You already know the answer:
It works in Java only when I escape the { character like this: \w+\{\d+".
Why? Because JavaScript here is a bit more permissive. Note that in JavaScript \w{3 will match "f{3", but not "f77"; \w{3} will match "f77" but not "f{3}". That is to say, the same character { changes meaning based on whether or not somewhere later in the string an } appears. The behaviour is thus made more unpredictable by its permissiveness, and Java just does not allow you to write regular expressions so sloppily.

you have to escape special characters and since a backslash is also a special character, you have to escape it as well. the regex will look like this in java: \\w+\\{\\d+. if you have problems, feel free to ask. you can generate a code in several programming languages here: https://regex101.com/r/D4yz40/1 this example matches your string. you can then generate the code for java and js

You just need to escape the {. So the regex should look like this:
\w+\{\d+
Your initial regex isn't valid.. Javascript is just more forgiving in this case.. But { is one of the characters you want to escape in regex since it means how many times to repeat a specific character(s) like so: [a-z]{22} would match 22 sequential characters from a-z..

Related

regex validating if string ends with specific set of words [duplicate]

I'm creating a javascript regex to match queries in a search engine string. I am having a problem with alternation. I have the following regex:
.*baidu.com.*[/?].*wd{1}=
I want to be able to match strings that have the string 'word' or 'qw' in addition to 'wd', but everything I try is unsuccessful. I thought I would be able to do something like the following:
.*baidu.com.*[/?].*[wd|word|qw]{1}=
but it does not seem to work.
replace [wd|word|qw] with (wd|word|qw) or (?:wd|word|qw).
[] denotes character sets, () denotes logical groupings.
Your expression:
.*baidu.com.*[/?].*[wd|word|qw]{1}=
does need a few changes, including [wd|word|qw] to (wd|word|qw) and getting rid of the redundant {1}, like so:
.*baidu.com.*[/?].*(wd|word|qw)=
But you also need to understand that the first part of your expression (.*baidu.com.*[/?].*) will match baidu.com hello what spelling/handle????????? or hbaidu-com/ or even something like lkas----jhdf lkja$##!3hdsfbaidugcomlaksjhdf.[($?lakshf, because the dot (.) matches any character except newlines... to match a literal dot, you have to escape it with a backslash (like \.)
There are several approaches you could take to match things in a URL, but we could help you more if you tell us what you are trying to do or accomplish - perhaps regex is not the best solution or (EDIT) only part of the best solution?

How to prevent regex characters from being changed after page is rendered?

I'am stuck after searching and trying several tests, but just can't figure out how to fix the following issue.
I use these characters \x3c, \x3e and \x22 in a regEx and save is in a variable in *.component.ts but when I use the variable in the markup/HTML, it turns it into <, > and ". the result is that my Pattern doesn't work as expected.
Here is one of test on regex101.com and as you can see it works as it should be:
^(?=.*[a-zA-Z\d!\x22#$%&\'()*+,.:;\x3c=\x3e?#[\]^_`{|}~/\\-])[A-Za-z\d!\x22#$%&\'()*+,.:;\x3c=\x3e?#[\]^_`{|}~/\\-]{8,50}$
How can I prevent this and keep the characters as they are in the original when the page is rendered? Is it a behavior of TypeScript or JavaScript browser engine or what? Any hint would be great.
First of all, you need to use double backslashes to introduce literal backslashes into the regex patterns. I.e. if you write "\x22" as a string literal, it is in fact a mere ". So, to define \x22 in a string literal, write "\\x22".
Then, you have
^(?=.*[a-zA-Z\d!\x22#$%&\'()*+,.:;\x3c=\x3e?#[\]^_`{|}~/\\-])[A-Za-z\d!\x22#$%&\'()*+,.:;\x3c=\x3e?#[\]^_`{|}~/\\-]{8,50}$
The lookahead here is redundant because it requires the same set of chars as is required by the consuming part. The lookahead can be removed, or better replaced with the one you need, (?=[^A-Z]*[A-Z]), requiring at least 1 uppercase ASCII letter:
^(?=[^A-Z]*[A-Z])[A-Za-z\d!\x22#$%&\'()*+,.:;\x3c=\x3e?#[\]^_`{|}~/\\-]{8,50}$
As a string literal:
"^(?=[^A-Z]*[A-Z])[A-Za-z\\d!\\x22#$%&'()*+,.:;\\x3c=\\x3e?#[\\]^_`{|}~/\\\\-]{8,50}$"
See the regex demo.

Regex testing for special characters

I'm trying to write a regex to test for certain special characters, but I think I am overcomplicating things. The characters I need to check for are: &<>'"
My current regex looks like such:
/&<>'"/
Another I was trying is:
/\&\<\>\'\"/
Any tips for a beginner (in regards to regex)? Thanks!
You are looking for a character class:
/[&<>'"]/
In doing so, any of the characters in the square brackets will be matched.
The expression you were originally using, /&<>'"/, wasn't working as expected because it matches the characters in that sequential order. In other words, it would match a full string such as &<>'" but not &<.
I'm assuming that you want to be able to match all of the characters you listed, at one time.
If so, you should be able to combine a character set with the g (global-matching) flag, for your regex.
Here's what it could look like:
/[<>&'"]/g
Try /(\&|\<|>|\'|\")/
it depends on what regex system you use

Regex: allow everything but some selected characters [duplicate]

This question already has answers here:
Regex: match everything but a specific pattern
(6 answers)
Closed 5 years ago.
I would like to validate a textarea and I just don't get regex (It took me the day and a bunch of tutorials to figure it out).
Basically I would like to be able to allow everything (line breaks and chariots included), but the characters that could be malicious( those which would lead to a security breach).
As there are very few characters that are not allowed, I assume that it would make more sense to create a black list than a white one.
My question is: what is the standard "everything but" in Regex?
I'm using javascript and jquery.
I tried this but it doesn't work (it's awful, I know..):
var messageReg = /^[a-zA-Z0-9éèêëùüàâöïç\"\/\%\(\).'?!,#$#§-_ \n\r]+$/;
Thank you.
If you want to exclude a set of characters (some punctuation characters, for example) you would use the ^ operator at the beginning of a character set, in a regex like
/[^.?!]/
This matches any character that is not ., ?, or !.
You can use the ^ as the first character inside brackets [] to negate what's in it:
/^[^abc]*$/
This means: "from start to finish, no a, b, or c."
As Esailija mentioned, this won't do anything for real security.
The code you mentioned is almost a negated set, as murgatroid99 mentioned, the ^ goes inside the brackets. So the regular expression will match anything that is not in that list. But it looks like you really want to strip out those characters, so your regexp doesn't need to be negated.
Your code should look like:
str.replace(/[a-zA-Z0-9éèêëùüàâöïç\"\/\%\(\).'?!,#$#-_ \n\r]/g, "");
That says, remove all the characters in my regular expression.
However, that is saying you don't want to keep a-zA-Z0-9 are you sure you want to strip those out?
Also, chrome doesn't like § in Regular Expressions, you have to use the \x along with the hex code for the character

RegExp in JavaScript, when a quantifier is part of the pattern

I have been trying to use a regexp that matches any text that is between a caret, less than and a greater than, caret.
So it would look like: ^< THE TEXT I WANT SELECTED >^
I have tried something like this, but it isn't working: ^<(.*?)>^
I'm assuming this is possible, right? I think the reason I have been having such a tough time is because the caret serves as a quantifier. Thanks for any help I get!
Update
Just so everyone knows, they following from am not i am worked
/\^<(.*?)>\^/
But, it turned out that I was getting html entities since I was getting my string by using the .innerHTML property. In other words,
> ... >
< ... <
To solve this, my regexp actually looks like this:
\^<(.*?)((.|\n)*)>\^
This includes the fact that the string in between should be any character or new line. Thanks!
You need to escape the ^ symbol since it has special meaning in a JavaScript regex.
/\^<(.*?)>\^/
In a JavaScript regex, the ^ means beginning of the string, unless the m modifier was used, in which case it means beginning of the line.
This should work:
\^<(.*?)>\^
In a regex, if you want to use a character that has a special meaning (caret, brackets, pipe, ...), you have to escape it using a backslash. For example, (\w\b)*\w\. will select a sequence of words terminated by a dot.
Careful!
If you have to pass the regex pattern as a string, i.e. there's no regex literal like in javascript or perl, you may have to use a double backslash, which the programming language will escape to a single one, which will then be processed by the regex engine.
Same regex in multiple languages:
Python:
import re
myRegex=re.compile(r"\^<(.*?)>\^") # The r before the string prevents backslash escaping
PHP:
$result=preg_match("/\\^<(.*?)>\\^/",$subject); // Notice the double backslashes here?
JavaScript:
var myRegex=/\^<(.*?)>\^/,
subject="^<blah example>^";
subject.match(myRegex);
If you tell us what programming language you're writing in, we'll be able to give you some finished code to work with.
Edit: Whoops, didn't even notice this was tagged as javascript. Then, you don't have to worry about double backslash at all.
Edit 2: \b represent a word boundary. Though I agree yours is what I would have used myself.

Categories

Resources