Regex - how to exclude a specific string that contains a specific character

Regex - how to exclude a specific string that contains a specific character - javascript

I need to exclude whole specific string that contains on of these characters: $ % &
The string looks like a URL. It should starts with 'http(s)://', 'ftp://' or 'www.' and match everything after it accept invalid characters $ % &
------- For example:-------
Valid strings are:
www.localhost
http://www.aaaaaa.com/aaaaa5-test5
https://map:1234
www.google.com
http://map:1234
Invalid strings are:
http://www.aaaaa%a.com/test5
https://map:12$34
www.google.com&
I have written this regex (https://regex101.com/r/Gl60ls/1)
/(\b(https?:\/\/|ftp:\/\/|www\.).+?([^\$\%\&\s\n])+)/gim
But it match first part of the string till the invalid character
------- For example: -------
If I have a string http://www.aaaaa%a.com/test5 , it will match http://www.aaaaa
I need to completely exclude the entire string
Any ideas ? I will be so grateful !

The part .+?([^\$\%\&\s\n])+) will match as soon as at least one character is not in your "forbidden" list. It is not forbidding those characters to be matched in the .+? part.
You can use negative look-ahead for your purpose:
(^(https?:\/\/|ftp:\/\/|www\.)(?!.*?[$%&]).+)
If you have several URLs on the same line, separated by a space, then:
\b(https?:\/\/|ftp:\/\/|www\.)([^$%&\s]+)(?!\S)

Related

Why does this negative lookahead Unicode regex not work in JavaScript when direct match does?

I am trying to validate for filename or directory portions of a string that will eventually be used in a URL and want to reject non Unicode plus other characters, the regex is returning null bytes.
Given this string as input:
զվարճ?անք9879#jhkjhkhl!kjljlkjlkjj()+======\_ew.html
/(?![\p{L}]|[\p{N}]|[\._-~])/gu
JavaScript returns correct invalid character matches, but is selecting a null byte for every character matched and not the full character.
If I run the opposite and try to match on characters that are ok instead of not ok:
/[\p{L}]|[\p{N}]|[\._-~]/gu
JavaScript returns matches and selects each valid character as expected, no null byte matches.
Each pattern has the /u flag. I don't understand the difference in behavior. Tested this in the latest Chrome (update 100 as of post date), Safari, and Firefox and they all behave the same.
Is there some flag or operator that the first regex is missing or is this a JavaScript bug / limitation?

You are not matching, only asserting. You can either match a single character right after the assertion and bundle the alternation to a single character class:
(?![\p{L}\p{N}._-~]).
Regex 101 demo
Or you can match 1 or more times the opposite using a negated character class starting with [^
[^\p{L}\p{N}._-~]+
Regex 101 demo
Note that this part in the character class _-~ denotes a range instead of chars _ - ~
If you want to match the - char, you can either escape it or place it at the start or end of the character class.

Javascript Regex to avoid same character twice consecutively

I need a regular expression to avoid having the same character(# is the character) twice consecutively but can have them at muliple places.
For example:
someword#someword is ok
someword##someword is not ok
someword#someword#someword is ok too.
So basically this is my existing regular expression /^([a-zA-Z0-9'\-\x80-\xff\*\+ ]+),([a-zA-Z0-9'\-\x80-\xff\*\+\# ]+)$/ where the first group is the last name and second group is the first name. I have introduced a magical character # in the first name group which I will replace with a space when saving. The problem is I cannot have consecutive # symbols.

Looks for any repeated characters (repeated once):
/(.)\1/.test(string) // returns true if repeated characters are found
Looks for a repeated #:
string.indexOf('##') !== -1 // returns true if ## is found

str.replace(/(#{2,})/g,'#')
works for any number of occorences.. ##, ###, #### etc

str.replace(/##/g,'#')
finds and replaces all instances of '##' by '#'. Also works if you have more than 2 consecutive '#' signs. Doesn't replace single # signs or things that aren't # signs.
edit: if you don't have to replace but just want to test on it:
/##/.test(str)

Try this regex
/^(?!.*(.)\1)[a-zA-Z][a-zA-Z\d#]*$/
This regex will not allow # consecutively

I didn't find any pure regex solution that worked for your use-case
So here is mine:
^([a-z0-9]#?)*[a-z0-9]$
|____________|______|
| |-> Makes sure your string ends with an alphanumeric character
|-> Makes sure the start of the string is a mix of single # or alphanumeric characters
| It has to start with an alphanumeric character

Regular expression to check contains only

EDIT: Thank you all for your inputs. What ever you answered was right.But I thought I didnt explain it clear enough.
I want to check the input value while typing itself.If user is entering any other character that is not in the list the entered character should be rolled back.
(I am not concerning to check once the entire input is entered).
I want to validate a date input field which should contain only characters 0-9[digits], -(hyphen) , .(dot), and /(forward slash).Date may be like 22/02/1999 or 22.02.1999 or 22-02-1999.No validation need to be done on either occurrence or position. A plain validation is enough to check whether it has any other character than the above listed chars.
[I am not good at regular expressions.]
Here is what I thought should work but not.
var reg = new RegExp('[0-9]./-');
Here is jsfiddle.

Your expression only tests whether anywhere in the string, a digit is followed by any character (. is a meta character) and /-. For example, 5x/- or 42%/-foobar would match.
Instead, you want to put all the characters into the character class and test whether every single character in the string is one of them:
var reg = /^[0-9.\/-]+$/
^ matches the start of the string
[...] matches if the character is contained in the group (i.e. any digit, ., / or -).
The / has to be escaped because it also denotes the end of a regex literal.
- between two characters describes a range of characters (between them, e.g. 0-9 or a-z). If - is at the beginning or end it has no special meaning though and is literally interpreted as hyphen.
+ is a quantifier and means "one or more if the preceding pattern". This allows us (together with the anchors) to test whether every character of the string is in the character class.
$ matches the end of the string
Alternatively, you can check whether there is any character that is not one of the allowed ones:
var reg = /[^0-9.\/-]/;
The ^ at the beginning of the character class negates it. Here we don't have to test every character of the string, because the existence of only character is different already invalidates the string.
You can use it like so:
if (reg.test(str)) { // !reg.test(str) for the first expression
// str contains an invalid character
}

Try this:
([0-9]{2}[/\-.]){2}[0-9]{4}

If you are not concerned about the validity of the date, you can easily use the regex:
^[0-9]{1,2}[./-][0-9]{1,2}[./-][0-9]{4}$
The character class [./-] allows any one of the characters within the square brackets and the quantifiers allow for either 1 or 2 digit months and dates, while only 4 digit years.
You can also group the first few groups like so:
^([0-9]{1,2}[./-]){2}[0-9]{4}$
Updated your fiddle with the first regex.

regular expression incorrectly matching % and $

I have a regular expression in JavaScript to allow numeric and (,.+() -) character in phone field
my regex is [0-9-,.+() ]
It works for numeric as well as above six characters but it also allows characters like % and $ which are not in above list.

Even though you don't have to, I always make it a point to escape metacharacters (easier to read and less pain):
[0-9\-,\.+\(\) ]
But this won't work like you expect it to because it will only match one valid character while allowing other invalid ones in the string. I imagine you want to match the entire string with at least one valid character:
^[0-9\-,\.\+\(\) ]+$
Your original regex is not actually matching %. What it is doing is matching valid characters, but the problem is that it only matches one of them. So if you had the string 435%, it matches the 4, and so the regex reports that it has a match.
If you try to match it against just one invalid character, it won't match. So your original regex doesn't match the string %:
> /[0-9\-,\.\+\(\) ]/.test("%")
false
> /[0-9\-,\.\+\(\) ]/.test("44%5")
true
> "444%6".match(/[0-9\-,\.+\(\) ]/)
["4"] //notice that the 4 was matched.
Going back to the point about escaping, I find that it is easier to escape it rather than worrying about the different rules where specific metacharacters are valid in a character class. For example, - is only valid in the following cases:
When used in an actual character class with proper-order such as [a-z] (but not [z-a])
When used as the first or last character, or by itself, so [-a], [a-], or [-].
When used after a range like [0-9-,] or [a-d-j] (but keep in mind that [9-,] is invalid and [a-d-j] does not match the letters e through f).
For these reasons, I escape metacharacters to make it clear that I want to match the actual character itself and to remove ambiguities.

You just need to anchor your regex:
^[0-9-,.+() ]+$
In character class special char doesn't need to be escaped, except ] and -.
But, these char are not escaped when:
] is alone in the char class []]
- is at the begining [-abc] or at the end [abc-] of the char class or after the last end range [a-c-x]

Escape characters with special meaning in your RegExp. If you're not sure and it isn't an alphabet character, it usually doesn't hurt to escape it, too.
If the whole string must match, include the start ^ and end $ of the string in your RegExp, too.
/^[\d\-,\.\+\(\) ]*$/

How to use regex to check that user input does not consist of special characters only?

How to put a validation over a field which wouldn't allow only special characters, that means AB#,A89#,##ASD is allowed but ##$^& or # is not allowed. I need the RegEx for this validation.

str.match(/^[A-Z##,]+$/)
will match a string that...
... starts ^ and ends $ with the enclosed pattern
... contains any upper case letters A-Z (will not match lower case letters)
... contains only the special chars #, #, and ,
... has at least 1 character (no empty string)
For case insensitive, you can add i at the end : (i.g. /pattern/i)
** UPDATE **
If you need to validate if the field contains only specials characters, you can check if the string contains only characters that are not words or numbers :
if (str.match(/^[^A-Z0-9]*$/i)) {
alert('Invalid');
} else {
alert('Valid');
}
This will match a string which contains only non-alphanumeric characters. An empty string will also yield invalid. Replace * with + to allow empty strings to be valid.

If you can use a "negative match" for your validation, i. e. the input is OK if the regex does not match, then I suggest
^\W*$
This will match a string that consists only of non-word characters (or the empty string).
If you need a positive match, then use
^\W*\w.*$
This will match if there is at least one alphanumeric character in the string.

I believe what you're looking for is something like this:
!str.match(/^[##$^&]*$/)
Checks that the string does not contain only symbols, i.e., it contains at least one character that isn't a symbol.

Develop Reference

JavaScript is the programming language of the Web.

Regex - how to exclude a specific string that contains a specific character - javascript

Related

Why does this negative lookahead Unicode regex not work in JavaScript when direct match does?

Javascript Regex to avoid same character twice consecutively

Regular expression to check contains only

regular expression incorrectly matching % and $

How to use regex to check that user input does not consist of special characters only?

Categories

Resources