Javascript multiple regex pattern - javascript

I'm trying to exclude some internal IP addresses and some internal IP address formats from viewing certain logos and links in the site.I have multiple range of IP addresses(sample given below). Is it possible to write a regex that could match all the IP addresses in the list below using javascript?
10.X.X.X
12.122.X.X
12.211.X.X
64.X.X.X
64.23.X.X
74.23.211.92
and 10 more

Quote the periods, replace the X's with \d+, and join them all together with pipes:
const allowedIPpatterns = [
"10.X.X.X",
"12.122.X.X",
"12.211.X.X",
"64.X.X.X",
"64.23.X.X",
"74.23.211.92" //, etc.
];
const allowedRegexStr = '^(?:' +
allowedIPpatterns.
join('|').
replace(/\./g, '\\.').
replace(/X/g, '\\d+') +
')$';
const allowedRegexp = new RegExp(allowedRegexStr);
Then you're all set:
'10.1.2.3'.match(allowedRegexp) // => ['10.1.2.3']
'100.1.2.3'.match(allowedRegexp) // => null
How it works:
First, we have to turn the individual IP patterns into regular expressions matching their intent. One regular expression for "all IPs of the form '12.122.X.X'" is this:
^12\.122\.\d+\.\d+$
^ means the match has to start at the beginning of the string; otherwise, 112.122.X.X IPs would also match.
12 etc: digits match themselves
\.: a period in a regex matches any character at all; we want literal periods, so we put a backslash in front.
\d: shorthand for [0-9]; matches any digit.
+: means "1 or more" - 1 or more digits, in this case.
$: similarly to ^, this means the match has to end at the end of the string.
So, we turn the IP patterns into regexes like that. For an individual pattern you could use code like this:
const regexStr = `^` + ipXpattern.
replace(/\./g, '\\.').
replace(/X/g, '\\d+') +
`$`;
Which just replaces all .s with \. and Xs with \d+ and sticks the ^ and $ on the ends.
(Note the doubled backslashes; both string parsing and regex parsing use backslashes, so wherever we want a literal one to make it past the string parser to the regular expression parser, we have to double it.)
In a regular expression, the alternation this|that matches anything that matches either this or that. So we can check for a match against all the IP's at once if we to turn the list into a single regex of the form re1|re2|re3|...|relast.
Then we can do some refactoring to make the regex matcher's job easier; in this case, since all the regexes are going to have ^...$, we can move those constraints out of the individual regexes and put them on the whole thing: ^(10\.\d+\.\d+\.\d+|12\.122\.\d+\.\d+|...)$. The parentheses keep the ^ from being only part of the first pattern and $ from being only part of the last. But since plain parentheses capture as well as group, and we don't need to capture anything, I replaced them with the non-grouping version (?:..).
And in this case we can do the global search-and-replace once on the giant string instead of individually on each pattern. So the result is the code above:
const allowedRegexStr = '^(?:' +
allowedIPpatterns.
join('|').
replace(/\./g, '\\.').
replace(/X/g, '\\d+') +
')$';
That's still just a string; we have to turn it into an actual RegExp object to do the matching:
const allowedRegexp = new RegExp(allowedRegexStr);
As written, this doesn't filter out illegal IPs - for instance, 10.1234.5678.9012 would match the first pattern. If you want to limit the individual byte values to the decimal range 0-255, you can use a more complicated regex than \d+, like this:
(?:\d{1,2}|1\d{2}|2[0-4]\d|25[0-5])
That matches "any one or two digits, or '1' followed by any two digits, or '2' followed by any of '0' through '4' followed by any digit, or '25' followed by any of '0' through '5'". Replacing the \d with that turns the full string-munging expression into this:
const allowedRegexStr = '^(?:' +
allowedIPpatterns.
join('|').
replace(/\./g, '\\.').
replace(/X/g, '(?:\\d{1,2}|1\\d{2}|2[0-4]\\d|25[0-5])') +
')$';
And makes the actual regex look much more unwieldy:
^(?:10\.(?:\d{1,2}|1\d{2}|2[0-4]\d|25[0-5])\.(?:\d{1,2}|1\d{2}|2[0-4]\d|25[0-5]).(?:\d{1,2}|1\d{2}|2[0-4]\d|25[0-5])|12\.122\....
but you don't have to look at it, just match against it. :)

You could do it in regex, but it's not going to be pretty, especially since JavaScript doesn't even support verbose regexes, which means that it has to be one humongous line of regex without any comments. Furthermore, regexes are ill-suited for matching ranges of numbers. I suspect that there are better tools for dealing with this.
Well, OK, here goes (for the samples you provided):
var myregexp = /\b(?:74\.23\.211\.92|(?:12\.(?:122|211)|64\.23)\.(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])\.(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])|(?:10|64)\.(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])\.(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])\.(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9]))\b/g;
As a verbose ("readable") regex:
\b # start of number
(?: # Either match...
74\.23\.211\.92 # an explicit address
| # or
(?: # an address that starts with
12\.(?:122|211) # 12.122 or 12.211
| # or
64\.23 # 64.23
)
\. # .
(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])\. # followed by 0..255 and a dot
(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9]) # followed by 0..255
| # or
(?:10|64) # match 10 or 64
\. # .
(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])\. # followed by 0..255 and a dot
(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])\. # followed by 0..255 and a dot
(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9]) # followed by 0..255
)
\b # end of number

/^(X|\d{1,3})(\.(X|\d{1,3})){3}$/ should do it.

If you don't actually need to match the "X" character you could use this:
\b(?:\d{1,3}\.){3}\d{1,3}\b
Otherwise I would use the solution cebarrett provided.

I'm not entirely sure of what you're trying to achieve here (doesn't look anyone else is either).
However, if it's validation, then here's a solution to validate an IP address that doesn't use RegEx. First, split the input string at the dot. Then using parseInt on the number, make sure it isn't higher than 255.
function ipValidator(ipAddress) {
var ipSegments = ipAddress.split('.');
for(var i=0;i<ipSegments.length;i++)
{
if(parseInt(ipSegments[i]) > 255){
return 'fail';
}
}
return 'match';
}
Running the following returns 'match':
document.write(ipValidator('10.255.255.125'));
Whereas this will return 'fail':
document.write(ipValidator('10.255.256.125'));
Here's a noted version in a jsfiddle with some examples, http://jsfiddle.net/VGp2p/2/

Related

Regex to accept special character only in presence of alphabets or numeric value in JavaScript?

I have a Javascript regex like this:
/^[a-zA-Z0-9 !##$%^&*()-_-~.+,/\" ]+$/
which allows following conditions:
only alphabets allowed
only numeric allowed
combination of alphabets and numeric allowed
combination of alphabets, numeric and special characters are allowed
I want to modify above regex to cover two more cases as below:
only special characters are not allowed
string should not start with special characters
so basicaly my requirement is:
string = 'abc' -> Correct
string = '123' -> Correct
string = 'abc123' ->Correct
string = 'abc123!##' ->Correct
string = 'abc!##123' -> Correct
string = '123!##abc' -> Correct
string = '!##' -> Wrong
string = '!##abc' -> Wrong
string = '!##123' -> Wrong
string = '!##abc123' -> Wrong
can someone please help me with this?
You can require at least one alphanumeric:
/^(?=[^a-zA-Z0-9]*[a-zA-Z0-9])[a-zA-Z0-9 !##$%^&*()_~.+,/\" -]+$/
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Also, I think you wanted to match a literal -, so need to repeat it, just escape, change -_- to \-_, or - better - move to the end of the character class.
The (?=[^a-zA-Z0-9]*[a-zA-Z0-9]) pattern is a positive character class that requires an ASCII letter of digit after any zero or more chars other than ASCII letters or digits, immediately to the right of the current location, here, from the start of string.
Just add [a-zA-Z0-9] to the beginning of the regex:
/^[a-zA-Z0-9][a-zA-Z0-9 \^\\\-!##$%&*()_~.+,/'"]+$/gm
Note, if within a class (ie [...]) that there are four special characters that must be escaped by prefixing a backward slash (\) to it so that it is interpreted as it's literal meaning:
// If within a class (ie [...])
^ \ - ]
// If not within a class
\ ^ $ . * + ? ( ) [ ] { } |
RegEx101
If all the special characters are allowed and you consider that underscore _ is also a special character then you can always simplify your RegEx like this :
/^[^\W_].+$/
Check here for your given examples on Regex101

Regex consume a character if it matches, but not otherwise

I am trying to write a regex expression which will capture all instances of the '#' character, except when two such characters appear in succession (essentially, an escape sequence). For example:
abd#ajk: # should be matched
abd##ajk: No matches
abd###ajk: The final # should match.
abd####ajk: No matches
This almost works with the negative lookahead expression #(?!#), except that because the second # is not consumed, the last of two # symbols will still be matched. What I think I want to do is to lookahead for an # but consume the character if it is there; otherwise, do not consume it. Is this possible?
Edit: I'm using Javascript which unfortunately rules out several good approaches :(
In JavaScript, to split strings at an unescaped #, you may actually match chunks of text that is either ## (an escaped #) and any chars other than #:
var strs = ['abd#ajk','abd##ajk','abd###ajk','abd####ajk'];
var rx = /(?:[^#]|##)+/g;
for (var s of strs) {
console.log(s, "=>", s.match(rx))
}
The regex is
/(?:[^#]|##)+/g
See its demo
Details
(?: - start of a non-capturing group that matches either of the 2 alternatives:
[^#]- any char other than#`
| - or
## - 2 #s
)+ - repeat matching 1 or more times.
The g modifier finds all matching occurrences inside the input string.
Since you didn't tag a programming language to your question here is my 2 cents for Java:
(?<=(?<!#)(?:##){0,999})#(?!#)
Java doesn't support infinite lookbehinds but bounded so here I explicitly specified max of even occurrences of #: 999.
JavsScript
Lookbehinds in JavaScript are not implemented and supported by many browsers yet. If you are trying to do this in JS then this would be your working solution:
Method 1
((?:[^#]*(?:##)+[^#]*)+)|#
(?:[^#]*(?:##)+[^#]*)+ Match ## occurrences and all its leading / trailing characters
|# Or a single #
JS Code:
str.split(/((?:[^#]*(?:##)+[^#]*)+)|#/).filter(Boolean);
Method 2 (Recommended)
Or if you don't have problem with using match() this is much more cleaner and of course faster:
(?:[^#]*(?:##)+[^#]*)+|[^#]+
JS Code:
console.log(
"aaaa#######bbb#aa###cccc##ddddd#".match(/(?:[^#]*(?:##)+[^#]*)+|[^#]+/g)
);

Query on Javascript RegEx

I need a regex that allows 0-9, a-z, A-Z, hyphen, question mark and "/" slash characters alone. Also the length should be between 5 to 15 only.
I tried as follows, but it does not work:
var reg3 = /^([a-zA-Z0-9?-]){4,15}+$/;
alert(reg3.test("abcd-"));
length should be between 5 to 15 only
Is that why you have this?
{4,15}+
Just use {5,15}; it’s already a quantifier, and a + after it won’t work. Apart from that, the group isn’t necessary, but things should work.
/^[a-zA-Z0-9?/-]{5,15}$/
(I also added a slash character.)
This is what you need:
if (/^([a-z\/?-]{4,15})$/i.test(subject)) {
// Successful match
} else {
// Match attempt failed
}
REGEX EXPLANATION
^([a-z\/?-]{4,15})$
Options: Case insensitive
Assert position at the beginning of the string «^»
Match the regex below and capture its match into backreference number 1 «([a-z\/?-]{4,15})»
Match a single character present in the list below «[a-z\/?-]{4,15}»
Between 4 and 15 times, as many times as possible, giving back as needed (greedy) «{4,15}»
A character in the range between “a” and “z” (case insensitive) «a-z»
The literal character “/” «\/»
The literal character “?” «?»
The literal character “-” «-»
Assert position at the very end of the string «$»
Couple issues,
you need {5,15} instead of {4,15}+
need to include /
Your code can be rewritten as
var reg3 = new RegExp('^[a-z0-9?/-]{5,15}$', 'i'); // i flag to eliminate need of A-Z
alert(reg3.test("a1?-A7="));
Update
Let's not confuse can be with MUST be and concentrate on the actual thing I was trying to convey.
{4,15}+ part in /^([a-zA-Z0-9?-]){4,15}+$/ should be written as {5,15}, and / must be included; which will make your regexp
/^([a-zA-Z0-9?/-]){5,15}$/
which CAN be written as
/^[a-z0-9?/-]{5,15}$/i // i flag to eliminate need of A-Z
Also I hope everybody is OK with use of /i

JavaScript Regular Expression (If and only if)

I just finished all the "Easy" CoderByte challenges and am now going back to see if there are more efficient ways to answer the questions. I am trying to come up with a regular expression for "SimpleSymbol".
(Have the function SimpleSymbols(str) take the str parameter being passed and determine if it is an acceptable sequence by either
returning the string true or false. The str parameter will be composed
of + and = symbols with several letters between them (ie.
++d+===+c++==a) and for the string to be true each letter must be surrounded by a + symbol. So the string to the left would be false.
The string will not be empty and will have at least one letter.)
I originally answered the question by traversing through the whole string, when a letter is found, testing on either side to see if a "+" exists. I thought it would be easier if I could just test the string with a regular expression like,
str.match(/\+[a-zA-Z]\+/g)
This doesn't quite work. I am trying to see if the match will ONLY return true if the condition is met on ALL of the characters in the string. For instance the method will return true on the string, "++d+===+c++==a", due to '+d+' and '+c+'. However, based on the original question it should return false because of the 'a' and no surrounding '+'s.
Any ideas?
EDIT: #Marc brought up a very good point. The easiest way to do this is to search for violations using
[^+][a-zA-Z]|[a-zA-Z][^+]
or something like it. This will find all violations of the rule -- times when a letter appears next to something other than a +. If this matches, then you can return false, knowing that there exist violations. Else, return true.
Original answer:
Here's a regex -- I explain it below. Remember that you have to escape +, because it is a special character!
^([^a-zA-Z+]|(\+[a-zA-Z]\+)|[+])*$
^ // start of the string
[^a-zA-Z+] // any character except a letter or a + (1)
| // or
(\+[a-zA-Z]\+) // + (letter) + (2)
| //or
[+] // plus (3)
)*$ // repeat that pattern 0 or more times, end
The logic behind this is: skip all characters that aren't relevant in your string. (1)
If we have a + (letter) +, that's fine. capture that. (2)
if we have a + all by itself, that's fine too. (3)
A letter without surrounding + will fail.
The problem is that + is a special character in regular expressions. It's a quantifier that means 'one or more of the previous item'. You can represent a literal + character by escaping it, like this:
str.match(/\+[a-zA-Z]\+/g)
However, this will return true if any set of characters is found in the string matching that pattern. If you want to ensure that there are no other characters in the string which do not match that pattern you could do something like this:
str.match(/^([=+]*\+[a-zA-Z](?=\+))+[=+]*$/)
This will match, any number of = or + characters, followed by a literal + followed by a Latin letter, followed by a literal +, all of which may be repeated one or more times, followed by any number of = or + characters. The ^ at the beginning and the $ at the end matches the start and end of the input string, respectively. This ensures that no other characters are allowed. The (?=\+) is a look-ahead assertion, meaning that the next character must be a literal +, but is not considered part of the group, this means it can be rematched as the leading + in the next match (e.g. +a+b+).
Interesting problem!
String requirements:
String must be composed of only +, = and [A-Za-z] alpha chars.
Each and every alpha char must be preceded by a + and followed by a +.
There must be at least one alpha char.
Valid strings:
"+A+"
"++A++"
"=+A+="
"+A+A+"
"+A++A+"
Invalid strings:
+=+= # Must have at least one alpha.
+A+&+A+ # Non-valid char.
"+A" # Alpha not followed by +.
"A+" # Alpha not preceded by +.
Solution:
^[+=]*(?:\+[A-Z](?=\+)[+=]*)+$ (with i ignorecase option set)
Here's how I'd do it: (First in a tested python script with fully commented regex)
import re
def isValidSpecial(text):
if re.search(r"""
# Validate special exercise problem string format.
^ # Anchor to start of string.
[+=]* # Zero or more non-alphas {normal*).
(?: # Begin {(special normal*)+} construct.
\+[A-Z] # Alpha but only if preceded by +
(?=\+) # and followed by + {special} .
[+=]* # More non-alphas {normal*).
)+ # At least one alpha required.
$ # Anchor to end of string.
""", text, re.IGNORECASE | re.VERBOSE):
return "true"
else:
return "false"
print(isValidSpecial("+A+A+"))
Now here is the same solution in JavaScript syntax:
function isValidSpecial(text) {
var re = /^[+=]*(?:\+[A-Z](?=\+)[+=]*)+$/i;
return (re.test(text)) ? "true" : "false";
}
/^[=+]*\+[a-z](?=\+)(?:\+[a-z](?=\+)|[+=]+)*$/i.test(str)
pattern details:
^ # anchor, start of the string
[=+]* # 0 or more = and +
\+ [a-z] (?=\+) # imposed surrounded letter (at least one letter condition)
(?: # non capturing group with:
\+[a-z](?=\+) # surrounded letter
| # OR
[=+]+ # + or = characters (one or more)
)* # repeat the group 0 or more times
$ # anchor, end of the string
To allow consecutive letters like +a+a+a+a+a+, I use a lookahead assertion to check there is a + symbol after the letter without match it. (Thanks to ridgrunner for his comment)
Example:
var str= Array('+==+u+==+a', '++=++a+', '+=+=', '+a+-', '+a+a+');
for (var i=0; i<5; i++) {
console.log(/^[=+]*\+[a-z](?=\+)(?:\+[a-z](?=\+)|[+=]+)*$/i.test(str[i]));
}

Regular expression for excluding some characters with multiline matching

I want to ensure that the user input doesn't contain characters like <, > or &#, whether it is text input or textarea. My pattern:
var pattern = /^((?!&#|<|>).)*$/m;
The problem is, that it still matches multiline strings from a textarea like
this text matches
though this should not, because of this character <
EDIT:
To be more clear, I need exclude &# combination only, not & or #.
Please suggest the solution. Very grateful.
You're probably not looking for m (multiline) switch but s (DOTALL) switch in Javascript. Unfortunately s doesn't exist in Javascript.
However good news that DOTALL can be simulated using [\s\S]. Try following regex:
/^(?![\s\S]*?(&#|<|>))[\s\S]*$/
OR:
/^((?!&#|<|>)[\s\S])*$/
Live Demo
I don't think you need a lookaround assertion in this case. Simply use a negated character class:
var pattern = /^[^<>&#]*$/m;
If you're also disallowing the following characters, -, [, ], make sure to escape them or put them in proper order:
var pattern = /^[^][<>&#-]*$/m;
Alternate answer to specific question:
anubhava's solution works accurately, but is slow because it must perform a negative lookahead at each and every character position in the string. A simpler approach is to use reverse logic. i.e. Instead of verifying that: /^((?!&#|<|>)[\s\S])*$/ does match, verify that /[<>]|&#/ does NOT match. To illustrate this, lets create a function: hasSpecial() which tests if a string has one of the special chars. Here are two versions, the first uses anubhava's second regex:
function hasSpecial_1(text) {
// If regex matches, then string does NOT contain special chars.
return /^((?!&#|<|>)[\s\S])*$/.test(text) ? false : true;
}
function hasSpecial_2(text) {
// If regex matches, then string contains (at least) one special char.
return /[<>]|&#/.test(text) ? true : false;
}
These two functions are functionally equivalent, but the second one is probably quite a bit faster.
Note that when I originally read this question, I misinterpreted it to really want to exclude HTML special chars (including HTML entities). If that were the case, then the following solution will do just that.
Test if a string contains HTML special Chars:
It appears that the OP want to ensure a string does not contain any special HTML characters including: <, >, as well as decimal and hex HTML entities such as:  ,  , etc. If this is the case then the solution should probably also exclude the other (named) type of HTML entities such as: &, <, etc. The solution below excludes all three forms of HTML entities as well as the <> tag delimiters.
Here are two approaches: (Note that both approaches do allow the sequence: &# if it is not part of a valid HTML entity.)
FALSE test using positive regex:
function hasHtmlSpecial_1(text) {
/* Commented regex:
# Match string having no special HTML chars.
^ # Anchor to start of string.
[^<>&]* # Zero or more non-[<>&] (normal*).
(?: # Unroll the loop. ((special normal*)*)
& # Allow a & but only if
(?! # not an HTML entity (3 valid types).
(?: # One from 3 types of HTML entities.
[a-z\d]+ # either a named entity,
| \#\d+ # or a decimal entity,
| \#x[a-f\d]+ # or a hex entity.
) # End group of HTML entity types.
; # All entities end with ";".
) # End negative lookahead.
[^<>&]* # More (normal*).
)* # End unroll the loop.
$ # Anchor to end of string.
*/
var re = /^[^<>&]*(?:&(?!(?:[a-z\d]+|#\d+|#x[a-f\d]+);)[^<>&]*)*$/i;
// If regex matches, then string does NOT contain HTML special chars.
return re.test(text) ? false : true;
}
Note that the above regex utilizes Jeffrey Friedl's "Unrolling-the-Loop" efficiency technique and will run very quickly for both matching and non-matching cases. (See his regex masterpiece: Mastering Regular Expressions (3rd Edition))
TRUE test using negative regex:
function hasHtmlSpecial_2(text) {
/* Commented regex:
# Match string having one special HTML char.
[<>] # Either a tag delimiter
| & # or a & if start of
(?: # one of 3 types of HTML entities.
[a-z\d]+ # either a named entity,
| \#\d+ # or a decimal entity,
| \#x[a-f\d]+ # or a hex entity.
) # End group of HTML entity types.
; # All entities end with ";".
*/
var re = /[<>]|&(?:[a-z\d]+|#\d+|#x[a-f\d]+);/i;
// If regex matches, then string contains (at least) one special HTML char.
return re.test(text) ? true : false;
}
Note also that I have included a commented version of each of these (non-trivial) regexes in the form of a JavaScript comment.

Categories

Resources