JavaScript Regular Expression (If and only if) - javascript

I just finished all the "Easy" CoderByte challenges and am now going back to see if there are more efficient ways to answer the questions. I am trying to come up with a regular expression for "SimpleSymbol".
(Have the function SimpleSymbols(str) take the str parameter being passed and determine if it is an acceptable sequence by either
returning the string true or false. The str parameter will be composed
of + and = symbols with several letters between them (ie.
++d+===+c++==a) and for the string to be true each letter must be surrounded by a + symbol. So the string to the left would be false.
The string will not be empty and will have at least one letter.)
I originally answered the question by traversing through the whole string, when a letter is found, testing on either side to see if a "+" exists. I thought it would be easier if I could just test the string with a regular expression like,
str.match(/\+[a-zA-Z]\+/g)
This doesn't quite work. I am trying to see if the match will ONLY return true if the condition is met on ALL of the characters in the string. For instance the method will return true on the string, "++d+===+c++==a", due to '+d+' and '+c+'. However, based on the original question it should return false because of the 'a' and no surrounding '+'s.
Any ideas?

EDIT: #Marc brought up a very good point. The easiest way to do this is to search for violations using
[^+][a-zA-Z]|[a-zA-Z][^+]
or something like it. This will find all violations of the rule -- times when a letter appears next to something other than a +. If this matches, then you can return false, knowing that there exist violations. Else, return true.
Original answer:
Here's a regex -- I explain it below. Remember that you have to escape +, because it is a special character!
^([^a-zA-Z+]|(\+[a-zA-Z]\+)|[+])*$
^ // start of the string
[^a-zA-Z+] // any character except a letter or a + (1)
| // or
(\+[a-zA-Z]\+) // + (letter) + (2)
| //or
[+] // plus (3)
)*$ // repeat that pattern 0 or more times, end
The logic behind this is: skip all characters that aren't relevant in your string. (1)
If we have a + (letter) +, that's fine. capture that. (2)
if we have a + all by itself, that's fine too. (3)
A letter without surrounding + will fail.

The problem is that + is a special character in regular expressions. It's a quantifier that means 'one or more of the previous item'. You can represent a literal + character by escaping it, like this:
str.match(/\+[a-zA-Z]\+/g)
However, this will return true if any set of characters is found in the string matching that pattern. If you want to ensure that there are no other characters in the string which do not match that pattern you could do something like this:
str.match(/^([=+]*\+[a-zA-Z](?=\+))+[=+]*$/)
This will match, any number of = or + characters, followed by a literal + followed by a Latin letter, followed by a literal +, all of which may be repeated one or more times, followed by any number of = or + characters. The ^ at the beginning and the $ at the end matches the start and end of the input string, respectively. This ensures that no other characters are allowed. The (?=\+) is a look-ahead assertion, meaning that the next character must be a literal +, but is not considered part of the group, this means it can be rematched as the leading + in the next match (e.g. +a+b+).

Interesting problem!
String requirements:
String must be composed of only +, = and [A-Za-z] alpha chars.
Each and every alpha char must be preceded by a + and followed by a +.
There must be at least one alpha char.
Valid strings:
"+A+"
"++A++"
"=+A+="
"+A+A+"
"+A++A+"
Invalid strings:
+=+= # Must have at least one alpha.
+A+&+A+ # Non-valid char.
"+A" # Alpha not followed by +.
"A+" # Alpha not preceded by +.
Solution:
^[+=]*(?:\+[A-Z](?=\+)[+=]*)+$ (with i ignorecase option set)
Here's how I'd do it: (First in a tested python script with fully commented regex)
import re
def isValidSpecial(text):
if re.search(r"""
# Validate special exercise problem string format.
^ # Anchor to start of string.
[+=]* # Zero or more non-alphas {normal*).
(?: # Begin {(special normal*)+} construct.
\+[A-Z] # Alpha but only if preceded by +
(?=\+) # and followed by + {special} .
[+=]* # More non-alphas {normal*).
)+ # At least one alpha required.
$ # Anchor to end of string.
""", text, re.IGNORECASE | re.VERBOSE):
return "true"
else:
return "false"
print(isValidSpecial("+A+A+"))
Now here is the same solution in JavaScript syntax:
function isValidSpecial(text) {
var re = /^[+=]*(?:\+[A-Z](?=\+)[+=]*)+$/i;
return (re.test(text)) ? "true" : "false";
}

/^[=+]*\+[a-z](?=\+)(?:\+[a-z](?=\+)|[+=]+)*$/i.test(str)
pattern details:
^ # anchor, start of the string
[=+]* # 0 or more = and +
\+ [a-z] (?=\+) # imposed surrounded letter (at least one letter condition)
(?: # non capturing group with:
\+[a-z](?=\+) # surrounded letter
| # OR
[=+]+ # + or = characters (one or more)
)* # repeat the group 0 or more times
$ # anchor, end of the string
To allow consecutive letters like +a+a+a+a+a+, I use a lookahead assertion to check there is a + symbol after the letter without match it. (Thanks to ridgrunner for his comment)
Example:
var str= Array('+==+u+==+a', '++=++a+', '+=+=', '+a+-', '+a+a+');
for (var i=0; i<5; i++) {
console.log(/^[=+]*\+[a-z](?=\+)(?:\+[a-z](?=\+)|[+=]+)*$/i.test(str[i]));
}

Related

REgex for non repeating alphabets comma seperated

I have a requirement where I need a regex which
should not repeat alphabet
should only contain alphabet and comma
should not start or end with comma
can contain more than 2 alphabets
example :-
A,B --- correct
A,B,C,D,E,F --- correct
D,D,A --- wrong
,B,C --- wrong
B,C, --- wrong
A,,B,C --- wrong
Can anyone help ?
Another idea with capturing and checking by use of a lookahead:
^(?:([A-Z])(?!.*?\1),?\b)+$
You can test here at regex101 if it meets your requirements.
If you don't want to match single characters, e.g. A, change the + quantifier to {2,}.
The statement of the question is incomplete in several respects. I have made the following assumptions:
Considering that D,D,A is incorrect I assume that a letter cannot be followed by a comma followed by the same letter.
The string may contain the same letter more than once as long as #1 is satisfied.
Considering that A,,B,C is incorrect I assume a comma cannot follow a comma.
Since the examples contain only capital letters I will assume that lower-case letters are not permitted (though one need only set the case-indifferent flag (i) to permit either case).
We observe that the requirements are satisfied if and only if the string begins with a capital letter and is followed by a sequence of comma-capital letter pairs, provided that no capital letter is followed by a comma followed by the same letter. We therefore can attempt to match the following regular expression.
^(?:([A-Z]),(?!\1))*[A-Z]$
Demo
The elements of the expression are as follows.
^ # match beginning of string
(?: # begin a non-capture group
([A-Z]) # match a capital letter and save to capture group 1
, # match a comma
(?!\1) # use negative lookahead to assert next character is not equal
# to the content of capture group 1
)* # end non-capture group and execute it zero or more times
[A-Z] # match a capital letter
$ # match end of string
Here is a big ugly regex solution:
var inputs = ['A,B', 'D,D,D', ',B,C', 'B,C,', 'A,,B'];
for (var i=0; i < inputs.length; ++i) {
if (/^(?!.*?([^,]+).*,\1(?:,|$))[^,]+(?:,[^,]+)*$/.test(inputs[i])) {
console.log(inputs[i] + " => VALID");
}
else {
console.log(inputs[i] + " => INVALID");
}
}
The regex has two parts to it. It uses a negative lookahead to assert that no two CSV entries ever repeat in the input. Then, it uses a straightforward pattern to match any proper CSV delimited input. Here is an explanation:
^ from the start of the input
(?!.*?([^,]+).*,\1(?:,|$)) assert that no CSV element ever repeats
[^,]+ then match a CSV element
(?:,[^,]+)* followed by comma and another element, 0 or more times
$ end of the input
This one could suit your needs:
^(?!,)(?!.*,,)(?!.*(\b[A-Z]+\b).*\1)[A-Z,]+(?<!,)$
^: the start of the string
(?!,): should not be directly followed by a comma
(?!.*,,): should not be followed by two commas
(?!.*(\b[A-Z]+\b).*\1): should not be followed by a value found twice
[A-Z,]+: should contain letters and commas only
$: the end of the string
(?<!,): should not be directly preceded by a comma
See https://regex101.com/r/1kGVSB/1

how to exclude character if a certain character occurs earlier

I'm trying to validate US telephone numbers
and i'm trying to exclude
555)555-5555 and (555-555-5555
How do i exclude the '(' if there's no ')' after the 3rd 5
and vice versa?
my suggestion is to auto format the number while entering it, see the demo below
$(function() {
$('#us-phone-no').on('input', function() {
var value = $(this).val();
var nums = value.replace(/\D/g, '').match(/(\d{0,3})(\d{0,3})(\d{0,4})/);
var formated = !nums[2] ? nums[1] : nums[1] + '-' + nums[2] + (nums[3] ? '-' + nums[3] : '');
$(this).val(formated);
});
});
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<input type="tetx" id="us-phone-no">
How about something simple:
https://regex101.com/r/g6uBF4/1
^((\(\d{3}\))|(\d{3}-))\d{3}-\d{4}
Just test for each case separately. Either 3 digits enclosed in parenthesis (\d{3}\) or | 3 digits with a dash (\d{3}-). Then the rest of the number \d{3}-\d{4}.
In general, you can't use Javascript regexes to ensure that parentheses are correctly balanced if there can be an indefinite number of nested parentheses. You'd need recursive/balanced matching for that. But your case is not that complicated.
For example, you can add a negative lookahead assertion at the start of your regex:
/^(?![^()]*[()][^()]*$)REST_OF_REGEX_HERE/
This ensures that there won't be a single opening or closing parenthesis in your input.
Explanation:
^ # Start of string
(?! # Assert that it's impossible to match...
[^()]* # any number of characters other than parentheses,
[()] # then a single parenthesis,
[^()]* # then any number of characters other than parentheses,
$ # then the end of the string.
) # End of lookahead
Of course there may be other ways to do what you need, but you didn't show us the rest of the rules you're using for matching.

regular expression, not reading entire string

I have a standard expression that is not working correctly.
This expression is supposed to catch if a string has invalid characters anywhere in the string. It works perfect on RegExr.com but not in my tests.
The exp is: /[a-zA-Z0-9'.\-]/g
It is failing on : ####
but passing with : aa####
It should fail both times, what am I doing wrong?
Also, /^[a-zA-Z0-9'.\-]$/g matches nothing...
//All Boxs
$('input[type="text"]').each(function () {
var text = $(this).prop("value")
var textTest = /[a-zA-Z0-9'.\-]/g.test(text)
if (!textTest && text != "") {
allFieldsValid = false
$(this).css("background-color", "rgba(224, 0, 0, 0.29)")
alert("Invalid characters found in " + text + " \n\n Valid characters are:\n A-Z a-z 0-9 ' . -")
}
else {
$(this).css("background-color", "#FFFFFF")
$(this).prop("value", text)
}
});
edit:added code
UPDATE AFTER QUESTION RE-TAGGING
You need to use
var textTest = /^[a-zA-Z0-9'.-]+$/.test(text)
^^
Note the absence of /g modifier and the + quantifier. There are known issues when you use /g global modifier within a regex used in RegExp#test() function.
You may shorten it a bit with the help of the /i case insensitive modifier:
var textTest = /^[A-Z0-9'.-]+$/i.test(text)
Also, as I mention below, you do not have to escape the - at the end of the character class [...], but it is advisable to keep escaped if the pattern will be modified later by less regex-savvy developers.
ORIGINAL C#-RELATED DETAILS
Ok, say, you are using Regex.IsMatch(str, #"[a-zA-Z0-9'.-]"). The Regex.IsMatch searches for partial matches inside a string. So, if the input string contains an ASCII letter, digit, ', . or -, this will pass. Thus, it is logical that aa#### passes this test, and #### does not.
If you use the second one as Regex.IsMatch(str, #"^[a-zA-Z0-9'.-]$"), only 1 character strings (with an optional newline at the end) would get matched as ^ matches at the start of the string, [a-zA-Z0-9'.-] matches 1 character from the specified ranges/sets, and $ matches the end of the string (or right before the final newline).
So, you need a quantifier (+ to match 1 or more, or * to match zero or more occurrences) and the anchors \A and \z:
Regex.IsMatch(str, #"\A[a-zA-Z0-9'.-]+\z")
^^ ^^^
\A matches the start of string (always) and \z matches the very end of the string in .NET. The [a-zA-Z0-9'.-]+ will match 1+ characters that are either ASCII letters, digits, ', . or -.
Note that - at the end of the character class does not have to be escaped (but you may keep the \- if some other developers will have to modify the pattern later).
And please be careful where you test your regexps. Regexr only supports JavaScript regex syntax. To test .NET regexps, use RegexStorm.net or RegexHero.
/^[a-zA-Z0-9'.-]+$/g
In the second case your (/[a-zA-Z0-9'.-]/g) was working because it matched on the first letter, so to make it correct you need to match the whole string (use ^ and $) and also allow more letters by adding a + or * (if you allow empty string).
Try this regex it matches any char which isn't part of the allowed charset
/[^a-zA-Z0-9'.\-]+/g
Test
>>regex = /[^a-zA-Z0-9'.\-]+/g
/[^a-zA-Z0-9'.\-]+/g
>>regex.test( "####dsfdfjsakldfj")
true
>>regex.test( "dsfdfjsakldfj")
false

With a JS Regex matching exact word but not hypenated words starting with said word

I could not find a match to this question.
I have a string like so
var s="one two one-two one-three one one_four"
and my function is as follows
function replaceMatches( str, word )
{
var pattern=new RegExp( '\\b('+word+')\\b','g' )
return str.replace( pattern, '' )
}
the problem is if I run the function like
var problem=replaceMatches( s,'one' )
it
returns two -two -three one_four"
the function replaces every "one" like it should but treats words with a hyphen as
two words replacing the "one" before the hyphen.
My question is not about the function but about the regex. What literal regex will match
only the words "one" in my string and not "one-two" or "one-\w"<--you know what I mean lol
basically
var pat=/\b(one)\b/g
"one one-two one".replace( pat, '')
I want the above ^ to return
" one-two "
only replace the exact match "one" and not the one in "one-two"
the "one" on the end is important to, the regex must work if the match is at the very end
Thank you, sorry if my question is relatively confusing. I am just trying to get my learn on, and expand my personal library.
What do you considered to be a word?
A word is a sequence of 1 or more word characters, and word boundary \b is defined based upon the definition of word character (and non-word character).
Word character as defined by \w in JavaScript RegExp is shorthand for character class [a-zA-Z0-9_].
What is your definition of a "word"? Let's say your definition is [a-zA-Z0-9_-].
Emulating word boundary
This post describes how to emulate a word boundary in languages that support look-behind and look-ahead. Too bad, JS doesn't support look-behind.
Let us assume the word to be replaced is one for simplicity.
We can limit the replacement with the following code:
inputString.replace(/([^a-zA-Z0-9_-]|^)one(?![a-zA-Z0-9_-])/g, "$1")
Note: I use the expanded form [a-zA-Z0-9_-] instead of [\w-] to avoid association with \w.
Break down the regex:
(
[^a-zA-Z0-9_-] # Negated character class of "word" character
| # OR
^ # Beginning of string
)
one # Keyword
(?! # Negative look-ahead
[a-zA-Z0-9_-] # Word character
)
I emulate the negative look-behind (which is (?<![a-zA-Z0-9_-]) if supported) by matching a character from negated character class of "word" character and ^ beginning of string. This is natural, since if we can't find a "word" character, then it must be either a non-"word" character or beginning of the string. Everything is wrapped in a capturing group so that it can be replaced back later.
Since one is only replace if there is no "word" character before or after, there is no risk of missing a match.
Putting together
Since you are removing "word"s, you must make sure your keyword contains only "word" characters.
function replaceMatches(str, keyword)
{
// The keyword must not contain non-"word" characters
if (!/^[a-zA-Z0-9_-]+$/.test(keyword)) {
throw "not a word";
}
// Customize [a-zA-Z0-9_-] and [^a-zA-Z0-9_-] with your definition of
// "word" character
var pattern = new RegExp('([^a-zA-Z0-9_-]|^)' + keyword + '(?![a-zA-Z0-9_-])', 'g')
return str.replace(pattern, '$1')
}
You need to escape meta-characters in the keyword if your definition of "word" character includes regex meta-characters.
Use this for your RegExp:
function replaceMatches( str, word ) {
var pattern = new RegExp('(^|[^-])\\b('+word+')\\b([^-]|$)', 'g');
return str.replace(pattern, '$1$3')
}
The (^|[^-]) will match either the start of the string or any character except -. The ([^-]|$) will match either a character other than - or the end of the string.
I'm not a JS pattern function expert but the function should replace all.
As for the hyphen in 'one-two' between one and - is a word boundry (ie. \b) and the
end of string is a word boundry if a \w character is there before it.
But, it sounds like you may want 'one' to be preceeded with a space or BOL.
([ ]|^)one\b in that case you want to make the replacement capture group 1, thus strippking out 'one' only.
And, I'm not sure how that function call works in JS.
Edit: after new expected output, the regex could be -
([ ]|^)one(?=[ ]|$)

Javascript multiple regex pattern

I'm trying to exclude some internal IP addresses and some internal IP address formats from viewing certain logos and links in the site.I have multiple range of IP addresses(sample given below). Is it possible to write a regex that could match all the IP addresses in the list below using javascript?
10.X.X.X
12.122.X.X
12.211.X.X
64.X.X.X
64.23.X.X
74.23.211.92
and 10 more
Quote the periods, replace the X's with \d+, and join them all together with pipes:
const allowedIPpatterns = [
"10.X.X.X",
"12.122.X.X",
"12.211.X.X",
"64.X.X.X",
"64.23.X.X",
"74.23.211.92" //, etc.
];
const allowedRegexStr = '^(?:' +
allowedIPpatterns.
join('|').
replace(/\./g, '\\.').
replace(/X/g, '\\d+') +
')$';
const allowedRegexp = new RegExp(allowedRegexStr);
Then you're all set:
'10.1.2.3'.match(allowedRegexp) // => ['10.1.2.3']
'100.1.2.3'.match(allowedRegexp) // => null
How it works:
First, we have to turn the individual IP patterns into regular expressions matching their intent. One regular expression for "all IPs of the form '12.122.X.X'" is this:
^12\.122\.\d+\.\d+$
^ means the match has to start at the beginning of the string; otherwise, 112.122.X.X IPs would also match.
12 etc: digits match themselves
\.: a period in a regex matches any character at all; we want literal periods, so we put a backslash in front.
\d: shorthand for [0-9]; matches any digit.
+: means "1 or more" - 1 or more digits, in this case.
$: similarly to ^, this means the match has to end at the end of the string.
So, we turn the IP patterns into regexes like that. For an individual pattern you could use code like this:
const regexStr = `^` + ipXpattern.
replace(/\./g, '\\.').
replace(/X/g, '\\d+') +
`$`;
Which just replaces all .s with \. and Xs with \d+ and sticks the ^ and $ on the ends.
(Note the doubled backslashes; both string parsing and regex parsing use backslashes, so wherever we want a literal one to make it past the string parser to the regular expression parser, we have to double it.)
In a regular expression, the alternation this|that matches anything that matches either this or that. So we can check for a match against all the IP's at once if we to turn the list into a single regex of the form re1|re2|re3|...|relast.
Then we can do some refactoring to make the regex matcher's job easier; in this case, since all the regexes are going to have ^...$, we can move those constraints out of the individual regexes and put them on the whole thing: ^(10\.\d+\.\d+\.\d+|12\.122\.\d+\.\d+|...)$. The parentheses keep the ^ from being only part of the first pattern and $ from being only part of the last. But since plain parentheses capture as well as group, and we don't need to capture anything, I replaced them with the non-grouping version (?:..).
And in this case we can do the global search-and-replace once on the giant string instead of individually on each pattern. So the result is the code above:
const allowedRegexStr = '^(?:' +
allowedIPpatterns.
join('|').
replace(/\./g, '\\.').
replace(/X/g, '\\d+') +
')$';
That's still just a string; we have to turn it into an actual RegExp object to do the matching:
const allowedRegexp = new RegExp(allowedRegexStr);
As written, this doesn't filter out illegal IPs - for instance, 10.1234.5678.9012 would match the first pattern. If you want to limit the individual byte values to the decimal range 0-255, you can use a more complicated regex than \d+, like this:
(?:\d{1,2}|1\d{2}|2[0-4]\d|25[0-5])
That matches "any one or two digits, or '1' followed by any two digits, or '2' followed by any of '0' through '4' followed by any digit, or '25' followed by any of '0' through '5'". Replacing the \d with that turns the full string-munging expression into this:
const allowedRegexStr = '^(?:' +
allowedIPpatterns.
join('|').
replace(/\./g, '\\.').
replace(/X/g, '(?:\\d{1,2}|1\\d{2}|2[0-4]\\d|25[0-5])') +
')$';
And makes the actual regex look much more unwieldy:
^(?:10\.(?:\d{1,2}|1\d{2}|2[0-4]\d|25[0-5])\.(?:\d{1,2}|1\d{2}|2[0-4]\d|25[0-5]).(?:\d{1,2}|1\d{2}|2[0-4]\d|25[0-5])|12\.122\....
but you don't have to look at it, just match against it. :)
You could do it in regex, but it's not going to be pretty, especially since JavaScript doesn't even support verbose regexes, which means that it has to be one humongous line of regex without any comments. Furthermore, regexes are ill-suited for matching ranges of numbers. I suspect that there are better tools for dealing with this.
Well, OK, here goes (for the samples you provided):
var myregexp = /\b(?:74\.23\.211\.92|(?:12\.(?:122|211)|64\.23)\.(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])\.(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])|(?:10|64)\.(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])\.(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])\.(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9]))\b/g;
As a verbose ("readable") regex:
\b # start of number
(?: # Either match...
74\.23\.211\.92 # an explicit address
| # or
(?: # an address that starts with
12\.(?:122|211) # 12.122 or 12.211
| # or
64\.23 # 64.23
)
\. # .
(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])\. # followed by 0..255 and a dot
(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9]) # followed by 0..255
| # or
(?:10|64) # match 10 or 64
\. # .
(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])\. # followed by 0..255 and a dot
(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])\. # followed by 0..255 and a dot
(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9]) # followed by 0..255
)
\b # end of number
/^(X|\d{1,3})(\.(X|\d{1,3})){3}$/ should do it.
If you don't actually need to match the "X" character you could use this:
\b(?:\d{1,3}\.){3}\d{1,3}\b
Otherwise I would use the solution cebarrett provided.
I'm not entirely sure of what you're trying to achieve here (doesn't look anyone else is either).
However, if it's validation, then here's a solution to validate an IP address that doesn't use RegEx. First, split the input string at the dot. Then using parseInt on the number, make sure it isn't higher than 255.
function ipValidator(ipAddress) {
var ipSegments = ipAddress.split('.');
for(var i=0;i<ipSegments.length;i++)
{
if(parseInt(ipSegments[i]) > 255){
return 'fail';
}
}
return 'match';
}
Running the following returns 'match':
document.write(ipValidator('10.255.255.125'));
Whereas this will return 'fail':
document.write(ipValidator('10.255.256.125'));
Here's a noted version in a jsfiddle with some examples, http://jsfiddle.net/VGp2p/2/

Categories

Resources