I'm using PHP's Filter Functions (FILTER_VALIDATE_REGEXP specifically) to validate the input data. I have a list of options and the $input variable can specify a number of options from the list.
The options are (case-insensitive):
all
rewards
join
promotions
stream
checkin
verified_checkin
The $input variable can have almost any combination of the values. The possible success cases are:
all (value can either be all or a comma separated list of other values but not both)
rewards,stream,join (a comma separated list of values excluding all)
join (a single value)
The Regular Expression I've been able to come up with is:
/^(?:all|(?:checkin|verified_checkin|rewards|join|promotions|stream)?(?:,(?:checkin|verified_checkin|rewards|join|promotion|stream))*)$/
So far, it works for the following example scenarios:
all (passes)
rewards,join,promotion,checkin,verified_checkin (passes)
join (passes)
However, it lets a value with a leading comma and duplicates through:
,promotion,checkin,verified_checkin (starts with a comma but also passes when it shouldn't)
Also, checking for duplicates would be a bonus, but not necessarily required.
rewards,join,promotion,checkin,join,verified_checkin (duplicate value but still passes but not as critical as a leading comma)
I've been at it for a couple of days now and having tried various implementations, this is the closest I've been able to get.
Any ideas on how to handle the leading comma false positive?
UPDATE: Edited the question to better explain that duplicate filtering isn't really a requirement, just a bonus.
Sometimes regular expressions just make things more complicated than they should be. Regular expressions are really good at matching patterns, but when you introduce external rules that have dependencies on the number of matched patterns things get complicated fast.
In this case I would just split the list on comma and check the resulting strings against the rules you just described.
$valid_choices = array('checkin','join','promotions','rewards','stream','verified_checkin');
$input_string; // string to match
$tokens = explode(',' $input_string);
$tokens = asort($tokens); // sort to tokens to make it easy to find duplicates
if($tokens[0] == 'all' && count($tokens) > 1)
return FALSE; // fail (all + other options)
if(!in_array($tokens[0], $valid_choices))
return FALSE; // fail (invalid first choice)
for($i = 1; $i < count($tokens); $i++)
{
if($tokens[$i] == $tokens[$i-1])
return FALSE; // fail (duplicates)
if(!in_array($tokens[$i], $valid_choices))
return FALSE; // fail (choice not valid)
}
EDIT
Since you edited your and specified that duplicates would be acceptable but you definitely want a regex-based solution then this one should do:
^(all|((checkin|verified_checkin|rewards|join|promotions|stream)(,(checkin|verified_checkin|rewards|join|promotion|stream))*))$
It will not fail on duplicates but it will take care or leading or trailing commas, or all + other choices combination.
Filtering out duplicates with a regex would be pretty difficult but maybe not impossible (if you use a look-ahead with a capture group placeholder)
SECOND EDIT
Although you mentioned that detecting duplicate entries is not critical I figured I'd try my hand at crafting a pattern that would also check for duplicate entries.
As you can see below, it's not very elegant, nor is it easily scalable but it does get the job done with the finite list of options you have using negative look-ahead.
^(all|(checkin|verified_checkin|rewards|join|promotions|stream)(,(?!\2)(checkin|verified_checkin|rewards|join|promotions|stream))?(,(?!\2)(?!\4)(checkin|verified_checkin|rewards|join|promotions|stream))?(,(?!\2)(?!\4)(?!\6)(checkin|verified_checkin|rewards|join|promotions|stream))?(,(?!\2)(?!\4)(?!\6)(?!\8)(checkin|verified_checkin|rewards|join|promotions|stream))?(,(?!\2)(?!\4)(?!\6)(?!\8)(?!\10)(checkin|verified_checkin|rewards|join|promotions|stream))?)$
Since the final regex is so long, I'm going to break it up into parts for the sake of making it easier to follow the general idea:
^(all|
(checkin|verified_checkin|rewards|join|promotions|stream)
(,(?!\2)(checkin|verified_checkin|rewards|join|promotions|stream))?
(,(?!\2)(?!\4)(checkin|verified_checkin|rewards|join|promotions|stream))?
(,(?!\2)(?!\4)(?!\6)(checkin|verified_checkin|rewards|join|promotions|stream))?
(,(?!\2)(?!\4)(?!\6)(?!\8)(checkin|verified_checkin|rewards|join|promotions|stream))?
(,(?!\2)(?!\4)(?!\6)(?!\8)(?!\10)(checkin|verified_checkin|rewards|join|promotions|stream))?
)$/
You can see that the mechanism to form the pattern is somewhat iterative and such a pattern could be generated automatically by an algorithm if you wanted to provide a different list but the resulting pattern would get rather large, rather quickly.
Related
I'm trying to implement a username form validation in javascript where the username
can't start with numbers
can't have whitespaces
can't have any symbols but only One dot or One underscore or One dash
example of a valid username: the_user-one.123
example of invalid username: 1----- user
i've been trying to implement this for awhile but i couldn't figure out how to have only one of each allowed symbol:-
const usernameValidation = /(?=^[\w.-]+$)^\D/g
console.log(usernameValidation.test('1username')) //false
console.log(usernameValidation.test('username-One')) //true
How about using a negative lookahead at the start:
^(?!\d|.*?([_.-]).*\1)[\w.-]+$
This will check if the string
neither starts with digit
nor contains two [_.-] by use of capture and backreference
See this demo at regex101 (more explanation on the right side)
Preface: Due to my severe carelessness, I assumed the context was usage of the HTML pattern attribute instead of JavaScript input validation. I leave this answer here for posterity in case anyone really wants to do this with regex.
Although regex does have functionality to represent a pattern occuring consecutively within a certain number of times (via {<lower-bound>,<upper-bound>}), I'm not aware of regex having "elegant" functionality to enforce a set of patterns each occuring within a range of number of times but in any order and with other patterns possibly in between.
Some workarounds I can think of:
Make a regex that allows for one of each permutation of ordering of special characters (note: newlines added for readability):
^(?:
(?:(?:(?:[A-Za-z][A-Za-z0-9]*\.?)|\.)[A-Za-z0-9]*-?[A-Za-z0-9]*_?)|
(?:(?:(?:[A-Za-z][A-Za-z0-9]*\.?)|\.)[A-Za-z0-9]*_?[A-Za-z0-9]*-?)|
(?:(?:(?:[A-Za-z][A-Za-z0-9]*-?)|-)[A-Za-z0-9]*\.?[A-Za-z0-9]*_?)|
(?:(?:(?:[A-Za-z][A-Za-z0-9]*-?)|-)[A-Za-z0-9]*_?[A-Za-z0-9]*\.?)|
(?:(?:(?:[A-Za-z][A-Za-z0-9]*_?)|_)[A-Za-z0-9]*\.?[A-Za-z0-9]*-?)|
(?:(?:(?:[A-Za-z][A-Za-z0-9]*_?)|_)[A-Za-z0-9]*-?[A-Za-z0-9]*\.?)
)[A-Za-z0-9]*$
Note that the above regex can be simplified if you don't want usernames to start with special characters either.
Friendly reminder to also make sure you use the HTML attributes to enforce a minimum and maximum input character length where appropriate.
If you feel that regex isn't well suited to your use-case, know that you can do custom validation logic using javascript, which gives you much more control and can be much more readable compared to regex, but may require more lines of code to implement. Seeing the regex above, I would personally seriously consider the custom javascript route.
Note: I find https://regex101.com/ very helpful in learning, writing, and testing regex. Make sure to set the "flavour" to "JavaScript" in your case.
I have to admit that Bobble bubble's solution is the better fit. Here ia a comparison of the different cases:
console.log("Comparison between mine and Bobble Bubble's solution:\n\nusername mine,BobbleBubble");
["valid-usrId1","1nvalidUsrId","An0therVal1d-One","inva-lid.userId","anot-her.one","test.-case"].forEach(u=>console.log(u.padEnd(20," "),chck(u)));
function chck(s){
return [!!s.match(/^[a-zA-Z][a-zA-Z0-9._-]*$/) && ( s.match(/[._-]/g) || []).length<2, // mine
!!s.match(/^(?!\d|.*?([_.-]).*\1)[\w.-]+$/)].join(","); // Bobble bulle
}
The differences can be seen in the last three test cases.
I'm working on this simple, straightforward text content filtering mechanism on our post commenting module where people are prohibited from writing foul, expletive words.
So far I'm able to compare (word-by-word, using .include()) comment contents against the blacklisted words we have in the database. But to save space, time and effort in entering database entries for each word such as 'Fucking' and 'Fuck', I want to create a mechanism where we check if a word contains a blacklisted word.
This way, we just enter 'Fuck' in the database. And when visitor's comment contains 'Fucking' or 'Motherfucker', the function will automatically detect that there is a word in the comment that contain's 'fuck' in it and then perform necessary actions.
I've been thinking of integrating .substring() but I guess that's not what I need.
Btw, I'm using React (in case you know of any built-in functions). Much as possible, I wanna deviate from using libraries for this mechanism.
Thanks a heap!
"handover".indexOf("hand")
It will return index if it exists otherwise -1
To ignore cases you can define all your blacklisted words in lower case and then use this
"HANDOVER".toLowerCase().indexOf("hand")
To detect if a string has another string inside of it you can simply use the .includes method, it does not work on a word by word basis but checks for a sequence of characters so it should meet you requirements. It returns a boolean value for if the string is inside the other string
var sentence = 'Stackoverflow';
console.log(sentence.includes("flow"));
You were on the right track with .includes()
console.log('handover'.includes('hand'));
Returns true
I am trying to write a function to calculate how likely two strings are to mean the same thing. In order to do this I am converting to lower case and removing special characters from the strings before I compare them. Currently I am removing the strings '.com' and 'the' using String.replace(substring, '') and special characters using String.replace(regex, '')
str = str.toLowerCase()
.replace('.com', '')
.replace('the', '')
.replace(/[&\/\\#,+()$~%.'":*?<>{}]/g, '');
Is there a better regex that I can use to remove the common patterns like '.com' and 'the' as well as the special characters? Or some other way to make this more efficient?
As my dataset grows I may find other common meaningless patterns that need to be removed before trying to match strings and would like to avoid the performance hit of chaining more replace functions.
Examples:
Fish & Chips? => fish chips
stackoverflow.com => stackoverflow
The Lord of the Rings => lord of rings
You can connect the replace calls to a single one with a rexexp like this:
str = str.toLowerCase().replace(/\.com|the|[&\/\\#,+()$~%.'":*?<>{}]/g, '');
The different strings to remove are inside parentheses () and separated by pipes |
This makes it easy enough to add more string to the regexp.
If you are storing the words to remove in an array, you can generate the regex using the RegExp constructor, e.g.:
var words = ["\\.com", "the"];
var rex = new RegExp(words.join("|") + "|[&\\/\\\\#,+()$~%.'\":*?<>{}]", "g");
Then reuse rex for each string:
str = str.toLowerCase().replace(rex, "");
Note the additional escaping required because instead of a regular expression literal, we're using a string, so the backslashes (in the words array and in the final bit) need to be escaped, as does the " (because I used " for the string quotes).
The problem with this question is that im sure you have a very concrete idea in your mind of what you want to do, but the solution you have arrived at (removing un-informative letters before making a is-identical comparison) may not be the best for the comparison you want to do.
I think perhaps a better idea would be to use a different method comparison and a different datastructure than a string. A very simple example would be to condense your strings to sets with set('string') and then compare set similarity/difference. Another method might be to create a Directed Acyclic Graph, or sub-string Trei. The main point is that it's probably ok to reduce the information from the original string and store/compare that - however don't underestimate the value of storing the original string, as it will help you down the road if you want to change the way you compare.
Finally, if your strings are really really really long, you might want to use a perceptual hash - which is like an MD5 hash except similar strings have similar hashes. However, you will most likely have to roll your own for short strings, and define what you think is important data, and what is superfluous.
I have the following regex for phone number validation
function validatePhonenumber(phoneNum) {
var regex = /^[1-9]{3}[-\s\.]{0,1}[0-9]{3}[-\s\.]{0,1}[0-9]{4}$/;
return regex.test(phoneNum);
}
However, I would liek to make sure it doesn;t pass for different separators such as in
111-222.3333
Any ideas how to make sure the separators are the same always?
Just make sure beforehand that there is at most one kind of separator, then pass the string through the regex as you were doing.
function validatePhonenumber(phoneNum) {
var separators = extractSeparators(phoneNum);
if(separators.length > 1) return false;
var regex = /^[1-9]{3}[-\s\.]{0,1}[0-9]{3}[-\s\.]{0,1}[0-9]{3}$/;
return regex.test(phoneNum);
}
function extractSeparators(str){
// Return an array with all the distinct chars
// that are present in the passed string
// and are not numeric (0-9)
}
You can use the following regex instead:
\d{3}([-\s\.])?\d{3}\1?\d{4}
Here is a working example:
http://regex101.com/r/nN9nT7/1
As result it will match the following result:
111-222-3333 --> ok
111.222.3333 --> ok
111 222 3333 --> ok
111-222.3333
111.222-3333
111-222 3333
111 222-3333
EDIT: after Alan Moore's suggestion:
Also matches 111-2223333. That's because you made the \1 optional,
which isn't necessary. One of JavaScript's stranger quirks is that a
backreference to a group that did not participate in the match,
succeeds anyway. So if there's no first separator, ([-\s.])? succeeds
because the ? made it optional, and \1 succeeds because it's
JavaScript. But I would have used ([-\s.]?) to capture the first
separator (which might be nothing), and \1 to match the same thing
again. This works in any flavor, including JavaScript.
We can improve the regex to:
^\d{3}([-\s\.]?)\d{3}\1\d{4}$
You'll need at least two passes to keep this maintainable and extensible.
JS' RegEx doesn't allow for creating variables for use later in the RegEx, if you want to support older browsers.
If you are only supporting modern browsers, Fede's answer is just fine...
As such, with ghetto-support, you aren't going to be able to reliably check that one separator is the same value every time, without writing a really, really, really, stupidly-long RegEx, using | to basically write out the RegEx 3 times.
A better way might be to grab all of the separators, and use a reduction or a filter to check that they all have the same value.
var userEnteredNumber = "999.231 3055";
var validNumber = numRegEx.test(userEnteredNumber);
var separators = userEnteredNumber.replace(/\d+/g, "").split("");
var firstSeparator = separators[0];
var uniformSeparators = separators.every(function (separator) { return separator === firstSeparator; });
if (!uniformSeparators) { /* also not valid */ }
You could make that a little neater, using closures and some applied functions, but that's the idea.
Alternatively, here's the big, ugly RegEx that would allow you to test exactly what the user entered.
var separatorTest = /^([0-9]{3}\.[0-9]{3}\.[0-9]{3,4})|([0-9]{3}-[0-9]{3}-[0-9]{3,4})|([0-9]{3} [0-9]{3} [0-9]{3,4})|([0-9]{9,10})$/;
Notice I had to include the exact same number-test three times, wrap each one in parens (to be treated as a single group), and then separate each group with an | to check each group, like an if, else if, else... ...and then plug in a separate special case for having no separator at all...
...not pretty.
I'm also not using \d, just because it's easy to forget that - and . are both accepted "digit"s, when trying to maintain one of these abominations.
Now, a word or two of warning:
People are liable to enter all kinds of crap; if this is for a commercial site, it's likely better to just strip separators entirely and validate the number is the right size, and conforms to some specifics (eg: doesn't start with /^555555/).
If not given any instruction about number format, people will happily use either no separator or a formal number, like (555) 555-5555 (or +1 (555) 555-5555 for the really pedantic), which is obviously going to fail hard, in this system (see point #1).
Be prepared to trim what you get, before validating.
Depending on your country/region/etc laws about data-security and consumer-vs-transaction record-keeping (again, may or may not be more important in a commercial setting), it's likely better to store both a "user-given" ugly number, and a system-usable number, which you either clean on the back-end, or submit along with the user-entered text.
From a user-interaction perspective, either forcing the number to conform, explicitly (placeholders showing them xxx-xxx-xxxx right above the input, in bold), or accepting any text, and prepping it yourself, is going to be 1000x better than accepting certain forms, but not bothering to tell the user up-front, and instead telling them what they did was wrong, after they try.
It's not cool for relationships; it's equally not cool, here.
You've got 9-digit and 10-digit numbers, so if you're trying for an international solution, be prepared to deal with all international separators (, \.\-\(\)\+) etc... again, why stripping is more useful, because THAT RegEx would be insane.
I am working a serviceNow business rule and want to compare two strings and capture the substrings that are missing from the string for example...
var str1 = "subStr1,subStr2,subStr3,subStr4"
var str2 = "subStr1,subStr3"
magicFunction(str1,str2);
and the magic function would return "subStr2,subStr4"
I'd probably have better luck turning the strings into arrays and comparing them that way which if there is some method that would be recommended I can do that, but I have to push a , separated string back to the form field for it to work right, something with how sys_id's behave seems to demand it.
Basically I have a field on a form that holds a list of sys_ids, I need if one of those sys_ids is removed from the list I can capture the sys_id and make some change on the record belonging to it
If you're not against using libraries, underscore has an easy way to do this with arrays. See http://underscorejs.org/#difference
function magicFunction(str1, str2) {
return _.difference(str1.split(","),str2.split(",")).join(",");
}
The ArrayUtil Script Include in ServiceNow has a "diff" function, once you use split(",") on your Strings to create two Arrays.
e.g.,
var myDiffArray = new ArrayUtil().diff(myArray1, myArray2);
Assuming you're list has commas separating them, you can use split(",") and join(",") to turn them in to arrays/back into comma delimited lists, and then you can find the differences pretty easily using this method of finding array differences.