Comparing Strings to find Missing Substrings

Comparing Strings to find Missing Substrings - javascript

I am working a serviceNow business rule and want to compare two strings and capture the substrings that are missing from the string for example...
var str1 = "subStr1,subStr2,subStr3,subStr4"
var str2 = "subStr1,subStr3"
magicFunction(str1,str2);
and the magic function would return "subStr2,subStr4"
I'd probably have better luck turning the strings into arrays and comparing them that way which if there is some method that would be recommended I can do that, but I have to push a , separated string back to the form field for it to work right, something with how sys_id's behave seems to demand it.
Basically I have a field on a form that holds a list of sys_ids, I need if one of those sys_ids is removed from the list I can capture the sys_id and make some change on the record belonging to it

If you're not against using libraries, underscore has an easy way to do this with arrays. See http://underscorejs.org/#difference
function magicFunction(str1, str2) {
return _.difference(str1.split(","),str2.split(",")).join(",");
}

The ArrayUtil Script Include in ServiceNow has a "diff" function, once you use split(",") on your Strings to create two Arrays.
e.g.,
var myDiffArray = new ArrayUtil().diff(myArray1, myArray2);

Assuming you're list has commas separating them, you can use split(",") and join(",") to turn them in to arrays/back into comma delimited lists, and then you can find the differences pretty easily using this method of finding array differences.

Related

JS: Check if word "handover" contains "hand"

I'm working on this simple, straightforward text content filtering mechanism on our post commenting module where people are prohibited from writing foul, expletive words.
So far I'm able to compare (word-by-word, using .include()) comment contents against the blacklisted words we have in the database. But to save space, time and effort in entering database entries for each word such as 'Fucking' and 'Fuck', I want to create a mechanism where we check if a word contains a blacklisted word.
This way, we just enter 'Fuck' in the database. And when visitor's comment contains 'Fucking' or 'Motherfucker', the function will automatically detect that there is a word in the comment that contain's 'fuck' in it and then perform necessary actions.
I've been thinking of integrating .substring() but I guess that's not what I need.
Btw, I'm using React (in case you know of any built-in functions). Much as possible, I wanna deviate from using libraries for this mechanism.
Thanks a heap!

"handover".indexOf("hand")
It will return index if it exists otherwise -1
To ignore cases you can define all your blacklisted words in lower case and then use this
"HANDOVER".toLowerCase().indexOf("hand")

To detect if a string has another string inside of it you can simply use the .includes method, it does not work on a word by word basis but checks for a sequence of characters so it should meet you requirements. It returns a boolean value for if the string is inside the other string
var sentence = 'Stackoverflow';
console.log(sentence.includes("flow"));

You were on the right track with .includes()
console.log('handover'.includes('hand'));
Returns true

replace regex captures with values from an array (javascript)

In my current task, I'm animating the coordinates of SVG paths, so I'm trying to programmatically alter their values depending on an offset that the user changes. The coordinates are in ugly attributes whose values looks something like this:
M0,383c63,0,89.3,-14.1,117.4,-65.9...
For an offset of 100, the new value might need to look something like:
M0,483c63,0,89.3,-14.1,217.4,-65.9... // without the bold (it's just there to highlight diff)
I pass in different regular expressions for different paths, which look something like, say, /long(N)and...N...ugly(N)regex/, but each path has a different amount of captures (I can pass in that quantity too). Using the answers here and a loop, I figured out how to programmatically make an array of newly changed values (regardless of the number of captures), giving me something like ['483','217.4',...].
My question is, how do I programmatically replace/inject the altered numbers back into the string (as in the line above with the bolded numbers)?
My workaround for the moment (and a fiddle):
I'm generating a "reverse" regex by switching parens around, which ends up looking like /(long)N(and...N...ugly)N(regex)/. Then I'm doing stuff by hand, e.g.:
if (previouslyCapturedCount == 3) {
d = d.replace(reverseRE, "$1" + dValuesAltered[0] + "$2" + dValuesAltered[1] + "$3" + dValuesAltered[2] + "$4");
} else if (previouslyCapturedCount == 4) {
// and so on ...
But there's gotta be a better way, no? (Maybe due to my lack of JS skillz, I simply wasn't able to extrapolate it from the answers in the above linked question.)
Caveat: There are several questions here on S.O. that seem to answer how to do multiple replacements on specific pairs, but they seem to assume global replacements, and I have no guarantee that the matched/captured numbers will be unique in the string.

Rather than making a huge complex pattern to find and replace everything you're looking for, I'd iterate each item and replace them one at a time. With a regex that looks like this:
In this regex the {1} is the number of comma delimited values you'd like to skip over. In this case we're skipping the first and replacing just the leading numbers in the second string.
^(,?[^,]*){1},([0-9]+)
Replace With the following, where _XX_ is the desired new value.
$1,_XX_

Efficiently remove common patterns from a string

I am trying to write a function to calculate how likely two strings are to mean the same thing. In order to do this I am converting to lower case and removing special characters from the strings before I compare them. Currently I am removing the strings '.com' and 'the' using String.replace(substring, '') and special characters using String.replace(regex, '')
str = str.toLowerCase()
.replace('.com', '')
.replace('the', '')
.replace(/[&\/\\#,+()$~%.'":*?<>{}]/g, '');
Is there a better regex that I can use to remove the common patterns like '.com' and 'the' as well as the special characters? Or some other way to make this more efficient?
As my dataset grows I may find other common meaningless patterns that need to be removed before trying to match strings and would like to avoid the performance hit of chaining more replace functions.
Examples:
Fish & Chips? => fish chips
stackoverflow.com => stackoverflow
The Lord of the Rings => lord of rings

You can connect the replace calls to a single one with a rexexp like this:
str = str.toLowerCase().replace(/\.com|the|[&\/\\#,+()$~%.'":*?<>{}]/g, '');
The different strings to remove are inside parentheses () and separated by pipes |
This makes it easy enough to add more string to the regexp.
If you are storing the words to remove in an array, you can generate the regex using the RegExp constructor, e.g.:
var words = ["\\.com", "the"];
var rex = new RegExp(words.join("|") + "|[&\\/\\\\#,+()$~%.'\":*?<>{}]", "g");
Then reuse rex for each string:
str = str.toLowerCase().replace(rex, "");
Note the additional escaping required because instead of a regular expression literal, we're using a string, so the backslashes (in the words array and in the final bit) need to be escaped, as does the " (because I used " for the string quotes).

The problem with this question is that im sure you have a very concrete idea in your mind of what you want to do, but the solution you have arrived at (removing un-informative letters before making a is-identical comparison) may not be the best for the comparison you want to do.
I think perhaps a better idea would be to use a different method comparison and a different datastructure than a string. A very simple example would be to condense your strings to sets with set('string') and then compare set similarity/difference. Another method might be to create a Directed Acyclic Graph, or sub-string Trei. The main point is that it's probably ok to reduce the information from the original string and store/compare that - however don't underestimate the value of storing the original string, as it will help you down the road if you want to change the way you compare.
Finally, if your strings are really really really long, you might want to use a perceptual hash - which is like an MD5 hash except similar strings have similar hashes. However, you will most likely have to roll your own for short strings, and define what you think is important data, and what is superfluous.

RegEx to validate a comma separated list of options

I'm using PHP's Filter Functions (FILTER_VALIDATE_REGEXP specifically) to validate the input data. I have a list of options and the $input variable can specify a number of options from the list.
The options are (case-insensitive):
all
rewards
join
promotions
stream
checkin
verified_checkin
The $input variable can have almost any combination of the values. The possible success cases are:
all (value can either be all or a comma separated list of other values but not both)
rewards,stream,join (a comma separated list of values excluding all)
join (a single value)
The Regular Expression I've been able to come up with is:
/^(?:all|(?:checkin|verified_checkin|rewards|join|promotions|stream)?(?:,(?:checkin|verified_checkin|rewards|join|promotion|stream))*)$/
So far, it works for the following example scenarios:
all (passes)
rewards,join,promotion,checkin,verified_checkin (passes)
join (passes)
However, it lets a value with a leading comma and duplicates through:
,promotion,checkin,verified_checkin (starts with a comma but also passes when it shouldn't)
Also, checking for duplicates would be a bonus, but not necessarily required.
rewards,join,promotion,checkin,join,verified_checkin (duplicate value but still passes but not as critical as a leading comma)
I've been at it for a couple of days now and having tried various implementations, this is the closest I've been able to get.
Any ideas on how to handle the leading comma false positive?
UPDATE: Edited the question to better explain that duplicate filtering isn't really a requirement, just a bonus.

Sometimes regular expressions just make things more complicated than they should be. Regular expressions are really good at matching patterns, but when you introduce external rules that have dependencies on the number of matched patterns things get complicated fast.
In this case I would just split the list on comma and check the resulting strings against the rules you just described.
$valid_choices = array('checkin','join','promotions','rewards','stream','verified_checkin');
$input_string; // string to match
$tokens = explode(',' $input_string);
$tokens = asort($tokens); // sort to tokens to make it easy to find duplicates
if($tokens[0] == 'all' && count($tokens) > 1)
return FALSE; // fail (all + other options)
if(!in_array($tokens[0], $valid_choices))
return FALSE; // fail (invalid first choice)
for($i = 1; $i < count($tokens); $i++)
{
if($tokens[$i] == $tokens[$i-1])
return FALSE; // fail (duplicates)
if(!in_array($tokens[$i], $valid_choices))
return FALSE; // fail (choice not valid)
}
EDIT
Since you edited your and specified that duplicates would be acceptable but you definitely want a regex-based solution then this one should do:
^(all|((checkin|verified_checkin|rewards|join|promotions|stream)(,(checkin|verified_checkin|rewards|join|promotion|stream))*))$
It will not fail on duplicates but it will take care or leading or trailing commas, or all + other choices combination.
Filtering out duplicates with a regex would be pretty difficult but maybe not impossible (if you use a look-ahead with a capture group placeholder)
SECOND EDIT
Although you mentioned that detecting duplicate entries is not critical I figured I'd try my hand at crafting a pattern that would also check for duplicate entries.
As you can see below, it's not very elegant, nor is it easily scalable but it does get the job done with the finite list of options you have using negative look-ahead.
^(all|(checkin|verified_checkin|rewards|join|promotions|stream)(,(?!\2)(checkin|verified_checkin|rewards|join|promotions|stream))?(,(?!\2)(?!\4)(checkin|verified_checkin|rewards|join|promotions|stream))?(,(?!\2)(?!\4)(?!\6)(checkin|verified_checkin|rewards|join|promotions|stream))?(,(?!\2)(?!\4)(?!\6)(?!\8)(checkin|verified_checkin|rewards|join|promotions|stream))?(,(?!\2)(?!\4)(?!\6)(?!\8)(?!\10)(checkin|verified_checkin|rewards|join|promotions|stream))?)$
Since the final regex is so long, I'm going to break it up into parts for the sake of making it easier to follow the general idea:
^(all|
(checkin|verified_checkin|rewards|join|promotions|stream)
(,(?!\2)(checkin|verified_checkin|rewards|join|promotions|stream))?
(,(?!\2)(?!\4)(checkin|verified_checkin|rewards|join|promotions|stream))?
(,(?!\2)(?!\4)(?!\6)(checkin|verified_checkin|rewards|join|promotions|stream))?
(,(?!\2)(?!\4)(?!\6)(?!\8)(checkin|verified_checkin|rewards|join|promotions|stream))?
(,(?!\2)(?!\4)(?!\6)(?!\8)(?!\10)(checkin|verified_checkin|rewards|join|promotions|stream))?
)$/
You can see that the mechanism to form the pattern is somewhat iterative and such a pattern could be generated automatically by an algorithm if you wanted to provide a different list but the resulting pattern would get rather large, rather quickly.

Javascript Regex Match between pound signs

Quick. I have a string: #user 9#I'm alive! and I want to be about to pull out "user 9".
So far Im doing:
if(variable.match(/\#/g)){
console.log(variable):
}
But the output is still the full line.

Use .split(), in order to pull out your desired item.
var variable = variable.split("#");
console.log(variable[1]);
.split() turns your string into an array, with the first variable as a separator.
Of course, if you just want regex alone, you could do:
console.log(variable.match(/([^#]+)/g));
This will again give you an array of the items, but a smaller one as it doesn't use the empty value before the hash as an item. Further, as stated by #Stephen P, you will need to use a capture group(()) to capture the items you want.

Try something more along these lines...
var thetext="#user 9#I'm alive!";
thetext.match(/\#([^#]+)\#/g);
You want to introduce a capturing group (the part in the parentheses) to collect the text in between the pound signs.
You may want to use .exec() instead of .match() depending on what you're doing.
Also see this answer to the SO question "How do you access the matched groups in a javascript regex?"

Develop Reference

JavaScript is the programming language of the Web.

Comparing Strings to find Missing Substrings - javascript

If you're not against using libraries, underscore has an easy way to do this with arrays. See http://underscorejs.org/#difference function magicFunction(str1, str2) { return _.difference(str1.split(","),str2.split(",")).join(","); }

The ArrayUtil Script Include in ServiceNow has a "diff" function, once you use split(",") on your Strings to create two Arrays. e.g., var myDiffArray = new ArrayUtil().diff(myArray1, myArray2);

Assuming you're list has commas separating them, you can use split(",") and join(",") to turn them in to arrays/back into comma delimited lists, and then you can find the differences pretty easily using this method of finding array differences.

Related

JS: Check if word "handover" contains "hand"

replace regex captures with values from an array (javascript)

Efficiently remove common patterns from a string

RegEx to validate a comma separated list of options

Javascript Regex Match between pound signs

Categories

Resources