replace regex captures with values from an array (javascript) - javascript

In my current task, I'm animating the coordinates of SVG paths, so I'm trying to programmatically alter their values depending on an offset that the user changes. The coordinates are in ugly attributes whose values looks something like this:
M0,383c63,0,89.3,-14.1,117.4,-65.9...
For an offset of 100, the new value might need to look something like:
M0,483c63,0,89.3,-14.1,217.4,-65.9... // without the bold (it's just there to highlight diff)
I pass in different regular expressions for different paths, which look something like, say, /long(N)and...N...ugly(N)regex/, but each path has a different amount of captures (I can pass in that quantity too). Using the answers here and a loop, I figured out how to programmatically make an array of newly changed values (regardless of the number of captures), giving me something like ['483','217.4',...].
My question is, how do I programmatically replace/inject the altered numbers back into the string (as in the line above with the bolded numbers)?
My workaround for the moment (and a fiddle):
I'm generating a "reverse" regex by switching parens around, which ends up looking like /(long)N(and...N...ugly)N(regex)/. Then I'm doing stuff by hand, e.g.:
if (previouslyCapturedCount == 3) {
d = d.replace(reverseRE, "$1" + dValuesAltered[0] + "$2" + dValuesAltered[1] + "$3" + dValuesAltered[2] + "$4");
} else if (previouslyCapturedCount == 4) {
// and so on ...
But there's gotta be a better way, no? (Maybe due to my lack of JS skillz, I simply wasn't able to extrapolate it from the answers in the above linked question.)
Caveat: There are several questions here on S.O. that seem to answer how to do multiple replacements on specific pairs, but they seem to assume global replacements, and I have no guarantee that the matched/captured numbers will be unique in the string.

Rather than making a huge complex pattern to find and replace everything you're looking for, I'd iterate each item and replace them one at a time. With a regex that looks like this:
In this regex the {1} is the number of comma delimited values you'd like to skip over. In this case we're skipping the first and replacing just the leading numbers in the second string.
^(,?[^,]*){1},([0-9]+)
Replace With the following, where _XX_ is the desired new value.
$1,_XX_

Related

Efficiently remove common patterns from a string

I am trying to write a function to calculate how likely two strings are to mean the same thing. In order to do this I am converting to lower case and removing special characters from the strings before I compare them. Currently I am removing the strings '.com' and 'the' using String.replace(substring, '') and special characters using String.replace(regex, '')
str = str.toLowerCase()
.replace('.com', '')
.replace('the', '')
.replace(/[&\/\\#,+()$~%.'":*?<>{}]/g, '');
Is there a better regex that I can use to remove the common patterns like '.com' and 'the' as well as the special characters? Or some other way to make this more efficient?
As my dataset grows I may find other common meaningless patterns that need to be removed before trying to match strings and would like to avoid the performance hit of chaining more replace functions.
Examples:
Fish & Chips? => fish chips
stackoverflow.com => stackoverflow
The Lord of the Rings => lord of rings
You can connect the replace calls to a single one with a rexexp like this:
str = str.toLowerCase().replace(/\.com|the|[&\/\\#,+()$~%.'":*?<>{}]/g, '');
The different strings to remove are inside parentheses () and separated by pipes |
This makes it easy enough to add more string to the regexp.
If you are storing the words to remove in an array, you can generate the regex using the RegExp constructor, e.g.:
var words = ["\\.com", "the"];
var rex = new RegExp(words.join("|") + "|[&\\/\\\\#,+()$~%.'\":*?<>{}]", "g");
Then reuse rex for each string:
str = str.toLowerCase().replace(rex, "");
Note the additional escaping required because instead of a regular expression literal, we're using a string, so the backslashes (in the words array and in the final bit) need to be escaped, as does the " (because I used " for the string quotes).
The problem with this question is that im sure you have a very concrete idea in your mind of what you want to do, but the solution you have arrived at (removing un-informative letters before making a is-identical comparison) may not be the best for the comparison you want to do.
I think perhaps a better idea would be to use a different method comparison and a different datastructure than a string. A very simple example would be to condense your strings to sets with set('string') and then compare set similarity/difference. Another method might be to create a Directed Acyclic Graph, or sub-string Trei. The main point is that it's probably ok to reduce the information from the original string and store/compare that - however don't underestimate the value of storing the original string, as it will help you down the road if you want to change the way you compare.
Finally, if your strings are really really really long, you might want to use a perceptual hash - which is like an MD5 hash except similar strings have similar hashes. However, you will most likely have to roll your own for short strings, and define what you think is important data, and what is superfluous.

regex replace on JSON is removing an Object from Array

I'm trying to improve my understanding of Regex, but this one has me quite mystified.
I started with some text defined as:
var txt = "{\"columns\":[{\"text\":\"A\",\"value\":80},{\"text\":\"B\",\"renderer\":\"gbpFormat\",\"value\":80},{\"text\":\"C\",\"value\":80}]}";
and do a replace as follows:
txt.replace(/\"renderer\"\:(.*)(?:,)/g,"\"renderer\"\:gbpFormat\,");
which results in:
"{"columns":[{"text":"A","value":80},{"text":"B","renderer":gbpFormat,"value":80}]}"
What I expected was for the renderer attribute value to have it's quotes removed; which has happened, but also the C column is completely missing! I'd really love for someone to explain how my Regex has removed column C?
As an extra bonus, if you could explain how to remove the quotes around any value for renderer (i.e. so I don't have to hard-code the value gbpFormat in the regex) that'd be fantastic.
You are using a greedy operator while you need a lazy one. Change this:
"renderer":(.*)(?:,)
^---- add here the '?' to make it lazy
To
"renderer":(.*?)(?:,)
Working demo
Your code should be:
txt.replace(/\"renderer\"\:(.*?)(?:,)/g,"\"renderer\"\:gbpFormat\,");
If you are learning regex, take a look at this documentation to know more about greedyness. A nice extract to understand this is:
Watch Out for The Greediness!
Suppose you want to use a regex to match an HTML tag. You know that
the input will be a valid HTML file, so the regular expression does
not need to exclude any invalid use of sharp brackets. If it sits
between sharp brackets, it is an HTML tag.
Most people new to regular expressions will attempt to use <.+>. They
will be surprised when they test it on a string like This is a
first test. You might expect the regex to match and when
continuing after that match, .
But it does not. The regex will match first. Obviously not
what we wanted. The reason is that the plus is greedy. That is, the
plus causes the regex engine to repeat the preceding token as often as
possible. Only if that causes the entire regex to fail, will the regex
engine backtrack. That is, it will go back to the plus, make it give
up the last iteration, and proceed with the remainder of the regex.
Like the plus, the star and the repetition using curly braces are
greedy.
Try like this:
txt = txt.replace(/"renderer":"(.*?)"/g,'"renderer":$1');
The issue in the expression you were using was this part:
(.*)(?:,)
By default, the * quantifier is greedy by default, which means that it gobbles up as much as it can, so it will run up to the last comma in your string. The easiest solution would be to turn that in to a non-greedy quantifier, by adding a question mark after the asterisk and change that part of your expression to look like this
(.*?)(?:,)
For the solution I proposed at the top of this answer, I also removed the part matching the comma, because I think it's easier just to match everything between quotes. As for your bonus question, to replace the matched value instead of having to hardcode gbpFormat, I used a backreference ($1), which will insert the first matched group into the replacement string.
Don't manipulate JSON with regexp. It's too likely that you will break it, as you have found, and more importantly there's no need to.
In addition, once you have changed
'{"columns": [..."renderer": "gbpFormat", ...]}'
into
'{"columns": [..."renderer": gbpFormat, ...]}' // remove quotes from gbpFormat
then this is no longer valid JSON. (JSON requires that property values be numbers, quoted strings, objects, or arrays.) So you will not be able to parse it, or send it anywhere and have it interpreted correctly.
Therefore you should parse it to start with, then manipulate the resulting actual JS object:
var object = JSON.parse(txt);
object.columns.forEach(function(column) {
column.renderer = ghpFormat;
});
If you want to replace any quoted value of the renderer property with the value itself, then you could try
column.renderer = window[column.renderer];
Assuming that the value is available in the global namespace.
This question falls into the category of "I need a regexp, or I wrote one and it's not working, and I'm not really sure why it has to be a regexp, but I heard they can do all kinds of things, so that's just what I imagined I must need." People use regexps to try to do far too many complex matching, splitting, scanning, replacement, and validation tasks, including on complex languages such as HTML, or in this case JSON. There is almost always a better way.
The only time I can imagine wanting to manipulate JSON with regexps is if the JSON is broken somehow, perhaps due to a bug in server code, and it needs to be fixed up in order to be parseable.

Comparing Strings to find Missing Substrings

I am working a serviceNow business rule and want to compare two strings and capture the substrings that are missing from the string for example...
var str1 = "subStr1,subStr2,subStr3,subStr4"
var str2 = "subStr1,subStr3"
magicFunction(str1,str2);
and the magic function would return "subStr2,subStr4"
I'd probably have better luck turning the strings into arrays and comparing them that way which if there is some method that would be recommended I can do that, but I have to push a , separated string back to the form field for it to work right, something with how sys_id's behave seems to demand it.
Basically I have a field on a form that holds a list of sys_ids, I need if one of those sys_ids is removed from the list I can capture the sys_id and make some change on the record belonging to it
If you're not against using libraries, underscore has an easy way to do this with arrays. See http://underscorejs.org/#difference
function magicFunction(str1, str2) {
return _.difference(str1.split(","),str2.split(",")).join(",");
}
The ArrayUtil Script Include in ServiceNow has a "diff" function, once you use split(",") on your Strings to create two Arrays.
e.g.,
var myDiffArray = new ArrayUtil().diff(myArray1, myArray2);
Assuming you're list has commas separating them, you can use split(",") and join(",") to turn them in to arrays/back into comma delimited lists, and then you can find the differences pretty easily using this method of finding array differences.

JavaScript + RegEx Complications- Searching Strings Not Containing SubString

I am trying to use a RegEx to search through a long string, and I am having trouble coming up with an expression. I am trying to search through some HTML for a set of tags beginning with a tag containing a certain value and ending with a different tag containing another value. The code I am currently using to attempt this is as follows:
matcher = new RegExp(".*(<[^>]+" + startText + "((?!" + endText + ").)*" + endText + ")", 'g');
data.replace(matcher, "$1");
The strangeness around the middle ( ((\\?\\!endText).)* ) is borrowed from another thread, found here, that seems to describe my problem. The issue I am facing is that the expression matches the beginning tag, but it does not find the ending tag and instead includes the remainder of the data. Also, the lookaround in the middle slowed the expression down a lot. Any suggestions as to how I can get this working?
EDIT: I understand that parsing HTML in RegEx isn't the best option (makes me feel dirty), but I'm in a time-crunch and any other alternative I can think of will take too long. It's hard to say what exactly the markup I will be parsing will look like, as I am creating it on the fly. The best I can do is to say that I am looking at a large table of data that is collected for a range of items on a range of dates. Both of these ranges can vary, and I am trying to select a certain range of dates from a single row. The approximate value of startText and endText are \\#\\#ASSET_ID\\#\\#_<YYYY_MM_DD>. The idea is to find the code that corresponds to this range of cells. (This edit could quite possibly have made this even more confusing, but I'm not sure how much more information I could really give without explaining the entire application).
EDIT: Well, this was a stupid question. Apparently, I just forgot to add .* after the last paren. Can't believe I spent so long on this! Thanks to those of you that tried to help!
First of all, why is there a .* Dot Asterisk in the beginning? If you have text like the following:
This is my Text
And you want "my Text" pulled out, you do my\sText. You don't have to do the .*.
That being said, since all you'll be matching now is what you need, you don't need the main Capture Group around "Everything". This: .*(xxx) is a huge no-no, and can almost always be replaced with this: xxx. In other words, your regex could be replaced with:
<[^>]+xxx((?!zzz).)*zzz
From there I examine what it's doing.
You are looking for an HTML opening Delimeter <. You consume it.
You consume at least one character that is NOT a Closing HTML Delimeter, but can consume many. This is important, because if your tag is <table border=2>, then you have, at minimum, so far consumed <t, if not more.
You are now looking for a StartText. If that StartText is table, you'll never find it, because you have consumed the t. So replace that + with a *.
The regex is still success if the following is NOT the closing text, but starts from the VERY END of the document, because the Asterisk is being Greedy. I suggest making it lazy by adding a ?.
When the backtracking fails, it will look for the closing text and gather it successfully.
The result of that logic:
<[^>]*xxx((?!zzz).)*?zzz
If you're going to use a dot anyway, which is okay for new Regex writers, but not suggested for seasoned, I'd go with this:
<[^>]*xxx.*?zzz
So for Javascript, your code would say:
matcher = new RegExp("<[^>]*" + startText + ".*?" + endText, 'gi');
I put the IgnoreCase "i" in there for good measure, but you may or may not want that.

please extract a bit of info from this string (without regex so that i can understand it)

On my web app, I take a look at the current URL, and if the current URL is a form like this:
http://www.domain.com:11000/invite/abcde16989/root/index.html
-> All I need is to extract the ID which consists of 5 letters and 5 numbers (abcde16989) in another variable for further use.
So I need this:
var current_url = "the whole path, not just the hostname";
if (current_url has ID)
var ID = abcde16989;
You could always use split using / as the delimiter if the ID is always going to be in the same position, eg
var parts = current_url.split('/');
var id = parts[4];
Though your requirement of matching "5 letters and 5 numbers" really does suit a regex match.
var id = current_url.match(/[a-zA-Z]{5}[0-9]{5}/); // returns null if not found
I'm assuming you don't need the full URL, but just the pathname to get your ID. Use the following:
var current_url = window.location.pathname; //gets the pathname
var split_url = current_url.split('/'); //splits the path at each /
current_id = split_url[2]; //1st item in array is "invite", 2nd is your id, 3rd would be "root"
alert(current_id);
Firstly, this doesn't need JQuery; this is simple Javascript. I'll amend your tags after I've replied to reflect this.
A regex would actually be quite an easy way to achieve this, and I don't think a simple one like this would be as difficult to understand as you think.
So I'll answer with the regex option anyway and then move on to other options:
var url = "http://www.domain.com:11000/invite/abcde16989/root/index.html";
//first method:
var id = url.match('^http://www.domain.com:11000/invite/(.+)/root/index.html$')[1];/index.html$/)[1];
//second method: (if you don't know exact format of the rest of the URL but you do know the format of the ID string)
var id = url.match('/([a-z]{5}[0-9]{5})/')[1];
The first method will get the string in the position you specified within the URL. It won't check the formatting; it just looks at the rest of the URL and grabs the bit of it you're asking for. This should be really easy to understand: It's basically just your URL, but with (.+) where your ID goes.
The second method looks specifically for a string in the format you asked for -- ie five letters and then five numbers. This is admittedly a bit harder to read, but should be fairly self explanatory if you look at it given those criteria.
In both cases, the regex itself will return an array of results, with array element zero being the whole string (ie in the first case, including the rest of the URL). This is where the (brackets) come in (ie the bit where we said (.+)). This tells the match function to put the contents of the brackets into another array element so we can use it. In both cases, this means that we can read the ID in array element [1].
Okay, so how about the non-regex options:
In fact, it's going to be quite hard to do it in a simple way without regex in Javascript, since even the simple string splitting function uses a regex match to do the split (granted it would be a very simple one, it is still a regex). A couple of other people have already given you answers using this, but it is still a regex, so technically they've also not answered your question accurately.
I'm going to guess that actually one of these answers will be good enough for you (either mine or more likely one of the answers using split()), despite there still being a regex element. However if you really don't want anything to do with regex, you're going to have to start doing some slightly more complex string manipulation, probably using substring() (though there are other ways to do it).
Something along the lines of this:
var prefixstring="http://www.domain.com:11000/invite/";
var prefixlen=prefixstring.length;
var idlen=10;
var id = url.substring(prefixlen,idlen+prefixlen);
This gets the length of the portion of the URL in front of the ID, and then uses substring() to snip out the required bit. But I'm sure you'll agree that the regex options are simpler? ;-)
Hope that helps. (and I hope it helps you feel less afraid of regex!)

Categories

Resources