Understanding regex in Javascript .replace()

Understanding regex in Javascript .replace() - javascript

I shouldn't say I actually have a "problem" (the code seems to... work? Although one time it through an error in the console possibly due to environmental reasons), but I'm picking apart a piece of code and I see this:
key = key.replace(/[\[]/,"\\\[").replace(/[\]]/,"\\\]");
"key" is passed into the function containing this line. As you might expect, it's a string that's ultimately used as the needle in a haystack.
It's sanitizing something, but I can't figure out what it's sanitizing (primarily because I don't have any fluency in regex I guess). JSLint is barking about something also (Unexpected ']') but I think it's a false positive because it's not parsing the regex.
Wasn't sure to ask this at Stack Overflow or at Code Review, but it's not really "review" so here it is.
Any insight from you regexy type people would be much appreciated.

If I got it right it replaces [ with \[ and ] with \], so basically an escaping of square brackets.

These all do the same thing (globally)
var key = 'a[b] [c] [][]d'.replace(/[\[]/g,"\\\[").replace(/[\]]/g,"\\\]");
print(key);
var key = 'a[b] [c] [][]d'.replace(/[\[]/g,'\\[').replace(/[\]]/g,'\\]');
print(key);
var key = 'a[b] [c] [][]d'.replace(/([\[\]])/g,"\\$1");
print(key);

Related

JavaScript ignores extra spaces?

In this JavaScript quiz at WsCube Tech there was a question whether JavaScript ignores extra spaces. The correct answer was “False”.
Isn’t JavaScript white-space independent? I have read in many blogs that it is. So why is my answer wrong?

I seriously wouldn’t trust that site…
That’s the short, sufficient answer I could give, but I’d like to say two other things:
Firstly, JavaScript doesn’t ignore spaces within strings:
var str = "Hello World";
This string has 16 spaces and they won’t get ignored just like that. However, in-between some operators, keywords and tokens, JavaScript does ignore spaces:
var test = [ 0 , 1 , 3 ] . slice ( 2 ) ;
This line is parsed as
var test=[0,1,3].slice(2);
Still, the space between var and test isn’t ignored. Not all spaces are equal. This quiz question cannot be answered in its current form — well, or two forms…
Secondly, that quiz has a lot of inconsistencies, false information, outdated information and promotes bad practice. I’ve just sent them a huge list of things wrong with the quiz…
It’s much safer to stick to a more “trusted” site like the Mozilla Developer Network.

For one thing, you can terminate expressions by a new line.
var x = 1 // no semicolon
console.info(x)
Also look at this (which returns undefined):
return
12

why this regexp returns match?

http://jsfiddle.net/sqee98xr/
var reg = /^(?!managed).+\.coffee$/
var match = '20150212214712-test-managed.coffee'.match(reg)
console.log(match) // prints '20150212214712-test-managed.coffee'
I want to match regexp only if there is not word "managed" present in a string - how I can do that?

Negative lookaheads are weird. You have to match more than just the word you are looking for. It's weird, I know.
var reg = /^(?!.*managed).+\.coffee$/
http://jsfiddle.net/sqee98xr/3/
EDIT: It seems I really got under some people's skin with the "weird" descriptor and lay description. It's weird because on a surface level the term "negative lookahead" implies "look ahead and make sure the stuff in these parenthesis isn't up there, then come back and continue matching". As a lover of regex, I still proclaim this naming is weird, especially to first time users of the assertion. To me it's easier to think of it as a "not" operator as opposed to something which actually crawls forward and "looks ahead". In order to get behavior to resemble an actual "look ahead", you have to match everything before the search term, hence the .*.
An even easier solution would have been to remove the start-of-string (^) assertion. Again, to me it's easier to read ?! as "not".
var reg = /(?!managed).+\.coffee$/

While #RyanWheale's solution is correct, the explanation isn't correct. The reason essentially is that a string that contains the word "managed" (such as "test-managed" ) can count as not "managed". To get an idea of this first lets look at the regular expression:
/^(?!managed).+\.coffee$/
// (Not "managed")(one or more characters)(".")("coffee")
So first we cannot have a string with the text "managed", then we can have one or more characters, then a dot, followed by the text "coffee". Here is an example that fulfills this.
"Hello.coffee" [ PASS ]
Makes sense, "Hello" certainly is not "managed". Here is another example that works from your string:
"20150212214712-test-managed.coffee" [ PASS ]
Why? Because "20150212214712-test-managed" is not the string "managed" even though it contains the string, the computer does not know that's what you mean. It thinks that "20150212214712-test-managed" as a string that isn't "managed" in the same way "andflaksfj" isn't "managed". So the only way it fails is if "managed" was at the start of the string:
"managed.coffee" [ FAIL ]
This isn't just because the text "managed" is there. Say the computer said that "managed." was not "managed". It would indeed pass the (?!managed) part but the rest of the string would just be coffee and it would fail because there is no ".".
Finally the solution to this is as suggested by the other answer:
/^(?!.*managed).+\.coffee$/
Now the string "20150212214712-test-managed.coffee" fails because no matter how it's looked at: "test-managed", "-managed", "st-managed", etc. Would still count as (?!.*managed) and fail. As in the example above this one it could try adding a sub-string from ".coffee", but as explained this would cause the string to fail in the rest of the regexp ( .+\.coffee$ ).
Hopefully this long explanation explained that Negative look-aheads are not weird, just takes your request very literally.

regex replace on JSON is removing an Object from Array

I'm trying to improve my understanding of Regex, but this one has me quite mystified.
I started with some text defined as:
var txt = "{\"columns\":[{\"text\":\"A\",\"value\":80},{\"text\":\"B\",\"renderer\":\"gbpFormat\",\"value\":80},{\"text\":\"C\",\"value\":80}]}";
and do a replace as follows:
txt.replace(/\"renderer\"\:(.*)(?:,)/g,"\"renderer\"\:gbpFormat\,");
which results in:
"{"columns":[{"text":"A","value":80},{"text":"B","renderer":gbpFormat,"value":80}]}"
What I expected was for the renderer attribute value to have it's quotes removed; which has happened, but also the C column is completely missing! I'd really love for someone to explain how my Regex has removed column C?
As an extra bonus, if you could explain how to remove the quotes around any value for renderer (i.e. so I don't have to hard-code the value gbpFormat in the regex) that'd be fantastic.

You are using a greedy operator while you need a lazy one. Change this:
"renderer":(.*)(?:,)
^---- add here the '?' to make it lazy
To
"renderer":(.*?)(?:,)
Working demo
Your code should be:
txt.replace(/\"renderer\"\:(.*?)(?:,)/g,"\"renderer\"\:gbpFormat\,");
If you are learning regex, take a look at this documentation to know more about greedyness. A nice extract to understand this is:
Watch Out for The Greediness!
Suppose you want to use a regex to match an HTML tag. You know that
the input will be a valid HTML file, so the regular expression does
not need to exclude any invalid use of sharp brackets. If it sits
between sharp brackets, it is an HTML tag.
Most people new to regular expressions will attempt to use <.+>. They
will be surprised when they test it on a string like This is a
first test. You might expect the regex to match and when
continuing after that match, .
But it does not. The regex will match first. Obviously not
what we wanted. The reason is that the plus is greedy. That is, the
plus causes the regex engine to repeat the preceding token as often as
possible. Only if that causes the entire regex to fail, will the regex
engine backtrack. That is, it will go back to the plus, make it give
up the last iteration, and proceed with the remainder of the regex.
Like the plus, the star and the repetition using curly braces are
greedy.

Try like this:
txt = txt.replace(/"renderer":"(.*?)"/g,'"renderer":$1');
The issue in the expression you were using was this part:
(.*)(?:,)
By default, the * quantifier is greedy by default, which means that it gobbles up as much as it can, so it will run up to the last comma in your string. The easiest solution would be to turn that in to a non-greedy quantifier, by adding a question mark after the asterisk and change that part of your expression to look like this
(.*?)(?:,)
For the solution I proposed at the top of this answer, I also removed the part matching the comma, because I think it's easier just to match everything between quotes. As for your bonus question, to replace the matched value instead of having to hardcode gbpFormat, I used a backreference ($1), which will insert the first matched group into the replacement string.

Don't manipulate JSON with regexp. It's too likely that you will break it, as you have found, and more importantly there's no need to.
In addition, once you have changed
'{"columns": [..."renderer": "gbpFormat", ...]}'
into
'{"columns": [..."renderer": gbpFormat, ...]}' // remove quotes from gbpFormat
then this is no longer valid JSON. (JSON requires that property values be numbers, quoted strings, objects, or arrays.) So you will not be able to parse it, or send it anywhere and have it interpreted correctly.
Therefore you should parse it to start with, then manipulate the resulting actual JS object:
var object = JSON.parse(txt);
object.columns.forEach(function(column) {
column.renderer = ghpFormat;
});
If you want to replace any quoted value of the renderer property with the value itself, then you could try
column.renderer = window[column.renderer];
Assuming that the value is available in the global namespace.
This question falls into the category of "I need a regexp, or I wrote one and it's not working, and I'm not really sure why it has to be a regexp, but I heard they can do all kinds of things, so that's just what I imagined I must need." People use regexps to try to do far too many complex matching, splitting, scanning, replacement, and validation tasks, including on complex languages such as HTML, or in this case JSON. There is almost always a better way.
The only time I can imagine wanting to manipulate JSON with regexps is if the JSON is broken somehow, perhaps due to a bug in server code, and it needs to be fixed up in order to be parseable.

Regex For Parsing Object Literal

Im trying to parse out a JavaScript object literal from a script block on a page. Here's the example I have of the data:
//End Update for CASE00370003 2011/08/22
stores[0] = {
'fullName' : 'Bobs Commons',
'street1' : '23 Chestnut Commons Dr'
};
//Some more comments
stores[1] = {
'fullName' : 'Gove Wood',
'street1' : '65 Lake Rd'
};
So far I've come up with:
/^(stores\[)(\d){1,2}(])(.|\n)*};$/m
However, the string ends with "};" will grab the last occurrence of the }; on stores[1], so each won't be broken individually. Thanks!

Seems that what you need is 'non-greedy' matches (*? instead of *) - it captures the shortest match rather than the longest.
So the regex will look like:
/^(stores\[)(\d){1,2}(])(.|\n)*?};$/m
After suggesting that I still have to note that you in general should not do that with regular expressions at all :). Your approach will break if the object string properties contain '}' (e.g., 'fullName' : 'Bobs Comm}ons').

Additionally to the #amakhrov answer above:
you may perform consecutive searches of closing curly brace unless JSON.parse(literal_string); executes without exceptions
UPD: missed the important fact, that for that it needs to be a valid JSON

PREPARE TO HAVE YOUR MIND BLOWN.
J.S.O.N. (the thing several answers have mentioned)
--->
J ava-S cript O bject N otation.
Cool, right? Since JS' declaration syntax is so useful, it became a neutral data format. That's the format your web request is returning, and JavaScript even has special libraries for parsing it. I think you may have just been unclear on that part.

Grabbing the third fragment between square brackets

Still completely stuck with regex's and square brackets. Hopefully someone can help me out.
Say I have a string like this:
room_request[1][1][2011-08-21]
How would I grab the third fragment out of it?
I tried the following, but I'm not exactly sure what I'm doing so it's fairly hard to figure out where I'm going wrong.
.match(/\[(.*?)\]/);
But this returns the [1] fragment. (The first one, I guess).
So then, I asked here on SO and people told me to add a global flag:
.match(/\[(.*?)\]/g)[2];
In other cases that I've used this regex, this worked fine. However, in this case, I want the stuff INSIDE the square brackets. It returns:
[2011-08-21]
But I really want 2011-08-21.
How can I do this? Thanks a lot.
If anyone could recommend any decent resources about regular expressions, that'd be great aswell. I'm starting to understand the very basics but most of this stuff is far too confusing atm. Thanks.

Two possible methods. To grab the third bracketed expression:
.match(/\[.*?\]\[.*?\]\[(.*?)\]/);
Or, if you know that the expression you want is always at the end of the string:
.match(/\[(.*?)\]$/);

var str = "room_request[1][1][2011-08-21]"
var val = str.match(/\[[^\]]*\]\[[^\]]*\]\[([^\]]*)\]/);
alert(val[1]);

This is a little less messy I think:
var r = "room_request[1][1][2011-08-21]";
var match = r.match(/(?:\[([^\]]+)\]){3}/);
console.log(match[1]);
Basically, it picks out the third match of the square brackets containing something. You get the match result back with two matches - the whole [1][1][2011-08-21] (for whatever reason) and the matched date: 2011-08-21
My regex is a little rusty, but this certainly works.

Develop Reference

JavaScript is the programming language of the Web.