Grabbing the third fragment between square brackets

Grabbing the third fragment between square brackets - javascript

Still completely stuck with regex's and square brackets. Hopefully someone can help me out.
Say I have a string like this:
room_request[1][1][2011-08-21]
How would I grab the third fragment out of it?
I tried the following, but I'm not exactly sure what I'm doing so it's fairly hard to figure out where I'm going wrong.
.match(/\[(.*?)\]/);
But this returns the [1] fragment. (The first one, I guess).
So then, I asked here on SO and people told me to add a global flag:
.match(/\[(.*?)\]/g)[2];
In other cases that I've used this regex, this worked fine. However, in this case, I want the stuff INSIDE the square brackets. It returns:
[2011-08-21]
But I really want 2011-08-21.
How can I do this? Thanks a lot.
If anyone could recommend any decent resources about regular expressions, that'd be great aswell. I'm starting to understand the very basics but most of this stuff is far too confusing atm. Thanks.

Two possible methods. To grab the third bracketed expression:
.match(/\[.*?\]\[.*?\]\[(.*?)\]/);
Or, if you know that the expression you want is always at the end of the string:
.match(/\[(.*?)\]$/);

var str = "room_request[1][1][2011-08-21]"
var val = str.match(/\[[^\]]*\]\[[^\]]*\]\[([^\]]*)\]/);
alert(val[1]);

This is a little less messy I think:
var r = "room_request[1][1][2011-08-21]";
var match = r.match(/(?:\[([^\]]+)\]){3}/);
console.log(match[1]);
Basically, it picks out the third match of the square brackets containing something. You get the match result back with two matches - the whole [1][1][2011-08-21] (for whatever reason) and the matched date: 2011-08-21
My regex is a little rusty, but this certainly works.

Related

What regular expression would I use to grab a certain part of this link?

https://www.twitch.tv/averagepothead/clip/TiredRoughElkSquadGoals
I would like to use a regular expression to specifically grab everything after /clip/, aka the five random words that denotes the clip "id". I've been looking up other examples on here, but unfortunately when I write my own expressions based on that I don't get it exactly right... if anyone would be able to point me in the right direction that would be amazing. Thank you!

Regex? Arguably wrong tool for the job
const [dontcare, words] = url.split('clip/');
To show what I mean, here's a quick-and-dirty regex version:
const match = url.match(/[a-zA-Z0-9\/\.:]+clip\/(\w+)/);
const words = match && match[1];
That regex is pretty gnarly for such a basic task. You could make it shorter:
/.*clip\/(\w+)/
at the cost of making it even slower than it already is. Regexes are great for stuff that can't be represented simply as a quick string operation, but are more trouble than they're worth for something like this.

Match a word unless it is preceded by an equals sign?

I have the following string
class=use><em>use</em>
that when searched using us I want to transform into
class=use><em><b>us</b>e</em>
I've tried looking at relating answers but I can't quite get it working the way I want it to. I'm especially interested in this answer's callback approach.
Help appreciated

This is a good exercise for writing regular expressions, and here's a possible solution.
"useclass=use><em>use</em>".replace(/([^=]|^)(us)/g, "$1<b>$2</b>");
// returns "<b>us</b>eclass=use><em><b>us</b>e</em>"
([^=]|^) ensures that the prefix of any matched us is either not an equal sign, or it's the start of the string.
As #jamiec pointed out in the comments, if you are using this to parse/modify HTML, just stop right now. It's mathematically impossible to parse a CFG with a regular grammar (even with enhanced JS regexps you will have a bad time trying to achieve that.)

If you can make any assumptions about the structure of your document, you may be better off using an approach that operates on DOM elements directly rather than parsing the whole document with a regex.
Parsing HTML with a regex has certain problems that can be painful to deal with.
var element = document.querySelector('em');
element.innerHTML = element.innerHTML.replace('us', '<b>us</b>');
<div class=use><em>use</em>
</div>

I would first look for any character other than the equals sign [^=] and separate it by parentheses so that I can use it again in my replacement. Then another set of parentheses around the two characters us ought to do it:
var re = /([^=]|^)(us)/
That will give you two capture groups to work with (inside the parentheses), which you can represent with $1 and $2 in your replacement string.
str.replace( /([^=|^])(us)/, '$1<b>$2</b>' );

JavaScript RegEx match unless wrapped with [nocode][/nocode] tags

My current code is:
var user_pattern = this.settings.tag;
user_pattern = user_pattern.replace(/[\-\[\]\/\{\}\(\)\*\+\?\.\\\^\$\|]/g, "\\$&"); // escape regex
var pattern = new RegExp(user_pattern.replace(/%USERNAME%/i, "(\\S+)"), "ig");
Where this.settings.tag is a string such as "[user=%USERNAME%]" or "#%USERNAME%". The code uses pattern.exec(str) to find any username in the corresponding tag and works perfectly fine. For example, if str = "Hello, [user=test]" then pattern.exec(str) will find test.
This works fine, but I want to be able to stop it from matching if the string is wrapped in [nocode][/nocode] tags. For example, if str = "[nocode]Hello, [user=test], how are you?[/nocode]" thenpattern.exec(str)` should not match anything.
I'm not quite sure where to start. I tried using a (?![nocode]) before and after the pattern, but to no avail. Any help would be great.

I would just test if the string starts with [nocode] first:
/^\[nocode\]/.test('[nocode]');
Then simply do not process it.

Maybe filter out [nocode] before trying to find the username(s)?
pattern.exec(str.replace(/\[nocode\](.*)\[\/nocode\]/g,''));
I know this isn't exactly what you asked for because now you have to use two separate regular expressions, however code readability is important too and doing it this way is definitely better in that aspect. Hope this helps 😉
JSFiddle: http://jsfiddle.net/1f485Lda/1/
It's based on this: Regular Expression to get a string between two strings in Javascript

why this regexp returns match?

http://jsfiddle.net/sqee98xr/
var reg = /^(?!managed).+\.coffee$/
var match = '20150212214712-test-managed.coffee'.match(reg)
console.log(match) // prints '20150212214712-test-managed.coffee'
I want to match regexp only if there is not word "managed" present in a string - how I can do that?

Negative lookaheads are weird. You have to match more than just the word you are looking for. It's weird, I know.
var reg = /^(?!.*managed).+\.coffee$/
http://jsfiddle.net/sqee98xr/3/
EDIT: It seems I really got under some people's skin with the "weird" descriptor and lay description. It's weird because on a surface level the term "negative lookahead" implies "look ahead and make sure the stuff in these parenthesis isn't up there, then come back and continue matching". As a lover of regex, I still proclaim this naming is weird, especially to first time users of the assertion. To me it's easier to think of it as a "not" operator as opposed to something which actually crawls forward and "looks ahead". In order to get behavior to resemble an actual "look ahead", you have to match everything before the search term, hence the .*.
An even easier solution would have been to remove the start-of-string (^) assertion. Again, to me it's easier to read ?! as "not".
var reg = /(?!managed).+\.coffee$/

While #RyanWheale's solution is correct, the explanation isn't correct. The reason essentially is that a string that contains the word "managed" (such as "test-managed" ) can count as not "managed". To get an idea of this first lets look at the regular expression:
/^(?!managed).+\.coffee$/
// (Not "managed")(one or more characters)(".")("coffee")
So first we cannot have a string with the text "managed", then we can have one or more characters, then a dot, followed by the text "coffee". Here is an example that fulfills this.
"Hello.coffee" [ PASS ]
Makes sense, "Hello" certainly is not "managed". Here is another example that works from your string:
"20150212214712-test-managed.coffee" [ PASS ]
Why? Because "20150212214712-test-managed" is not the string "managed" even though it contains the string, the computer does not know that's what you mean. It thinks that "20150212214712-test-managed" as a string that isn't "managed" in the same way "andflaksfj" isn't "managed". So the only way it fails is if "managed" was at the start of the string:
"managed.coffee" [ FAIL ]
This isn't just because the text "managed" is there. Say the computer said that "managed." was not "managed". It would indeed pass the (?!managed) part but the rest of the string would just be coffee and it would fail because there is no ".".
Finally the solution to this is as suggested by the other answer:
/^(?!.*managed).+\.coffee$/
Now the string "20150212214712-test-managed.coffee" fails because no matter how it's looked at: "test-managed", "-managed", "st-managed", etc. Would still count as (?!.*managed) and fail. As in the example above this one it could try adding a sub-string from ".coffee", but as explained this would cause the string to fail in the rest of the regexp ( .+\.coffee$ ).
Hopefully this long explanation explained that Negative look-aheads are not weird, just takes your request very literally.

regex replace on JSON is removing an Object from Array

I'm trying to improve my understanding of Regex, but this one has me quite mystified.
I started with some text defined as:
var txt = "{\"columns\":[{\"text\":\"A\",\"value\":80},{\"text\":\"B\",\"renderer\":\"gbpFormat\",\"value\":80},{\"text\":\"C\",\"value\":80}]}";
and do a replace as follows:
txt.replace(/\"renderer\"\:(.*)(?:,)/g,"\"renderer\"\:gbpFormat\,");
which results in:
"{"columns":[{"text":"A","value":80},{"text":"B","renderer":gbpFormat,"value":80}]}"
What I expected was for the renderer attribute value to have it's quotes removed; which has happened, but also the C column is completely missing! I'd really love for someone to explain how my Regex has removed column C?
As an extra bonus, if you could explain how to remove the quotes around any value for renderer (i.e. so I don't have to hard-code the value gbpFormat in the regex) that'd be fantastic.

You are using a greedy operator while you need a lazy one. Change this:
"renderer":(.*)(?:,)
^---- add here the '?' to make it lazy
To
"renderer":(.*?)(?:,)
Working demo
Your code should be:
txt.replace(/\"renderer\"\:(.*?)(?:,)/g,"\"renderer\"\:gbpFormat\,");
If you are learning regex, take a look at this documentation to know more about greedyness. A nice extract to understand this is:
Watch Out for The Greediness!
Suppose you want to use a regex to match an HTML tag. You know that
the input will be a valid HTML file, so the regular expression does
not need to exclude any invalid use of sharp brackets. If it sits
between sharp brackets, it is an HTML tag.
Most people new to regular expressions will attempt to use <.+>. They
will be surprised when they test it on a string like This is a
first test. You might expect the regex to match and when
continuing after that match, .
But it does not. The regex will match first. Obviously not
what we wanted. The reason is that the plus is greedy. That is, the
plus causes the regex engine to repeat the preceding token as often as
possible. Only if that causes the entire regex to fail, will the regex
engine backtrack. That is, it will go back to the plus, make it give
up the last iteration, and proceed with the remainder of the regex.
Like the plus, the star and the repetition using curly braces are
greedy.

Try like this:
txt = txt.replace(/"renderer":"(.*?)"/g,'"renderer":$1');
The issue in the expression you were using was this part:
(.*)(?:,)
By default, the * quantifier is greedy by default, which means that it gobbles up as much as it can, so it will run up to the last comma in your string. The easiest solution would be to turn that in to a non-greedy quantifier, by adding a question mark after the asterisk and change that part of your expression to look like this
(.*?)(?:,)
For the solution I proposed at the top of this answer, I also removed the part matching the comma, because I think it's easier just to match everything between quotes. As for your bonus question, to replace the matched value instead of having to hardcode gbpFormat, I used a backreference ($1), which will insert the first matched group into the replacement string.

Don't manipulate JSON with regexp. It's too likely that you will break it, as you have found, and more importantly there's no need to.
In addition, once you have changed
'{"columns": [..."renderer": "gbpFormat", ...]}'
into
'{"columns": [..."renderer": gbpFormat, ...]}' // remove quotes from gbpFormat
then this is no longer valid JSON. (JSON requires that property values be numbers, quoted strings, objects, or arrays.) So you will not be able to parse it, or send it anywhere and have it interpreted correctly.
Therefore you should parse it to start with, then manipulate the resulting actual JS object:
var object = JSON.parse(txt);
object.columns.forEach(function(column) {
column.renderer = ghpFormat;
});
If you want to replace any quoted value of the renderer property with the value itself, then you could try
column.renderer = window[column.renderer];
Assuming that the value is available in the global namespace.
This question falls into the category of "I need a regexp, or I wrote one and it's not working, and I'm not really sure why it has to be a regexp, but I heard they can do all kinds of things, so that's just what I imagined I must need." People use regexps to try to do far too many complex matching, splitting, scanning, replacement, and validation tasks, including on complex languages such as HTML, or in this case JSON. There is almost always a better way.
The only time I can imagine wanting to manipulate JSON with regexps is if the JSON is broken somehow, perhaps due to a bug in server code, and it needs to be fixed up in order to be parseable.

Develop Reference

JavaScript is the programming language of the Web.