Getting this regex expression to work in javascript - javascript

I have an html checkbox element with the following name:
type_config[selected_licenses][CC BY-NC-ND 3.0]
I would like to break this name apart as follows and returned as part of an array:
["type_config", "[selected_licenses]", "[CC BY-NC-ND 3.0]", "[selected_licenses][CC BY-NC-ND 3.0]"]
I thought I could do this by using a regular expression in javascript. Here is the expression that I am using:
matches = /([a-zA-Z0-9_]*)((\[[a-zA-Z0-9_\.\s]*\])+)*/.exec(element_name);
but this is the result I am getting in my matches variable:
["type_config[selected_licenses]", "type_config", "[selected_licenses]", "[selected_licenses]", index: 0, input: "type_config[selected_licenses][CC BY-ND 3.0]"]
I am half way there. What am I doing wrong in my regular expression? I guess I should also ask if it is possible to accomplish what I want with a regex?
Thanks.

The problem with this kind of goal is that there's no simple way to achieve this with regular expression, i.e. a simple match call. In short, even if you put a quantifier after a capturing group, the captured string will always be just one.
You'll have to rely on something more specific, like breaking the string with a repeated use of indexOf, or something like
name.split(/(?=\[)/);
Maybe you want to be sure that name is formally correct.

This is a very ugly problem. I don't know how repeatable this is, but I can do it:
Regex
^(\w+)(?<firstbracket>\[(?<secondbracket>[^]]*)\]\[(.*?)\])$
Replacement
["$1", "[$3]", "[$4]", "$2"]
Demo
http://regex101.com/r/eD9mH8

Related

Trouble defining a proper RegEx in Javascript

I need to create a regular expression that matches a process number that has the following pattern #######-##.####.7.09.0009 where # means numbers from 0 to 9. Here is what I came up with after some research:
var regex = new RegExp("^[0-9]{7}[\-][0-9]{2}[\.][0-9]{4}[\.7\.09\.0009]$");
I also tried:
/^[0-9]{7}\-[0-9]{2}\.[0-9]{4}\.7\.09\.0009$/
/^[0-9]{7}\\-[0-9]{2}\\.[0-9]{4}\\.7\\.09\\.0009$/
Try this:
const pattern = /\d{7}\-\d{2}\.\d{4}\.7\.09\.0009/
Regexper is a great tool that I use whenever I'm writing a regular expression, I find it really helps to visualize what the expression is actually doing. Check it out.
For reference, here is the original pattern that you posted -- it looks like the main problem is that you are defining character classes in several places using [ and ] where you really don't need them at all.

Match a word unless it is preceded by an equals sign?

I have the following string
class=use><em>use</em>
that when searched using us I want to transform into
class=use><em><b>us</b>e</em>
I've tried looking at relating answers but I can't quite get it working the way I want it to. I'm especially interested in this answer's callback approach.
Help appreciated
This is a good exercise for writing regular expressions, and here's a possible solution.
"useclass=use><em>use</em>".replace(/([^=]|^)(us)/g, "$1<b>$2</b>");
// returns "<b>us</b>eclass=use><em><b>us</b>e</em>"
([^=]|^) ensures that the prefix of any matched us is either not an equal sign, or it's the start of the string.
As #jamiec pointed out in the comments, if you are using this to parse/modify HTML, just stop right now. It's mathematically impossible to parse a CFG with a regular grammar (even with enhanced JS regexps you will have a bad time trying to achieve that.)
If you can make any assumptions about the structure of your document, you may be better off using an approach that operates on DOM elements directly rather than parsing the whole document with a regex.
Parsing HTML with a regex has certain problems that can be painful to deal with.
var element = document.querySelector('em');
element.innerHTML = element.innerHTML.replace('us', '<b>us</b>');
<div class=use><em>use</em>
</div>
I would first look for any character other than the equals sign [^=] and separate it by parentheses so that I can use it again in my replacement. Then another set of parentheses around the two characters us ought to do it:
var re = /([^=]|^)(us)/
That will give you two capture groups to work with (inside the parentheses), which you can represent with $1 and $2 in your replacement string.
str.replace( /([^=|^])(us)/, '$1<b>$2</b>' );

Regular Expression matching extra unwanted content

I'm trying to get a parameter stored in a html comment using regex. However when I execute the expression it return the widest string possible and not all the possible matches.
So I have some content that might include this string:
<!--url:/new--><!--title:My Title-->
I use the following simply expression to get the url I need:
/<!--url:(.*)-->/
The issue I have is that the result match part of the title which is of course valid but not what I was looking for
["<!--url:/new--><!--title:My Title-->", "/new--><!--title:My Title"]
There is workarounds I can use like making sure there is a line break after each parameter line but I prefer to have a solid regex and also of course understand what I missing out.
PS: Please comment if you come up with a better title.
Make the regex non-greedy:
/<!--url:(.*?)-->/
You can test this regex by clicking here:
Regex101

regex replace on JSON is removing an Object from Array

I'm trying to improve my understanding of Regex, but this one has me quite mystified.
I started with some text defined as:
var txt = "{\"columns\":[{\"text\":\"A\",\"value\":80},{\"text\":\"B\",\"renderer\":\"gbpFormat\",\"value\":80},{\"text\":\"C\",\"value\":80}]}";
and do a replace as follows:
txt.replace(/\"renderer\"\:(.*)(?:,)/g,"\"renderer\"\:gbpFormat\,");
which results in:
"{"columns":[{"text":"A","value":80},{"text":"B","renderer":gbpFormat,"value":80}]}"
What I expected was for the renderer attribute value to have it's quotes removed; which has happened, but also the C column is completely missing! I'd really love for someone to explain how my Regex has removed column C?
As an extra bonus, if you could explain how to remove the quotes around any value for renderer (i.e. so I don't have to hard-code the value gbpFormat in the regex) that'd be fantastic.
You are using a greedy operator while you need a lazy one. Change this:
"renderer":(.*)(?:,)
^---- add here the '?' to make it lazy
To
"renderer":(.*?)(?:,)
Working demo
Your code should be:
txt.replace(/\"renderer\"\:(.*?)(?:,)/g,"\"renderer\"\:gbpFormat\,");
If you are learning regex, take a look at this documentation to know more about greedyness. A nice extract to understand this is:
Watch Out for The Greediness!
Suppose you want to use a regex to match an HTML tag. You know that
the input will be a valid HTML file, so the regular expression does
not need to exclude any invalid use of sharp brackets. If it sits
between sharp brackets, it is an HTML tag.
Most people new to regular expressions will attempt to use <.+>. They
will be surprised when they test it on a string like This is a
first test. You might expect the regex to match and when
continuing after that match, .
But it does not. The regex will match first. Obviously not
what we wanted. The reason is that the plus is greedy. That is, the
plus causes the regex engine to repeat the preceding token as often as
possible. Only if that causes the entire regex to fail, will the regex
engine backtrack. That is, it will go back to the plus, make it give
up the last iteration, and proceed with the remainder of the regex.
Like the plus, the star and the repetition using curly braces are
greedy.
Try like this:
txt = txt.replace(/"renderer":"(.*?)"/g,'"renderer":$1');
The issue in the expression you were using was this part:
(.*)(?:,)
By default, the * quantifier is greedy by default, which means that it gobbles up as much as it can, so it will run up to the last comma in your string. The easiest solution would be to turn that in to a non-greedy quantifier, by adding a question mark after the asterisk and change that part of your expression to look like this
(.*?)(?:,)
For the solution I proposed at the top of this answer, I also removed the part matching the comma, because I think it's easier just to match everything between quotes. As for your bonus question, to replace the matched value instead of having to hardcode gbpFormat, I used a backreference ($1), which will insert the first matched group into the replacement string.
Don't manipulate JSON with regexp. It's too likely that you will break it, as you have found, and more importantly there's no need to.
In addition, once you have changed
'{"columns": [..."renderer": "gbpFormat", ...]}'
into
'{"columns": [..."renderer": gbpFormat, ...]}' // remove quotes from gbpFormat
then this is no longer valid JSON. (JSON requires that property values be numbers, quoted strings, objects, or arrays.) So you will not be able to parse it, or send it anywhere and have it interpreted correctly.
Therefore you should parse it to start with, then manipulate the resulting actual JS object:
var object = JSON.parse(txt);
object.columns.forEach(function(column) {
column.renderer = ghpFormat;
});
If you want to replace any quoted value of the renderer property with the value itself, then you could try
column.renderer = window[column.renderer];
Assuming that the value is available in the global namespace.
This question falls into the category of "I need a regexp, or I wrote one and it's not working, and I'm not really sure why it has to be a regexp, but I heard they can do all kinds of things, so that's just what I imagined I must need." People use regexps to try to do far too many complex matching, splitting, scanning, replacement, and validation tasks, including on complex languages such as HTML, or in this case JSON. There is almost always a better way.
The only time I can imagine wanting to manipulate JSON with regexps is if the JSON is broken somehow, perhaps due to a bug in server code, and it needs to be fixed up in order to be parseable.

Preparing a regular expression for javascript

I have made this regular expression which does exactly what I want when I test it in e.g. RegExr:
^https?:\/\/(www\.)?(test\.yahoo\.com|sub\.yahoo\.com)?(?!([a-z0-9]+\.)?(localhost|yahoo\.com))(.*)?
However when I test it in javascript it says that the expression is invalid. After hours of debugging I found out that this expression works in javascript:
^https?:\/\/(www\.)?(test\.yahoo\.com|sub\.yahoo\.com)?(?![a-z0-9]+\.)?(localhost|yahoo\.com)(.*)?
However this doesn't do what I want (again testing in RegExr).
Why cannot I use the first expression in javascript? And how do I fix it?
UPDATE JULY 25
Sorry for the lack of info. The way I am using the Regexp is through a jQuery extension which lets me select using regexp. The script can be seen here: http://james.padolsey.com/javascript/regex-selector-for-jquery/
The specific code I am trying to get to work is:
$('a:regex(href, ^https?:\/\/(www\.)?(test\.yahoo\.com|sub\.yahoo\.com)?(?!([a-z0-9]+\.)?(localhost|yahoo\.com))(.*)?)').live('click', function(e) {
After including the linked jQuery plugin. The text strings I am testing are:
http://yahoo.com
http://google.dk
http://subdomain.yahoo.com
http://test.yahoo.com
http://localhost.dk
http://sub.yahoo.com/lalala
Where it is supposed to match "http://google.dk", "http://test.yahoo.com" and "http://sub.yahoo.com/lalala" - which it does when using RegExr but failing (invalid expression) using the jQuery plugin.
The first regular expression is not invalid:
var regexp = /^https?:\/\/(www\.)?(test\.yahoo\.com|sub\.yahoo\.com)?(?!([a-z0-9]+\.)?(localhost|yahoo\.com))(.*)?/;
works fine.
If you want to instantiate the expression from a string, you have to double all the backslashes:
var regexp = new RegExp("^https?:\\/\\/(www\\.)?(test\\.yahoo\\.com|sub\\.yahoo\\.com)?(?!([a-z0-9]+\\.)?(localhost|yahoo\\.com))(.*)?");
When you start from a string, you have to account for the fact that the string constant itself uses backslashes as a quoting mechanism, so there will be two evaluations made: one as a string, and one as a regular expression.
edit — OK I think I see the problem. That plugin you're trying to use is simply attempting to do something that's just not going to work, given the way that Sizzle parses selectors. In other words, the problem is not with your regular expression, it's with the overall selector. It is not even getting far enough to parse the regular expression.
Specifically it seems to be nested parentheses inside the regular expression. Something as simple as
$('a:regex(href, ((abc)))')
causes an error. You can instead do something like this:
$('a').filter(function() {
return /^https?:\/\/(www\.)?(test\.yahoo\.com|sub\.yahoo\.com)?(?!([a-z0-9]+\.)?(localhost|yahoo\.com))(.*)?/.test(this.href);
}).whatever( ... );

Categories

Resources