Regular Expression matching extra unwanted content

Regular Expression matching extra unwanted content - javascript

I'm trying to get a parameter stored in a html comment using regex. However when I execute the expression it return the widest string possible and not all the possible matches.
So I have some content that might include this string:
<!--url:/new--><!--title:My Title-->
I use the following simply expression to get the url I need:
/<!--url:(.*)-->/
The issue I have is that the result match part of the title which is of course valid but not what I was looking for
["<!--url:/new--><!--title:My Title-->", "/new--><!--title:My Title"]
There is workarounds I can use like making sure there is a line break after each parameter line but I prefer to have a solid regex and also of course understand what I missing out.
PS: Please comment if you come up with a better title.

Make the regex non-greedy:
/<!--url:(.*?)-->/
You can test this regex by clicking here:
Regex101

Related

How do I replace string within quotes in javascript?

I have this in a javascript/jQuery string (This string is grabbed from an html ($('#shortcode')) elements value which could be changed if user clicks some buttons)
[csvtohtml_create include_rows="1-10"
debug_mode="no" source_type="visualizer_plugin" path="map"
source_files="bundeslander_staple.csv" include cols="1,2,4" exclude cols="3"]
In a textbox (named incl_sc) I have the value:
include cols="2,4"
I want to replace include_cols="1,2,4" from the above string with the value from the textbox.
so basically:
How do I replace include_cols values here? (include_cols="2,4" instead of include_cols="1,2,4") I'm great at many things but regex is not one of them. I guess regex is the thing to use here?
I'm trying this:
var s = $('#shortcode').html();
//I want to replace include cols="1,2,4" exclude cols="3"
//with include_cols="1,2" exclude_cols="3" for example
s.replace('/([include="])[^]*?\1/g', incl_sc.val() );
but I don't get any replacement at all (the string s is same string as $("#shortcode").html(). Obviously I'm doing something really dumb. Please help :-)

In short what you will need is
s.replace(/include cols="[^"]+"/g, incl_sc.val());
There were a couple problems with your code,
To use a regex with String.prototype.replace, you must pass a regex as the first argument, but you were actually passing a string.
This is a regex literal /regex/ while this isn't '/actually a string/'
In the text you supplied in your question include_cols is written as include cols (with a space)
And your regex was formed wrong. I recomend testing them in this website, where you can also learn more if you want.
The code above will replace the part include cols="1,2,3" by whatever is in the textarea, regardless of whats between the quotes (as long it doesn't contain another quote).

First of all I think you need to remove the quotes and fix a little bit the regex.
const r = /(include_cols=\")(.*)(\")/g;
s.replace(r, `$1${incl_sc.val()}$3`)
Basically, I group the first and last part in order to include them at the end of the replacement. You can also avoid create the first and last group and put it literally in the last argument of the replace function, like this:
const r = /include_cols=\"(.*)\"/g;
s.replace(r, `include_cols="${incl_sc.val()}"`)

regex replace on JSON is removing an Object from Array

I'm trying to improve my understanding of Regex, but this one has me quite mystified.
I started with some text defined as:
var txt = "{\"columns\":[{\"text\":\"A\",\"value\":80},{\"text\":\"B\",\"renderer\":\"gbpFormat\",\"value\":80},{\"text\":\"C\",\"value\":80}]}";
and do a replace as follows:
txt.replace(/\"renderer\"\:(.*)(?:,)/g,"\"renderer\"\:gbpFormat\,");
which results in:
"{"columns":[{"text":"A","value":80},{"text":"B","renderer":gbpFormat,"value":80}]}"
What I expected was for the renderer attribute value to have it's quotes removed; which has happened, but also the C column is completely missing! I'd really love for someone to explain how my Regex has removed column C?
As an extra bonus, if you could explain how to remove the quotes around any value for renderer (i.e. so I don't have to hard-code the value gbpFormat in the regex) that'd be fantastic.

You are using a greedy operator while you need a lazy one. Change this:
"renderer":(.*)(?:,)
^---- add here the '?' to make it lazy
To
"renderer":(.*?)(?:,)
Working demo
Your code should be:
txt.replace(/\"renderer\"\:(.*?)(?:,)/g,"\"renderer\"\:gbpFormat\,");
If you are learning regex, take a look at this documentation to know more about greedyness. A nice extract to understand this is:
Watch Out for The Greediness!
Suppose you want to use a regex to match an HTML tag. You know that
the input will be a valid HTML file, so the regular expression does
not need to exclude any invalid use of sharp brackets. If it sits
between sharp brackets, it is an HTML tag.
Most people new to regular expressions will attempt to use <.+>. They
will be surprised when they test it on a string like This is a
first test. You might expect the regex to match and when
continuing after that match, .
But it does not. The regex will match first. Obviously not
what we wanted. The reason is that the plus is greedy. That is, the
plus causes the regex engine to repeat the preceding token as often as
possible. Only if that causes the entire regex to fail, will the regex
engine backtrack. That is, it will go back to the plus, make it give
up the last iteration, and proceed with the remainder of the regex.
Like the plus, the star and the repetition using curly braces are
greedy.

Try like this:
txt = txt.replace(/"renderer":"(.*?)"/g,'"renderer":$1');
The issue in the expression you were using was this part:
(.*)(?:,)
By default, the * quantifier is greedy by default, which means that it gobbles up as much as it can, so it will run up to the last comma in your string. The easiest solution would be to turn that in to a non-greedy quantifier, by adding a question mark after the asterisk and change that part of your expression to look like this
(.*?)(?:,)
For the solution I proposed at the top of this answer, I also removed the part matching the comma, because I think it's easier just to match everything between quotes. As for your bonus question, to replace the matched value instead of having to hardcode gbpFormat, I used a backreference ($1), which will insert the first matched group into the replacement string.

Don't manipulate JSON with regexp. It's too likely that you will break it, as you have found, and more importantly there's no need to.
In addition, once you have changed
'{"columns": [..."renderer": "gbpFormat", ...]}'
into
'{"columns": [..."renderer": gbpFormat, ...]}' // remove quotes from gbpFormat
then this is no longer valid JSON. (JSON requires that property values be numbers, quoted strings, objects, or arrays.) So you will not be able to parse it, or send it anywhere and have it interpreted correctly.
Therefore you should parse it to start with, then manipulate the resulting actual JS object:
var object = JSON.parse(txt);
object.columns.forEach(function(column) {
column.renderer = ghpFormat;
});
If you want to replace any quoted value of the renderer property with the value itself, then you could try
column.renderer = window[column.renderer];
Assuming that the value is available in the global namespace.
This question falls into the category of "I need a regexp, or I wrote one and it's not working, and I'm not really sure why it has to be a regexp, but I heard they can do all kinds of things, so that's just what I imagined I must need." People use regexps to try to do far too many complex matching, splitting, scanning, replacement, and validation tasks, including on complex languages such as HTML, or in this case JSON. There is almost always a better way.
The only time I can imagine wanting to manipulate JSON with regexps is if the JSON is broken somehow, perhaps due to a bug in server code, and it needs to be fixed up in order to be parseable.

Javascript Regex, Removing unclosed tags

I'm looking for javascript regex solution to remove unclosed tags for example:
<div></div><span>
As you can see i want to remove the <span> element, I know it's a bad idea to use regex on markup but it's required for my project, This is the regex pattern i made but it didn't work:
/<([a-z]+?)>([\s\S]*?)(?!<\/\1>)/g
I'm using javascript replace to replace all matches with "", What i try with my pattern is to match only unclosed tags, About the pattern:
[a-z] i know html tags can contain =,",etc, I'm looking for simple pattern that i can play and edit so i started with [a-z]
I used !? to reject matches for closing tags.
I know my pattern isn't working, If anyone have an idea i will be very thankful.
Edit:
I'm aware that there may be recursion, If this is the case i want to remove all the recursion tree, I only want to keep 1 level of html for example:
<div><span></span></div><p></p>
So if the next tag after the <div> is not </div> remove it.

First of all, lets see what OP said:
I know it's a bad idea to use regex on markup but it's required for my project.
I only want to keep 1 level of html
This can be achieved.
You were on the right track. However you shouldn't have used !? to reject matches for closing tags. You want to accept them. This way the match will not accept unclosed tags which is our goal after all.
Now, your regex will look like this.
/<([a-z]+?)>([\s\S]*?)(<\/\1>)/g
We can remove the second and third brackets as they are not necessary:
/<([a-z]+?)>[\s\S]*?<\/\1>/g
If we test this regex on the provided code will will get the following:
"<div><span></span></div><p></p>".match(/<([a-z]+?)>[\s\S]*?<\/\1>/g)
["<div><span></span></div>", "<p></p>"]
It seems that our regex matches TOO MUCH symbols. We must break the match at the "<" symbol as it denotes new tag. The [^<] means "any character but "<".
"<div><span></span></div><p></p>".match(/<([a-z]+?)>[^<]*?<\/\1>/g)
["<span></span>", "<p></p>"]
Finally we can just join the matched results.
"<div><span></span></div><p></p>".match(/<([a-z]+?)>[^<]*?<\/\1>/g).join("")
"<span></span><p></p>"
Wohoooo. I will leave the first part of regex to you as it was not part of the question. I hope this was helpful. I am open for further questions.

Getting this regex expression to work in javascript

I have an html checkbox element with the following name:
type_config[selected_licenses][CC BY-NC-ND 3.0]
I would like to break this name apart as follows and returned as part of an array:
["type_config", "[selected_licenses]", "[CC BY-NC-ND 3.0]", "[selected_licenses][CC BY-NC-ND 3.0]"]
I thought I could do this by using a regular expression in javascript. Here is the expression that I am using:
matches = /([a-zA-Z0-9_]*)((\[[a-zA-Z0-9_\.\s]*\])+)*/.exec(element_name);
but this is the result I am getting in my matches variable:
["type_config[selected_licenses]", "type_config", "[selected_licenses]", "[selected_licenses]", index: 0, input: "type_config[selected_licenses][CC BY-ND 3.0]"]
I am half way there. What am I doing wrong in my regular expression? I guess I should also ask if it is possible to accomplish what I want with a regex?
Thanks.

The problem with this kind of goal is that there's no simple way to achieve this with regular expression, i.e. a simple match call. In short, even if you put a quantifier after a capturing group, the captured string will always be just one.
You'll have to rely on something more specific, like breaking the string with a repeated use of indexOf, or something like
name.split(/(?=\[)/);
Maybe you want to be sure that name is formally correct.

This is a very ugly problem. I don't know how repeatable this is, but I can do it:
Regex
^(\w+)(?<firstbracket>\[(?<secondbracket>[^]]*)\]\[(.*?)\])$
Replacement
["$1", "[$3]", "[$4]", "$2"]
Demo
http://regex101.com/r/eD9mH8

javascript regex replace some words with links, but not within existing links

Trying to replace certain words in HTML pages with the same word but as a URL linking to that resource.
For example, replace the word 'MySQL' with MySQL
Using the JS replace function with regex, and it's doing the replacing just fine.
BUT it's also replacing words that are already part of URLs... which is the problem.
For the MySQL example, it's replacing BOTH the "MySQL" text that's already linked, AND the URL leading to mysql.com, so breaking the already existing link.
Is there a way to update the inline regex (in the .replace call) to NOT do replacing in existing links, i.e. elements?
Here's the replace code:
var NewHTML = OriginalHTML
.replace(/\bJavaScript\b/gi, "$&")
.replace(/\bMySQL\b/gi, "$&")
;
Here's the full sample code (tried to paste it inline but wasn't looking right with the backticks):
http://pastie.org/private/v4l2s2c42aqduqlopurpw
Went through the JS regexp reference (here), and tried various other permutations in the regex matching, like the following, but all that does it make it not match ANY words on the page...
.replace(/\b(\<a\>*!\>)JavaScript\b/i,xxxxx
The following regex DOES prevent the match from happening wherever the word is literally touching a slash or a dash... but that's not the solution (and it does not fix the mysql example above):
.replace(/\b(?!\>)(?!\-)(?!\/)MySQL\b(?!\-)(?!\/)/gi, "$&")`
I've read through the related threads on stackoverflow and elsewhere, but can't seem to find this particular scenario, not in JavaScript anyway.
Any help would be greatly appreciated. :-)
Thanks!

You could change your regex to exclude keywords that precede the end anchor tag, </a>:
.replace(/\bMySQL\b(?![^<]*?<\/a>)/gi, "$&")
See jsfiddle for example.

A negative lookahead should be sufficient:
.replace(/\bMySQL(?!\.com)\b/gi, "$&")

Develop Reference

JavaScript is the programming language of the Web.

Regular Expression matching extra unwanted content - javascript

Make the regex non-greedy: // You can test this regex by clicking here: Regex101

Related

How do I replace string within quotes in javascript?

regex replace on JSON is removing an Object from Array

Javascript Regex, Removing unclosed tags

Getting this regex expression to work in javascript

javascript regex replace some words with links, but not within existing links

Categories

Resources