Regex to match only when certain characters follow a string - javascript

I need to find a string that contains "script" with as many characters before or after, and enclosed in < and >. I can do this with:<*script.*>
I also want to match only when that string is NOT followed by a <
The closest I've come, so far, is with this: (<*script.*>)([^=?<*]*)$
However, that will fail for something like <script></script> because the last > isn't followed by a < (so it doesn't match).
How can I check if only the the first > is followed by < or not?
For example,
<script> abc () ; </script> MATCH
<< ScriPT >abc (”XXX”);//<</ ScriPT > MATCH
<script></script> DON'T MATCH
And, a case that I still am working on:
<script/script> DON'T MATCH
Thanks!

You were close with your Regex. You just needed to make your first query non-greedy using a ? after the second *. Try this out:
(?i)<*\s*script.*?>[^<]+<*[^>]+>
There is an app called Expresso that really helps with designing Regex strings. Give it a shot.
Explanation: Without the ? non-greedy argument, your second * before the first > makes the search go all the way to the end of the string and grab the > at the end right at that point. None of the other stuff in your query was even being looked at.
EDIT: Added (?i) at the beginning for case-insensitivity. If you want a javascript specific case-insensitive regex, you would do that like this:
/<*\s*script.*?>[^<]+<*[^>]+>/i
I noticed you have parenthesis in your regex to make groups but you didn't specifically say you were trying to capture groups. Do you want to capture what's between the <script> and </script>? If so, that would be:
/<*\s*script.*?>([^<]+)<*[^>]+>/i

If I understand what you are looking for give this a try:
regex = "<\s*script\s*>([^<]+)<"
Here is an example in Python:
import re
textlist = ["<script>show this</script>","<script></script>"]
regex = "<\s*script\s*>([^<]+)"
for text in textlist:
thematch = re.search(regex, text, re.IGNORECASE)
if thematch:
print ("match found:")
print (thematch.group(1))
else:
print ("no match sir!")
Explanation:
start with < then possible spaces, the word script, possible spaces, a >
then capture all (at least 1) non < and make sure that's followed by a <
Hope that helps!

This would be better solved by using substring() and/or indexOf()
JavaScript methods

Related

javascript regex insert new element into expression

I am passing a URL to a block of code in which I need to insert a new element into the regex. Pretty sure the regex is valid and the code seems right but no matter what I can't seem to execute the match for regex!
//** Incoming url's
//** url e.g. api/223344
//** api/11aa/page/2017
//** Need to match to the following
//** dir/api/12ab/page/1999
//** Hence the need to add dir at the front
var url = req.url;
//** pass in: /^\/api\/([a-zA-Z0-9-_~ %]+)(?:\/page\/([a-zA-Z0-9-_~ %]+))?$/
var re = myregex.toString();
//** Insert dir into regex: /^dir\/api\/([a-zA-Z0-9-_~ %]+)(?:\/page\/([a-zA-Z0-9-_~ %]+))?$/
var regVar = re.substr(0, 2) + 'dir' + re.substr(2);
var matchedData = url.match(regVar);
matchedData === null ? console.log('NO') : console.log('Yay');
I hope I am just missing the obvious but can anyone see why I can't match and always returns NO?
Thanks
Let's break down your regex
^\/api\/ this matches the beginning of a string, and it looks to match exactly the string "/api"
([a-zA-Z0-9-_~ %]+) this is a capturing group: this one specifically will capture anything inside those brackets, with the + indicating to capture 1 or more, so for example, this section will match abAB25-_ %
(?:\/page\/([a-zA-Z0-9-_~ %]+)) this groups multiple tokens together as well, but does not create a capturing group like above (the ?: makes it non-captuing). You are first matching a string exactly like "/page/" followed by a group exactly like mentioned in the paragraph above (that matches a-z, A-Z, 0-9, etc.
?$ is at the end, and the ? means capture 0 or more of the precending group, and the $ matches the end of the string
This regex will match this string, for example: /api/abAB25-_ %/page/abAB25-_ %
You may be able to take advantage of capturing groups, however, and use something like this instead to get similar results: ^\/api\/([a-zA-Z0-9-_~ %]+)\/page\/\1?$. Here, we are using \1 to reference that first capturing group and match exactly the same tokens it is matching. EDIT: actually, this probably won't work, since the text after /api/ and the text after /page/ will most likely be different, carrying on...
Afterwards, you are are adding "dir" to the beginning of your search, so you can now match someting like this: dir/api/abAB25-_ %/page/abAB25-_ %
You have also now converted the regex to a string, so like Crayon Violent pointed out in their comment, this will break your expected funtionality. You can fix this by using .source on your regex: var matchedData = url.match(regVar.source); https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/source
Now you can properly match a string like this: dir/api/11aa/page/2017 see this example: https://repl.it/Mj8h
As mentioned by Crayon Violent in the comments, it seems you're passing a String rather than a regular expression in the .match() function. maybe try the following:
url.match(new RegExp(regVar, "i"));
to convert the string to a regular expression. The "i" is for ignore case; don't know that's what you want. Learn more here:
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp

Javascript Regex only replacing first match occurence

I am using regular expressions to do some basic converting of wiki markup code into copy-pastable plain text, and I'm using javascript to do the work.
However, javascript's regex engine behaves much differently to the ones I've used previously as well as the regex in Notepad++ that I use on a daily basis.
For example- given a test string:
==Section Header==
===Subsection 1===
# Content begins here.
## Content continues here.
I want to end up with:
Section Header
Subsection 1
# Content begins here.
## Content continues here.
Simply remove all equals signs.
I began with the regex setup of:
var reg_titles = /(^)(=+)(.+)(=+)/
This regex searches for lines that begin with one or more equals with another set of one or more equals. Rubular shows that it matches my lines accurately and does not catch equals signs in the middle of contet. http://www.rubular.com/r/46PrkPx8OB
The code to replace the string based on regex
var lines = $('.tb_in').val().split('\n'); //use jquery to grab text in a textarea, and split into an array of lines based on the \n
for(var i = 0;i < lines.length;i++){
line_temp = lines[i].replace(reg_titles, "");
lines[i] = line_temp; //replace line with temp
}
$('.tb_out').val(lines.join("\n")); //rejoin and print result
My result is unfortunately:
Section Header==
Subsection 1===
# Content begins here.
## Content continues here.
I cannot figure out why the regex replace function, when it finds multiple matches, seems to only replace the first instance it finds, not all instances.
Even when my regex is updated to:
var reg_titles = /(={2,})/
"Find any two or more equals", the output is still identical. It makes a single replacement and ignores all other matches.
No one regex expression executor behaves this way for me. Running the same replace multiple times has no effect.
Any advice on how to get my string replace function to replace ALL instances of the matched regex instead of just the first one?
^=+|=+$
You can use this.Do not forget to add g and m flags.Replace by ``.See demo.
http://regex101.com/r/nA6hN9/28
Add the g modifier to do a global search:
var reg_titles = /^(=+)(.+?)(=+)/g
Your regex is needlessly complex, and yet doesn't actually accomplish what you set out to do. :) You might try something like this instead:
var reg_titles = /^=+(.+?)=+$/;
lines = $('.tb_in').val().split('\n');
lines.forEach(function(v, i, a) {
a[i] = v.replace(reg_titles, '$1');
})
$('.tb_out').val(lines.join("\n"));

Match characters prior to word?

I’ve been at it for many hours now and finally decided to give up and ask.
I need a JavaScript Regex to match against things like this:
asdfURL
123URL
##URL
Basically anything before the word URL except < and >.
I was able to handle characters after the word (below), but not prior. And I need both before and after!
/^(?=\bURL)[^<> ]+$/i
So essentially $B#5t4rg3b4URLDFSGre4r should match and FGWEG$R$G$?>URL<9TGSG should not.
You should use groups
var myRegexp =/([^<> ]*)URL([^<> ]*)/ig;
var match = myRegexp.exec(input);
alert(match[1]);//before
alert(match[2]);//after

Simple Regexp Pattern matching with escape characters

Hopefully a simple one!
I've been trying to get this to work for several hours now but am having no luck, as I'm fairly new to regexp I may be missing something very obvious here and was hoping someone could point me in the right direction. The pattern I want to match is as follows: -
At least 1 or more numbers + "##" + at least 1 or more numbers + "##" + at least 1 or more numbers
so a few examples of valid combinations would be: -
1##2##3
123#123#123
0##0##0
A few invalid combinations would be
a##b##c
1## ##1
I've got the following regexp like so: -
[\d+]/#/#[\d+]/#/#[\d+]
And am using it like so (note the double slashes as its inside a string): -
var patt = new RegExp("[\\d+]/#/#[\\d+]/#/#[\\d+]");
if(newFieldValue!=patt){newFieldValue=="no match"}
I also tried these but still nothing: -
if(!patt.text(newFieldValue)){newFieldValue==""}
if(patt.text(newFieldValue)){}else{newFieldValue==""}
But nothing I try is matching, where am I going wrong here?
Any pointers gratefully received, cheers!
1) I can't see any reason to use the RegExp constructor over a RegExp literal for your case. (The former is used primarily where the pattern needs to by dynamic, i.e. is contributed to by variables.)
2) You don't need a character class if there's only one type of character in it (so \d+ not [\d+]
3) You are not actually checking the pattern against the input. You don't apply RegEx by creating an instance of it and using ==; you need to use test() or match() to see if a match is made (the former if you want to check only, not capture)
4) You have == where you mean to assign (=)
if (!/\d+##\d+##\d+/.test(newFieldValue)) newFieldValue = "no match";
You put + inside the brackets, so you're matching a single character that's either a digit or +, not a sequence of digits. I also don't understand why you have / before each #, your description doesn't mention anything about this character.
Use:
var patt = /\d+##\d+##\d+/;
You should use the test method of the pat regex
if (!patt.test(newFieldValue)){ newFieldValue=="no match"; }
once you have a valid regular expression.
Try this regex :
^(?:\d+##){2}\d+$
Demo: http://regex101.com/r/mE8aG7
With the following regex
[\d+]/#/#[\d+]/#/#[\d+]
You would only match things like:
+/#/#5/#/#+
+/#/#+/#/#+
0/#/#0/#/#0
because the regex engine sees it like on the schema below:
Something like:
((-\s)?\d+##)+\d+

Javascript Regular expression to remove unwanted <br>,

I have a JS stirng like this
<div id="grouplogo_nav"><br> <ul><br> <li><a class="group_hlfppt" target="_blank" href="http://www.hlfppt.org/">&nbsp;</a></li><br> </ul><br> </div>
I need to remove all <br> and $nbsp; that are only between > and <. I tried to write a regular expression, but didn't got it right. Does anybody have a solution.
EDIT :
Please note i want to remove only the tags b/w > and <
Avoid using regex on html!
Try creating a temporary div from the string, and using the DOM to remove any br tags from it. This is much more robust than parsing html with regex, which can be harmful to your health:
var tempDiv = document.createElement('div');
tempDiv.innerHTML = mystringwithBRin;
var nodes = tempDiv.childNodes;
for(var nodeId=nodes.length-1; nodeId >= 0; --nodeId) {
if(nodes[nodeId].tagName === 'br') {
tempDiv.removeChild(nodes[nodeId]);
}
}
var newStr = tempDiv.innerHTML;
Note that we iterate in reverse over the child nodes so that the node IDs remain valid after removing a given child node.
http://jsfiddle.net/fxfrt/
myString = myString.replace(/^( |<br>)+/, '');
... where /.../ denotes a regular expression, ^ denotes start of string, ($nbsp;|<br>) denotes " or <br>", and + denotes "one or more occurrence of the previous expression". And then simply replace that full match with an empty string.
s.replace(/(>)(?: |<br>)+(\s?<)/g,'$1$2');
Don't use this in production. See the answer from Phil H.
Edit: I try to explain it a bit and hope my english is good enough.
Basically we have two different kinds of parentheses here. The first pair and third pair () are normal parentheses. They are used to remember the characters that are matched by the enclosed pattern and group the characters together. For the second pair, we don't need to remember the characters for later use, so we disable the "remember" functionality by using the form (?:) and only group the characters to make the + work as expected. The + quantifier means "one or more occurrences", so or <br> must be there one or more times. The last part (\s?<) matches a whitespace character (\s), which can be missing or occur one time (?), followed by the characters <. $1 and $2 are kind of variables that are replaces by the remembered characters of the first and third parentheses.
MDN provides a nice table, which explains all the special characters.
You need to replace globally. Also don't forget that you can have the being closed . Try this:
myString = myString.replace(/( |<br>|<br \/>)/g, '');
This worked for me, please note for the multi lines
myString = myString.replace(/( |<br>|<br \/>)/gm, '');
myString = myString.replace(/^( |<br>)+/, '');
hope this helps

Categories

Resources