Javascript regex not containing keyword with backslashes - javascript

I'm having a problem with a javascript regex that has to comment out all tags inside a script tag. But it can not comment out special first script tag with id "ignorescript".
Here is a sample string to regex:
<script id="ignorescript">
var test = '<script>test<\/script>;
var xxxx = 'x';
</script>
Script tag inside ignorescipt has extra backslash because it is JSON encoded (from PHP).
And here is the final result i have to get:
<script id="ignorescript">
var test = '<!ignore-- <script>test<\/script> ignore-->;
var xxxx = 'x';
</script>
Following example works:
content = content.replace(/(<script>.*<\\\/script>)/g,
"<!--ignore $1 ignore-->");
But I need to check that it does not contain a keyword "ignorescript". If that keyword comes up then I do not want to replace anything. Otherwise add ignore comments to whole script tag So far I have gotten this far:
content = content.replace(/(<script.((?!ignorescript).)*<\/script>)/g,
"<!--ignore $1 ignore-->");
It kinda works, but not the way it supposed to be. I also have one more backslash in ending tag. So I changed it to:
content = content.replace(/(<script.((?!ignorescript).)*<\\\/script>)/g,
"<!--ignore $1 ignore-->");
Not it does not find anything at all.

Got it finally working.
Here is the working regex:
/(<script(?!\sid="ignorescript").*?<\\\/script>)/g

Related

removing specific html tags in a string - javascript

I'm trying to clean up a string of text on the server side from the output generated by a wysiwyg. and while I can fix it client side, it's best to also fix this on the server side.
var string = "<p>firstline</p><p>secondline</p><p>thirdline</p><p>iframe</p><p>a</p><p>df</p><p>dsf </p><p><br></p><p>sd</p><p>f</p><p>sdf</p><p><br></p>"
var x = string.replace("<p><br></p>", "");
https://jsfiddle.net/8c0yh9r7/
the code should but doesn't get rid of the break within the paragraphs
why is that?
Use a regex with a global flag, like:
string.replace(/<p><br><\/p>/g, "");
https://jsfiddle.net/Lu2r3820/1/
When using a string only the first occurrence will be replaced.
See replace() documentation
doesn't get rid of the break within the paragraphs
Yes, it does… but only once. You have more than one paragraph containing a line break in your code.
If you want to replace it more than once, you need to use a regex and mark it as global with g.
var x = string.replace(/<p><br><\/p>/g, "");
It does replace, but only the first occurrence. If you run this afterwards, you can see the second occurrence disappearing.
var x = x.replace("<p><br></p>", "");
refer to this to replace all occurrences.
How to replace all occurrences of a string in JavaScript?

How do you remove from /> back to < and everything in between? (Javascript)

I'm having an issue with some XML when processing it with my Javascript, because the Node modules (libxslt & libxmljs) don't know how to handle a self closing tag. Through some different testing I have narrowed the problem down to XML elements that self close, like the center element in the example below:
var string =
"<head>
<body>
<example />
</body>
</head>"
Simply put, I need a way of removing
<example />
entirely; without knowing the position prior, because there are multiple in a document, and without addressing the tag name directly, because the self closing tags vary from document to document.
If .replace() obtains the location ID of the parameter, it could be used with a function as the second parameter. Something like this:
string.replace('/>', function(match){
//search from match back for the closest '<' and remove that substring.
})
Thanks all for the advice; particularly to #Tonioyoyo, his led to solving my question, solution below:
//Xml with random element tags
var xml = "<head><body><example1 /><example2 /><example3 /></body></head>"
//Convert to string
xml = xml.toString();
//Create pattern variable to match self-closing elements
var myRegexp = /.*?(\<\w+\s*\/\>).*/
//Removing all problem elements
var match = myRegexp.exec(xml);
while (match != null && match[1] != null) {
xml = xml.replace(match[1], '')
match = myRegexp.exec(xml);
}
//Log result
console.log(xml);
However, the real problem turned out to be a comma getting added, like so:
<opti,ons/>
When porting from SQL to Node.js using node package 'mssql', (the comma was not in the source SQL), which produced the mismatching tags error. Using:
xml.toString();
xml.replace(<opti,ons/>, ''); //Fixes the mismatch tags error.
This means that #Quentin is correct the Node modules libxslt & libxmljs do know how to deal with self closing tags, as the added comma was the problem not the tags.
You can write your own regular expression to capture either self closing tags or code between classic tags.
For instance, if you do:
var string =
"<head>
<body>
<example />
</body>
</head>"
var pattern = /<(.*) \/>/;
var result = string.replace(pattern, '');
You will end up with your string value equals to:
<head>
<body>
</body>
</head>
And if you want to test your regular expression online, you may want to visit https://regex101.com/ (you can test for Javascript language)
Hope this helps :)

getElementsByTagName exclude script tag

As the title suggests I want to exclude the script tag.
Cause while using regex (at least I think that's the right name :P)
I get to a point where something
var wdc = /something/g;
is included inside the
var foundwdc = words.match(wdc).length;
So when I alert foundwdc it gives 3 "somethings" instead of the desired two inside the body
var words = document.getElementsByTagName('body')[0].innerHTML;
Hope this is clear enough :D and hope the title is right :P
Use replace() to remove the script tag from string
var words = document.getElementsByTagName('body')[0].innerHTML.replace(/<script[^>]*>[\s\S]*?<\/script>/gi, '');
console.log(words);
hi hello
<script>
</script>
Regex explanation here.

regex to match subexpression at end of string

I'm trying to test whether the ending pattern in a string is an html closing tag (assuming trailing spaces are trimmed).
var str1 = "<em>I</em> am <strong>dummy</strong> <em>text.</em>"; //ends with html close tag
var str2 = "<em>I</em> am <strong>dummy</strong> <strong>text.</strong>"; //ends with html close tag
var str3 = "<em>I</em> am <strong>dummy</strong> text"; //does not end with html close tag
Using str1 above, I would like to get the position of the ending tag, which is an . Here are my attempts:
var rgx1 = /(<\/em>)$/g; // works. basic scenario. matches closing </em> tags at the end of the string.
var rgx2 = /<\s*\/\s*\w\s*.*?>/g; //matches html closing tags.
var rgx3 = /<\s*\/\s*\w\s*.*?>$/g; //doesn't work. supposed to match closing html tag at the end of the string
console.log(str.search(rgx1))
While rgx1 correctly returns the position of the ending tag, and rgx2 correctly returns the position of a closing html tag in general, I'm trying get a generalized regex that will return the positing of any html tag that ends the string. Why doesn't rgx3 work?
Should just use a negative char class to match anything that's not a closing >
var rgx = /<\/[^>]+>$/g;
as to why rgx3 didn't work... your pattern isn't really good but it should technically match... if it didn't work with the $ on the end there, then the string you are matching probably isn't trimmed as you suppose it to be (or some other thing on the end other than closing html tag)
Seems like there might be an issue with rgx2 and rgx3 - an extra .*? before the > and a missing * after \w - here's how I would write the regexes. The fact that rgx2 was working at all was because of the match all (.*)
var rgx2 = /<\s*\/\s*\w*\s*>/g;
var rgx3 = /<\s*\/\s*\w*\s*>$/g;

How to get regex to match multiple script tags?

I'm trying to return the contents of any tags in a body of text. I'm currently using the following expression, but it only captures the contents of the first tag and ignores any others after that.
Here's a sample of the html:
<script type="text/javascript">
alert('1');
</script>
<div>Test</div>
<script type="text/javascript">
alert('2');
</script>
My regex looks like this:
//scripttext contains the sample
re = /<script\b[^>]*>([\s\S]*?)<\/script>/gm;
var scripts = re.exec(scripttext);
When I run this on IE6, it returns 2 matches. The first containing the full tag, the 2nd containing alert('1').
When I run it on http://www.pagecolumn.com/tool/regtest.htm it gives me 2 results, each containing the script tags only.
The "problem" here is in how exec works. It matches only first occurrence, but stores current index (i.e. caret position) in lastIndex property of a regex. To get all matches simply apply regex to the string until it fails to match (this is a pretty common way to do it):
var scripttext = ' <script type="text/javascript">\nalert(\'1\');\n</script>\n\n<div>Test</div>\n\n<script type="text/javascript">\nalert(\'2\');\n</script>';
var re = /<script\b[^>]*>([\s\S]*?)<\/script>/gm;
var match;
while (match = re.exec(scripttext)) {
// full match is in match[0], whereas captured groups are in ...[1], ...[2], etc.
console.log(match[1]);
}
Don't use regular expressions for parsing HTML. HTML is not a regular language. Use the power of the DOM. This is much easier, because it is the right tool.
var scripts = document.getElementsByTagName('script');
Try using the global flag:
document.body.innerHTML.match(/<script.*?>([\s\S]*?)<\/script>/gmi)
Edit: added multiple line and case insensitive flags (for obvious reasons).
The first group contains the content of the tags.
Edit: Don't you have to surround the regex-satement with quotes? Like:
re = "/<script\b[^>]*>([\s\S]*?)<\/script>/gm";
In .Net, there's a submatch method, in PHP, preg_match_all, which should solve you problem. In Javascript there isn't such a method. But you can made by yourself.
Test in
http://www.pagecolumn.com/tool/regtest.htm
Select $1elements method will return what you want
try this
for each(var x in document.getElementsByTagName('script');
if (x && x.innerHTML){
var yourRegex = /http:\/\/\.*\.com/g;
var matches = yourRegex.exec(x.innerHTML);
if (matches){
your code
}}

Categories

Resources