As the title suggests I want to exclude the script tag.
Cause while using regex (at least I think that's the right name :P)
I get to a point where something
var wdc = /something/g;
is included inside the
var foundwdc = words.match(wdc).length;
So when I alert foundwdc it gives 3 "somethings" instead of the desired two inside the body
var words = document.getElementsByTagName('body')[0].innerHTML;
Hope this is clear enough :D and hope the title is right :P
Use replace() to remove the script tag from string
var words = document.getElementsByTagName('body')[0].innerHTML.replace(/<script[^>]*>[\s\S]*?<\/script>/gi, '');
console.log(words);
hi hello
<script>
</script>
Regex explanation here.
Related
What I want to do is find the tag that has the string "test string" even when that tag is nested inside other tags.
HTML example:
<section class="test-class1"><div><p class="test-class2">something else....test string</p></div></section>
Regex :
/.*<([a-zA-Z]*).*>.*?test string/g
Output:
p
I'm using https://regex101.com/#javascript, for the testing;
This regex works well when the html is small, but when the size of the HTML increases, it times out.
Is there a way to improve the performance of the regex ?
< *(\w+)[^<>]*>[^<]*(?:<[^>]*)*test string
matches p in the first capturing group ($1). Is not possible to speed it up so much. You'd better to use pure JS functions.
Try this <(\w+)[^>]+>[^>]+test string
var data = '<section class="test-class1"><div><p class="test-class2">something else....test string</p></div></section>';
var regex = /<(\w+)[^>]+>[^>]+test string/
var output = regex.exec(data);
alert(output[1]);
Online Regex
I have a strings where some html tags could present, like
this is a nice day for bowling <b>bbbb</b>
how can I replace with RegExp all b symbols, for example, with :blablabla: (for example) but ONLY outside html tags?
So in that case the resulting string should become
this is a nice day for :blablabla:owling <b>bbbb</b>
EDIT: I would like to be more specific, based on the answers I have received. So first of all I have just a string, not DOM element, or anything else. The string may or may not contain tags (opening and closing). The main idea is to be able to replace anywhere in the text except inside tags. For example if I have a string like
not feeling well today :/ check out this link http://example.com
the regexp should replace only first :/ with real smiley image, but should not replace second and third, because they are inside (and part of) tag. Here's an example snippet using the regexp from one of the answer.
var s = 'not feeling well today :/ check out this link http://example.com';
var replaced = s.replace(/(?:<[^\/]*?.*?<\/.*?>)|(:\/)/g, "smiley_image_here");
document.querySelector("pre").textContent = replaced;
<pre></pre>
It is strange but the DEMO shows that it captured the correct group, but the same regexp in replace function seem not to be working.
The regex itself to replace all bs with :blablabla: is not that hard:
.replace(/b/g, ":blablabla:")
It is a bit tricky to get the text nodes where we need to perform search and replace.
Here is a DOM-based example:
function replaceTextOutsideTags(input) {
var doc = document.createDocumentFragment();
var wrapper = document.createElement('myelt');
wrapper.innerHTML = input;
doc.appendChild( wrapper );
return textNodesUnder(doc);
}
function textNodesUnder(el){
var n, walk=document.createTreeWalker(el,NodeFilter.SHOW_TEXT,null,false);
while(n=walk.nextNode())
{
if (n.parentNode.nodeName.toLowerCase() === 'myelt')
n.nodeValue = n.nodeValue.replace(/:\/(?!\/)/g, "smiley_here");
}
return el.firstChild.innerHTML;
}
var s = 'not feeling well today :/ check out this link http://example.com';
console.log(replaceTextOutsideTags(s));
Here, we only modify the text nodes that are direct children of the custom-created element named myelt.
Result:
not feeling well today smiley_here check out this link http://example.com
var input = "this is a nice day for bowling <b>bbbb</b>";
var result = input.replace(/(^|>)([^<]*)(<|$)/g, function(_,a,b,c){
return a
+ b.replace(/b/g, ':blablabla:')
+ c;
});
document.querySelector("pre").textContent = result;
<pre></pre>
You can do this:
var result = input.replace(/(^|>)([^<]*)(<|$)/g, function(_,a,b,c){
return a
+ b.replace(/b/g, ':blablabla:') // you may do something else here
+ c;
});
Note that in most (no all but most) real complex use cases, it's much more convenient to manipulate a parsed DOM rather than just a string. If you're starting with a HTML page, you might use a library (some, like my one, accept regexes to do so).
I think you can use a regex like this : (Just for a simple data not a nested one)
/<[^\/]*?b.*?<\/.*?>|(b)/ig
[Regex Demo]
If you wanna use a regex I can suggest you use below regex to remove all tags recursively until all tags removed:
/<[^\/][^<]*>[^<]*<\/.*?>/g
then use a replace for finding any b.
I'm trying to match #(\w+) in a div content and remove it.
Here's what i've tried : http://jsfiddle.net/mxgde6m7/1/ .
#(\w+) works , but it doesn't replace with space.
var content = document.getElementById('contentbox');
var find = '#(\w+)';
var reg = new RegExp(find, 'g');
var result = content.innerHTML.replace(reg, ' ');
alert(result);
<div id="contentbox">#d test
What i want: <div id="contentbox">test
</div>
Thanks in advance.
EDIT
Okay, one problem solved, another one came up.
My script http://jsfiddle.net/mxgde6m7/9/ works perfectly there, but when i try it on my website, only a half works. The last part where it should replace #(\w+) with space doesn't work at all. If i copy/paste the CONTENT of the function in console(chrome), it works , but if i paste the function and i call it, it doesn't work.
Please help ! I'm stuck.
Using a RegExp constructor, you need two backslashes \\ in place of each backslash \.
var find = '#(\\w+)';
hwnd is correct that you need to double escape \w in your regular expression.
var find = '#(\\w+)';
But, you could also make this code much cleaner by defining a regex literal like so -
var content = document.getElementById('contentbox');
var result = content.innerHTML.replace(/#(\w+)/g, ' ');
alert(result);
Doing it this way doesn't require double escaping, as it's not a string.
I'm having a problem with a javascript regex that has to comment out all tags inside a script tag. But it can not comment out special first script tag with id "ignorescript".
Here is a sample string to regex:
<script id="ignorescript">
var test = '<script>test<\/script>;
var xxxx = 'x';
</script>
Script tag inside ignorescipt has extra backslash because it is JSON encoded (from PHP).
And here is the final result i have to get:
<script id="ignorescript">
var test = '<!ignore-- <script>test<\/script> ignore-->;
var xxxx = 'x';
</script>
Following example works:
content = content.replace(/(<script>.*<\\\/script>)/g,
"<!--ignore $1 ignore-->");
But I need to check that it does not contain a keyword "ignorescript". If that keyword comes up then I do not want to replace anything. Otherwise add ignore comments to whole script tag So far I have gotten this far:
content = content.replace(/(<script.((?!ignorescript).)*<\/script>)/g,
"<!--ignore $1 ignore-->");
It kinda works, but not the way it supposed to be. I also have one more backslash in ending tag. So I changed it to:
content = content.replace(/(<script.((?!ignorescript).)*<\\\/script>)/g,
"<!--ignore $1 ignore-->");
Not it does not find anything at all.
Got it finally working.
Here is the working regex:
/(<script(?!\sid="ignorescript").*?<\\\/script>)/g
I'm trying to return the contents of any tags in a body of text. I'm currently using the following expression, but it only captures the contents of the first tag and ignores any others after that.
Here's a sample of the html:
<script type="text/javascript">
alert('1');
</script>
<div>Test</div>
<script type="text/javascript">
alert('2');
</script>
My regex looks like this:
//scripttext contains the sample
re = /<script\b[^>]*>([\s\S]*?)<\/script>/gm;
var scripts = re.exec(scripttext);
When I run this on IE6, it returns 2 matches. The first containing the full tag, the 2nd containing alert('1').
When I run it on http://www.pagecolumn.com/tool/regtest.htm it gives me 2 results, each containing the script tags only.
The "problem" here is in how exec works. It matches only first occurrence, but stores current index (i.e. caret position) in lastIndex property of a regex. To get all matches simply apply regex to the string until it fails to match (this is a pretty common way to do it):
var scripttext = ' <script type="text/javascript">\nalert(\'1\');\n</script>\n\n<div>Test</div>\n\n<script type="text/javascript">\nalert(\'2\');\n</script>';
var re = /<script\b[^>]*>([\s\S]*?)<\/script>/gm;
var match;
while (match = re.exec(scripttext)) {
// full match is in match[0], whereas captured groups are in ...[1], ...[2], etc.
console.log(match[1]);
}
Don't use regular expressions for parsing HTML. HTML is not a regular language. Use the power of the DOM. This is much easier, because it is the right tool.
var scripts = document.getElementsByTagName('script');
Try using the global flag:
document.body.innerHTML.match(/<script.*?>([\s\S]*?)<\/script>/gmi)
Edit: added multiple line and case insensitive flags (for obvious reasons).
The first group contains the content of the tags.
Edit: Don't you have to surround the regex-satement with quotes? Like:
re = "/<script\b[^>]*>([\s\S]*?)<\/script>/gm";
In .Net, there's a submatch method, in PHP, preg_match_all, which should solve you problem. In Javascript there isn't such a method. But you can made by yourself.
Test in
http://www.pagecolumn.com/tool/regtest.htm
Select $1elements method will return what you want
try this
for each(var x in document.getElementsByTagName('script');
if (x && x.innerHTML){
var yourRegex = /http:\/\/\.*\.com/g;
var matches = yourRegex.exec(x.innerHTML);
if (matches){
your code
}}