Search and Replace with JS: RegExp that does not include html code - javascript

I'm looking for a regexp that matches to strings in html body but does not influence strings that appear in title tags e.g. I have:
words = new Array("Android","iOS");
change = new Array ("http://www.google.com","http://www.apple.com");
obj = document.getElementsByTagName("body")[0];
// search and replace
for (i in words) {
re = new RegExp("\\b("+words[i]+")\\b", "ig");
str = obj.innerHTML.replace(re,'$1');
document.getElementsByTagName("body")[0].innerHTML = str;
}
}
So I have a list with words an the JS is replacing these words (eg replacing iOS' by <a href='http://www.apple.com'>iOS</a>) from HTML Body. But: it also replaces HTML Code like '<title = 'iOS'> -> this becomes <title='a href='http://www.apple.com'>iOS</a>' . How can the regexp can be changed that <title='...> and stuff are not changed
Adam

Use a look-ahead to ensure the target is not within a tag (ie the next angle bracket is a <):
re = new RegExp("\\b(" + words[i] + ")\\b(?=[^>]*<)", "ig");

Related

Regex word search with apostrophe

highlightStr: function (body, searchString){
console.log(searchString);
var regex = new RegExp('(' + searchString + ')', 'gi');
console.log(regex)
return body.replace(regex, "<span class='text-highlight'>$1</span>");
}
Above is the code I'm using. I want to find and replace the searchString, which could be anything. It works fine for most words, but fails when finding words with apostrophes.
How can I modify the regex to include special characters like the appostrophe.
var body = "<br>I like that Appleā€™s.<br>";
var searchString = "Apple's";
Thank you
You should escape the search string to make sure the regex works OK even if the search string contains special regex metacharacters.
Besides, there is no need wrapping the whole pattern with a capturing group, you may always reference the whole match with $& placeholder from the replacement pattern.
Here is an example code:
var s = "I like that Apple's color";
var searchString = "Apple's";
var regex = new RegExp(searchString.replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&'), "gi");
document.body.innerHTML = s.replace(regex, '<b>$&</b>');

Alternative to regexp $1 for replace

I am trying to modify a substring with .replace() in javascript.
Basically I want to put arbitrary text before and after the match.
var pattern = new RegExp("<div"+"(.*?)>", "g");
var text = "<div><div class='someClass'>"
text = text.replace(pattern, "<pre>"+"<div>"+ "$1" + ">" +"</pre>")
The code above changes text to:
"<pre><div>></pre><pre><div> class='someClass'></pre>"
Besides the extra ">>" this is correct, but it is ugly in the replace function.
How can I change my regex so I
Dont have to use $1 because it is not fully supported according to this
How can I change replace to something simpler like
text = text.replace(pattern, "<pre>"+ "THING_THAT_MATCHED" +"</pre>")
Use the following code:
var pattern = new RegExp("<div"+"(.*?)>", "g");
var text = "<div><div class='someClass'>"
text = text.replace(pattern, function(match, first_match) {
return "<pre>"+"<div>"+ first_match + ">" +"</pre>"
})
Also note that you code make your original code much neater, like so:
var pattern = new RegExp("<div"+"(.*?)>", "g");
var text = "<div><div class='someClass'>"
text = text.replace(pattern, "<pre><div>$1></pre>")

How to find all regex match in a string

This might be too simple to find on web but I got problem with finding the answer.
I get string as http response text that contain substrings I want to grab all one by one to further process. Its relative URL.
for example:
var string = "div classimage a hrefstring1.png img idEMIC00001 he19.56mm wi69.85mm srcstring1.png separated by some html div classimage a hrefstring2.png srcstring2.png div separated by some html many such relative urls";
var re = new RegExp("[a-z]{5,10}[0-9].png");
var match = re.exec(string)
WScript.Echo (match);
This gives first match. I want to get all collection one by one. I am using Jscript. I am new to javascript.
After the answer I tried this.
var string = "div classimage a hrefstring1.png img idEMIC00001 he19.56mm wi69.85mm srcstring1.png separated by some html div classimage a hrefstring2.png srcstring2.png div separated by some html many such relative urls";
var re = new RegExp("[a-z]{5,10}[0-9].png", "g");
var match = re.exec(string)
WScript.Echo (match);
But no luck.
use 'g' for a global search and match to get all matches:-
var string = "div classimage a hrefstring1.png img idEMIC00001 he19.56mm wi69.85mm srcstring1.png separated by some html div classimage a hrefstring2.png srcstring2.png div separated by some html many such relative urls";
var re = new RegExp("[a-z]{5,10}[0-9].png", 'g');
var matches = string.match(re);
for(var i = 0; i < matches.length; i++){
console.log(matches[i]);
}
This should fix your problem :
var re = new RegExp("[a-z]{5,10}[0-9].png", "g");
The "g" stands for global, it'll match all occurrences in your string
just make it
var match = string.match(re)
instead of
var match = re.exec(string);
rest of the code seems to be fine.

Select Random Words From Tag, Wrap In Italic

I have a bunch of dynamically generated H1 tags.
I want to randomly select 1 word within the tag, and wrap it in italic tags.
This is what I have so far, the problem is, it is taking the first h1's dynamically generated content, and duplicating it to every h1 on the page.
Other than that, it works.
Any ideas?
var words = $('h1').text().split(' ');
// with help from http://stackoverflow.com/questions/5915096/get-random-item-from-array-with-jquery
var randomWord = words[Math.floor(Math.random()*words.length)];
// with more help from http://stackoverflow.com/questions/2214794/wrap-some-specified-words-with-span-in-jquery
$('h1').html($('h1').html().replace(new RegExp( randomWord, 'g' ),'<i>'+randomWord+'</i>'));
My ultimate goal
<h1>This is a <i>title</i></h1>
<h1><i>This</i> is another one</h1>
<h1>This <i>is</i> the last one</h1>
All of the titles will be dynamically generated.
http://codepen.io/anon/pen/uskfl
The problem is $('h1') creates a collection of all of the h1 tags in the page.
You can use a function callback of the html() method which will loop over every h1 and treat them as separate instances
$('h1').html(function(index, existingHtml) {
var words = existingHtml.split(' ');
var randomWord = words[Math.floor(Math.random() * words.length)];
return existingHtml.replace(new RegExp(randomWord, 'g'), '<i>' + randomWord + '</i>');
});
see html() docs ( scroll 1/2 way down page, function argument was not in earlier versions)
You can use jQuery's .each() to iterate through the h1s.
$('h1').each(function(){
var words = $(this).text().split(' ');
var randomWord = words[Math.floor(Math.random()*words.length)];
$(this).html(
$(this).html().replace(new RegExp( randomWord, 'g'),'<i>'+randomWord+'</i>')
);
});
Demo: http://jsfiddle.net/RT25S/1/
Edit: I just noticed a bug in my answer that is also in your question and probably in the other answers.
In titles like this is another one, is is italicised in both is and this. scrowler commented that when the selected word is in the title multiple times all of them will be italicised, but I doubt you intended for partial words to be italicised.
The fixes are relatively simple. Just check for spaces before and after the word. You also have to allow for words at the beginning and end of the title using the ^ and $ metacharacters.
Ideally we could use \b, which is a "word boundary", instead but it doesn't seem to work when words end with non-alphanum characters.
You should also probably escape the randomly-selected word before including it in a regex in case it contains any special characters. I added the escaping regex from Is there a RegExp.escape function in Javascript?.
The updated code:
$('h1').each(function(){
var words = $(this).text().split(' ');
var randomWord = words[Math.floor(Math.random()*words.length)];
// Escape the word before including it in a regex in case it has any special chars
var randomWordEscaped = randomWord.replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&');
$(this).html(
$(this).html().replace(
//new RegExp( '\\b' + randomWordEscaped + '\\b', 'g' ),
new RegExp( '(^| )' + randomWordEscaped + '( |$)', 'g' ),
'<i> ' + randomWord + ' </i>'
)
);
});
And the updated JSFiddle: http://jsfiddle.net/RT25S/3/
Note that I added spaces after and before the <i> tags because the regex now captures them. (This still works for words at the beginning/ends of titles because HTML ignores that whitespace.)

JavaScript Replace Text with HTML Between it

I want to replace some text in a webpage, only the text, but when I replace via the document.body.innerHTML I could get stuck, like so:
HTML:
<p>test test </p>
<p>test2 test2</p>
<p>test3 test3</p>
Js:
var param = "test test test2 test2 test3";
var text = document.body.innerHTML;
document.body.innerHTML = text.replace(param, '*' + param + '*');
I would like to get:
*test test
test2 test2
test3* test3
HTML of 'desired' outcome:
<p>*test test </p>
<p>test2 test2</p>
<p>test3* test3</p>
So If I want to do that with the parameter above ("test test test2 test2 test3") the <p></p> would not be taken into account - resulting into the else section.
How can I replace the text with no "consideration" to the html markup that could be between it?
Thanks in advance.
Edit (for #Sonesh Dabhi):
Basically I need to replace text in a webpage, but when I scan the
webpage with the html in it the replace won't work, I need to scan and
replace based on text only
Edit 2:
'Raw' JavaScript Please (no jQuery)
This will do what you want, it builds a regex expression to find the text between tags and replace in there. Give it a shot.
http://jsfiddle.net/WZYG9/5/
The magic is
(\s*(?:<\/?\w+>)*\s*)*
Which, in the code below has double backslashes to escape them within the string.
The regex itself looks for any number of white space characters (\s). The inner group (?:</?\w+>)* matches any number of start or end tags. ?: tells java script to not count the group in the replacement string, and not remember the matches it finds. < is a literal less than character. The forward slash (which begins an end html tag) needs to be escaped, and the question mark means 0 or 1 occurrence. This is proceeded by any number of white space characters.
Every space within the "text to search" get replaced with this regular expression, allowing it to match any amount of white space and tags between the words in the text, and remember them in the numbered variables $1, $2, etc. The replacement string gets built to put those remembered variables back in.
Which matches any number of tags and whitespace between them.
function wrapTextIn(text, character) {
if (!character) character = "*"; // default to asterik
// trim the text
text = text.replace(/(^\s+)|(\s+$)/g, "");
//split into words
var words = text.split(" ");
// return if there are no words
if (words.length == 0)
return;
// build the regex
var regex = new RegExp(text.replace(/\s+/g, "(\\s*(?:<\\/?\\w+>)*\\s*)*"), "g");
//start with wrapping character
var replace = character;
//for each word, put it and the matching "tags" in the replacement string
for (var i = 0; i < words.length; i++) {
replace += words[i];
if (i != words.length - 1 & words.length > 1)
replace += "$" + (i + 1);
}
// end with the wrapping character
replace += character;
// replace the html
document.body.innerHTML = document.body.innerHTML.replace(regex, replace);
}
WORKING DEMO
USE THAT FUNCTION TO GET TEXT.. no jquery required
First remove tags. i.e You can try document.body.textContent / document.body.innerText or use this example
var StrippedString = OriginalString.replace(/(<([^>]+)>)/ig,"");
Find and replace (for all to be replace add 1 more thing "/g" after search)
String.prototype.trim=function(){return this.replace(/^\s\s*/, '').replace(/\s\s*$/, '');};
var param = "test test test2 test2 test3";
var text = (document.body.textContent || document.body.innerText).trim();
var replaced = text.search(param) >= 0;
if(replaced) {
var re = new RegExp(param, 'g');
document.body.innerHTML = text.replace(re , '*' + param + '*');
} else {
//param was not replaced
//What to do here?
}
See here
Note: Using striping you will lose the tags.

Categories

Resources