JavaScript Replace Text with HTML Between it - javascript

I want to replace some text in a webpage, only the text, but when I replace via the document.body.innerHTML I could get stuck, like so:
HTML:
<p>test test </p>
<p>test2 test2</p>
<p>test3 test3</p>
Js:
var param = "test test test2 test2 test3";
var text = document.body.innerHTML;
document.body.innerHTML = text.replace(param, '*' + param + '*');
I would like to get:
*test test
test2 test2
test3* test3
HTML of 'desired' outcome:
<p>*test test </p>
<p>test2 test2</p>
<p>test3* test3</p>
So If I want to do that with the parameter above ("test test test2 test2 test3") the <p></p> would not be taken into account - resulting into the else section.
How can I replace the text with no "consideration" to the html markup that could be between it?
Thanks in advance.
Edit (for #Sonesh Dabhi):
Basically I need to replace text in a webpage, but when I scan the
webpage with the html in it the replace won't work, I need to scan and
replace based on text only
Edit 2:
'Raw' JavaScript Please (no jQuery)

This will do what you want, it builds a regex expression to find the text between tags and replace in there. Give it a shot.
http://jsfiddle.net/WZYG9/5/
The magic is
(\s*(?:<\/?\w+>)*\s*)*
Which, in the code below has double backslashes to escape them within the string.
The regex itself looks for any number of white space characters (\s). The inner group (?:</?\w+>)* matches any number of start or end tags. ?: tells java script to not count the group in the replacement string, and not remember the matches it finds. < is a literal less than character. The forward slash (which begins an end html tag) needs to be escaped, and the question mark means 0 or 1 occurrence. This is proceeded by any number of white space characters.
Every space within the "text to search" get replaced with this regular expression, allowing it to match any amount of white space and tags between the words in the text, and remember them in the numbered variables $1, $2, etc. The replacement string gets built to put those remembered variables back in.
Which matches any number of tags and whitespace between them.
function wrapTextIn(text, character) {
if (!character) character = "*"; // default to asterik
// trim the text
text = text.replace(/(^\s+)|(\s+$)/g, "");
//split into words
var words = text.split(" ");
// return if there are no words
if (words.length == 0)
return;
// build the regex
var regex = new RegExp(text.replace(/\s+/g, "(\\s*(?:<\\/?\\w+>)*\\s*)*"), "g");
//start with wrapping character
var replace = character;
//for each word, put it and the matching "tags" in the replacement string
for (var i = 0; i < words.length; i++) {
replace += words[i];
if (i != words.length - 1 & words.length > 1)
replace += "$" + (i + 1);
}
// end with the wrapping character
replace += character;
// replace the html
document.body.innerHTML = document.body.innerHTML.replace(regex, replace);
}

WORKING DEMO
USE THAT FUNCTION TO GET TEXT.. no jquery required

First remove tags. i.e You can try document.body.textContent / document.body.innerText or use this example
var StrippedString = OriginalString.replace(/(<([^>]+)>)/ig,"");
Find and replace (for all to be replace add 1 more thing "/g" after search)
String.prototype.trim=function(){return this.replace(/^\s\s*/, '').replace(/\s\s*$/, '');};
var param = "test test test2 test2 test3";
var text = (document.body.textContent || document.body.innerText).trim();
var replaced = text.search(param) >= 0;
if(replaced) {
var re = new RegExp(param, 'g');
document.body.innerHTML = text.replace(re , '*' + param + '*');
} else {
//param was not replaced
//What to do here?
}
See here
Note: Using striping you will lose the tags.

Related

Replace a specific character from a string with HTML tags

Having a text input, if there is a specific character it must convert it to a tag. For example, the special character is *, the text between 2 special characters must appear in italic.
For example:
This is *my* wonderful *text*
must be converted to:
This is <i>my</i> wonderful <i>text</i>
So I've tried like:
const arr = "This is *my* wonderful *text*";
if (arr.includes('*')) {
arr[index] = arr.replace('*', '<i>');
}
it is replacing the star character with <i> but doesn't work if there are more special characters.
Any ideas?
You can simply create wrapper and thereafter use regular expression to detect if there is any word that is surrounded by * and simply replace it with any tag, in your example is <i> tag so just see the following
Example
let str = "This is *my* wonderful *text*";
let regex = /(?<=\*)(.*?)(?=\*)/;
while (str.includes('*')) {
let matched = regex.exec(str);
let wrap = "<i>" + matched[1] + "</i>";
str = str.replace(`*${matched[1]}*`, wrap);
}
console.log(str);
here you go my friend:
var arr = "This is *my* wonderful *text*";
const matched = arr.match(/\*(?:.*?)\*/g);
for (let i = 0; i < matched.length; i++) {
arr = arr.replace(matched[i], `<i>${matched[i].replaceAll("*", "")}</i>`);
}
console.log(arr);
an explanation first of all we're matching the regex globaly by setting /g NOTE: that match with global flag returns an array.
secondly we're looking for any character that lies between two astrisks and we're escaping them because both are meta characters.
.*? match everything in greedy way so we don't get something like this my*.
?: for non capturing groups, then we're replacing every element we've matched with itself but without astrisk.

How to find string in between special characters in a string and replace it as superscript

I have a title in the following format - test ^TM^ title test. I want to convert this text in such a way that the word enclosed between '^' as superscript without changing its position, as shown below.
testTMtitle test
Please help me.
A regex can do this for you, see the example below.
const
input = 'test ^TM^ title test',
// This regex will match any text between two "^" characters. The text
// between the "^" will be placed inside a capture group.
regex = /\^(.*?)\^/g,
// This replaces the match with the text from the capture group, wrapped in a sup tag.
htmlString = input.replace(regex, `<sup>$1</sup>`);
console.log(htmlString);
document.getElementById('output').innerHTML = htmlString;
<div id="output"></div>
Using javascript:
text = "test ^TM^ title test";
text.replace(/\^(.+?)\^/g,'$1'.sup());
I thinks you need to open your files in any HTML or PHP editor and use find and replace option.
For example: find ^TM^
Replace with: TM
string = "hello^TM^ hi";
isOdd = true;
newString = "";
for(i=0; i<string.length; i++){
if(string[i] == '^'){
if(isOdd){
newString += "<sup>";
} else {
newString += "</sup>";
}
isOdd = !isOdd;
} else{
newString += string[i];
}
}
//newString will have "hello<sup>TM</sup> hi"

Splitting a string by white space and a period when not surrounded by quotes

I know that similar questions have been asked many times, but my regular expression knowledge is pretty bad and I can't get it to work for my case.
So here is what I am trying to do:
I have a text and I want to separate the sentences. Each sentence ends with some white space and a period (there can be one or many spaces before the period, but there is always at least one).
At the beginning I used /\s+\./ and it worked great for separating the sentences, but then I noticed that there are cases such as this one:
"some text . some text".
Now, I don't want to separate the text in quotes. I searched and found a lot of solutions that work great for spaces (for example: /(".*?"|[^"\s]+)+(?=\s*|\s*$)/), but I was not able to modify them to separate by white space and a period.
Here is the code that I am using at the moment.
var regex = /\s+\./;
var result = regex.exec(fullText);
if(result == null) {
break;
}
var length = result[0].length;
var startingPoint = result.index;
var currentSentence = fullText.substring(0,startingPoint).trim();
fullText = fullText.substring(startingPoint+length);
I am separating the sentences one by one and removing them from the full text.
The length var represents the size of the portion that needs to be removed and startingPoint is the position on which the portion starts. The code is part of a larger while cycle.
Instead of splitting you may try and match all sentences between delimiters. In this case it will be easier to skip delimiters in quotes. The respective regex is:
(.*?(?:".*?".*?)?|.*?)(?: \.|$)
Demo: https://regex101.com/r/iS9fN6/1
The sentences then may be retrieved in this loop:
while (match = regex.exec(input)) {
console.log(match[1]); // each next sentence is in match[1]
}
BUT! This particular expression makes regex.exec(input) return true infinitely! (Looks like a candidate to one more SO question.)
So I can only suggest a workaround with removing the $ from the expression. This will cause the regex to miss the last part which later may be extracted as a trailer not matched by the regex:
var input = "some text . some text . \"some text . some text\" some text . some text";
//var regex = /(.*?(?:".*?".*?)?|.*?)(?: \.|$)/g;
var regex = /(.*?(?:".*?".*?)?|.*?) \./g;
var trailerPos = 0;
while (match = regex.exec(input)) {
console.log(match[1]); // each next sentence is in match[1]
trailerPos = match.index + match[0].length;
}
if (trailerPos < input.length) {
console.log(input.substring(trailerPos)); // the last sentence in
// input.substring(trailerPos)
}
Update:
If sentences span multiple lines, the regex won't work since . pattern does not match the newline character. In this case just use [\s\S] instead of .:
var input = "some \ntext . some text . \"some\n text . some text\" some text . so\nm\ne text";
var regex = /([\s\S]*?(?:"[\s\S]*?"[\s\S]*?)?|[\s\S]*?) \./g;
var trailerPos = 0;
var sentences = []
while (match = regex.exec(input)) {
sentences.push(match[1]);
trailerPos = match.index + match[0].length;
}
if (trailerPos < input.length) {
sentences.push(input.substring(trailerPos));
}
sentences.forEach(function(s) {
console.log("Sentence: -->%s<--", s);
});
Use the encode and decode of javascript while sending and receiving.

Search and Replace with JS: RegExp that does not include html code

I'm looking for a regexp that matches to strings in html body but does not influence strings that appear in title tags e.g. I have:
words = new Array("Android","iOS");
change = new Array ("http://www.google.com","http://www.apple.com");
obj = document.getElementsByTagName("body")[0];
// search and replace
for (i in words) {
re = new RegExp("\\b("+words[i]+")\\b", "ig");
str = obj.innerHTML.replace(re,'$1');
document.getElementsByTagName("body")[0].innerHTML = str;
}
}
So I have a list with words an the JS is replacing these words (eg replacing iOS' by <a href='http://www.apple.com'>iOS</a>) from HTML Body. But: it also replaces HTML Code like '<title = 'iOS'> -> this becomes <title='a href='http://www.apple.com'>iOS</a>' . How can the regexp can be changed that <title='...> and stuff are not changed
Adam
Use a look-ahead to ensure the target is not within a tag (ie the next angle bracket is a <):
re = new RegExp("\\b(" + words[i] + ")\\b(?=[^>]*<)", "ig");

Match all whitespace in an HTML string with JavaScript

Lets say you have an HTML string like this:
<div id="loco" class="hey" >lorem ipsum pendus <em>hey</em>moder <hr /></div>
And need to place <br/> elements after every space character.... which I was doing with:
HTMLtext.replace(/\s{1,}/g, ' <br/>');
However, the problem is that this inserts breaks after space characters in-between tags (between tag properties) too and I'd of course like to do this for tag textual contents only. Somehow I was always really bad with regular expressions - could anyone help out?
So basically do my original whitespace match but only if its not between < and > ?
Regex is not a good tool for this. You should be working with the DOM, not with the raw HTML string.
For a quick-and-dirty solution that presupposes that there are no < or > character in your string except those delimiting a tag, you can try this, though:
result = subject.replace(/\s+(?=[^<>]*<)/g, "$&<br/>");
This inserts a <br/> after whitespace only if the next angle bracket is an opening angle bracket.
Explanation:
\s+ # Match one or more whitespace characters (including newlines!)
(?= # but only if (positive lookahead assertion) it's possible to match...
[^<>]* # any number of non-angle brackets
< # followed by an opening angle bracket
) # ...from this position in the string onwards.
Replace that with $& (which contains the matched characters) plus <br/>.
This regex does not check if there is a > further behind, as this would require a positive look*behind* assertion, and JavaScript does not support these. So you can't check for that, but if you control the HTML and are sure that the conditions I mentioned above are met, that shouldn't be a problem.
See this answer for iterating the dom and replacing whitespaces with <br /> elements. The adapted code would be:
(function iterate_node(node) {
if (node.nodeType === 3) { // Node.TEXT_NODE
var text = node.data,
words = text.split(/\s/);
if (words.length > 1) {
node.data = words[0];
var next = node.nextSibling,
parent = node.parentNode;
for (var i=1; i<words.length; i++) {
var tnode = document.createTextNode(words[i]),
br = document.createElement("br");
parent.insertBefore(br, next);
parent.insertBefore(tnode, next);
}
}
} else if (node.nodeType === 1) { // Node.ELEMENT_NODE
for (var i=node.childNodes.length-1; i>=0; i--) {
iterate_node(node.childNodes[i]); // run recursive on DOM
}
}
})(content); // any dom node
(Demo at jsfiddle.net)
Okay, so you don't want to match spaces inside HTML tags. Only regular expressions isn't sufficient for this. I'll use a lexer to do the job. You can see the output here.
var lexer = new Lexer;
var result = "";
lexer.addRule(/</, function (c) { // start of a tag
this.state = 2; // go to state 2 - exclusive tag state
result += c; // copy to output
});
lexer.addRule(/>/, function (c) { // end of a tag
this.state = 0; // go back to state 0 - initial state
result += c; // copy to output
}, [2]); // only apply this rule when in state 2
lexer.addRule(/.|\n/, function (c) { // match any character
result += c; // copy to output
}, [2]); // only apply this rule when in state 2
lexer.addRule(/\s+/, function () { // match one or more spaces
result += "<br/>"; // replace with "<br/>"
});
lexer.addRule(/.|\n/, function (c) { // match any character
result += c; // copy to output
}); // everything else
lexer.input = '<div id="loco" class="hey" >lorem ipsum pendus <em>hey</em>moder <hr /></div>';
lexer.lex();
Of course, a lexer is a very powerful tool. You may also skip angled brackets inside the value of an attribute in a tag. However I'll leave that for you to implement. Good luck.

Categories

Resources