Modify tag position with regex

Modify tag position with regex - javascript

Suppose I have following string:
var text = "<p>Some text <ins>Text1</p><p>Text2 </ins><ins>Some other text </ins>and another text<ins>Text3</p><p>Text4 </ins></p>"
I need to clean up the above string into
var text = "<p>Some text Text1</p><p><ins>Text2 </ins><ins>Some other text </ins>and another text Text3</p><p><ins>Text4 </ins></p>"
Assume Text1, Text2, Text3, Text4 are random string
I tried below but just mess up:
text.replace(/<ins>(.*?)<\/p><p>/g, '</p><p><ins>');
Thanks
ADDITIONAL EXPLANATION
Take a look at this:
<ins>Text1</p><p>Text2 </ins>
Above is wrong. It should be:
Text1</p><p><ins>Text2 </ins>

Please try the following regex:
function posChange() {
var text = "<p>Some text <ins>Text1</p><p>Text2 </ins><ins>Some other text </ins>and another text<ins>Text3</p><p>Text4 </ins></p>";
var textnew = text.replace(/(<ins>)([^<]+)(<\/p><p>)([^<]+)/g, '$2$3$1$4');
alert(textnew);
}
posChange()
REGEX EXPLANATION:
/(<ins>) 1st capturing group (i.e: <ins>)....$1
([^<]+) 2nd capturing group (i.e: Text1)....$2
(<\/p><p>) 3rd capturing group (i.e: </p><p>)..$3
([^<]+) 4th capturing group (i.e: Text2 )...$4
/g match all occurrences
Based on the requirements, for each match:
Original String: $1 $2 $3 $4
should be replaced with
New String: $2 $3 $1 $4
In this way, the position of each capturing group gets shifted with the help of regex.

You can remove all <ins>:
text = text.replace(/<ins>/g, '');
and then replace every string ending with </ins> and not containing any tag with sum of <ins> and this string:
var matches = text.match(/[^<>]+<\/ins>/g)
for (i = 0; i < matches.length; i++) {
text = text.replace(matches[i], '<ins>' + matches[i]);
}
result:
<p>Some text Text1</p><p><ins>Text2 </ins><ins>Some other text </ins>and another textText3</p><p><ins>Text4 </ins></p>

Related

regex replace first element

I have the need to replace a HTML string's contents from one <br> to two. But what I can't achieve is when I have one tag following another one:
(<br\s*\/?>)
will match all the tags in this text:
var text = 'text<BR><BR>text text<BR>text;'
will match and with the replace I will have
text = text.replace.replace(/(<br\s*\/?>)>/gi, "<BR\/><BR\/>")
console.log(text); //text<BR/><BR/><BR/><BR/>text text<BR/><BR/>text;"
Is there a way to only increment one tag with the regex? And achieve this:
console.log(text); //text<BR/><BR/><BR/>text text<BR/><BR/>text;"
Or I only will achieve this with a loop?

You may use either
var text = 'text<BR><BR>text text<BR>text;'
text = text.replace(/(<br\s*\/?>)+/gi, "$&$1");
console.log(text); // => text<BR><BR><BR>text text<BR><BR>text;
Here, (<br\s*\/?>)+/gi matches 1 or more sequences of <br>s in a case insensitive way while capturing each tag on its way (keeping the last value in the group beffer after the last it, and "$&$1" will replace with the whole match ($&) and will add the last <br> with $1.
Or
var text = 'text<BR><BR>text text<BR>text;'
text = text.replace(/(?:<br\s*\/?>)+/gi, function ($0) {
return $0.replace(/<br\s*\/?>/gi, "<BR/>") + "<BR/>";
})
console.log(text); // => text<BR/><BR/><BR/>text text<BR/><BR/>text;
Here, the (?:<br\s*\/?>)+ will also match 1 or more <br>s but without capturing each occurrence, and inside the callback, all <br>s will get normalized as <BR/> and a <BR/> will get appended to the result.

You can use negative look ahead (<br\s*\/?>)(?!<br\s*\/?>)/ to increment only the last tag if there are any consecutive:
var text = 'text<BR><BR>text text<BR>text;'
text = text.replace(/(<br\s*\/?>)(?!<br\s*\/?>)/gi, "<BR\/><BR\/>")
console.log(text);

Replace last digit occurrence in square brackets

I have a variable like:
var text = 'researchOrganisationTrait.keywords[0].freeKeyword[1].texts[en_GB]';
Which I wish to maintain the index of the last occurrence (dynamic added content)
I have tried using the code like:
var text = 'researchOrganisationTrait.keywords[0].freeKeyword[1].texts[en_GB]';
text = text.replace(/\[\d*](?!.*\[)/, '[newIndex]');
alert(text);
But this does not replace freeKeyword[1] with freeKeyword[newIndex]
How to I match the last occurrence of square digit?
JSFiddle: http://jsfiddle.net/4eALF/

Append \d:
text = text.replace(/\[\d+](?!.*\[\d)/, '[newIndex]')

Using Regex to remove html elements and leave the content

Lets say I have the following html
<b>Item 1</b> Text <br>
<b>Item 2</b> Text <br>
<b>Item 3</b> Text <br>
<p><font color="#000000" face="Arial, Helvetica, sans-serif"><b>Item 4:</b></font></p>
<p><font color="#000000" face="Arial, Helvetica, sans-serif">Detailed Description</font></p>
and am using the following regex to capture data (Item 1:.*?<br>)/gi which returns <b>Item 1</b> Text <br>
How do i drop or remove the <b>,</b> and <br>
to be left with
Item 1 Text
I've been trying to make sense of this code <(\w+)[^>]*>.*<\/\1>, but so far no luck. All the examples I have seen on here seem to require an id class, which my html does not have so i'm a bit stuck in getting those examples to fit my problem.

Try this reg ex: <[^>]*>
This will remove all the html with or without attributes and closing tags.

This should do the trick:
var matches = stringToTest.match(/(Item \d+.*?<br\/?>)/gi);
for (var i = 0; i < matches.length; i++) {
matches[i] = matches[i].replace(/<[^>]+>/g, '');
}
alert(matches);
If you have jQuery:
alert(
$.map(stringToTest.match(/(Item \d+.*?<br\/?>)/gi), function(v) { return v.replace(/<[^>]+>/g, '') })
);

This regex will match b and br tags:
</?br?\s*/?>
To use it in Javascript you write something like this:
result = subject.replace(/<\/?br?\s*\/?>/img, "");
All the matched tags will be replaced with an empty string.
In my experience it is better to replace br tags with a space and replace normal inline tags with empty string. If that is what you want to do, this next regex matches only b tags:
</?b\s*/?>
and this one matches only br tags:
</?br\s*/?>

in a regex, what is between () represents capture groups that can be later accessed as variables (\1 \2 \3 etc.) or sometimes $1 $2 $3. So simply use them to capture the text you want.
I think this regex would work for you:
<b>(Item \d+)</b>(.*?)<br>
in details, the expression means:
(Item \d+): Any string formatted as "Item [at least 1 digit]"
(.*?): any group of characters, the ? minimizes the number of characters in the sequence.
So now in <b>Item 5434</b>hel34lo 0345 345<br>, with regex above your captured groups are:
\1 = Item 5434
\2 = hel34lo 0345 345
I've never programmed in javascript, but more precisely, this piece of code might work:
var myString = "<b>Item 5434</b>hel34lo 0345 345<br>";
var myRegexp = /<b>(Item \d+)</b>(.*?)<br>/g;
var match = myRegexp.exec(myString);
alert(match[1]); // Item 5434
alert(match[2]); // hel34lo 0345 345

Regular expression does not work

I am using the following regular expression in Javascript:
comment_body_content = comment_body_content.replace(
/(<span id="sc_start_commenttext-.*<\/span>)((.|\s)*)(<span id="sc_end_commenttext-.*<\/span>)/,
"$1$4"
);
I want to find in my HTML code this tag <span id="sc_start_commenttext-330"></span> (the number is always different) and the tag <span id="sc_end_commenttext-330"></span>. Then the text and HTML code between those tags should be deleted and given back.
Example before replacing:
Some text and code
<span id="sc_start_commenttext-330"></span>Some text and code<span id="sc_end_commenttext-330"></span>
Some Text and code
Example after replacing:
Some text and code
<span id="sc_start_commenttext-330"></span><span id="sc_end_commenttext-330"></span>
Some text and code
Sometimes my regular expression works and it replaces the text correctly, sometimes not - is there a mistake? Thank you for help!
Alex

You should use a pattern that matches the start with its corresponding end, for example:
/(<span id="sc_start_commenttext-(\d+)"><\/span>)[^]*?(<span id="sc_end_commenttext-\2"><\/span>)/
Here \2 in the end tag refers to the matched string of (\d+) which matches the digits 330 in the start tag. [^] is a simple expression for any character.

Using DOM.
var $spans = document.getElementsByTagName("span");
var str = "";
for(var i = 0, $span, $sibling; i < $spans.length; ++i) {
$span = $spans[i];
if(/^sc_start_commenttext/i.test($span.id)) {
while($sibling = $span.nextSibling) {
if(/^sc_end_commenttext/i.test($sibling.id)) {
break;
}
str += $sibling.data;
$span.parentNode.removeChild($sibling);
}
}
}
console.log("The enclosed string was: ", str);
Here you have it.

I would start to replace .* with [0-9]+"> -- if I understand correctly your intention.

I agree that it's normaly a bad ide to use regexp to parse html but it can be used effectly on non-nested html
Using RegExp:
var str = 'First text and code<span id="sc_start_commenttext-330"></span>Remove text<span id="sc_end_commenttext-330"></span>Last Text and code';
var re = /(.*<span id="sc_start_commenttext-\d+"><\/span>).*(<span id="sc_end_commenttext-\d+"><\/span>.*)/;
str.replace(re, "$1$2");
Result:
First text and code<span id="sc_start_commenttext-330"></span><span id="sc_end_commenttext-330"></span>Last Text and code

JavaScript Replace Text with HTML Between it

I want to replace some text in a webpage, only the text, but when I replace via the document.body.innerHTML I could get stuck, like so:
HTML:
<p>test test </p>
<p>test2 test2</p>
<p>test3 test3</p>
Js:
var param = "test test test2 test2 test3";
var text = document.body.innerHTML;
document.body.innerHTML = text.replace(param, '*' + param + '*');
I would like to get:
*test test
test2 test2
test3* test3
HTML of 'desired' outcome:
<p>*test test </p>
<p>test2 test2</p>
<p>test3* test3</p>
So If I want to do that with the parameter above ("test test test2 test2 test3") the <p></p> would not be taken into account - resulting into the else section.
How can I replace the text with no "consideration" to the html markup that could be between it?
Thanks in advance.
Edit (for #Sonesh Dabhi):
Basically I need to replace text in a webpage, but when I scan the
webpage with the html in it the replace won't work, I need to scan and
replace based on text only
Edit 2:
'Raw' JavaScript Please (no jQuery)

This will do what you want, it builds a regex expression to find the text between tags and replace in there. Give it a shot.
http://jsfiddle.net/WZYG9/5/
The magic is
(\s*(?:<\/?\w+>)*\s*)*
Which, in the code below has double backslashes to escape them within the string.
The regex itself looks for any number of white space characters (\s). The inner group (?:</?\w+>)* matches any number of start or end tags. ?: tells java script to not count the group in the replacement string, and not remember the matches it finds. < is a literal less than character. The forward slash (which begins an end html tag) needs to be escaped, and the question mark means 0 or 1 occurrence. This is proceeded by any number of white space characters.
Every space within the "text to search" get replaced with this regular expression, allowing it to match any amount of white space and tags between the words in the text, and remember them in the numbered variables $1, $2, etc. The replacement string gets built to put those remembered variables back in.
Which matches any number of tags and whitespace between them.
function wrapTextIn(text, character) {
if (!character) character = "*"; // default to asterik
// trim the text
text = text.replace(/(^\s+)|(\s+$)/g, "");
//split into words
var words = text.split(" ");
// return if there are no words
if (words.length == 0)
return;
// build the regex
var regex = new RegExp(text.replace(/\s+/g, "(\\s*(?:<\\/?\\w+>)*\\s*)*"), "g");
//start with wrapping character
var replace = character;
//for each word, put it and the matching "tags" in the replacement string
for (var i = 0; i < words.length; i++) {
replace += words[i];
if (i != words.length - 1 & words.length > 1)
replace += "$" + (i + 1);
}
// end with the wrapping character
replace += character;
// replace the html
document.body.innerHTML = document.body.innerHTML.replace(regex, replace);
}

WORKING DEMO
USE THAT FUNCTION TO GET TEXT.. no jquery required

First remove tags. i.e You can try document.body.textContent / document.body.innerText or use this example
var StrippedString = OriginalString.replace(/(<([^>]+)>)/ig,"");
Find and replace (for all to be replace add 1 more thing "/g" after search)
String.prototype.trim=function(){return this.replace(/^\s\s*/, '').replace(/\s\s*$/, '');};
var param = "test test test2 test2 test3";
var text = (document.body.textContent || document.body.innerText).trim();
var replaced = text.search(param) >= 0;
if(replaced) {
var re = new RegExp(param, 'g');
document.body.innerHTML = text.replace(re , '*' + param + '*');
} else {
//param was not replaced
//What to do here?
}
See here
Note: Using striping you will lose the tags.

Develop Reference

JavaScript is the programming language of the Web.

Modify tag position with regex - javascript

Related

regex replace first element

Replace last digit occurrence in square brackets

Using Regex to remove html elements and leave the content

Regular expression does not work

JavaScript Replace Text with HTML Between it

Categories

Resources