I have an HTML element with a title inside, like this. <details>Name of page</details>
How can I make a regex to search for the <details> element, but only returning the text inside, Name of page?
You should never use regex to parse HTML. Especially not when the environment you use provides a DOM parser at your fingertips. Just use it:
var docpart = document.createElement("div"),
details, text = '';
docpart.innerHTML = "your <details>…HTML string…</details> here";
details = docpart.getElementsByTagName("details");
if (details.length > 0) {
text = details[0].textContent;
}
alert(text); // "…HTML string…"
Since you mentioned jQuery in your comment, things get simpler. Here is the jQuery equivalent of the above:
var inputHTML = "your <details>…HTML string…</details> here";
var details = $("<div>", {html: inputHTML}).find("details").text();
Thy this regex:
/<details>(.*?)<\/details>/
$1 regex variable will contain the name.
Related
I've a problem. I'm currently looking for a way to remove any HTML elements from a string. But there are two conditions:
The content of the elements should be kept
Special elements with a defined class should not be removed
I've already tried lots of things and looked at plenty of questions/answers on SO, but unfortunately I can't really figure out any of the answers. Unfortunately, this exceeds my abilities by far. But I would like to know how something like this works.
Question/Answers I've tried:
How to strip HTML tags from string in JavaScript?,
Strip HTML from Text JavaScript
So when I have for example a string like this:
You have to pay <div class="keep-this">$200</div> per <span class="date">month</span> for your <span class="vehicle">car</span>
It should looks like this after stripping:
You have to pay <div class="keep-this">$200</div> per month for your car
I've actually tried following things:
jQuery(document).ready(function ($) {
let string = 'You have to pay <div class="keep-this">$200</div> per <span class="date">month</span> for your <span class="vehicle">car</span>';
console.log(string);
function removeHTMLfromString(string) {
let tmp = document.createElement("DIV");
tmp.innerHTML = string;
return tmp.textContent || tmp.innerText || "";
}
console.log(removeHTMLfromString(string));
console.log(string.replace(/<[^>]*>?/gm, ''));
});
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
And I've also tried out a regex tool to see what get's removed, but unfortunately, I'm not making much progress here either:
https://www.regexr.com/50qar
I would love if someone can help me with this task. Thanks a lot!
Update
Maybe there is a way doing it with just a regex? If yes, how can I exclude my elements with a special class when using this regex: /<\/?[^>]+(>|$)/g?
It may be a little big code. But I think it may help you.
let str = 'You have to pay <div class="keep-this">$200</div> per <span class="date">month</span> for your <span class="vehicle">car</span> <div class="keep-this">$500</div> also';
const el = document.createElement("div");
el.innerHTML = str;
// Get all the elements to keep
const keep = el.querySelectorAll(".keep-this");
// Replace the keeping element from the original string
// With special pattern and index so that we can replace
// the pattern with original keeping element
keep.forEach((v, i) => {
const keepStr = v.outerHTML;
str = str.replace(keepStr, `_k${i}_`);
});
// Replace created element's innerHTML by patternised string.
el.innerHTML = str;
// Get the text only
let stringify = el.innerText;
// Replace patterns from the text string by keeping element
keep.forEach((v,i) => {
const keepStr = v.outerHTML;
stringify = stringify.replace(`_k${i}_`, keepStr);
});
console.log(stringify);
Leave me comment if anything misleading.
Update: Regular Expression approach
The same task can be done by using a regular expression. The approach is-
Find all the keepable elements by regex and store them.
Replace all the keepable elements from the input string by an identical pattern
Remove all the HTML tags from the sting.
Replace the identical patterns by keepable elements.
let htmlString = 'You have to pay <div class="keep-this">$200</div> per <span class="date">month</span> for your <span class="vehicle">car</span> Another <div class="keep-this">$400</div> here';
// RegExp for keep elements
const keepRegex = /<([a-z1-6]+)\s+(class=[\'\"](keep-this\s*.*?)[\'\"])[^>]*>.*?<\/\1>/ig;
// RegExp for opening tag
const openRegex = /<([a-z1-6]+)\b[^>]*>/ig;
// RegExp for closing tag
const closeRegex = /<\/[a-z1-6]+>/ig;
// Find all the matches for the keeping elements
const matches = [...htmlString.matchAll(keepRegex)];
// Replace the input string with any pattern so that it could be replaced later
matches.forEach((match, i) => {
htmlString = htmlString.replace(match[0], `_k${i}_`);
});
// Remove opening tags from the input string
htmlString = htmlString.replace(openRegex, '');
// Remove closing tags from the input string
htmlString = htmlString.replace(closeRegex, '');
// Replace the previously created pattern by keeping element
matches.forEach((match, index) => {
htmlString = htmlString.replace(`_k${index}_`, match[0]);
})
console.log(htmlString);
If date and vehicles div and class are coming from another function, you should just get rid of it from there.
so I need to replace 2 sets of words in a string; the title of the web page. However, I seem to be able to get one set of words to be removed.
The title is being created by wordpress, which is adding words at the start and end of the title which I don't want to be displayed (as I am calling the title, using PHP, to dynamically create a few bits of information on the page, which are subject to change)
The code I have so far is:
<script>
var str = document.title.replace(/ - CompanyName/i, '');
document.write(str);
</script>
However, I need something which is basically:
var str = document.title.replace(/ - CompanyName/i, '') && document.title.replace(/The /i, '');
This is because the title will produce itself like "The PAGETITLE - CompanyName"
Any ideas how to remove 2 sections of the same string?
Keep the title in a separate variable, re-assign the variable with the result of each replace, set the document title:
var title = "The PAGETITLE - CompanyName";
title = title.replace("The ", "");
title = title.replace(" - CompanyName", "");
document.title = title;
Or, if you like one-liners:
document.title = document.title.replace("The ", "").replace(" - CompanyName", "");
you can directly use like this
var title = "The PAGETITLE - CompanyName";
title.replace(/The(.*?)-[^-]*/,'$1')
var str = document.title.replace(/ - CompanyName/i, '').replace(/The
/i, '');
the replace function yields the new string, which you want to run through replace again.
You can chain functions in JQuery like this:
var str = document.title.replace(/ - CompanyName/i, '').replace(/The /i, '');
Say i have a text like this:
This should also be extracted, <strong>text</strong>
I need the text only from the entire string, I have tried this:
r = r.replace(/<strong[\s\S]*?>[\s\S]*?<\/strong>/g, "$1"); but failed (strong is still there). Is there any proper way to do this?
Expected Result
This should also be extracted, text
Solution:
To target specific tag I used this:
r = r.replace(/<strong\b[^>]*>([^<>]*)<\/strong>/i, "**$1**")
To parse HTML, you need an HTML parser. See this answer for why.
If you just want to remove <strong> and </strong> from the text, you don't need parsing, but of course simplistic solutions tend to fail, which is why you need an HTML parser to parse HTML. Here's a simplistic solution that removes <strong> and </strong>:
str = str.replace(/<\/?strong>/g, "")
var yourString = "This should also be extracted, <strong>text</strong>";
yourString = yourString.replace(/<\/?strong>/g, "")
display(yourString);
function display(msg) {
// Show a message, making sure any HTML tags show
// as text
var p = document.createElement('p');
p.innerHTML = msg.replace(/&/g, "&").replace(/</g, "<");
document.body.appendChild(p);
}
Back to parsing: In your case, you can easily do it with the browser's parser, if you're on a browser:
var yourString = "This should also be extracted, <strong>text</strong>";
var div = document.createElement('div');
div.innerHTML = yourString;
display(div.innerText || div.textContent);
function display(msg) {
// Show a message, making sure any HTML tags show
// as text
var p = document.createElement('p');
p.innerHTML = msg.replace(/&/g, "&").replace(/</g, "<");
document.body.appendChild(p);
}
Most browsers provide innerText; Firefox provides textContent, which is why there's that || there.
In a non-browser environment, you'll want some kind of DOM library (there are lots of them).
You can do this
var r = "This should also be extracted, <strong>text</strong>";
r = r.replace(/<(.+?)>([^<]+)<\/\1>/,"$2");
console.log(r);
I have just included some strict regex. But if you want relaxed version, you can very well do
r = r.replace(/<.+?>/g,"");
For example i have this text
"!?vake 7EnEebjP8jXf JFyd5hpIVa6B !?vake".
The starting and ending keyword is "!?vake" and i want to retrieve the encrypted content between the keywords,decrypt it and replace it.
The html code before replacing would be:
<span class="messageBody" data-ft="{"type":3}">!?vake 7EnEebjP8jXf JFyd5hpIVa6B
Fu63LH23dAiB !?vake</span>
and after decrypting :
<span class="messageBody" data-ft="{"type":3}">i am the decrypted text</span>
Replace should work in the whole html document without knowing the specific element the encrypted text is.
You can achieve this using RegExp. See a demo here : http://jsfiddle.net/diode/KVqJS/4/
var body = $("body").html();
var matched;
while ((matched = body.match(/!\?vake (.*?) !\?vake/))) {
if(matched.length > 1){
body = body.replace(/\!\?vake (.*) \!\?vake/, decode(matched[1]));
}
}
$("body").html(body);
function decode(encoded) {
return "decoded " + encoded;
}
You can use regex to find (and replace) the data.
Here is a quick and dirty JSFiddle: http://jsfiddle.net/z8rVc/1/
I'm sure someone will come up with a more elegant method, however.
You can use the following regex search and replace code in JS:
var a = "!?vake 7EnEebjP8jXf JFyd5hpIVa6B !?vake";
var b = a.replace(/\!\?vake(.*)\!\?vake/g, '$1');
/// b contains your string between the keywords which you can decrypt as required.
page contents:
aa<b>1;2'3</b>hh<b>aaa</b>..
.<b>bbb</b>
blabla..
i want to get result:
1;2'3aaabbb
match tag is <b> and </b>
how to write this regex using javascript?
thanks!
Lazyanno,
If and only if:
you have read SLaks's post (as well as the previous article he links to), and
you fully understand the numerous and wondrous ways in which extracting information from HTML using regular expressions can break, and
you are confident that none of the concerns apply in your case (e.g. you can guarantee that your input will never contain nested, mismatched etc. <b>/</b> tags or occurrences of <b> or </b> within <script>...</script> or comment <!-- .. --> tags, etc.)
you absolutely and positively want to proceed with regular expression extraction
...then use:
var str = "aa<b>1;2'3</b>hh<b>aaa</b>..\n.<b>bbb</b>\nblabla..";
var match, result = "", regex = /<b>(.*?)<\/b>/ig;
while (match = regex.exec(str)) { result += match[1]; }
alert(result);
Produces:
1;2'3aaabbb
You cannot parse HTML using regular expressions.
Instead, you should use Javascript's DOM.
For example (using jQuery):
var text = "";
$('<div>' + htmlSource + '</div>')
.find('b')
.each(function() { text += $(this).text(); });
I wrap the HTML in a <div> tag to find both nested and non-nested <b> elements.
Here is an example without a jQuery dependency:
// get all elements with a certain tag name
var b = document.getElementsByTagName("B");
// map() executes a function on each array member and
// builds a new array from the function results...
var text = b.map( function(element) {
// ...in this case we are interested in the element text
if (typeof element.textContent != "undefined")
return element.textContent; // standards compliant browsers
else
return element.innerText; // IE
});
// now that we have an array of strings, we can join it
var result = text.join('');
var regex = /(<([^>]+)>)/ig;
var bdy="aa<b>1;2'3</b>hh<b>aaa</b>..\n.<b>bbb</b>\nblabla..";
var result =bdy.replace(regex, "");
alert(result) ;
See : http://jsfiddle.net/abdennour/gJ64g/
Just use '?' character after the generating pattern for your inner text if you want to use Regular experssions.
for example:
".*" to "(.*?)"