I have this text:
<body>
<span class="Forum"><div align="center"></div></span><br />
<span class="Topic">Text</span><br />
<hr />
<b>Text</b> Text<br />
<hr width=95% class="sep"/>
TextText
<hr />
<b>Text</b> -Text<br />
<hr width=95% class="sep"/>
**Text what i need.**
<hr />
and my RegEx for "Text what I need" - /"sep"(.*)hr/m .
It's wrong: Why?
Don’t use regular expression, use DOM methods instead:
var elems = document.getElementByTagName("hr");
for (var i=0; i<elems.length; ++i) {
var elem = elems[i];
if (/(?:^|\s+)sep(?:\s|$)/.test(elem.className) &&
elem.nextSibling && elem.nextSibling.nodeType === Node.TEXT_NODE) {
var text = elems.nextSibling.nodeValue;
break;
}
}
This selects all HR elements, checks if it has the class sep and grabs the next sibling node if it is a text node.
. doesn't match newlines in JavaScript regular expressions. Try:
/"sep"([\s\S]*)hr/m
IMO, you're much better off going for a different approach, regex isn't ideal for extracting data from HTML. A better method would be to create a div, set the element's innerHTML property to the HTML string you have, then use DOM traversal to find the text node you need.
Here's an example of what I mean: http://www.jsfiddle.net/W33n6/. It uses the following code to get the text:
var div = document.createElement("div");
div.innerHTML = html;
var hrs = div.getElementsByTagName("hr");
for (var i = 0; i < hrs.length; i++) {
if (hrs[i].className == "sep") {
document.body.innerHTML = hrs[i].nextSibling.nodeValue;
break;
}
}
EDIT: Gumbo's version is a little stricter than mine, checking for the "sep" class among other classes and ensuring the node following is a text node.
Related
I have a string of text here that will be dynamically generated to be one of the following:
<h1 id="headline">"Get your FREE toothbrush!"</h1>
OR
<h1 id="headline">"FREE floss set and dentures!"</h1>
Since this will be dynamically generated I won't be able to wrap a <span> around the word "FREE" so I want to specifically target the word "FREE" using Javascript so that I can style it with a different font-family and font-color than whatever styling the <h1> is set to. What methods do I use to go about doing this?
You can search and replace the substring 'FREE' with styled HTML. If 'FREE' occurs more than once in the string you may need to use regex (unless you don't need to support Internet Explorer). See How to replace all occurrences of a string?
In your case:
let str = '<h1 id="headline">"FREE floss set and dentures!"</h1>'
str = str.replace(/FREE/g, '<span color="red">FREE</span>');
The property you are looking for is innerHTML, look the following example:
var word = document.getElementById('word');
function changeWord(){
word.innerHTML = "another";
word.style.backgroundColor = 'black';
word.style.color = 'white';
}
<h1 id="headline">
<span id="word">some</span> base title
</h1>
<button onClick="changeWord()">
change
</button>
Here is a working example using slice and some classic concatenation.
EDIT: Code for the second string is also included now.
//get headline by id
var headline = document.getElementById("headline");
//declare your possible strings in vars
var string1 = "Get your FREE toothbrush!"
var string2 = "FREE floss set and dentures!"
//declare formatted span with "FREE" in var
var formattedFree = "<span style='color: blue; font-style: italic;'>FREE</span>"
//target positions for the rest of your string
var string1Position = 13
var string2Position = 4
//concat your vars into expected positions for each string
var newString1 = string1.slice(0, 9) + formattedFree + string1.slice(string1Position);
var newString2 = formattedFree + string2.slice(string2Position)
//check if strings exist in html, if they do then append the new strings with formatted span
if (headline.innerHTML.includes(string1)) {
headline.innerHTML = newString1
}
else if (headline.innerHTML.includes(string2)) {
headline.innerHTML = newString2
}
<!-- As you can see the original string does not have "FREE" formatted -->
<!-- Change this to your other string "FREE floss set and dentures!" to see it work there as well -->
<h1 id="headline">Get your FREE toothbrush!</h1>
You can split the text and convert the keyword "FREE" to a span element. So you can style the keyword "FREE". This method is safe because does not alter any non-text html element.
var keyword = "FREE";
var headline = document.getElementById("headline");
var highlight, index;
headline.childNodes.forEach(child => {
if (child.nodeType == Node.TEXT_NODE) {
while ((index = child.textContent.indexOf(keyword)) != -1) {
highlight = child.splitText(index);
child = highlight.splitText(keyword.length);
with(headline.insertBefore(document.createElement("span"), highlight)) {
appendChild(highlight);
className = 'highlight';
}
}
}
});
.highlight {
/* style your keyword */
background-color: yellow;
}
<div id="FREE">
<h1 id="headline">"Get your FREE toothbrush! FREE floss set and dentures!"</h1>
</div>
Let's say I have this html
<strong>Link</strong>
and I want to replace this with something else programatically. I select this with the mouse and call
var sel = window.getSelection()
The content of sel is however a text, and its parentNode is the link node a (and its parentNode is the <strong> element I was looking for).
Can I get semantic elements like e.g. strong, b, em in a selection?
Use case: I want to select some text in a wysiwyg editor (html) and replace it with a link.
You can use the jQuery selector feature to search in the string for particular tag:
$('#btn').on('click', function() {
var sel = getSelectionHtml();
alert(sel.toString())
var anchor = $(sel).find('a');
var id = anchor.attr('id');
// here DOM manipulation starts
$('#' + id).html('Click here');
});
function getSelectionHtml() {
var html = "";
if (typeof window.getSelection != "undefined") {
var sel = window.getSelection();
if (sel.rangeCount) {
var container = document.createElement("div");
for (var i = 0, len = sel.rangeCount; i < len; ++i) {
container.appendChild(sel.getRangeAt(i).cloneContents());
}
html = container.innerHTML;
}
}
else if (typeof document.selection != "undefined") {
if (document.selection.type == "Text") {
html = document.selection.createRange().htmlText;
}
}
return html;
}
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<span>This is sample text</span>
<br/>
<br/>
<strong>Link</strong>
<span>This is another text</span>
<br/>
<br/>
<span>Select two above lines and click the button below</span>
<br/>
<br/>
<input type="button" id="btn" value="Click to see selected HTML" />
The function used to obtain HTML selection was reused from this page
Are you sure that there wouldn't be other <strong> tags on the page? This isn't unique at all, so you would be running a risk of selecting/replacing the wrong element. You could do something where you're finding the first or closest, but this may still be risky (could even grab this in a text-based ad that you're running).
That said, if you feel comfortable with the approach, you could try something like this instead:
document.getElementsByTagName("strong")[0].innerHTML = "<a href='url'>Some words</a>";
That would grab the first <strong> tag on the page and inject the link for you.
I'm trying to scrape text from an HTML string by using container.innerText || container.textContent where container is the element from which I want to extract text.
Usually, the text I want to extract is located in <p> tags. So for the HTML below as an example:
<div id="container">
<p>This is the first sentence.</p>
<p>This is the second sentence.</p>
</div>
Using
var container = document.getElementById("container");
var text = container.innerText || container.textContent; // the text I want
will return This is the first sentence.This is the second sentence. without a space between the first period and the start of the second sentence.
My overall goal is to parse text using the Stanford CoreNLP, but its parser cannot detect that these are 2 sentences because they are not separated by a space. Is there a better way of extracting text from HTML such that the sentences are separated by a space character?
The HTML I'm parsing will have the text I want mostly in <p> tags, but the HTML may also contain <img>, <a>, and other tags embeeded between <p> tags.
As a dirty hack, try using this:
container.innerHTML.replace(/<.*?>/g," ").replace(/ +/g," ");
This will replace all tags with a space, then collapse multiple spaces into a single one.
Note that if there is a > inside an attribute value, this will mess you up. Avoiding this problem will require more elaborate parsing, such as looping through all text nodes and putting them together.
Longer but more robust method:
function recurse(result, node) {
var c = node.childNodes, l = c.length, i;
for( i=0; i<l; i++) {
if( c[i].nodeType == 3) result += c.nodeValue + " ";
if( c[i].nodeType == 1) result = recurse(result, c[i]);
}
return result;
}
recurse(container);
Assuming I haven't made a stupid mistake, this will perform a depth-first search for text nodes, appending their contents to the result as it goes.
jQuery has the method text() that does what you want. Will this work for you?
I'm not sure if it fits for everything that's in your container but it works in my example. It will also take the text of a <a>-tag and appends it to the text.
Update 20.12.2020
If you're not using jQuery. You could implement the text method with vanilla js like this:
const nodes = Array.from(document.querySelectorAll("#container"));
const text = nodes
.filter((node) => !!node.textContent)
.map((node) => node.textContent)
.join(" ");
Using querySelectorAll("#container") to get every node in the container. Using Array.from so we can work with Array methods like filter, map & join.
Finally, generate the text by filtering out elements with-out textContent. Then use map to get each text and use join to add a space separator between the text.
$(function() {
var textToParse = $('#container').text();
$('#output').html(textToParse);
});
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<div id="container">
<p>This is the first sentence.</p>
<p>This is the second sentence.</p>
<img src="http://placehold.it/200x200" alt="Nice picture"></img>
<p>Third sentence.</p>
</div>
<h2>output:</h2>
<div id="output"></div>
You can use the following function to extract and process the text as shown. It basically goes through all the children nodes of the target element and the child nodes of the child nodes and so on ... adding spaces at appropriate points:
function getInnerText( sel ) {
var txt = '';
$( sel ).contents().each(function() {
var children = $(this).children();
txt += ' ' + this.nodeType === 3 ? this.nodeValue : children.length ? getInnerText( this ) : $(this).text();
});
return txt;
}
function getInnerText( sel ) {
var txt = '';
$( sel ).contents().each(function() {
var children = $(this).children();
txt += ' ' + this.nodeType === 3 ?
this.nodeValue : children.length ?
getInnerText( this ) : $(this).text();
});
return txt;
}
alert( getInnerText( '#container' ) );
<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.1/jquery.min.js"></script>
<div id="container">
Some other sentence
<p>This is the first sentence.</p>
<p>This is the second sentence.</p>
</div>
You may use jQuery to traverse down the elements.
Here is the code :
$(document).ready(function()
{
var children = $("#container").find("*");
var text = "";
while (children.html() != undefined)
{
text += children.html()+"\n";
children = children.next();
}
alert(text);
});
Here is the fiddle : http://jsfiddle.net/69wezyc5/
I have the following html elements from which I have to get some specific texts,
example "John Doe"
I'm a newbie in javascript but have been playing with getElementById etc but I can't seem to get this one right.
<div id="name">
<p><span id="nameheading">name: </span> John Doe</p>
</div>
Bellow is What I have tried:
function askInformation()
{
var nameHeading = document.getElementById("nameheading");
var paragraph = document.getElementsByTagName("p").item(0).innerHTML ;
var name = paragraph[4];
console.log(name); // prints letter (n)
}
I need help please
If you want to get the text following the span in the following:
<div id="name">
<p><span id="nameheading">name: </span> John Doe</p>
</div>
You can use something like:
// Get a reference to the span
var span = document.getElementById('nameheading');
// Get the following text
var text = span.nextSibling.data;
However that is highly dependent on the internal structure, it may be best to loop over text node children and collect the content of all of them. You may also want to trim leading and trailing white space.
You could also get a reference to the parent DIV and use a function like the following that collects the text children and ignores child elements:
// Return the text of the child text nodes of an element,
// but not descendant element text nodes
function getChildText(element) {
var children = element.childNodes;
var text = '';
for (var i=0, iLen=children.length; i<iLen; i++) {
if (children[i].nodeType == '3') {
text += children[i].data;
}
}
return text;
}
var text = getChildText(document.getElementById('name').getElementsByTagName('p')[0]);
or more concisely for hosts that support the querySelector interface:
var text = getChildText(document.querySelector('#name p'));
var paragraph = document.getElementsByTagName("p").item(0).innerHTML ;
var name = paragraph.replace('<span id="nameheading">name: </span>','').trim(); // John Doe
I have the following HTML:
<html>
<body>
<div>
<span> $12.95 </span>
</div>
</body>
</html>
And the following Javascript:
var all = document.body.getElementsByTagName("*");
for (var i=0, max=all.length; i < max; i++) {
console.log(all[i].nodeValue);
}
I see null in the console when it gets to the element. I am wondering how may I be able to get just the text of all the elements in a page? I know that if I use innerHTML I would get the text, but then I would get the text repeated somehow. So, for the <div> I would get <span> $12.95 </span> and then for the <span> I would get $12.95
If you want to use nodeValue to get the contents then you have to traverse down to the text node that is contained within the span.
http://jsfiddle.net/xLJMb/
var all = document.body.getElementsByTagName("*");
for (var i=0, max=all.length; i < max; i++) {
console.log(all[i].nodeValue);
for(var j = 0, max2 = all[i].childNodes.length; j < max2; j++) {
console.log(all[i].childNodes[j].nodeValue);
}
}
Text Nodes are not elements, so they are not returned directly by getElementsByTagName().
Why do not use from this html:
<div>
<span id="span">$12.95 </span>
</div>
and this Script:
console.log($('#span').html());
As addendum to the answer above, in modern browser, if you want to iterate only text node, you could use the TreeWalker API:
var treeWalker = document.createTreeWalker(
document.body,
NodeFilter.SHOW_TEXT,
// Using ES6 arrow function, this is removing all "empty" text nodes
// equivalent to:
// function (node) { return !!node.nodeValue.trim() }
node => !!node.nodeValue.trim()
);
while(treeWalker.nextNode())
console.log(treeWalker.currentNode.nodeValue);