Use loop and find html element's values JavaScript - javascript

I want to use vanilla js to loop through a string of html text and get its values. with jQuery I can do something like this
var str1="<div><h2>This is a heading1</h2><h2>This is a heading2</h2></div>";
$.each($(str1).find('h2'), function(index, value) {
/// console.log($(value).text());
});
using $(str) converts it to an html string as I understand it and we can then use .text() to get an element (h2)'s value.
but I want to do this within my node app on the backend rather than on the client side, because it'd be more efficient (?) and also it'd just be nice to not rely on jQuery.
Some context, I'm working on a blogging app. I want a table of contents created into an object server side.

This is another way using .innerHTML but uses the built-in iterable protocol
Here's the operations we'll need, the types they have, and a link to the documentation of that function
Create an HTML element from a text
String -> HTMLElement – provided by set Element#innerHTML
Get the text contents of an HTML element
HTMLElement -> String – provided by get Element#innerHTML
Find nodes matching a query selector
(HTMLElement, String) -> NodeList – provided by Element#querySelectorAll
Transform a list of nodes to a list of text
(NodeList, HTMLElement -> String) -> [String] – provided by Array.from
// html2elem :: String -> HTMLElement
const html2elem = html =>
{
const elem = document.createElement ('div')
elem.innerHTML = html
return elem.childNodes[0]
}
// findText :: (String, String) -> [String]
const findText = (html, selector) =>
Array.from (html2elem(html).querySelectorAll(selector), e => e.textContent)
// str :: String
const str =
"<div><h1>MAIN HEADING</h1><h2>This is a heading1</h2><h2>This is a heading2</h2></div>";
console.log (findText (str, 'h2'))
// [
// "This is a heading1",
// "This is a heading2"
// ]
// :: [String]
console.log (findText (str, 'h1'))
// [
// "MAIN HEADING"
// ]
// :: [String]

The best way to parse HTML is to use the DOM. But, if all you have is a string of HTML, according to this Stackoverflow member) you may create a "dummy" DOM element to which you'd add the string to be able to manipulate the DOM, as follows:
var el = document.createElement( 'html' );
el.innerHTML = "<html><head><title>aTitle</title></head>
<body><div><h2>This is a heading1</h2><h2>This is a heading2</h2></div>
</body</html>";
Now you have a couple of ways to access the data using the DOM, as follows:
var el = document.createElement( 'html' );
el.innerHTML = "<html><head><title>aTitle</title></head><body><div><h2>This is a heading1</h2><h2>This is a heading2</h2></div></body</html>";
// one way
el.g = el.getElementsByTagName;
var h2s = el.g("h2");
for(var i = 0, max = h2s.length; i < max; i++){
console.log(h2s[i].textContent);
if (i == max -1) console.log("\n");
}
// and another
var elementList = el.querySelectorAll("h2");
for (i = 0, max = elementList.length; i < max; i++) {
console.log(elementList[i].textContent);
}
You may also use a regular expression, as follows:
var str = '<div><h2>This is a heading1</h2><h2>This is a heading2</h2></div>';
var re = /<h2>([^<]*?)<\/h2>/g;
var match;
var m = [];
var i=0;
while ( match = re.exec(str) ) {
m.push(match.pop());
}
console.log(m);
The regex consists of an opening H2 tag followed by not a "<",followed by a closing H2 tag. The "*?" take into account zero or multiple instances of which there is at least zero or one instance.
Per Ryan of Stackoverflow:
exec with a global regular expression is meant to be used in a loop,
as it will still retrieve all matched subexpressions.
The critical part of the regex is the "g" flag as per MDN. It allows the exec() method to obtain multiple matches in a given string. In each loop iteration, match becomes an array containing one element. As each element is popped off and pushed onto m, the array m ultimately contains all the captured text values.

Related

Split a string in javascript [duplicate]

This question already has an answer here:
Check if an HTML string only has element children (or whitespace between elements) and no element is unknown
(1 answer)
Closed 1 year ago.
I need to split a string according to the next idea:
const strin = 'test <br><span>test</span> <div>aa</div>8'.split(/<\ *>/i)
console.log(strin)
So, the expected output is next:
['test','<br>', '<span>test</span>', '<div>aa</div>', '8']
As #sebastian-simon mentioned, "split" HTML with only regular expression is impossible. The best solution is use a real HTML parser (already shipped with your browser, if you are using Node.js, you can use JSDOM).
var str = 'test <br><span>test</span> <fake></fake> <div><p>aa</p></div>8';
var container = document.createElement("div");
container.innerHTML = str; // use a HTML element to parse HTML
// If you need to work with nested tag, you should traverse childNodes and their childNodes by yourself
// childNodes included TextNode, children not.
// [...container.childNodes] convert container.childNodes to a normal array
// so we can .map over it
var elmList = [...container.childNodes];
var tags = elmList
// if elm is a TextNode, elm.outerHTML is undefined
// then we use elm.textContent instead
.map(elm => elm.outerHTML ?? elm.textContent)
.map(elm => elm.trim()) // remove whitespaces
.filter(elm => elm); // remove empty items
console.log(tags)

Using getElementsByTagName to find all hrefs in a variable

In a variable I'm holding HTML source code, which I obtained from DB. I'd like to search this content through for all the "a href" attributes and list them in a table.
Now I've found here how to search it in a DOM (like below), but how to use it to search within a variable?
var links = document.getElementsByTagName("a").getElementsByAttribute("href");
Got this currently, which is searching by RegEx, but it doesn't work very well:
matches_temp = result_content.match(/\b((?:[a-z][\w-]+:(?:\/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}\/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’&quote]))/ig);
In result_content I'm holding that HTML Source.
getElementsByTagName returns a nodelist that does not have a method called getElementsByAttribute but ONLY if you have DOM access
Without DOM (for example node.js)
const hrefRe = /href="(.*?)"/g;
const urlRe = /\b((?:[a-z][\w-]+:(?:\/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}\/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’&quote]))/ig;
const stringFromDB = `000
Something something 001 something`
stringFromDB.match(hrefRe).forEach(
(href) => console.log(href.match(urlRe)[0] )
);
// oldschool:
// stringFromDB.match(hrefRe).forEach(function(href) { console.log(href.match(urlRe)[0] ) });
In this code I create a DOM snippet first
Also I ONLY get anchors that have an href to begin with
NOTE the getAttribute so the browser does not try to interpret the URL
With the regex if you wanted to only match SPECIFIC types of href:
const re = /\b((?:[a-z][\w-]+:(?:\/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}\/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’&quote]))/ig;
const stringFromDB = `000
001`
let doc = document.createElement("div");
doc.innerHTML = stringFromDB
doc.querySelectorAll("a[href]").forEach(
(x) => console.log(x.getAttribute("href").match(re)[0])
);
Without the regex
const stringFromDB = `000
001`
let doc = document.createElement("div");
doc.innerHTML = stringFromDB
doc.querySelectorAll("a[href]").forEach(
(x) => console.log(x.getAttribute("href"))
);
Firstly, you shouldn't be using RegEx to parse HTML. This answer explains why.
Secondly, you're using getElementsByAttribute incorrectly - it does exactly what it says and gets elements by attributes. You should just use querySelectorAll on all elements with a href, and then map out the hrefs:
var hrefs = document.querySelectorAll("a[href*=http]");
var test = Array.prototype.slice.call(hrefs).map(e => e.href);
console.log(test);
Example
Example 1
Example 2
Example 3

Google apps script - retain links when copying footnote content

Background
I have a Google Apps Script that we use to parse the footnote content, wrapped in double parenthesis, in place of the footnote number superscript. The intended result should be:
Before Script
This is my footie index.1 1This is my
footie content with a link and emphasis.
After Script
This is my footie index. (( This is my footie content with a
link and emphasis.)
Problem
Everything works fine, except when I parse the footnotes in double parenthesis, they are losing all the links and formatting:
This is my footie index. (( This is my footie content with a
link and emphasis.)
If anyone can assist me with fixing the code below I would really appreciate the help :)
SOLUTION:
function convertFootNotes () {
var doc = DocumentApp.getActiveDocument()
var copy = generateCopy(doc) // make a copy to avoid damaging the original
var openCopy = doc; //DocumentApp.openById(copy.getId()) // you have to use the App API to copy, but the Doc API to manipulate
performConversion(openCopy); // perform formatting on the copy
}
function performConversion (docu) {
var footnotes = docu.getFootnotes(); // get the footnotes
footnotes.forEach(function (note) {
// Traverse the child elements to get to the `Text` object
// and make a deep copy
var paragraph = note.getParent(); // get the paragraph
var noteIndex = paragraph.getChildIndex(note); // get the footnote's "child index"
insertFootnote(note.getFootnoteContents(),true, paragraph, noteIndex);
note.removeFromParent();
})
}
function insertFootnote(note, recurse, paragraph, noteIndex){
var numC = note.getNumChildren(); //find the # of children
paragraph.insertText(noteIndex," ((");
noteIndex++;
for (var i=0; i<numC; i++){
var C = note.getChild(i).getChild(0).copy();
if (i==0){
var temp = C.getText();
var char1 = temp[0];
var char2 = temp[1];
if (C.getText()[0]==" "){
C = C.deleteText(0,0);
}
}
if (i>0){
paragraph.insertText(noteIndex,"\n");
noteIndex++;
}
paragraph.insertText(noteIndex,C);
noteIndex++;
} //end of looping through children
paragraph.insertText(noteIndex,"))");
}
function generateCopy (doc) {
var name = doc.getName() + ' #PARSED_COPY' // rename copy for easy visibility in Drive
var id = doc.getId()
return DriveApp.getFileById(id).makeCopy(name)
}
Were there any changes to the code other than the added )) to make it not work? Removing the (( & )) still did not have the formatting applied when testing it; getText() returns the element contents as a String, not a rich text object/element which contains the formatting info.
To get to the Text object:
getFootnoteContents().getChild(0) returns the FootnoteSection Paragraph
getChild(0).getChild(0) returns the Text object of that paragraph
copy() returns a detached deep copy of the text object to work with
Note: If there are other child elements in the FootnoteSection or in it's Paragraph child, you'll want to add some kind of type/index checking to get the correct one. However, with basic footnotes - as the above example - this is the correct path.
function performConversion (docu) {
var footnotes = docu.getFootnotes() // get the footnotes
var noteText = footnotes.map(function (note) {
// Traverse the child elements to get to the `Text` object
// and make a deep copy
var note_text_obj = note.getFootnoteContents().getChild(0).getChild(0).copy();
// Add the `((` & `))` to the start and end of the text object
note_text_obj.insertText(0, " ((");
note_text_obj.appendText(")) ");
return note_text_obj // reformat text with parens and save in array
})
...
}

Assigning javascript array elements class or id for css styling

I'm trying to assign class and id to items in an array I created in js and input into my html. I'm doing this so I can style them in my stylesheet. Each item will not be styled the same way.
I'm a beginner so trying to keep it to code I can understand and make it as clean as possible, i.e. not making each of these items an element in the html.
This part works fine:
var pool =['A','B','3','J','R','1','Q','F','5','T','0','K','N','C','R','U']
var letters = pool.join('');
document.getElementById('key').innerHTML = letters;
This part not so much:
var char1 = letters[1];
char1.classList.add('hoverRed');
There is a similar question here that didn't work for me, it just showed [object][object][object] when I ran it.
Your code attempts to apply a style to an array element, but CSS only applies to HTML. If you wish to style one character in a string, that character must be wrapped in an HTML element (a <span> is the best choice for wrapping an inline value).
This code shows how to accomplish this:
var pool =['A','B','3','J','R','1','Q','F','5','T','0','K','N','C','R','U']
var letters = pool.join('');
// Replace a specific character with the same character, but wrapped in a <span>
// so it can be styled
letters = letters.replace(letters[1], "<span>" + letters[1] + "</span>");
// Insert the letters string into the div
var theDiv = document.getElementById('key');
// Inject the string into the div
theDiv.innerHTML = letters;
// Get a reference to the span:
var theSpan = theDiv.querySelector("span");
// Add the style to the <span> that wraps the character, not the character itself
theSpan.classList.add('hoverRed');
.hoverRed {
color:red;
}
<div id="key"></div>
And, this snippet shows how you could apply CSS to any letter:
var pool =['A','B','3','J','R','1','Q','F','5','T','0','K','N','C','R','U'];
// Leave the original array alone so that it can be manipulated any way needed
// in the future, but create a new array that wraps each array element within
// a <span>. This can be accomplished in several ways, but the map() array method
// is the most straight-forward.
var charSpanArray = pool.map(function(char){
return "<span>" + char + "</span>";
});
// Decide which character(s) need CSS applied to them. This data can come from anywhere
// Here, we'll just say that the 2nd and 5th ones should.
// Loop through the new array and on the 2nd and 5th elements, apply the CSS class
charSpanArray.forEach(function(element, index, array){
// Check for the particular array elements in question
if(index === 1 || index === 4){
// Update those strings to include the CSS
array[index] = element.replace("<span>","<span class='hoverRed'>");
}
});
// Now, turn the new array into a string
var letters = charSpanArray.join('');
// For diagnostics, print the string to the console just to see what we've got
console.log(letters);
// Get a reference to the div container
var theDiv = document.getElementById('key');
// Inject the string into the div
theDiv.innerHTML = letters;
.hoverRed {
color:red;
}
<div id="key"></div>
You're on the right track, but missed one key thing.
In your example, pool contains characters. When you combine them using join, you get a string. Setting that string as the innerHTML of an element doesn't give the string super powers, it's still just a string.
In order to get a classList, you need to change your letters into elements and work with them.
I've included an es6 example (and a working plunker) of how to get the functionality you want below.
let pool = ['A','B','3','J','R','1','Q','F','5','T','0','K','N','C','R','U']
const letterToElement = function(char) {
//Create the element
let e = document.createElement("SPAN");
//Create the text node
let t = document.createTextNode(char);
//Put the text node on the element
e.appendChild(t);
//Add the class name you want
e.className += "hoverRed";
return e;
};
//create your elements from your pool and append them to the "key" element
window.onload = function() {
let container = document.getElementById("key");
pool.map(l => letterToElement(l))
.forEach(e => container.appendChild(e));
}
https://plnkr.co/edit/mBhA60aUCEGSs0t0MDGu

How to replace text in js?

Assuming I have the following:
var s = "This is a test of the battle system."
and I had an array:
var array = [
"is <b>a test</b>",
"of the <div style=\"color:red\">battle</div> system"
]
Is there some function or way I could make it such that I can process the string s such that the output would be:
var p = "This is <b>a test</b> of the <div style=\"color:red\">battle</div> system."
Based on the arbitrary elements in the array?
Note that the array elements should be executed in sequence. So looking at the first element in array 1, find the correct place to "replace" in string "s". Then looking at array element 2, find the correct place to "replace" in string "s".
Note that the string could contain numbers, brackets, and other characters like dashes (no <> though)
Update: after Colin DeClue's remark I think you want to do something different than I originally thought.
Here is how you can accomplish that
//your array
var array = [
"is <b>a test</b>",
"of the <div style=\"color:red\">battle</div> system"
];
//create a sample span element, this is to use the built in ability to get texts for tags
var cElem = document.createElement("span");
//create a clean version of the array, without the HTML, map might need to be shimmed for older browsers with a for loop;
var cleanArray = array.map(function(elem){
cElem.innerHTML = elem;
return cElem.textContent;
});
//the string you want to replace on
var s = "This is a test of the battle system."
//for each element in the array, look for elements that are the same as in the clean array, and replace them with the HTML versions
for(var i=0;i<array.length;i++){
var idx;//an index to start from, to avoid infinite loops, see discussion with 6502 for more information
while((idx = s.indexOf(cleanArray[i],idx)) > -1){
s = s.replace(cleanArray[i],array[i]);
idx +=(array[i].length - cleanArray[i].length) +1;//update the index
}
}
//write result
document.write(s);
Working example: http://jsbin.com/opudah/9/edit
Original answer, in case this is what you meant after all
Yes. Using join
var s = array.join(" ");
Here is a working example in codepen
I suppose you've an array of original --> replacement pairs.
To extract the text from an HTML a trick that may work for you is actually creating a DOM node and then extract the text content.
Once you have the text you can use the replace method with a regular expression.
One annoying thing is that searching for an exact string is not trivial because there is no escape predefined function in Javascript:
function textOf(html) {
var n = document.createElement("div");
n.innerHTML = html;
return n.textContent;
}
var subs = ["is <b>a test</b>",
"of the <div style=\"color:red\">battle</div> system"];
var s = "This is a test of the battle system"
for (var i=0; i<subs.length; i++) {
var target = textOf(subs[i]);
var replacement = subs[i];
var re = new RegExp(target.replace(/[\\[\]{}()+*$^|]/g, "\\$&"), "g");
s = s.replace(re, replacement);
}
alert(s);

Categories

Resources