ES6 - Parse HTML string to Array - javascript

I have an HTML formatted string:
let dataString = '<p>Lorem ipsum</p> <figure><img src="" alt=""></figure> <p>Lorem ipsum 2</p> <figure><img src="" alt=""></figure>';
How can I parse this string to get an array of tags as below?
let dataArray = [
'<p>Lorem ipsum</p>',
'<figure><img src="" alt=""></figure>',
'<p>Lorem ipsum 2</p>',
'<figure><img src="" alt=""></figure>',
];

Turn it into a document with DOMParser, then take the children of the body and .map their .outerHTML:
const str = '<p>Lorem ipsum</p> <figure><img src="" alt=""></figure> <p>Lorem ipsum 2</p> <figure><img src="" alt=""></figure>';
const doc = new DOMParser().parseFromString(str, 'text/html');
const arr = [...doc.body.children].map(child => child.outerHTML);
console.log(arr);
(you can also achieve this by creating an element and setting the innerHTML of the element to the string, and then iterating over its children, but that could allow for arbitrary code execution, if the input string isn't trustworthy)

Dom parsing is recommended.
Here using vanilla JS without the DOMParser used in the other answer
let dataString = `<p>Lorem ipsum</p> <figure><img src="" alt=""></figure> <p>Lorem ipsum 2</p> <figure><img src="" alt=""></figure>`;
let domFragment = document.createElement("div");
domFragment.innerHTML = dataString;
const arr = [...domFragment.querySelectorAll("div>p,div>figure")].map(el => el.outerHTML)
console.log(arr)
If you cannot use that, then your SPECIFIC string can be split like this after fixing your nested quotes.
Note any change for example adding a space after the <img..> will break such a script
let dataString = `<p>Lorem ipsum</p> <figure><img src="" alt=""></figure> <p>Lorem ipsum 2</p> <figure><img src="" alt=""></figure>`;
dataString = dataString.replace(/> /g,">|").split("|")
console.log(dataString)

I am not clear with your question. Is that a random string or a html string? The split rule is slice the origin string into html element parts?
If true, I think we can handle it with a dummy element.
For convenient, I use jQuery selector:
let stringToSplit = `<p>Lorem ipsum</p> <figure><img src="" alt=""></figure> <p>Lorem ipsum 2</p> <figure><img src="" alt=""></figure>`
$dummy = $("<div/>"); // create a dummy
$dummy.html(stringToSplit);
var dataArray = [];
var dummyChildren = $dummy.children();
for (var i = 0; i < dummyChildren.length; i++) {
dataArray[i] = dummyChildren[i].outerHTML
}
$dummy = null; // remove from memory
console.log(dataArray)
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>

Related

Generate Table of content dynamicaly with jquery

How do I generate table of contents dynamically. following is my code but its only for one heading tag h3. I need it to work for all headings.
Here is my following sample format of the post :
<html>
<head></head>
<body>
<div id="tableofcontent"></div>
<div class="entry-content">
<h1 id="Test1">Main Heading</h1>
<p>Lorem Ipsum Lorem IpsumLorem IpsumLorem Ipsum</p>
<h2 id="Test2"> Sub Heading</h2>
<p>Lorem Ipsum Lorem IpsumLorem IpsumLorem Ipsum</p>
<h3 id ="Test3">Sub Sub Heading</h3>
<p>Lorem Ipsum Lorem IpsumLorem IpsumLorem Ipsum</p>
<h4>Sub Sub Heading</h4>
<p>Lorem Ipsum Lorem IpsumLorem IpsumLorem Ipsum</p>
</div>
</body>
</html>
How do I generate table of contents dynamically.
following is my code but its only for one heading tag h3. I need it to work for all headings.
jQuery(document).ready(function($) {
var $aWithId = $('.entry-content h3[id]');
if ($aWithId.length != 0) {
if ($aWithId.length > 0) {
$('#tableofcontent').prepend('<nav class="toc"><h3 class="widget-title">Table of Contents</h3><ol></ol></nav>');
}
}
var $aWithId = $('.entry-content h3[id]');
if ($aWithId.length != 0) {
$('.entry-content').find($aWithId).each(function() {
var $item = $(this);
var $id = $(this).attr('id');
var li = $('<li/>');
var a = $('<a/>', {
text: $item.text(),
href: '#' + $id,
title: $item.text()
});
a.appendTo(li);
$('#tableofcontent .toc ol').append(li);
});
}
});

Cheerio unmatched selector error while selecting plain text

I'm scraping a web page with cheerio's .map method. The page's html code looks like this:
<div class="foo">
<h1>Lorem</h1>
<p>Lorem ipsum dolor sit amet.</p>
TEXT WITHOUT TAG
<p>Lorem ipsum dolor sit amet.</p>
</div>
Here is what I do:
let $ = cheerio.load(body);
let contentHtml = $('foo').html();
$(contentHtml).map((index, element) => {
console.log(element);
});
When .map see the 'TEXT WITHOUT TAG', it throws an error like this:
Unmatched selector: ...
Which is expected because it hasn't any selectors. I want to wrap that plain text with <p> tags but I couldn't figure out how.
Your element has class foo and selector not:
let contentHtml = $('.foo').html();

Get all headers and resursively create a tree

I want to create a tree using headers.
Example:
<h1>Wow</h1>
<h2>Blablablub</h2>
<p>Lorem Ipsum...</p>
<h1>Lalalala</h1>
<p>Lorem Ipsum...</p>
<h1>Ble</h1>
<h2>Test</h2>
<h3>Third</h3>
<p>Lorem Ipsum...</p>
This list should be created:
<ul>
<li>
<a>Wow</a>
<ul>
<li>
<a>Blablablub</a>
</li>
</ul>
</li>
<li>
<a>Lalalala</a>
</li>
<li>
<a>Ble</a>
<ul>
<li>
<a>Test</a>
<ul>
<li>
<a>Third</a>
</li>
</ul>
</li>
</ul>
</li>
</ul>
a tags should have a custom id but that isn't important for this question. I tried to do this but I couldn't figure it out. Here's what I tried:
function find_titles(find_num, element, prefix=""){
temp_list = $("<ul></ul>");
element.find(`${prefix}h${find_num}`).each(function(i, object){
let text = $(object).text();
let id = text.replace(/[^0-9a-zA-Z]/gi, "") + random_chars();
$(object).attr("id", id);
if ($(object).next().prop("tagName").toLowerCase() == `h${find_num + 1}`){
console.log($(object));
next_titles = find_titles(find_num + 1, $(object), "+ ")[0].innerHTML;
} else {
next_titles = "";
}
$(`<li>${text}${next_titles}</li>`).appendTo(temp_list);
});
return temp_list;
}
EDIT
This:
<h1>First</h1>
<h2>Second</h2>
<p>Lorem Ipsum</p>
<h3>Third</h3>
Should be normally converted into this:
<ul>
<li>
<a>First</a>
<ul>
<li>
<a>Second</a>
</li>
</ul>
</li>
<li>
<a>Third</a>
</li>
</ul>
I don't care wether the first is a h1 h2 or a h3. In the text it's only important for styling but in the tree it isn't important.
You can first clear your data to get only heading nodes and their number and text. After that you can loop the data and build tree structure based on levels using array and index number for each level.
function tree(data) {
data = Array.from(data).reduce((r, e) => {
const number = e.nodeName.match(/\d+?/g);
if(number) r.push({ text: e.textContent, level: +number })
return r;
}, [])
const result = $("<ul>")
const levels = [
[], result
]
data.forEach(({ level, text }) => {
const li = $("<li>")
const a = $("<a>", { text, href: text })
levels[level + 1] = $('<ul>')
li.append(a)
li.append(levels[level + 1]);
levels[level].append(li)
})
return result;
}
const result = tree($("body > *"));
$("body").html(result)
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
<h1>Wow</h1>
<h2>Blablablub</h2>
<p>Lorem Ipsum...</p>
<h1>Lalalala</h1>
<p>Lorem Ipsum...</p>
<h1>Ble</h1>
<h2>Test</h2>
<h3>Third</h3>
<p>Lorem Ipsum...</p>
You could also do this in one reduce method and add to tree if the element is heading.
function tree(data) {
const result = $("<ul>")
const levels = [
[], result
]
Array.from(data).reduce((r, { textContent: text, nodeName }) => {
const number = nodeName.match(/\d+?/g);
const level = number ? +number : null;
if(level) {
const li = $('<li>').append($("<a>", { text, href: text }))
r.push({ level: r[level + 1] = $('<ul>') })
r[level].append(li.append(levels[level + 1]))
}
return r;
}, levels)
return result;
}
const result = tree($("body > *"));
$("body").html(result)
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
<h1>Wow</h1>
<h2>Blablablub</h2>
<p>Lorem Ipsum...</p>
<h1>Lalalala</h1>
<p>Lorem Ipsum...</p>
<h1>Ble</h1>
<h2>Test</h2>
<h3>Third</h3>
<p>Lorem Ipsum...</p>
You can iterate through all the H1 elements and then iterate through all the next header elements (all except H1). Here is an example:
const elements = $('h1').map(function() {
let container = $('<li>');
const ret = container;
container.append($('<a>').text($(this).text()));
let next = $(this).next('h2, h3, h4, h5');
while (next.length) {
const tmp = $('<li>');
tmp.append($('<a>').text(next.text()));
container.append(tmp);
container = tmp;
next = next.next('h2, h3, h4, h5');
}
return ret;
}).get();
const parent = $('<ul>');
parent.append(elements);
console.log(parent[0].innerHTML);
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
<h1>Wow</h1>
<h2>Blablablub</h2>
<p>Lorem Ipsum...</p>
<h1>Lalalala</h1>
<p>Lorem Ipsum...</p>
<h1>Ble</h1>
<h2>Test</h2>
<h3>Third</h3>
<p>Lorem Ipsum...</p>
Using :header selector and tagName property
let $sub, $ul = $('<ul/>')
$(':header').each(function() {
let $this = $(this),
$prev = $this.prev(':header'),
$parent = $prev.length && $prev.prop('tagName') < $this.prop('tagName') ? $sub : $ul
$parent.append('<li><a>' + $this.text() + '</a></li>')
$sub = $('<ul/>').appendTo($parent.find('li:last'))
})
$('body').html($ul)
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
<h1>Wow</h1>
<h2>Blablablub</h2>
<p>Lorem Ipsum...</p>
<h1>Lalalala</h1>
<p>Lorem Ipsum...</p>
<h1>Ble</h1>
<h2>Test</h2>
<h3>Third</h3>
<h3>Third</h3>
<p>Lorem Ipsum...</p>
<h1>First</h1>
<h2>Second</h2>
<p>Lorem Ipsum</p>
<h3>Third</h3>
<h1>First</h1>
<h2>Second</h2>
<p>Lorem Ipsum</p>
<h3>Third</h3>

Regex for capturing repeated groups Javascript

I have some test data in the following format -
"lorem ipsum <img src='some_url' class='some_class' /> lorem ipsum <img src='some_url' class='some_class' /> ipsum <img src='some_url' class='some_class' />"
Now, my goal is to identify all the image tags along with their respective source urls and css classes and store them together with the remaining text in an ordered array like -
["lorem ipsum", {imageObject1}, "lorem ipsum", {imageObject2}, "ipsum", {imageObject3}]
Now for this I tried to create a sample regex
var regex = /(.*(<img\s+src=['"](.+)['"]\s+(class=['"].+['"])?\s+\/>)+?.*)+/ig
Now when I try this regex with the sample text i am getting -
regex.exec(sample_text) => [0:"lorem ipsum <img src='some_url1' class='some_class1' /> lorem ipsum <img src='some_url2' class='some_class2' /> ipsum <img src='some_url3' class='some_class3' />"
1:"lorem ipsum <img src='some_url1' class='some_class1' /> lorem ipsum <img src='some_url2' class='some_class2' /> ipsum <img src='some_url3' class='some_class3' />"
2:"<img src='some_url3' class='some_class3' />"
3:"some_url3"
4:"class='some_class3'"]
How in javascript can I transform the sample html text
into an array of tagged html objects with their attributes.
Do not use regular expressions to parse HTML. Use a DOMParser to parse the string and then CSS queries to get the images from the DOM, it will be much more reliable and easier to read.
var html = "lorem ipsum <img src='some_url' class='some_class' /> lorem ipsum <img src='some_url' class='some_class' /> ipsum <img src='some_url' class='some_class' />"
var nodes = new DOMParser().parseFromString(html, "text/html").body.childNodes
That will get you almost what you wanted (just some empty Text nodes you can filter out).
Or do something a little bit more accurate like this in case you don't have just images and text in the HTML:
var images = new DOMParser().parseFromString(html, "text/html").querySelectorAll("img")
var array = new Map([...images].map(img => [img.previousSibling.nodeValue, img]))

Copy and inserting HTML elements into new pop-up block

I want to completely copy all elements
<div id="articleFull"> ... </div>
(+ div inclusive) with their content in a new pop-up window
<div id="newPopUp"> ... </div>
<div id="articleFull">
<p>lorem ipsum</p>
<img src="1.png" />
<p>lorem ipsum</p>
<p>lorem ipsum</p>
<h3>Test title</h3>
<img src="1.png" />
<p>lorem ipsum</p>
</div>
I tried to do this simple method:
http://jsfiddle.net/ApBSN/3/
articleFull = document.getElementById('articleFull');
function copyHtml(){
div = document.createElement('div')
div.id = 'newPopUp';
document.body.appendChild(div);
var t = document.getElementById('articleFull');
div.appendChild(t);
}
It works... BUT the function does not copy the code, and moves it from one place to another, effectively removing it from its original location. I just want to duplicate the block. Yes, I understand that the page can not be 2 "ID", but with this, I'll take care of myself more.
Ideas?
you can try Clone if interested in Jquery...http://api.jquery.com/clone/ this will duplicate the html rather then replacing it as in case of append
i have updated your http://jsfiddle.net/ApBSN/9/ but now you need to work on css
var t1 = document.getElementById('newPopUp');
var t = document.getElementById('articleFull');
$(t).clone().appendTo(t1);
If I understood correctly this should do it:
function copyHtml(){
div = document.createElement('div')
div.id = 'newPopUp';
document.body.appendChild(div);
var t = document.getElementById('articleFull');
t.id = "articleFull2";
div.appendChild(t);
}

Categories

Resources