Convert innerHTML into a custom json with javascript

Convert innerHTML into a custom json with javascript - javascript

This is an example of innerHTML i get from a text editor on a web page where user can write text and add images, videos and audios.
<p>This is a<br>test</p>
<p><iframe width="560" height="315" src="https://www.youtube.com/embed/12345" frameborder="0" allowfullscreen=""></iframe></p>
<p><audio controls><source src="https://www.test.com/123/456/example.mp3"/></audio></p>
<p>end of test</p>
I save the innerHTML so i can reload the contents written by the user inside the editor, but i also need to format those information in json structure like the following:
{
"page1": {
contents: [
{"text":"This is a test"},
{"video":"https://www.youtube.com/embed/12345"},
{"audio":"https://www.test.com/123/456/example.mp3"},
{"text":"end of test"}
]
}
}
this json should be sent to the backend and saved, so a mobile app can ask for those information and display them in a customized way. Maintaining elements' order is crucial.
So, how can i obtain the above structure from innerHTML in javascript? I'm madding out on it

Hope this might give you a basic idea:
1) You need to select different keys for start text and end text like start_text and end_text.
2) Create a virtual DOM element and store the innerHTML string you have in the innerHTML of DOM element. This will help you to access DOM methods and you can achieve what you want.
Ex:
var content = '(innerHTML content)';
var d = document.createElement("DIV");
d.innerHTML = content;
var p_tags = d.querySelectorAll("p");
3) Create your preferred object structure.
Ex:
var final_content = {};
final_content["page_1"] = {};
final_content["page_1"]["content"] = [];
final_content["page_1"]["content"].push({"start_text":""});
4) Finally, you can convert object to JSON string with JSON.stringify(final_content).

If the format never changes, you can try converting innerHTML to string, then split by . This will create an array of 4 elements. Loop through each element. For strings, stripping tags from strings can easily be done with string.replace("",""). For the more complicated iframe and audio tags, use this regex "(https.*?)". It will return the src url. Then create your objects with those values.
Here's some quick pseudo code:
var aHtml = JSON.stringify(element.innerHTML).split('</p>');
var result = [];
aHtml.forEach(function(item, idx, arr){
// run regex against it, grab matching element
var match = item.match(/"(https.*?)"/,"g");
if(match){
var url = match[1]; // the url
if(match[0].indexOf('audio')> -1){
result.push({audio: url});
}else{
result.push({video: url});
}
}else{
var str = item.replace(/(<p>|<br>)/g, " ");
result.push({text: str});
}
})
console.log(result);

Related

How to find image tags in a string in node js

I have a string with image tags like below
var str = '<img src="www.client.jpg><img src="www.custums.png">';
I have to find the image tags and src i.e I need to push the image tags into an array but right now I cannot use jsdom since I had a version problem right now in my server.So, can anyone please suggest me help to do this with not using jsdom.Thanks.

Just split the string and then filter to the urls like so;
var str = '<img src="www.client.jpg"><img src="www.custums.png">';
console.log(str.split("\"").filter(t => t.startsWith("www.")));
Your example was missing a ", it would make it so this doesn't parse correctly, but assuming the html is actually of that form but without errors it will give you just the urls.

you can use xpath to extract, the path would be //img#src. alternatively, you can use an xml to json parser ; sth like fast-xml-parser

Split the string into an array of image tag. Convert strings into HTML DOM objects as shown below. Then you can easily get the src of image tag.
var str = "<img src='www.client.jpg'><img src='www.custums.png'>";
var newstring = str.replace(/></gi, ">,<"); // Replace '><' with '>,<' so that you can split with ','
var imgs = newstring.split(",");
for(i=0; i<imgs.length; i++) {
// Create a temporary div, assign img string as its innerHTML, then gets its content
var d = document.createElement('div');
d.innerHTML = imgs[i];
// Reassign the imgs array with HTML DOM object instead of string
imgs[i] = d.firstChild;
console.log(imgs[i].src);
}

If I write something into innerHTML, how do I get exact copy back?

I have string and I want to write this string into innerHTML property and then later on I want to compare these two strings. Basicaly I want something like this:
var myString = "<div>my message with accented áé characters</div>"
var output = document.getElementById("first")
output.innerHTML = myString
Then do comparison
if (output.innerHTML == myString) {
...
}
by running this code I get false result for the string above because sequence áé gets translated to áé and thus strings don't match.
The closest I got to desired result was to use little hack and create temporary element write to it and then compare with data extracted from it. It works fine, but it doesn't feel right to do it this way.
var tmp = document.createElement('div');
tmp.innerHTML = myString
if (output.innerHTML == tmp.innerHTML) {
...
}
So my question is:
is there any way how to get back exact copy of what I wrote into innerHTML?
or is there a simple way to transform data, so they match?
or is my hack just standard way how to do this kind of things?
To put it into context I am generating data, then transform it into JSON. This JSON output is then read in javascript by JSON.parse() and result is inserted into various tags. I am checking difference first, because when new data differs from old data, then other actions take place too.
I have also tried things like unescape() but it doesn't work on this kind of escaping. And yes, there is still option to store raw strings in Map and compare these.
Here is little more complete demonstration code which I also put on JSFiddle
<div id="first"> </div>
<div id="resultFirst"></div>
<br />
<div id="second"> </div>
<div id="resultSecond"></div>
<script>
var myString = "<div>my message with accented áé characters</div>"
var output = document.getElementById("first")
var result = document.getElementById("resultFirst")
output.innerHTML = myString
if (output.innerHTML == myString) {
result.innerText = "TRUE " + myString
} else {
result.innerText = "FALSE " + myString
}
var output2 = document.getElementById("second")
var result2 = document.getElementById("resultSecond")
var tmp = document.createElement('div');
tmp.innerHTML = myString
output2.innerHTML = myString
if (tmp.innerHTML == output2.innerHTML) {
result2.innerText = "TRUE " + tmp.innerHTML
} else {
result2.innerText = "FALSE " + tmp.innerHTML
}
</script>

The short answer is no; The HTML is just a display layer and you shouldn't rely on it for storing data, only showing it.
The browser will do whatever it needs to the innerHTML to make it renderable. The string is just a representation of the DOM structure inside that element. So once you've passed your string to innerHTML, it converts it to actual DOM elements. Retrieving innerHTML asks the browser to do the opposite; Converting those nodes back to a string however it sees fit. In short, innerHTML is not a variable, it's a function and it isn't actually "storing" anything.
If you want to store an exact copy of something, don't use HTML, it's strictly for display. Instead, store the variables in javascript and associate the id of each element with your variables to do the comparisons. You can then look up those elements by the ID and apply classes to them to trigger animations.

Find all the links and replace them with their href value

I think I have a string like:
href text, text, test
And I need to output
site/project/109# text, text, test
I can find all links
var txt = msg.match(/\<a\shref=\"(.*)\"\s(.*)[\<\/a\>]/gmi);
And in loop make replace. But I would like to shorten the code, do everything through single replace, like this :
var txt = msg.replace(/\<a\shref=\"(.*)\"\s(.*)[\<\/a\>]/gmi, $1);
But in this case I get: [object HTMLHeadElement]

Never use regex to parse HTML, it's better to generate an element with the content and do the rest on the element.
var str = 'href text, text, test';
// create an element
var temp = document.createElement('div');
// set the content with the string
temp.innerHTML = str;
// get all `a` tags and convert into array
// for older browser use `[].slice.call()
// for converting into array
Array.from(temp.querySelectorAll('a')).forEach(function(ele) {
// create a text node with the attribute value
var text = document.createTextNode(ele.getAttribute('href'));
// replace a tag wit the text node
ele.replaceWith(text);
});
// get the updated html content
console.log(temp.innerHTML)
Why not regex ? : RegEx match open tags except XHTML self-contained tags
UPDATE : The msg variable is an element object, a not string that's why it's getting converted to [object HTMLHeadElement](HTMLHeadElement refers to the HEAD tag, I think something wrong with your core check that also). So do the same as above where replace temp with the msg. In case you want to keep the original element content then generate temp element as above and set content as temp.innerHTML = msg.innerHTML .

If you're using jQuery (which is great and does all things) then you can get the href quite easily:
var string = 'href text, text, test';
var href = $(string).attr('href');
which means that setting the text of the anchor tag is trivial:
$(string).text($(string).href));

Search the HTML document's text for certain strings (and replace those)

I'm writing a Firefox extension. I want to go through the entire plaintext, so not Javascript or image sources, and replace certain strings. I currently have this:
var text = document.documentElement.innerHTML;
var anyRemaining = true;
do {
var index = text.indexOf("search");
if (index != -1) {
// This does not just replace the string with something else,
// there's complicated processing going on here. I can't use
// string.replace().
} else {
anyRemaining = false;
}
} while (anyRemaining);
This works, but it will also go through non-text elements and HTML such as Javascript, and I only want it to do the visible text. How can I do this?
I'm currently thinking of detecting an open bracket and continuing at the next closing bracket, but there might be better ways to do this.

You can use xpath to get all the text nodes on the page and then do your search/replace on those nodes:
function replace(search,replacement){
var xpathResult = document.evaluate(
"//*/text()",
document,
null,
XPathResult.ORDERED_NODE_ITERATOR_TYPE,
null
);
var results = [];
// We store the result in an array because if the DOM mutates
// during iteration, the iteration becomes invalid.
while(res = xpathResult.iterateNext()) {
results.push(res);
}
results.forEach(function(res){
res.textContent = res.textContent.replace(search,replacement);
})
}
replace(/Hello/g,'Goodbye');
<div class="Hello">Hello world!</div>

You can either use regex to strip the HTML tags, might be easier to use javascript function to return the text without HTML. See this for more details:
How can get the text of a div tag using only javascript (no jQuery)

How can I manipulate the DOM from a string of HTML in JavaScript?

I'm developing a Windows 8 Metro App using JavaScript. I need to manipulate a string of HTML to select elements like DOM.
How can I do that?
Example:
var html = data.responseText; // data.response is a string of HTML received from xhr function.
// Now I need to extract an element from the string like document.getElementById("some_element")...
Thanks!
UPDATE:
I solved!
var parser = new DOMParser();
var xml = parser.parseFromString(data.responseText);

I think your approach to the problem isn't the best, you could return JSON or xml. But if you need to do it that way:
To my knowledge you wont be able to use getElementById without inserting a new element in the document (in the example below, doing inserting div in document, for example document.appendChild(div)), but you could do this:
var div = document.createElement("div");
div.innerHTML = '<span id="rawr"></span>'; //here you would put data.responseText
var elements = div.getElementsByTagName("span"); // [<span id="rawr"></span>], there you could ask elements[0].id === "rawr" or whatever you like

Develop Reference

JavaScript is the programming language of the Web.

Convert innerHTML into a custom json with javascript - javascript

Related

How to find image tags in a string in node js

If I write something into innerHTML, how do I get exact copy back?

Find all the links and replace them with their href value

Search the HTML document's text for certain strings (and replace those)

How can I manipulate the DOM from a string of HTML in JavaScript?

Categories

Resources