meta property is undefined - javascript

i have some code which takes in a html using ajax and after which it's meta tags are retrieved.
if (request.readyState == 4) {
var html_text = request.responseText;
var parent = document.createElement('div');
parent.innerHTML = html_text;
var metas = parent.getElementsByTagName('meta');
var meta;
for(var i=0; i < metas.length; i++) {
meta = metas[i];
alert(meta.property);
alert(meta.content); }
}
the html_text does contains meta property and content and the content does show. but why is the meta property showing as undefined? can anyone help me with this?

Either you have to look for meta.name or you could use meta.getAttribute("property").
btw: You are innerHTML'ing the variable html_code but you stored the HTML content in html_text.

You could try using getAttribute to get the property attribute:
alert(meta.getAttribute('property'));
I'm not sure why it wouldn't work your way though.

What you try here is a kind of creation of a new document, it will not work at least in IE this way.
Put this line
alert(parent.innerHTML)
right after:
parent.innerHTML = html_text;
...and you will see, that you only get the contents of the body, everything else has been omitted.
If the response is valid xml, request.responseXML should be available, you can inspect it directly(it's already a document).

Related

Javascript pulling content from commented html

Bit of a JS newbie, I have a tracking script that reads the meta data of the page and places the right scripts on that page using this:
var element = document.querySelector('meta[name="tracking-title"]');
var content = element && element.getAttribute("content");
console.log(content)
This obviously posts the correct tag to console so I can make sure it's working .. and it does in a test situation. However, on the actual website the meta data i'm targeting is produced on the page by a Java application and beyond my control, the problem is it is in a commented out area. This script cannot read within a commented out area. ie
<!-- your tracking meta is here
<meta name="tracking-title" content="this-is-the-first-page">
Tracking finished -->
Any ideas appreciated.
You can use this code:
var html = document.querySelector('html');
var content;
function traverse(node) {
if (node.nodeType == 8) { // comment
var text = node.textContent.replace(/<!--|-->/g, '');
var frag = document.createDocumentFragment();
var div = document.createElement('div');
frag.appendChild(div);
div.innerHTML = text;
var element = div.querySelector('meta[name="tracking-title"]');
if (element) {
content = element.getAttribute("content");
}
}
var children = node.childNodes;
if (children.length) {
for (var i = 0; i < children.length; i++) {
traverse(children[i]);
}
}
}
traverse(html);
One way is to use a NodeIterator and get comment nodes. Quick example below. You will still need to parse the returned value for the data you want but I am sure you can extend this here to do what you want.
Fiddle: http://jsfiddle.net/AtheistP3ace/gfu791c5/
var commentedOutHTml = [];
var iterator = document.createNodeIterator(document.body, NodeFilter.SHOW_COMMENT, NodeFilter.FILTER_ACCEPT, false);
var currentNode;
while (currentNode = iterator.nextNode()) {
commentedOutHTml.push(currentNode.nodeValue);
}
alert(commentedOutHTml.toString());
You can try this. This will require you to use jQuery however.
$(function() {
$("*").contents().filter(function(){
return this.nodeType == 8;
}).each(function(i, e){
alert(e.nodeValue);
});
});

HTML JavaScript delay downloading img src until node in DOM

Hi I have markup sent to me from a server and I set it as the innerHTML of a div element for the purpose of traversing the tree, finding image nodes, and changing their src values. Is there a way to prevent the original src value from being downloaded?
Here is what I am doing
function replaceImageSrcsInMarkup(markup) {
var div = document.createElement('div');
div.innerHTML = markup;
var images = div.getElementsByTagName('img');
images.forEach(replaceSrc);
return div.innerHTML;
}
The problem is that in browsers as soon as you do:
var img = document.createElement('img'); img.src = 'someurl.com' the browser fires off a request to someurl.com. Is there a way to prevent this without resorting to parsing the markup myself? If there is in no other way does anyone know a good way of parsing the markup with as little code as possible to accomplish my goal?
I know you are already happy with your solution, but I think it would be worth sharing a safe method for future users.
You can now simply use the DOMParser object to generate an external document from your HTML string, instead of using a div created by your current document as container.
DOMParser specifically avoids the pitfalls mentioned in the question and other threats: no img src download, no JavaScript execution, even in elements attributes.
So in your case you can safely do:
function replaceImageSrcsInMarkup(markup) {
var parser = new DOMParser(),
doc = parser.parseFromString(markup, "text/html");
// Manipulate `doc` as a regular document
var images = doc.getElementsByTagName('img');
for (var i = 0; i < images.length; i += 1) {
replaceSrc(images[i]);
}
return doc.body.innerHTML;
}
Demo: http://jsfiddle.net/94b7gyg9/1/
Note: with your current code, browsers will still try downloading the resource initially specified in your img nodes src attribute, even if you change it before the end of JS execution. Trace network transactions in this demo: http://jsfiddle.net/94b7gyg9/
Rather than append the new markup to the DOM before you change the img sources, create an element, set it's inner HTML, change the source of the images and then finally, append the changed markup to the page.
Here's a fully-worked sample.
<!DOCTYPE html>
<html>
<head>
<script>
"use strict";
function byId(id,parent){return (parent == undefined ? document : parent).getElementById(id);}
//function allByClass(className,parent){return (parent == undefined ? document : parent).getElementsByClassName(className);}
function allByTag(tagName,parent){return (parent == undefined ? document : parent).getElementsByTagName(tagName);}
function newEl(tag){return document.createElement(tag);}
//function newTxt(txt){return document.createTextNode(txt);}
///////////////////////////////////////////////////////////////////////////////////////////////////////////////////
///////////////////////////////////////////////////////////////////////////////////////////////////////////////////
window.addEventListener('load', onDocLoaded, false);
function onDocLoaded()
{
byId('goBtn').addEventListener('click', onGoBtnClick, false);
}
var dummyString = "<img src='img/girl.png'/><img src='img/gfx07.jpg'/>";
function onGoBtnClick(evt)
{
var div = newEl('div');
div.innerHTML = dummyString;
var mImgs = allByTag('img', div);
for (var i=0, n=mImgs.length; i<n; i++)
{
mImgs[i].src = "img/murderface.jpg";
}
document.body.appendChild(div);
}
</script>
<style>
</style>
</head>
<body>
<button id='goBtn'>GO!</button>
</body>
</html>
You could directly parse the markup string using a regex to replace the img src. Searching for all the img src urls in the string and then replacing them with the new url.
var regex = /<img[^>]+src="?([^"\s]+)"?\s*\/>/g;
var imgUrls = [];
while ( m = regex.exec( markup ) ) {
imgUrls.push( m[1] );
}
imgUrls.forEach(function(url) {
markup = markup.replace(url,'new-url');
});
Another solution might be, if you have access to it, to set the all the img src to an empty string, and put the url in in a data-src attribute. Having your markup string look like something like this
markup = '
';
Then setting this markup to your div.innerHTML won't trigger any download from the browser. And you can still parse it using regular DOM selector.
div.innerHTML = markup;
var images = div.getElementsByTagName('img');
images.forEach(function(img){
var oldSrc = img.getAttribute('data-src');
img.setAttribute('src', 'new-url');
});

Get the text from an external HTML document

My goal is to get the text from a HTML document which does not call any functions from my .jsp file.
I've looked around and I thought I had found the answer to my problem but it doesn't seem to be working, and other answers consist of using jQuery (which I am both unfamiliar with and not allowed to use).
This is my code so far:
function getText(divID) {
var w = window.open("test.html");
var body = w.document.body;
var div = document.getElementById(divID);
var textContent = body.textContent || body.innerText;
console.log(textContent);
//div.appendChild(document.createTextNode(textContent));
}
So as you can see, I'm trying to get the body of one HTML document and have it appear in another. Am I on the right tracks?
EDIT: Ok so I seem to have made my problem quite confusing. I call the function in a HTML document called html.html, but I want to get the text from test.html, then have it appear in html.html. It has to be like this because I can't assume that the HTML document I want to read from will include my .jsp file in its head.
At the moment I am getting the following error.
Uncaught TypeError: Cannot read property 'body' of undefined
The reason document.body in the other window is undefined, is because the other window has not loaded and rendered the document yet.
One solution would be to wait for the onload event.
function getText(divID) {
var w = window.open("test.html");
w.addEventListener("load", function() {
var body = w.document.body;
var div = document.getElementById(divID);
var textContent = body.textContent || body.innerText;
console.log(textContent);
});
}
Make sure you run the getText function on a user event like a click, else window.open will fail.
If all you want to do is get the contents of the other window, using AJAX would probably be a better option.
function getText(divID) {
var xhr = new XMLHttpRequest();
xhr.onreadystatechange = function() {
if (xhr.readyState == 4 ) {
var body = xhr.response.body;
var div = document.getElementById(divID);
var textContent = body.textContent || body.innerText;
console.log(textContent);
}
};
xhr.open("GET", "test.html", true);
xhr.responseType = "document";
xhr.send();
}

Don't load scripts appended with innerHTML?

I'm appending a whole HTML page to a div (to scrape). How do I stop it from requesting script, and css files ? I tried immediately removing those nodes but they still get requested.
It's for a browser addon, I'm scraping with JS
As #adeneo wrote you don't have to add the html to a page in order to scrape information from it, you can turn it into DOM tree that is disconnected from the page DOM and process it there.
In jQuery it is simple $("html text here"). Then you can scrape it using the API,
eg.
function scrape_html(html_string) {
var $dom = $(html_string);
var name = $dom.find('.name').text();
return name;
}
without jQuery:
function scrape_html(html_string) {
var container = document.createElement('div');
container.innerHTML = html_string;
var name = container.getElementsByClassName('name')[0].innerText;
return name;
}
Setting the innerHTML of a temporary HTML element that has not been added to the document, will not execute scripts, and since it does not belong to your document, the style will not be applied either.
This will give you an opportunity to strip out any unwanted elements before copying the innerHTML to your own document.
Example:
var temp = document.createElement('div');
temp.innerHTML = html; // the HTML of the 'other' page.
function removeElements(element, tagName)
{
var elements = temp.getElementsByTagName(tagName);
while(elements.length > 0)
{
elements[0].parentNode.removeChild(elements[0]);
}
}
removeElements(temp, 'script');
removeElements(temp, 'style');
removeElements(temp, 'link');
container.innerHTML = temp.innerHTML;

Block existing scripts

I am making an addon for firefox. I want to extract a video from a HTML page and display it on a black background. Here is what i've got.
//main.js
var pageMod = require("page-mod");
pageMod.add(new pageMod.PageMod({
include: "http://myserver.fr/*",
contentStyleFile: data.url("modify.css"),
contentScriptFile: data.url('hack.js'),
contentScriptWhen: 'start'
}));
//hack.js
video = document.body.innerHTML;
document.body.innerHTML = '';
video = video.substring(video.lastIndexOf("<object"),video.lastIndexOf("</object>"));
video = "<div id='fond'></div><div id='mavideo'>"+video+"</div>"
document.body.innerHTML = video;
document.body.style.backgroundColor = "black";
document.body.style.margin = "0";
My code works but the probleme is that I have to wait "for hours" while the other javascript is beeing executed. I've tried to use contentScriptWhen: 'start' but i dosent change a thing.
Any idea to block the other script of the page ?
Blocking scripts from loading is going to be a little difficult, we can quickly remove them instead.
// remove all scripts
var scripts = document.getElementsByTagName("script");
var parent = null;
for (var i = 0; i < scripts.length; i += 1) {
parent = scripts.item(i).parentNode;
parent.removeChild(scripts.item(i));
}
This code tries to remove all the script tags from your page which would stop the loading of any scripts and allows you to run the rest of your code. Put this at the top of your hack.js script.
I don't know the page you're looking at so it's hard to know what else is going on but your use of substring, and lastIndexOf aren't going to be very fast at all either. We can get rid of those and see if you get any noticible speed increase.
Try using query selectors and Document Fragments instead. Here's an example:
//hack.js
var fragment = document.createDocumentFragment();
var objects = document.getElementsByTagName("object");
// create your div objects and append to the fragment
var fond = document.createElement("div");
fond.setAttribute("id", "fond");
fragment.appendChild(fond);
var mavideo = document.createElement("div");
mavideo.setAttribute("id", "mavideo");
fragment.appendChild(mavideo);
// append all <object> tags to your video div
for (var i = 0; i < objects.length; i += 1) {
mavideo.appendChild(objects.item(i));
}
// clear and append it all in
document.body.innerHTML = '';
document.body.appendChild(fragment);
That code throws all the objects into a document fragment so you can wipe the whole page away and then append it all back in; no libraries required. Your divs fond & mavideo are in there as well.
I didn't really test all this code out so hopefully it works as expected.

Categories

Resources