How can I get the text inside an object element?

How can I get the text inside an object element? - javascript

In the object element with the ID x a text file is loaded and displayed correctly. How can I get this text with JavaScript?
I set
y.data = "prova.txt"
then tried
y.innerHTML;
y.text;
y.value;
None of these work.
<!DOCTYPE html>
<html>
<head>
</head>
<body>
<object id="x" data="foo.txt"></object>
<script>
var y = document.getElementById("x")
</script>
</body>
</html>

I'm afraid this isn't going to be easy as you'd like it to be.
According to your comments, you tried AJAX first, but came across CORS problems. This happens when you try to include data from files on a different domain name.
Since that didn't work, you tried to include the file inside an object tag. This works a bit like an iframe - the data will be displayed on the webpage, but for the same reasons as above, you cannot access the data through JavaScript if the file is under a different domain name. This is a security feature. That explains the error you were getting most recently:
Uncaught SecurityError: Failed to read the 'contentDocument' property from 'HTMLObjectElement'
Now, there are a few ways you might be able to get around this.
Firstly, if this is a program exclusively for your own use, you can start your browser with web-security disabled (though this is dangerous for browsing the web generally). In Chrome, for example, you can do this by launching Chrome with the --disable-web-security flag. More details here.
Secondly, you can try to arrange that your document and the file do belong under the same domain. You will probably only be able to do this if you have control of the file.
Your error message (specifically a frame with origin "null") makes me think that you are running the files directly in the web-browser rather than through a server. It might make things work better if you go through an actual server.
If you've got Python installed (it's included on Linux and Mac), the easiest way to do that is to open up the terminal and browse to your code's directory. Then launch a simple Python server:
cd /home/your_user_name/your_directory
python -m SimpleHTTPServer
That will start up a web server which you can access in your browser by navigating to http://localhost:8000/your_file.html.
If you are on Windows and haven't got Python installed, you could also use the built-in IIS server, or WAMP (or just install Python).

y.innerHTML = 'Hello World';
will replace everything in the 'x' element with the text 'Hello World', but it looks like you've already loaded another HTML document into the 'x' element. So the question is...
Where exactly in the 'x' element do you want to insert the text? for example 'x' -> html -> body?

The object element is loading the text file asynchronously, so if you try to get its data by querying the element, you'll get undefined.
However, you can use the onload attribute in <object> elements.
In your HTML, add an onload that calls a function in your script to catch when the text file has fully loaded.
<object id="x" onload="getData()" data="readme.txt"></object>
In the script, you can get the object's data with contentDocument.
function getData() {
var textFile = document.getElementById('x').contentDocument;
/* The <object> element renders a whole
HTML structure in which the data is loaded.
The plain text representation in the DOM is surrounded by <pre>
so we need to target <pre> in the <object>'s DOM tree */
// getElementByTagsName returns an array of matches.
var textObject = textFile.getElementsByTagName('pre')[0];
// I'm sure there are far better ways to select the containing element!.
/*We retrieve the inner HTML from the object*/
var text = textObject.innerHTML;
alert(text); //use the content!
}

Related

Using external JavaScript files in a WinForms/WPF WebBrowser control

I found a cool feature on a website, implemented in JavaScript, I'd like to use it as is in my desktop application (for personal use).
During my experiments I managed to generate custom HTML on the fly, feed it to the browser using webBrowser1.DocumentText = [my generated HTML]
I've managed to put some inline JavaScript into the HTML, and hook it up via a ScriptManager so that I can call the JavaScript from my C# code, pass a value to it, and get a return value.
But the feature I'm trying to use is a bit more complicated: it's no less than 10 JavaScript files. 2 of them are referenced directly in the web page the usual way <script src="/js/script1.js" type="text/javascript"></script>
The other 8 are loaded in one of the scripts:
var elem = document.createElement("script");
elem.type = "text/javascript";
elem.src = "/js/" + filename;
document.body.appendChild(elem);
These 8 files are in fact data files, even though the data is represented in JavaScript. They're pretty large, over 1MB each. Stuffing it all into the HTML file seems quite stupid. Also, the script that loads the data creates a "file map" and further refers to the data based on which file it's in:
var fileMap = [
[/[\u0020-\u00ff]/, 'file1.js'],
[/[\u3000-\u30ff]/, 'file2.js'],
[/[\u4e00-\u5dff]/, 'file3.js'],
...
I don't want to resort to modifying the JavaScript, because it's not exactly my strong point. So the browser needs to "see" the js files in order to be able to use them. I thought of creating the file structure locally, and navigating the browser there. But I don't want any loose files in my solution. I'd like to have everything embedded if possible. And I doubt I can get the browser to navigate to an embedded resource, and see other embedded resources as files. Any idea how I could get around this?
EDIT:
I've tried to do it with local files. No luck. I get the HTML to load properly, but when I try to invoke a JavaScript call, nothing happens. I tried pointing the browser to those js files, to make sure they're there. They are. I tried an element with src attribute pointing to an image in the same subfolder as the script files. It gets rendered. It's as if external js files refuse to load.

I had a similar need as your scenario and I addressed it using two key points embedded in two other Stack Overflow answers. As noted by SLaks' answer here the first key is using the syntax file:/// as the prefix for an absolute path to external files. The second is using .Replace("\\", "/") for an absolute file path as listed in Adam Plocher's answer and one of his follow-up comments here.
In short, the final output for each external file in an HTML page will look something like:
<link href="file:///c:/users/david/myApp/styles/site.css" rel="stylesheet" type="text/css">
or
<script src="file:///c:/users/david/myApp/scripts/JavaScript1.js"></script>
Using the format in the samples above in my HTML file resulted in the WebBrowser control loading external CSS, image or script files.
The details and solving the scenario in the question
In the womd's answer in the first referenced SO answer above he used the method System.IO.File.ReadAllText() to load script files and embedded the text of the script files into the <head> tag. As you indicated in your question loading script files directly into the HTML page is not what you're looking to do.
The solution below involves using the same System.IO.File.ReadAllText() method but loads the text of the HTML page instead. The premise works similar to the Razor View Engine in ASP.NET.
The main idea in the solution below involves adding a temporary string in an HTML page that will be loaded into the WebBrowser control and then replacing this temporary string in a C# method in my app just before the HTML page is set to be loaded into the WebBrowser control.
Here are the basic steps to my solution:
Add a temporary string for each external reference in the HTML file.
Declare a variable for the absolute path in a script tag within the HTML file. This step is not necessary unless you're going to use the absolute path elsewhere within your JavaScript code. Your scenario involves delay loading external script files via JavaScript code so this step was necessary.
Modify the src property in the JavaScript code that delay loads the other script files with the absolute path variable.
Add a method in your app to loads the HTML page file as a text string and then replaces all temporary string instances with an absolute path containing the prefix 'file:///'. The absolute path should have forward slashes.
Set the 'DocumentText' property on the WebBrowser control to the updated HTML.
Set the 'Copy to Output Directory' of each external file in your project to 'Copy always' or 'Copy if newer'. This step may not be necessary if you have a fixed location to your external files and that location is not within the build or publish directory used by Visual Studio.
The following are the details for each step. I added a lot of detail that you can skip. I was verbose to reduce any confusion since the steps make changes to several places in the project.
1. Using a temporary string
I used the string "/ReplaceWithAbsolutePath/" but you can use any distinct text. Each reference to an external file in the HTML page looks like:
<!DOCTYPE html>
<html lang="en" xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta charset="utf-8" />
<title></title>
<link href="/ReplaceWithAbsolutePath/styles/site.css" rel="stylesheet" type="text/css">
<script type="text/javascript">
var absolutePath = "/ReplaceWithAbsolutePath/";
</script>
</head>
<body>
<p>My web page</p>
<script src="/ReplaceWithAbsolutePath/scripts/JavaScript1.js"></script>
</body>
</html>
2. Declare absolute path variable
Note in the above HTML page I listed a <script> tag with the declared variable 'absolutePath' set to the temporary string. (In the HTML page above the variable is added a global variable and that is not necessarily best practice. You can declare the variable within a namespace instead of declaring it in the global namespace.)
3. Modify the delay load script to include absolute path variable
Add the 'absolutePath' variable to your JavaScript file that delay loads other JavaScript files containing your data.
elem.src = absolutePath + "/js/" + filename;
4. C# method to replace all temporary string instances
Within your project add the following line to your form load event handler or place this line somewhere in your initialization of the WebBrowser control.
webBrowser1.DocumentText = GetUpdatedHtmlWithAbsolutePaths("/ReplaceWithAbsolutePath/", "HTMLPage1.html");
Add the following method to your code. Update the call to the method in the line above with the name of the class instance where the following method is placed.
// The result of this method will look like the following example:
// <script src="file:///c:/users/david/documents/myApp/scripts/JavaScript1.js"></script>
public string GetUpdatedHtmlWithAbsolutePaths(string tempPathString, string htmlFilename)
{
// Get the directory as the application
// stackoverflow.com/questions/674857/should-i-use-appdomain-currentdomain-basedirectory-or-system-environment-current
// Note that the 'BaseDirectory' property will return a string with trailing backslashes ('\\')
string appDirectory = AppDomain.CurrentDomain.BaseDirectory;
// Replace '//' with '/' in the appDirectory string
appDirectory = appDirectory.Replace("\\", "/");
// Read all of the HTML text from the HTML page file
string html = System.IO.File.ReadAllText(appDirectory + #"\" + htmlFilename);
// Replace all '/ReplaceWithAbsolutePath/' strings within the HTML text with
// the absolute path on the local machine
html = html.Replace(tempPathString, "file:///" + appDirectory);
return html;
}
5. Set the DocumentText property of the WebBrowser control
I added the initialization of the WebBrowser control in the form load event handler but you can, of course, add the line that sets the DocumentText property wherever you initialize your WebBrowser control.
private void Form1_Load(object sender, EventArgs e)
{
// Set the document text of the web browser control with the updated HTML
webBrowser1.DocumentText = GetUpdatedHtmlWithAbsolutePaths("HTMLPage1.html");
}
6. Set the 'Copy to Output Directory' of each external file
Take a look at the answer posted by Matthew Watson in this Stack Overflow question if you want your external files included in your solution/project file structure.
You can add files to your project and select their properties: "Build
Action" as "Content" and "Copy to output directory" as "Copy Always"
or Copy if Newer (the latter is preferable because otherwise the
project rebuilds fully every time you build it).
Then those files will be copied to your output folder.
This is better than using a post build step because Visual Studio will
know that the files are part of the project. (That affects things like
ClickOnce applications which need to know what files to add to the
clickonce data.)
In short, add the external file to your project. You can add the external to any subfolder in your project. (In Visual Studio 2013 or 2015 -- I don't have VS2012) Right-click on the external file in the Solution Explorer and select Properties from the context menu. The Properties pane will be displayed. In the Properties pane change the setting for 'Copy to Output Directory' to 'Copy always' or 'Copy if newer'.
Use View Source to verify absolute path strings
Run your project and it should load your external files in the WebBrowser control. Assuming you have not set the property wbChartContainer.IsWebBrowserContextMenuEnabled = false; in code or in the Properties pane for WebBrowser control you can right-click on the WebBrowser control when your form is running. Click 'View Source' from the context menu and check the paths to your external resources in the View Source window.

can firefox extension modify DOM of HTML document then save as HTML?

I am creating a firefox extension that lets the operator perform various actions that modify the content of the HTML document. The operator does not edit HTML, they take other actions and my extension modifies the document by inserting elements, adding attributes, and so forth.
When the operator is finished, they need to be able to save the HTML document as a file (or have my extension send it to an internet destination, but this is not required since they can email the saved file).
I thought maybe the changes made by the javascript code in my extension would be reflected in the HTML document, but when I ask the firefox browser to "view source" after making modifications, it displays the original HTML text.
My questions are:
#1: What is the easiest way for the operator to save the HTML document with all the changes my extension has made?
#2: What is the easiest way for the javascript code in my extension to process the HTML document contents and write to an HTML file on the local disk?
#3: Is any valid HTML content incapable of accurate representation in the saved file?
#4: Is the TreeWalker part of the solution (see below)?
A couple observations from my research so far:
I've read about the TreeWalker object, which seems to provide a fairly painless way for an extension to walk through everything (?or almost everything?) in the HTML document. But does it expose everything so everything in the original (and my modifications) can be saved without losing anything of importance?
Does the TreeWalker walk through the HTML document in the "correct order" --- the order necessary for my extension to generate the original and/or modified HTML document?
Anything obscure or tricky about these problems?

Ok so I am assuming here you have access to page DOM. What you need to do it basically make changes to the dom and then get all the dom code and save it as a file. Here is how you can download the page's html code. This will create an a tag which the user needs to click for the file to download.
var a = document.createElement('a'), code = document.querySelectorAll('html')[0].innerHTML;
a.setAttribute('download', 'filename.html');
a.setAttribute('href', 'data:text/html,' + code);
Now you can insert this a tag anywhere in the DOM and the file will download when the user clicks it.
Note: This is sort of a hack, this injects entire html of the file in the a tag, it should in theory work in any up to date browser (except, surprise, IE). There are more stable and less hacky ways of doing it like storing it in a file system API file and then downloading that file instead.
Edit: The document.querySelectorAll line accesses the page DOM. For it to work the document must be accessible. You say you are modifying DOM so that should already be there. Make sure you are adding the code on the page and not your extension code. This code will be at the same place as your DOM modification code, not your extension pages that can't access the DOM.
And as for the a tag, it will be inserted in the page. I skipped the steps since I assumed you already know how to manipulate DOM and also because I don't know where you would like to add the link. And you can skip the user action of clicking the link too, but it's a hack and only works in modern browsers. You can insert the a tag somewhere in the original page where user won't see it and then call the a.click() function to simulate a click event on the link. But this is not a legit way and I personally only use it on my practice projects to call click event listeners.
I can only test this on chrome not on FF but try this code, this will not require you to even add the a link to DOM. You need to add this next to the DOM manipulation code. This will work if luck is on your side :)
var a = document.createElement('a'), code = document.querySelectorAll('html')[0].innerHTML;
a.setAttribute('download', 'filename.html');
a.setAttribute('href', 'data:text/html,' + code);
a.click();

There is no easy way to do this with the web API only, at least when you want a result that does not omit stuff like the doctype or comments. You could still write a serializer yourself that goes through document.childNodes and serialized according to the node type (Element.outerHTML, Comment.data and so on).
Luckily, you're writing a Firefox add-on, so you have access to a lot more (powerful) stuff.
While still not 100% perfect, the nsIDocumentEncoder implementations will produce pretty decent results, that should only differ in some whitespace and explicit charset declaration at most (everything else is a bug).
Here is an example on how one might use this component:
function serializeDocument(document) {
const {
classes: Cc,
interfaces: Ci,
utils: Cu
} = Components;
let encoder = Cc['#mozilla.org/layout/documentEncoder;1?type=text/html'].createInstance(Ci.nsIDocumentEncoder);
encoder.init(document, 'text/html', Ci.nsIDocumentEncoder.OutputLFLineBreak | Ci.nsIDocumentEncoder.OutputRaw);
encoder.setCharset("utf-8");
return encoder.encodeToString();
}
If you're writing an SDK add-on, stuff gets more complicated as the SDK abstracts some important stuff away. You'll need to go through the chrome module, and also figure out the active window and tab yourself. Something like Services.wm.getMostRecentWindow("navigator:browser").content.document (Services.jsm) should do the trick.
In XUL overlay add-ons, content.document should suffice to get the document of the currently active tab, and you have Components access already.
Still, you need to let the user choose a file destination, usually through nsIFilePicker and then actually write the file, by using something like a file stream or the fully async OS.File API.

Looks like I get to answer my own question, thanks to someone in mozilla #extdev IRC.
I got totally faked out by "view source". When I didn't see my modifications in the window displayed by "view source", I assumed the browser would not provide the information.
However, guess what? When I "file" ===>> "save page as...", then examine the page contents with a plain text editor... sure enough, that contained the modifications made by my firefox extension! Surprise!

A browser has no direct write access to the local filesystem. The only read access it has is when explicitly provide a file:// URL (see note 1 below)
In your case, we are explicitly talking about javascript - which can read and write cookies and local storage. It can also send stuff back to the server and retrieve it, e.g. using AJAX.
Stuff you put in local storage/cookies is effectively not accessible to other programs (such as email clients).
It is possible to create very long mailto: URLs (see note 2) but only handles inline content in the email and you're going to run into all sorts of encoding issues that you're not ready to deal with.
Hence I'd recommend pursuing storage serverside via AJAX - and look at local storage once you've got this sorted/working.
Note 1: this is not strictly true. a trusted, signed javascript has access to additional functions which may include direct file access.
Note 2: (the limit depends on the browser and the email client - Lotus Notes truncaets the content rather a lot)

Creating a document from url javascript

This is what I've tried:
function createDocumentz() {
var doc = document.implementation.createHTMLDocument('http://www.moviemeter.nl/film/270',null,'html');
return doc;
}
Even though a document gets created, if I run this with Firebug it says that the body node has no childnodes, any idea why?

Looks like you assume that you can use createHTMLDocument() to download and parse a HTML file from the URL you've passed as the first parameter. That is not the case, createHTMLDocument() always creates an empty document.
Also, the parameters you've passed to the function are those of createDocument(). createHTMLDocument() takes only one parameter, the document title. But even if you'd use createDocument(), the first parameter is the URI of the namespace, not the source document.
Unfortunately there's no way to download and manipulate external web site's HTML using JavaScript alone. The closest you can get is displaying the document in an iframe.

No, you cannot get the content from another website, this way.
If it did, it would have lead to cross site scripting.
All you would get is an empty document, due to the browser's policy, which of course has an empty body.
You can use an Iframe & set the source to the same...

DOM parsing in JavaScript

Some background:
I'm developing a web based mobile application using JavaScript. HTML rendering is Safari based. Cross domain policy is disabled, so I can make calls to other domains using XmlHttpRequests. The idea is to parse external HTML and get text content of specific element.
In the past I was parsing the text line by line, finding the line I need. Then get the content of the tag which is a substring of that line. This is very troublesome and requires a lot of maintenance each time the target html changes.
So now I want to parse the html text into DOM and run css or xpath queries on it.
It works well:
$('<div></div>').append(htmlBody).find('#theElementToFind').text()
The only problem is that when I use the browser to load html text into DOM element, it will try to load all external resources (images, js files, etc.). Although it isn't causing any serious problem, I would like to avoid that.
Now the question:
How can I parse html text to DOM without the browser loading external resources, or run js scripts ?
Some ideas I've been thinking about:
creating new document object using createDocument call (document.implementation.createDocument()), but I'm not sure it will skip the loading of external resources.
use third party DOM parser in JS - the only one I've tried was very bad with handling errors
use iframe to create new document, so that external resources with relative path will not throw an error in console

It seems that the following piece of code works great:
var doc = document.implementation.createHTMLDocument("");
doc.documentElement.innerHTML = htmlBody;
var text = $(doc).find('#theElementToFind').text();
external resources aren't loaded, scripts aren't being evaluated.
Found it here:
https://stackoverflow.com/a/9251106/95624
Origin:
https://developer.mozilla.org/en/DOMParser#DOMParser_HTML_extension_for_other_browsers

You can construct jQuery object of any html string, without appending it to the DOM:
$(htmlBody).find('#theElementToFind').text();

How to load JavaScript intermixed with html

So I need to pull some JavaScript out of a remote page that has (worthless) HTML combined with (useful) JavaScript. The page, call it, http://remote.com/data.html, looks something like this (crazy I know):
<html>
<body>
<img src="/images/a.gif" />
<div>blah blah blah</div><br/><br/>
var data = { date: "2009-03-15", data: "Some Data Here" };
</body>
</html>
so, I need to load this data variable in my local page and use it.
I'd prefer to do so with completely client-side code. I figured, if I could get the HTML of this page into a local JavaScript variable, I could parse out the JavaScript code, run eval on it and be good to use the data. So I thought load the remote page in an iframe, but I can't seem to find the iframe in the DOM. Why not?:
<script>
alert(window.parent.frames.length);
alert(document.getElementById('my_frame'));
</script>
<iframe name="my_frame" id='my_frame' style='height:1px; width:1px;' frameBorder=0 src='http://remote.com/data.html'></iframe>
The first alert shows 0, the second null, which makes no sense. How can I get around this problem?

Have you tried switching the order - i.e. iframe first, script next? The script runs before the iframe is inserted into the DOM.
Also, this worked for me in a similar situation: give the iframe an onload handler:
<iframe src="http://example.com/blah" onload="do_some_stuff_with_the_iframe()"></iframe>
Last but not least, pay attention to the cross-site scripting issues - the iframe may be loaded, but your JS may not be allowed to access it.

One option is to use XMLHttpRequest to retrieve the page, although it is apparently only currently being implemented for cross-site requests.
I understand that you might want to make a tool that used the client's internet connection to retrieve the html page (for security or legal reasons), so it is a legitimate hope.
If you do end up needing to do it server-side, then perhaps a simple php page that takes a url as a query and returns a json chunk containing the script in a string. That way if you do find you need to filter out certain websites, you need only do this in one place.
The inevitable problem is that some of the users will be hostile, and they then have a license to abuse what is effectively a javascript proxy. As a result, the safest option may be to do all the processing on the server, and not allow certain javascript function calls (eval, http requests, etc).

Develop Reference

JavaScript is the programming language of the Web.