Parse Greasemonkey metadata and/or grab comments from within a function

Parse Greasemonkey metadata and/or grab comments from within a function - javascript

function blah(_x)
{
console.info(_x.toSource().match(/\/\/\s*#version\s+(.*)\s*\n/i));
}
function foobar()
{
// ==UserScript==
// #version 1.2.3.4
// ==/UserScript==
blah(arguments.callee);
}
foobar();
Is there any way to do this using JavaScript? I want to detect the version number / other attributes in a Greasemonkey script but as I understand it, .toSource() and .toString() strip out comments1.
I don't want to wrap the header block in <><![CDATA[ ]><> if I can avoid it, and I want to avoid having to duplicate the header block outside of the comments if possible.
Is this possible? Are there alternatives to toSource() / .toString() that would make this possible?
[1] - http://isc.sans.edu/diary.html?storyid=3231

There is currently no really good way for a Greasemonkey script to know its own metadata (or comments either).   That is why every "autoupdate" script (like this one) requires you to set extra variables so that the script will know its current version.
As aularon said, the only way to get the comments from a JS function is to parse the source HTML of the <script> tag or of the file.
However, there is a trick that might work for you. You can read in your own GM script as a resource and then parse that source.
For example:
Suppose your script was named MyTotallyKickassScript.user.js.
Now add a resource directive to your script's metadata block like so:
// #resource MeMyself MyTotallyKickassScript.user.js
Notice that there is no path information to the file, GM will use a relative path to copy the resource, one time, when the script is first installed.
Then you can access the script's code using GM_getResourceText(), like so:
var ThisFileSource = GM_getResourceText ("MeMyself");
//Optional for Firebug users: console.log (ThisFileSource);
You can parse ThisFileSource to get the comments you want.
A script that parses Greasemonkey metadata from a source file is here. You should be able to adapt it with little effort.

Javascript engine will ignore comments, the only way to do that is to string process <script>'s innerHTML, or string process an AJAX request that fetches the .js file, if it was an external file.

Related

Client-side javascript to extract patterns from online PDF document

I am trying to extract patterns from online PDFs using a client side script (tampermonkey / greasemonkey - Firefox or Chrome). The implementation can be browser specific, would like to try get it working in either 1.
I am able to use JS to extract the content and match on it manually in Firefox (which loads pdf.js automatically). E.g. on a PDF URL:
var matchList = document.body.innerText.match(/my_regex/gi);
I am now trying to port this into Greasemonkey for a user-script:
// ==UserScript==
// #name MyExtractor
// #version 1
// #grant none
// #include *.pdf
// ==/UserScript==
console.log("User script");
console.log(document.body.innerText); // this JS executed manually logs the PDF to text, but
alert("HI");
The script doesn't load - is it possible to get a Gm script to execute on a PDF url in Firefox?
In Chrome, the PDF document seems to be embedded - so even with direct console JS, i can't seem to get access to the content. e.g.
> document.getElementsByTagName("embed")[0]
<embed name="some_id" style="position:absolute; left: 0; top: 0;" width="100%" height="100%" src="about:blank" type="application/pdf" internalid="some_id">
This is about as far as I have been able to get with Chrome - is there a way to get the PDF object based on the above element and extract text from it?
With regards to the JS, i do not necessarily need to have it run directly on the PDF url, I can also get it to identify a page that has a PDF anchor href on it, and then fetch and parse it based on a request if possible - if there is a way to fetch and process with a PDf library some how?
References used so far:
Execute a Greasemonkey script on every page, regardless of page-type (like foo.com/image.jpg)? - do i need to build an extension for this?
Extract text from pdf file using javascript (and followed some of the links) - specifically, i have tried to follow this: How to extract text from PDF in JavaSript - but have not been able to create a reference to the PDF source / add the library to GM and execute as expected - is this a good path to follow and try solve the problems I am running into?

js "scripts" folder name prepended to XMLHttpRequest calls

I have a javascript file named myscripts.js in the "scripts" folder of my webserver. It could be accessed with this:
http://www.example.com/scripts/myscripts.js
Within myscripts.js is a javascript function which makes a XMLHttpRequest call to somemethod.html of my website. Here is the calling code:
xmlhttp.open("GET","somemethod.html",false);
99% of the time everything works fine. But I am finding some browsers are prepending "scripts/" to the call. So the result is a call like this:
http://www.example.com/scripts/somemethod.html
when it should be this:
http://www.example.com/somemethod.html
This is a custom built webserver (i.e. I basically handle ALL requests).
Should my webserver be able to handle this? Or is this just some fluky browser that I should not worry about?
Should I not be using "relative" paths in the javascript? And instead use absolute calls in the java script? e.g.: instead of "somemethod.html" it should be coded like this:
xmlhttp.open("GET","http://www.example.com/somemethod.html",false);

It's absolutely fine (and overwhelmingly the standard of practice) to use relative paths in the JavaScript, just be aware of what they're relative to: The document in which you've included the JavaScript (not the JavaScript file.) You seem clear on this, but just emphasizing.
I've never seen a browser get this wrong. It's possible the requests you're seeing are from a poorly-written web crawler looking at the source of the JavaScript rather than doing something intelligent like figuring out where/how it's run.
Just for clarity, though, about the relative thing (more for lurkers than for you):
Given this structure:
foo.html
index.html
js/
script.js
In that structure, if you include script.js in index.html:
<script src="js/script.js"></script>
...then use code in that script file to do an XHR call, the call will be relative to index.html, not script.js, on a correctly-functioning browser.

I never use relative requests, I built my own url handing js code to build up and pass urls around with a 'toString' method to give me exactly the url I need.
Also as an aside, try not to use synchronous XHR calls anymore, ideally you should use async and call backs, it's a pain, but it's for the best.
client . open(method, url [, async = true [, username = null [, password = null]]])
Sets the request method, request URL, and synchronous flag.
Throws a "SyntaxError" exception if either method is not a valid HTTP method or url cannot be parsed.
Throws a "SecurityError" exception if method is a case-insensitive match for `CONNECT`, `TRACE` or `TRACK`.
Throws an "InvalidAccessError" exception if async is false, the JavaScript global environment is a document environment, and either the timeout attribute is not zero, the withCredentials attribute is true, or the responseType attribute is not the empty string.
source: http://xhr.spec.whatwg.org/#the-open%28%29-method

can firefox extension modify DOM of HTML document then save as HTML?

I am creating a firefox extension that lets the operator perform various actions that modify the content of the HTML document. The operator does not edit HTML, they take other actions and my extension modifies the document by inserting elements, adding attributes, and so forth.
When the operator is finished, they need to be able to save the HTML document as a file (or have my extension send it to an internet destination, but this is not required since they can email the saved file).
I thought maybe the changes made by the javascript code in my extension would be reflected in the HTML document, but when I ask the firefox browser to "view source" after making modifications, it displays the original HTML text.
My questions are:
#1: What is the easiest way for the operator to save the HTML document with all the changes my extension has made?
#2: What is the easiest way for the javascript code in my extension to process the HTML document contents and write to an HTML file on the local disk?
#3: Is any valid HTML content incapable of accurate representation in the saved file?
#4: Is the TreeWalker part of the solution (see below)?
A couple observations from my research so far:
I've read about the TreeWalker object, which seems to provide a fairly painless way for an extension to walk through everything (?or almost everything?) in the HTML document. But does it expose everything so everything in the original (and my modifications) can be saved without losing anything of importance?
Does the TreeWalker walk through the HTML document in the "correct order" --- the order necessary for my extension to generate the original and/or modified HTML document?
Anything obscure or tricky about these problems?

Ok so I am assuming here you have access to page DOM. What you need to do it basically make changes to the dom and then get all the dom code and save it as a file. Here is how you can download the page's html code. This will create an a tag which the user needs to click for the file to download.
var a = document.createElement('a'), code = document.querySelectorAll('html')[0].innerHTML;
a.setAttribute('download', 'filename.html');
a.setAttribute('href', 'data:text/html,' + code);
Now you can insert this a tag anywhere in the DOM and the file will download when the user clicks it.
Note: This is sort of a hack, this injects entire html of the file in the a tag, it should in theory work in any up to date browser (except, surprise, IE). There are more stable and less hacky ways of doing it like storing it in a file system API file and then downloading that file instead.
Edit: The document.querySelectorAll line accesses the page DOM. For it to work the document must be accessible. You say you are modifying DOM so that should already be there. Make sure you are adding the code on the page and not your extension code. This code will be at the same place as your DOM modification code, not your extension pages that can't access the DOM.
And as for the a tag, it will be inserted in the page. I skipped the steps since I assumed you already know how to manipulate DOM and also because I don't know where you would like to add the link. And you can skip the user action of clicking the link too, but it's a hack and only works in modern browsers. You can insert the a tag somewhere in the original page where user won't see it and then call the a.click() function to simulate a click event on the link. But this is not a legit way and I personally only use it on my practice projects to call click event listeners.
I can only test this on chrome not on FF but try this code, this will not require you to even add the a link to DOM. You need to add this next to the DOM manipulation code. This will work if luck is on your side :)
var a = document.createElement('a'), code = document.querySelectorAll('html')[0].innerHTML;
a.setAttribute('download', 'filename.html');
a.setAttribute('href', 'data:text/html,' + code);
a.click();

There is no easy way to do this with the web API only, at least when you want a result that does not omit stuff like the doctype or comments. You could still write a serializer yourself that goes through document.childNodes and serialized according to the node type (Element.outerHTML, Comment.data and so on).
Luckily, you're writing a Firefox add-on, so you have access to a lot more (powerful) stuff.
While still not 100% perfect, the nsIDocumentEncoder implementations will produce pretty decent results, that should only differ in some whitespace and explicit charset declaration at most (everything else is a bug).
Here is an example on how one might use this component:
function serializeDocument(document) {
const {
classes: Cc,
interfaces: Ci,
utils: Cu
} = Components;
let encoder = Cc['#mozilla.org/layout/documentEncoder;1?type=text/html'].createInstance(Ci.nsIDocumentEncoder);
encoder.init(document, 'text/html', Ci.nsIDocumentEncoder.OutputLFLineBreak | Ci.nsIDocumentEncoder.OutputRaw);
encoder.setCharset("utf-8");
return encoder.encodeToString();
}
If you're writing an SDK add-on, stuff gets more complicated as the SDK abstracts some important stuff away. You'll need to go through the chrome module, and also figure out the active window and tab yourself. Something like Services.wm.getMostRecentWindow("navigator:browser").content.document (Services.jsm) should do the trick.
In XUL overlay add-ons, content.document should suffice to get the document of the currently active tab, and you have Components access already.
Still, you need to let the user choose a file destination, usually through nsIFilePicker and then actually write the file, by using something like a file stream or the fully async OS.File API.

Looks like I get to answer my own question, thanks to someone in mozilla #extdev IRC.
I got totally faked out by "view source". When I didn't see my modifications in the window displayed by "view source", I assumed the browser would not provide the information.
However, guess what? When I "file" ===>> "save page as...", then examine the page contents with a plain text editor... sure enough, that contained the modifications made by my firefox extension! Surprise!

A browser has no direct write access to the local filesystem. The only read access it has is when explicitly provide a file:// URL (see note 1 below)
In your case, we are explicitly talking about javascript - which can read and write cookies and local storage. It can also send stuff back to the server and retrieve it, e.g. using AJAX.
Stuff you put in local storage/cookies is effectively not accessible to other programs (such as email clients).
It is possible to create very long mailto: URLs (see note 2) but only handles inline content in the email and you're going to run into all sorts of encoding issues that you're not ready to deal with.
Hence I'd recommend pursuing storage serverside via AJAX - and look at local storage once you've got this sorted/working.
Note 1: this is not strictly true. a trusted, signed javascript has access to additional functions which may include direct file access.
Note 2: (the limit depends on the browser and the email client - Lotus Notes truncaets the content rather a lot)

Call a function in one Javascript file from another Javascript file?

I need to call a function in an external ".js" file from another ".js" file, without referencing the external file in the <head> tag.
I know that it is possible to dynamically add an external ".js" file to the which allows access to that file, i can do that like so...
var AppFile = "test/testApp_1.js";
var NewScript=document.createElement('script');
var headID = document.getElementsByTagName("head")[0];
NewScript.src = AppFile;
headID.appendChild(NewScript);
However...
this is no use to me as the external files need to be stand-alone files that run start-up procedures on...
$(document).ready(function()
{...}
so adding the full file dynamically has an unwanted affect. Also, i cannot pre-reference the external file in the <head> tag as it needs to be dynamic.
So, this external file "test/testApp_1.js" contains a function that returns a string variable...
function setAppLogo(){
var LogoFile = "test/TestApp_1_Logo.png";
return LogoFile;
}
I need access to either this function, or I could store the string as a global var in the external file... either way is fine, I just need access to the value in LogoFile without loading the whole external file.
This one has had me stumped for a few hours now so any ideas would be greatly appreciated.

You might benefit from having some sort of app.js file that contains global variables/values that you will want to use from lots of places. You should include this .js file on every page (and maybe minify it/concatenate it with other js if you want to be clever and improve performance). Generally these globals should be attached to some object you create such as var APPNAME = { }; with variables/functions on it that will be used from many places.
Once you have this, then the external '.js' file that you want to load, and the one you are currently in, can both access the global APPNAME variable and all its attributes/functions and use them as desired. This may be a better approach for making your javascript more modular and separatable. Hope this helps.

You want to load the file once jQuery has loaded using ajax, and then run the related script in the successful ajax function.
See jQuery's getScript function: http://api.jquery.com/jQuery.getScript/
$(document).ready(function(){
$.getScript("http://domain.com/ajax/test.js", function(data, textStatus, jqxhr) {
console.log(data); //data returned
console.log(textStatus); //success
console.log(jqxhr.status); //200
console.log('Load was performed.');
//run your second script executable code here
});
});

It is possible to load the whole script through XHR (e.g. $.get in jQuery) and then parse it, perhaps using a regular expression, to extract the needed part:
$.get('pathtoscript.js', function(scriptBody) {
var regex = /function\s+setUpLogo\(\)\s*\{[^}]+}/g;
alert(scriptBody.match(regex)[0]); // supposed to output a function called
// 'setUpLogo' from the script, if the
// function does not have {} blocks inside
});
Nevertheless, it shall be noted that such an approach is highly likely to trigger maintenance obstacles. Regular expressions are not a best tool to parse JavaScript code; the example above, for instance, will not parse functions with nested {} blocks, which may well exist in the code in question.
It might be recommended to find a server-side solution to the problem, e.g. adding necessary script path or its part before the page is sent to browser.

I'm not sure this is a good idea but you can create an iframe and eval the file inside its 'window' object to avoid most of the undesired side effects (assuming it does not try to access its parent). Then you can access whatever function/variable you want via the iframe's window object.
Example:
function loadSomeJsInAFrame(url,cb) {
jQuery.get(url,function(res) {
iframe = jQuery('<iframe></iframe>').hide().appendTo(document.body);
iframe[0].contentWindow.eval(res);
if(cb) cb(iframe[0].contentWindow);
},'text');
}
loadSomeJsInAFrame('test/testApp_1.js',function(frameWindow) {
console.log(frameWindow.setAppLogo());
jQuery(frameWindow.frameElement).remove();
});
This will not guarantee that the sript in the file can not mess with your document, but not likely if it comes from a trusted source.
Also, don't forget to remove your iframe after you get what you need from it.

Ok, thanks everybody for all the input but i think that what I was trying to do is currently not possible, i.e. accessing a function from another file without loading that file.
I have however found a solution to my problem. I now query my server for a list of apps that are available, i then use this list to dynamically build the apps in a UI. when an app is then selected i can then call that file and the functions within. Its a bit more complex but its dynamic, has good performance and, it works. Thanks again for the brainstorming! ;)

It may be possible with the help of Web Workers. You would be able to run your script you've wanted to inject in kinda isolated environment, so it won't mess up your current page.
As you said, it is possible for setAppLogo to be global within "test/testApp_1.js", so I will rely on this statement.
In your original script you should create a worker, which references to a worker script file + listen to messages that would come from the worker:
var worker = new Worker('worker.js');
worker.onmessage = function (e) {
// ....
};
Then, in the worker (worker.js), you could use special function importScripts (docs) which allows to load external scripts in worker, the worker can also see global variables of these scripts. Also there is a function postMessage available in worker to send custom messages back to original script, which in turn is listening to these messages (worker.onmessage). Code for worker.js:
importScripts('test/testApp_1.js');
// "setAppLogo" is now available to worker as it is global in 'test/testApp_1.js'
// use Worker API to send message back to original script
postMessage(setAppLogo());
When it invokes you'll get the result of setAppLogo in you listener:
worker.onmessage = function (e) {
console.log(e.data); // "test/TestApp_1_Logo.png"
};
This example is very basic though, you should read more about Web Workers API and possible pitfalls.

dynamically create greasemonkey script

I'm trying to create a dynamic GM script. Here's what I thought would do it
win = window.open('myScript.user.js');
win.document.writeln('// ==UserScript==');
win.document.writeln('// #name sample script');
win.document.writeln('// #description alerts hi');
win.document.writeln('// #include http://www.google.com/*');
win.document.writeln('// ==/UserScript==');
win.document.writeln('');
win.document.writeln('(function(){alert("hi");})()');
win.document.close();
Well it doesn't. Anyone have any ideas how to go about doing this?

You cannot dynamically create Greasemonkey scripts with Greasemonkey (alone).
A GM script is not part of the HTML page, so writing GM code to a page will never work. The script needs to be installed into GM's script management system.
A GM script cannot write to the file system, nor access sufficient browser chrome to install a script add-on.
You might be able to write a GM script that posts other scripts to a server, and then sends the browser to that server. GM would then prompt the user to install the new script.
You might be able to write a browser add-on that could write GM scripts, but I suspect that this approach will be difficult.
You probably could write a Python (or C, VB, etc.) program that generates GM scripts for installation. With extra work, such a program could probably automatically install the script, too.
Why do you want to dynamically create Greasemonkey scripts, anyway? There may be a simpler method to accomplish the true goal.?.
Update for OP comment/clarification:
Re: "I want to be able to have a user select an element to get blocked and then create a script that sets that element's display to none on all sites from that domain"...
One way to do that:
Store domain and selector pairs using GM_setValue().
The script would, first thing, check to see if it had a value stored for the current page's domain or URL (using GM_getValue() or GM_listValues()).
If a match was found, hide the element(s) as specified in the selector.
Note that, depending on the element, the excellent Adblock Plus extension may be able to block the element much more elegantly (saves bandwidth/DL-time too).

Develop Reference

JavaScript is the programming language of the Web.