dynamically create greasemonkey script - javascript

I'm trying to create a dynamic GM script. Here's what I thought would do it
win = window.open('myScript.user.js');
win.document.writeln('// ==UserScript==');
win.document.writeln('// #name sample script');
win.document.writeln('// #description alerts hi');
win.document.writeln('// #include http://www.google.com/*');
win.document.writeln('// ==/UserScript==');
win.document.writeln('');
win.document.writeln('(function(){alert("hi");})()');
win.document.close();
Well it doesn't. Anyone have any ideas how to go about doing this?

You cannot dynamically create Greasemonkey scripts with Greasemonkey (alone).
A GM script is not part of the HTML page, so writing GM code to a page will never work. The script needs to be installed into GM's script management system.
A GM script cannot write to the file system, nor access sufficient browser chrome to install a script add-on.
You might be able to write a GM script that posts other scripts to a server, and then sends the browser to that server. GM would then prompt the user to install the new script.
You might be able to write a browser add-on that could write GM scripts, but I suspect that this approach will be difficult.
You probably could write a Python (or C, VB, etc.) program that generates GM scripts for installation. With extra work, such a program could probably automatically install the script, too.
Why do you want to dynamically create Greasemonkey scripts, anyway? There may be a simpler method to accomplish the true goal.?.
Update for OP comment/clarification:
Re: "I want to be able to have a user select an element to get blocked and then create a script that sets that element's display to none on all sites from that domain"...
One way to do that:
Store domain and selector pairs using GM_setValue().
The script would, first thing, check to see if it had a value stored for the current page's domain or URL (using GM_getValue() or GM_listValues()).
If a match was found, hide the element(s) as specified in the selector.
Note that, depending on the element, the excellent Adblock Plus extension may be able to block the element much more elegantly (saves bandwidth/DL-time too).

Related

Client-side javascript to extract patterns from online PDF document

I am trying to extract patterns from online PDFs using a client side script (tampermonkey / greasemonkey - Firefox or Chrome). The implementation can be browser specific, would like to try get it working in either 1.
I am able to use JS to extract the content and match on it manually in Firefox (which loads pdf.js automatically). E.g. on a PDF URL:
var matchList = document.body.innerText.match(/my_regex/gi);
I am now trying to port this into Greasemonkey for a user-script:
// ==UserScript==
// #name MyExtractor
// #version 1
// #grant none
// #include *.pdf
// ==/UserScript==
console.log("User script");
console.log(document.body.innerText); // this JS executed manually logs the PDF to text, but
alert("HI");
The script doesn't load - is it possible to get a Gm script to execute on a PDF url in Firefox?
In Chrome, the PDF document seems to be embedded - so even with direct console JS, i can't seem to get access to the content. e.g.
> document.getElementsByTagName("embed")[0]
<embed name=​"some_id" style=​"position:​absolute;​ left:​ 0;​ top:​ 0;​" width=​"100%" height=​"100%" src=​"about:​blank" type=​"application/​pdf" internalid=​"some_id">​
This is about as far as I have been able to get with Chrome - is there a way to get the PDF object based on the above element and extract text from it?
With regards to the JS, i do not necessarily need to have it run directly on the PDF url, I can also get it to identify a page that has a PDF anchor href on it, and then fetch and parse it based on a request if possible - if there is a way to fetch and process with a PDf library some how?
References used so far:
Execute a Greasemonkey script on every page, regardless of page-type (like foo.com/image.jpg)? - do i need to build an extension for this?
Extract text from pdf file using javascript (and followed some of the links) - specifically, i have tried to follow this: How to extract text from PDF in JavaSript - but have not been able to create a reference to the PDF source / add the library to GM and execute as expected - is this a good path to follow and try solve the problems I am running into?

How the get the whole source html code of a webpage that is generated by Javascript using Java / Webdriver?

I am a newbie in programming and I have a task here I need to solve. I am trying to get the html source code of a webpage using Java / Webdriver method getPageSource(). Problem is, that page is somehow generated, probably by javascript, so the result I get is html code containing just page skeleton - a table that is empty, not filled by data. But, there is tag like <script type="text/javascript" src="/x/js/main.c0e805a3.js"></script> in the very bottom of that html code.
The question is, how can I force Webdriver to run that Javascript and give me the result - the whole source html with data. I already tried to use this (js.executeScript("window.location = '/x/js/main.c0e805a3.js'");) before calling getPageSource() but not successful.
Any help will be appreciated, thanks!
There are quite a few setups, now, that can run the Java-Script on a web-page. The most well known, I think, is likely Selenium since I think it has been around for a while. Others include karate, Puppeteer, and even an old tool called Rhino. Puppeteer is a Google, Inc. project that uses Java-Script (server-side Java-Script, called Node.js. They don't like us comparing, contrasting libraries here.
I haven't had the time to engage Selenium, yet, but I write HTML parser, search and update code all the time. If your only goal is to load a page whose contents are dynamically "filled in by AJAX calls" - and what I mean by that, you only want the contents of an HTML that would normally see when you visit the sites web-page, and you are not concerned with button presses then the one I have been using for that is called Splash This tool does have the ability to let you invoke Java-Script, but if all you want to do is see the JS on a page dynamically load the table, then, literally, all you have to do is start-the tool, and add one line to your program.
On Google Cloud Platform, these 2 lines will start a Splash Proxy Server. If you are writing your code on AWS (Amazon) or Azure (Microsoft), it would likely be similar. If you are running your code in an office on the local machine, you would have to research how to start it.
Install Docker. Make sure Docker version >= 17 is installed.
Pull the image:
$ sudo docker pull scrapinghub/splash
Start the container:
$ sudo docker run -it -p 8050:8050 --rm scrapinghub/splash
Then, in your code, all you have to do is the following:
// If your original code looked like this:
URL url = new URL("https://en.wikipedia.org/wiki/Christopher_Columbus");
HttpURLConnection con = (HttpURLConnection) url.openConnection();
con.setRequestMethod("GET");
con.setRequestProperty("User-Agent", USER_AGENT);
return new BufferedReader(new InputStreamReader(con.getInputStream()));
Change the first line of code in this example to this, and (theoretically), and dynamically loaded HTML tables that are completed with the onload page events will be automatically loaded before returning the HTML page.
// Add this line to your methods
String splashProxy = "http://localhost:8050/render.html?url=";
URL url = new URL(splashProxy + "https://en.wikipedia.org/wiki/Christopher_Columbus");
For most web-sites, any initial tables that are filled by JS/jQuery/AJAX will be filled in. If you are willing to learn teh Lua Programming Language, you can also start invoking the methods there. It has been pretty convenient for my purposes, since I am not writing web-page testing code (code that simulates user button presses). If that is what you are doing, Selenium is likely worth spending time learning / studying the A.P.I.

Reaching and Pulling Environment Variables in TamperMonkey

Currently my Office is running a AHK script to pull environment variables. These Env Variables are then used as a specific outputted data when closing tickets as my Office has a ticket closing environment. This works for the time being however I am looking into automating this process and starting off just trying to auto close the tickets when a specific key is pressed. I have been able to perform this task but I have to basically have static variables in the TamperMonkey script for each user. Everyone using this ticket site has the specific environment variables already due to the AHK script and want to try and implement this into the Tampermonkey script without having to change the site completely.
I have locally hosted the site and used Node to do this and I am successful in doing this but it does not work on the Tampermonkey route. I have been using process.env.ENV_VARIABLE on the node side but I am trying to refrain from completely implementing this on the site itself. I have added some basic variable examples in a Autohotkey Script already being used.
GetGreeting() {
global greeting
return greeting
}
GetSalutation() {
global salutation
return salutation
}
GetUserName() {
Envget, e_Ticketname, Ticketuser
return e_Ticketname
}
When a specific Key is pressed it should write the specific message and include said specific Env Variables. Currently I don't think I have it where Tampermonkey can actually understand the Environment Variables as it keeps giving a undefined error. Any Ideas.
So upon further investigation it does not appear to be a way to interact with the OS inside the browser. I will be looking into another way to do what I am looking for. Thank you!
You are able to access and run local files as JavaScript code in Tampermonkey using \\ #require
So if you're able to have a local file with the content in this format:
variables = {
var1: "hello there"
}
Then in the script, add this line and add the path to the file.
// #require file://Path\to\file
Since all the file has is an assignment to a variable, then you can access that in the script
console.log(variables.var1) // logs "hello there"
For this you need to give the extension access to file URLs:
go to chrome://extensions
Tampermonkey > Details
Allow access to file URLs
You'll still need a way to generate the file in that format in the user machine though, either manually, or some code running locally.
A way to generate the file, could be using Node locally, but if you're running that locally, at that point another way to get local data is to serve it using a simple http server (like Node server), then you could make a request from Tampermonkey using fetch or GM_xmlhttpRequest
As a side note, a hack I use to run code locally triggered with Tampermonkey, is to use the localexplorer extension. The extension allows you to open files and folder from the browser using "localexplorer://" urls, so then with javascript you can do window.open(local_url) and it will run or open that file/folder. The file can also be a .bat file, and you can run anything from it (including Node code).
There are some security considerations for using this though if you're worried other websites might be able to open files in your system. but the extension prompts you every time you try to open something with localexplorer
If you're still interested on this, a way I use for it to work without the prompt with less risk is this:
The prompt also lets you click the checkbox of Always open links of this type in the associated app for each domain. So then what you can do, is have a specific domain you choose for this, to always use that domain to open localexplorer links, and use a format of your choosing, like secretdomain.com/?C:\\path\\to\\file, and grant access to always open the links on that domain. Then use Tampermonkey to run some code on that domain so that when it detects that specific url format, to redirect the page to a localexplorer url, like this
location.href.replace(/htt.*:\/\/secretdomain.com\/\?/,'localexplorer:')

How can i href a local file using tampermonkey?

I want to replace a site's CSS file URL through href, i know it's possible to reference external links containing css but what about a local file instead?
document.querySelector("head > link:nth-child(7)").href = "http://example.com/style.css"
One option would be for the local file to be a script, perhaps one that assigns the desired CSS text to a window property, which can then be retrieved inside your userscript. For example:
// ==UserScript==
// #name local
// #match https://example.com
// #require file:///C:/local.js
// ==/UserScript==
document.head.appendChild(document.createElement('style'))
.textContent = window.cssTextFromLocal;
and
// local.js
window.cssTextFromLocal = `
body {
background-color: green;
}
`;
Make sure to permit local file access.
Of course, if you want to do this unconditionally (on every page load, regardless), there's no need for inter-script communication, and you can insert the <style> into the DOM inside local.js.
Like all Chrome extensions, Tampermonkey by default is subject to a number of restrictions designed to prevent developers from acting maliciously. If Tampermonkey had unfettered access to your filesystem, userscript authors could easily abuse the permission to steal or modify your data without your knowledge.
The closest thing you could do would be to host a server on your local machine (e.g. with Node.js). This would allow you to provide Tampermonkey with a URL like localhost/style.css which would be served locally from your computer. This would only work when you had the server running, though.
Alternatively, you could create your own developer-mode Chrome extension which contains the stylesheet within its web_accessible_resources and injects it based on a DeclarativeContent rule. This would be bypassing Tampermonkey entirely, though, so it doesn't exactly answer the question.

can firefox extension modify DOM of HTML document then save as HTML?

I am creating a firefox extension that lets the operator perform various actions that modify the content of the HTML document. The operator does not edit HTML, they take other actions and my extension modifies the document by inserting elements, adding attributes, and so forth.
When the operator is finished, they need to be able to save the HTML document as a file (or have my extension send it to an internet destination, but this is not required since they can email the saved file).
I thought maybe the changes made by the javascript code in my extension would be reflected in the HTML document, but when I ask the firefox browser to "view source" after making modifications, it displays the original HTML text.
My questions are:
#1: What is the easiest way for the operator to save the HTML document with all the changes my extension has made?
#2: What is the easiest way for the javascript code in my extension to process the HTML document contents and write to an HTML file on the local disk?
#3: Is any valid HTML content incapable of accurate representation in the saved file?
#4: Is the TreeWalker part of the solution (see below)?
A couple observations from my research so far:
I've read about the TreeWalker object, which seems to provide a fairly painless way for an extension to walk through everything (?or almost everything?) in the HTML document. But does it expose everything so everything in the original (and my modifications) can be saved without losing anything of importance?
Does the TreeWalker walk through the HTML document in the "correct order" --- the order necessary for my extension to generate the original and/or modified HTML document?
Anything obscure or tricky about these problems?
Ok so I am assuming here you have access to page DOM. What you need to do it basically make changes to the dom and then get all the dom code and save it as a file. Here is how you can download the page's html code. This will create an a tag which the user needs to click for the file to download.
var a = document.createElement('a'), code = document.querySelectorAll('html')[0].innerHTML;
a.setAttribute('download', 'filename.html');
a.setAttribute('href', 'data:text/html,' + code);
Now you can insert this a tag anywhere in the DOM and the file will download when the user clicks it.
Note: This is sort of a hack, this injects entire html of the file in the a tag, it should in theory work in any up to date browser (except, surprise, IE). There are more stable and less hacky ways of doing it like storing it in a file system API file and then downloading that file instead.
Edit: The document.querySelectorAll line accesses the page DOM. For it to work the document must be accessible. You say you are modifying DOM so that should already be there. Make sure you are adding the code on the page and not your extension code. This code will be at the same place as your DOM modification code, not your extension pages that can't access the DOM.
And as for the a tag, it will be inserted in the page. I skipped the steps since I assumed you already know how to manipulate DOM and also because I don't know where you would like to add the link. And you can skip the user action of clicking the link too, but it's a hack and only works in modern browsers. You can insert the a tag somewhere in the original page where user won't see it and then call the a.click() function to simulate a click event on the link. But this is not a legit way and I personally only use it on my practice projects to call click event listeners.
I can only test this on chrome not on FF but try this code, this will not require you to even add the a link to DOM. You need to add this next to the DOM manipulation code. This will work if luck is on your side :)
var a = document.createElement('a'), code = document.querySelectorAll('html')[0].innerHTML;
a.setAttribute('download', 'filename.html');
a.setAttribute('href', 'data:text/html,' + code);
a.click();
There is no easy way to do this with the web API only, at least when you want a result that does not omit stuff like the doctype or comments. You could still write a serializer yourself that goes through document.childNodes and serialized according to the node type (Element.outerHTML, Comment.data and so on).
Luckily, you're writing a Firefox add-on, so you have access to a lot more (powerful) stuff.
While still not 100% perfect, the nsIDocumentEncoder implementations will produce pretty decent results, that should only differ in some whitespace and explicit charset declaration at most (everything else is a bug).
Here is an example on how one might use this component:
function serializeDocument(document) {
const {
classes: Cc,
interfaces: Ci,
utils: Cu
} = Components;
let encoder = Cc['#mozilla.org/layout/documentEncoder;1?type=text/html'].createInstance(Ci.nsIDocumentEncoder);
encoder.init(document, 'text/html', Ci.nsIDocumentEncoder.OutputLFLineBreak | Ci.nsIDocumentEncoder.OutputRaw);
encoder.setCharset("utf-8");
return encoder.encodeToString();
}
If you're writing an SDK add-on, stuff gets more complicated as the SDK abstracts some important stuff away. You'll need to go through the chrome module, and also figure out the active window and tab yourself. Something like Services.wm.getMostRecentWindow("navigator:browser").content.document (Services.jsm) should do the trick.
In XUL overlay add-ons, content.document should suffice to get the document of the currently active tab, and you have Components access already.
Still, you need to let the user choose a file destination, usually through nsIFilePicker and then actually write the file, by using something like a file stream or the fully async OS.File API.
Looks like I get to answer my own question, thanks to someone in mozilla #extdev IRC.
I got totally faked out by "view source". When I didn't see my modifications in the window displayed by "view source", I assumed the browser would not provide the information.
However, guess what? When I "file" ===>> "save page as...", then examine the page contents with a plain text editor... sure enough, that contained the modifications made by my firefox extension! Surprise!
A browser has no direct write access to the local filesystem. The only read access it has is when explicitly provide a file:// URL (see note 1 below)
In your case, we are explicitly talking about javascript - which can read and write cookies and local storage. It can also send stuff back to the server and retrieve it, e.g. using AJAX.
Stuff you put in local storage/cookies is effectively not accessible to other programs (such as email clients).
It is possible to create very long mailto: URLs (see note 2) but only handles inline content in the email and you're going to run into all sorts of encoding issues that you're not ready to deal with.
Hence I'd recommend pursuing storage serverside via AJAX - and look at local storage once you've got this sorted/working.
Note 1: this is not strictly true. a trusted, signed javascript has access to additional functions which may include direct file access.
Note 2: (the limit depends on the browser and the email client - Lotus Notes truncaets the content rather a lot)

Categories

Resources