chrome....onBeforeRequest...() URLs list not functioning as expected - javascript

I am having some issues with an extension that i am trying to make. The extension's job is to check URLs before making the request and block the ones that are available in the list provided. The manifest.json file loads flagged.js which declares and initializes the array. Then, it loads the file that has the function to check URLs:
// To detected and filter globally flagged websites
chrome.webRequest.onBeforeRequest.addListener(
// What to do if detected a website
function(info) {
return {redirectUrl: 'chrome-extension://' + chrome.i18n.getMessage('##extension_id') +
'/blocked.html?flag_type=globally&url=' + escape(info.url)};
},
// Website to watchout for
{
urls: flagged,
types: []
},
// Action to be performed?
["blocking"]);
My issue starts with the part urls: flagged, it seems the list must be hardwired in order to be accepted. What i mean to say, is that i must specify the list items as such:
var flagged = [
"*://*.domain_1.com/*",
"*://*.domain_2.com/*",
"*://*.domain_3.com/*",
"*://*.domain_4.com/*",
];
If i attempt to go to any of the above specified domains, the extension will redirect them to the block.html page no problems. However, i want that list to be automatically or periodically updated, so i introduced this part:
// Get list from the text file we have stored
function getFlagged(url) {
var list = [];
var file = new XMLHttpRequest();
file.open("GET", url, true);
file.onreadystatechange = function() {
if (file.readyState === 4) {
if (file.status === 200) {
allText = file.responseText;
list = file.responseText.split('|');
for( var i=0; i<list.length; i++ )
flagged.push(list[i]);
}
}
}
file.send(null);
}
I have tried to change and add things that may not make sense or could be done differently, i was just desperate. The list will be populated with the new items, say *domain_5* and *domain_6* for example. This operation will occur first thing when the extension is being loaded, or so i think. However, when i try to access *domain_5* and *domain_6*, the extension does NOT block them despite the fact that they are in the list flagged.
Any help with this issue would be very appreciated!
Thank you
EDIT: I am not an expert on JS or Chrome.APIs, this is my first attempt at chrome extensions.

Yeah, Your Listener is registering before the XMLHttpRequest is finished.
Call the listener in your onreadystatechange function and you'll get it to work.

Related

Whitelist / Blocking with chrome.webRequest

so I've been trying to add whitelist ability to chrome extension, where that list can be updated via or whitelist can be totally disabled.
After checking the documentation and all possible answers here, I am still facing one weird issue, any light on it would be appreciated.
So here is the code snippet
//this list can be updated by user or other trigger
var allowed = ["example.com", "cnn.com", "domain.com"];
//this is main callback, so it can be removed from listener when needed
whiteMode = function (details) {
//checking url to array
var even = function(element) {
return details.url.indexOf(element) == -1;
}
if (allowed.some(even) == true) {
return {cancel: true }
} else {
return {cancel: false}
}
}
//setup listener
chrome.webRequest.onBeforeRequest.addListener(
whiteMode,
{urls: ["<all_urls>"]},
["blocking"]
);
so that works fine itself, if user wants to disable the mode, I just call
chrome.webRequest.onBeforeRequest.removeListener(whiteMode);
Then, if I would like to update the allowed list, I first use removeListener, then relaunch it again with new values, it does launches, however the "whiteMode" function keeps triggering twice now. By checking with console.log, I see that my new url is missing in the array on first try, then immidiately listener works again, and there is correct new array of allowed, however as it was already blocked by first trigger, it's just doing nothing.
The question, why the listener keeps doing it twice or more (if I add more items let's say), even if it was removed before added back.
Is there any way to clear up all listeners? (nothing about that in docs), been struggling with this for quite some time...
Also, tried with onHandlerBehaviorChanged, but its not helping.

REST with CORS not working with WebExtension content-script

I am working on a webextension in Firefox for use internally at work. The purpose of the extension is to insert relevant information from our ServiceNow instance into Nagios host/service page.
I am currently trying to insert the state of tickets into the history tab of Nagios. My script looks like this:
var table = document.getElementById('id_historytab_table');
var table = table.getElementsByTagName('tbody');
var table = table[1];
var len = table.children.length
const url = "https://[domain].service-now.com/api/now/table/task?sysparm_limit=1&number="
for (i = 1; i <= len; i++) {
var col = table.rows[i].cells[2];
if (col.textContent.startsWith("TKT")) {
var tkt = col.textContent;
//console.log(tkt);
//console.log(url+tkt);
var invocation = new XMLHttpRequest();
invocation.open("get",url+tkt, true);
invocation.withCredentials = true;
invocation.onreadystatechange = function() {
if(this.readyState == this.DONE) {
//console.log('recieved');
console.log(invocation.responseText);
//console.log(JSON.parse(invocation.responseText).result[0].state);
}
};
invocation.send();
};
};
This successfully gets the ticket number from each row of the history tab and makes a GET request. I can see the requests on my ServiceNow REST log and it looks good there. However, the response is never received.
If I copy and paste the above from my content-script.js and put it directly into my console I am able to iterate through the rows, get the ticket numbers, and successfully receive responses from ServiceNow. So this works, but not in WebExtension for some reason. I am about at the end of my knowledge of extensions and javascript though and am not sure what else to do.
I figured out the problem. In order for the WebExtension to receive the response the URL needs to be under permissions in the manifest.json. Adding:
"permissions": [ "url" ],
resolved the issues and I immediately began seeing the response bodies I was expecting.

Unable to determine the results of a Google Groups search using Javascript in a Chrome extension

I'm writing a Chrome extension that searches for posts in a Google Group whose subject lines contain a given character string. From the browser, using the search query "subject:", I get the search results: either 0 results or > 0 and I take different actions depending on whether results come up. The wrinkle is that if I simply fetch the page data for the results page using, say,
try
{
var request = new XMLHttpRequest();
request.open("GET", url, false);
request.send(null);
}
catch (e)
{
console.log(e);
return;
}
if (request.status == 200)
{
var tmp = request.responseText;
}
I just get obfuscated data and can't read it. If I can get a Document object back, then I can search for a certain classname, with something like doc.getElementsByClassName, that exists if and only there are non-zero results from the search.
Here's how to turn the responseText to a dom....
var page = document.implementation.createHTMLDocument("");
page.documentElement.innerHTML = request.responseText;
// Now you can find things
var thing = page.documentElement.querySelector('#find');
...but this won't always be enough for some pages as they are ajax driven and the new Google Groups is for sure.
So this page is just the frame for all the other stuff the js is going to get when the page is loaded.
Somtimes you can figure how to replicate the ajax request the page makes and replicate it by looking at the Network panel in the WebInspector and watch what happens when you press the search button.
But Google Groups 2 is doing some funky stuff and I wouldn't have a clue ;)
There's other things you can do like override XMLHttpRequest and monitor what calls it and what it does when the ready state changes to 4 or monitor the onload. And use that info to try and figure out what function processes the responseText and sometimes find what you need that way. But I can't find my code for that at the moment and dont really feel inclined to do it for this as I know its not going to be pretty ;)
Good Luck.

Error: The page has been destroyed and can no longer be used

I'm developing an add-on for the first time. It puts a little widget in the status bar that displays the number of unread Google Reader items. To accommodate this, the add-on process queries the Google Reader API every minute and passes the response to the widget. When I run cfx test I get this error:
Error: The page has been destroyed and can no longer be used.
I made sure to catch the widget's detach event and stop the refresh timer in response, but I'm still seeing the error. What am I doing wrong? Here's the relevant code:
// main.js - Main entry point
const tabs = require('tabs');
const widgets = require('widget');
const data = require('self').data;
const timers = require("timers");
const Request = require("request").Request;
function refreshUnreadCount() {
// Put in Google Reader API request
Request({
url: "https://www.google.com/reader/api/0/unread-count?output=json",
onComplete: function(response) {
// Ignore response if we encountered a 404 (e.g. user isn't logged in)
// or a different HTTP error.
// TODO: Can I make this work when third-party cookies are disabled?
if (response.status == 200) {
monitorWidget.postMessage(response.json);
} else {
monitorWidget.postMessage(null);
}
}
}).get();
}
var monitorWidget = widgets.Widget({
// Mandatory widget ID string
id: "greader-monitor",
// A required string description of the widget used for
// accessibility, title bars, and error reporting.
label: "GReader Monitor",
contentURL: data.url("widget.html"),
contentScriptFile: [data.url("jquery-1.7.2.min.js"), data.url("widget.js")],
onClick: function() {
// Open Google Reader when the widget is clicked.
tabs.open("https://www.google.com/reader/view/");
},
onAttach: function(worker) {
// If the widget's inner width changes, reflect that in the GUI
worker.port.on("widthReported", function(newWidth) {
worker.width = newWidth;
});
var refreshTimer = timers.setInterval(refreshUnreadCount, 60000);
// If the monitor widget is destroyed, make sure the timer gets cancelled.
worker.on("detach", function() {
timers.clearInterval(refreshTimer);
});
refreshUnreadCount();
}
});
// widget.js - Status bar widget script
// Every so often, we'll receive the updated item feed. It's our job
// to parse it.
self.on("message", function(json) {
if (json == null) {
$("span#counter").attr("class", "");
$("span#counter").text("N/A");
} else {
var newTotal = 0;
for (var item in json.unreadcounts) {
newTotal += json.unreadcounts[item].count;
}
// Since the cumulative reading list count is a separate part of the
// unread count info, we have to divide the total by 2.
newTotal /= 2;
$("span#counter").text(newTotal);
// Update style
if (newTotal > 0)
$("span#counter").attr("class", "newitems");
else
$("span#counter").attr("class", "");
}
// Reports the current width of the widget
self.port.emit("widthReported", $("div#widget").width());
});
Edit: I've uploaded the project in its entirety to this GitHub repository.
I think if you use the method monitorWidget.port.emit("widthReported", response.json); you can fire the event. It the second way to communicate with the content script and the add-on script.
Reference for the port communication
Reference for the communication with postMessage
I guess that this message comes up when you call monitorWidget.postMessage() in refreshUnreadCount(). The obvious cause for it would be: while you make sure to call refreshUnreadCount() only when the worker is still active, this function will do an asynchronous request which might take a while. So by the time this request completes the worker might be destroyed already.
One solution would be to pass the worker as a parameter to refreshUnreadCount(). It could then add its own detach listener (remove it when the request is done) and ignore the response if the worker was detached while the request was performed.
function refreshUnreadCount(worker) {
var detached = false;
function onDetach()
{
detached = true;
}
worker.on("detach", onDetach);
Request({
...
onComplete: function(response) {
worker.removeListener("detach", onDetach);
if (detached)
return; // Nothing to update with out data
...
}
}).get();
}
Then again, using try..catch to detect this situation and suppress the error would probably be simpler - but not exactly a clean solution.
I've just seen your message on irc, thanks for reporting your issues.
You are facing some internal bug in the SDK. I've opened a bug about that here.
You should definitely keep the first version of your code, where you send messages to the widget, i.e. widget.postMessage (instead of worker.postMessage). Then we will have to fix the bug I linked to in order to just make your code work!!
Then I suggest you to move the setInterval to the toplevel, otherwise you will fire multiple interval and request, one per window. This attach event is fired for each new firefox window.

Navigating / scraping hashbang links with javascript (phantomjs)

I'm trying to download the HTML of a website that is almost entirely generated by JavaScript. So, I need to simulate browser access and have been playing around with PhantomJS. Problem is, the site uses hashbang URLs and I can't seem to get PhantomJS to process the hashbang -- it just keeps calling up the homepage.
The site is http://www.regulations.gov. The default takes you to #!home. I've tried using the following code (from here) to try and process different hashbangs.
if (phantom.state.length === 0) {
if (phantom.args.length === 0) {
console.log('Usage: loadreg_1.js <some hash>');
phantom.exit();
}
var address = 'http://www.regulations.gov/';
console.log(address);
phantom.state = Date.now().toString();
phantom.open(address);
} else {
var hash = phantom.args[0];
document.location = hash;
console.log(document.location.hash);
var elapsed = Date.now() - new Date().setTime(phantom.state);
if (phantom.loadStatus === 'success') {
if (!first_time) {
var first_time = true;
if (!document.addEventListener) {
console.log('Not SUPPORTED!');
}
phantom.render('result.png');
var markup = document.documentElement.innerHTML;
console.log(markup);
phantom.exit();
}
} else {
console.log('FAIL to load the address');
phantom.exit();
}
}
This code produces the correct hashbang (for instance, I can set the hash to '#!contactus') but it doesn't dynamically generate any different HTML--just the default page. It does, however, correctly output that has when I call document.location.hash.
I've also tried to set the initial address to the hashbang, but then the script just hangs and doesn't do anything. For example, if I set the url to http://www.regulations.gov/#!searchResults;rpp=10;po=0 the script just hangs after printing the address to the terminal and nothing ever happens.
The issue here is that the content of the page loads asynchronously, but you're expecting it to be available as soon as the page is loaded.
In order to scrape a page that loads content asynchronously, you need to wait to scrape until the content you're interested in has been loaded. Depending on the page, there might be different ways of checking, but the easiest is just to check at regular intervals for something you expect to see, until you find it.
The trick here is figuring out what to look for - you need something that won't be present on the page until your desired content has been loaded. In this case, the easiest option I found for top-level pages is to manually input the H1 tags you expect to see on each page, keying them to the hash:
var titleMap = {
'#!contactUs': 'Contact Us',
'#!aboutUs': 'About Us'
// etc for the other pages
};
Then in your success block, you can set a recurring timeout to look for the title you want in an h1 tag. When it shows up, you know you can render the page:
if (phantom.loadStatus === 'success') {
// set a recurring timeout for 300 milliseconds
var timeoutId = window.setInterval(function () {
// check for title element you expect to see
var h1s = document.querySelectorAll('h1');
if (h1s) {
// h1s is a node list, not an array, hence the
// weird syntax here
Array.prototype.forEach.call(h1s, function(h1) {
if (h1.textContent.trim() === titleMap[hash]) {
// we found it!
console.log('Found H1: ' + h1.textContent.trim());
phantom.render('result.png');
console.log("Rendered image.");
// stop the cycle
window.clearInterval(timeoutId);
phantom.exit();
}
});
console.log('Found H1 tags, but not ' + titleMap[hash]);
}
console.log('No H1 tags found.');
}, 300);
}
The above code works for me. But it won't work if you need to scrape search results - you'll need to figure out an identifying element or bit of text that you can look for without having to know the title ahead of time.
Edit: Also, it looks like the newest version of PhantomJS now triggers an onResourceReceived event when it gets new data. I haven't looked into this, but you might be able to bind a listener to this event to achieve the same effect.

Categories

Resources