Scrape values generated by script from active tab chrome extension content script - javascript

I'm trying to scrape values from a page that have been generated by a JS script. When I inspect the page, i see the values there, but my selectors return null/undefined.
The purpose of the extension is to allow people, on click of a button, to scrape their personalised data from a page that requires login WITHOUT having to provide any login details to the extension.
In chrome-console, the static "title" values return, so i'm pretty sure my selectors are fine and it's just that accessing the document doesn't count for the executed scripts.
From reading, I might need to use something like pupeteer or selenium, but it seems they fire up their own browser instance (bad, as I'd need to take user login details to mock the sign in process) or i'd need to modify how the chrome browser starts with --remote-debugging-port=A_PORT_NUMBER which i want to avoid.
From chrome console and my extension, I can retrieve the values highlighted green, (so it is not an issue with iframes as some posts suggest) and can't retrieve values highlighted red.
HTML structure in image
From popup.html
document.addEventListener("DOMContentLoaded", function () {
...
document.querySelector('button[id="scrape"]').addEventListener("click", function onclick() {
chrome.tabs.query({ currentWindow: true, active: true },
function (activeTab) {
chrome.tabs.sendMessage(activeTab[0].id, { action: "putSource_scrapeSalePage", index: activeTab[0].index })
}
)
})
...
}, false)
From content.js
//Need to import pupeteer/selenium here? How else to use it for active tab?
chrome.runtime.onMessage.addListener(
function (request, sender, sendResponse) {
...
else if (request.action === "putSource_scrapeSalePage") {
let htmlvar = $(document)
console.log(htmlvar);
let test = $('td[desc= "transactionType"]').text().trim() //returns fine
let tableData1raw = $('table.tableDataOne tbody tr').find("tbody").find("tr")
let tableData1raw_almost = $(tableData1raw).each(function (i, element) {
console.log(element)
const $element = $(element).find("td")
console.log($element)
...
The Question:
If there is no better way to do this, how can I do this from content-script with something like pupeteer?

In the end I was able to use the Value i know i COULD get ("transaction type" title) and use it to traverse to it's sibling element (+) and retrieve whatever Div was there, instead of trying to target the Div class directly.
$('td[desc= "transactionType"] + td').find("div").text();

Related

How to get a javascript (emberjs) rendered HTML source in javascript

Problem: I am working on an extension in javascript which needs to be able to view the source HTML of a page after everything is rendered.
The problem is that no matter what method I use, I can only seem to retrieve the pre-rendered source. The website is using emberjs for generating the content of the page.
Example:
Site: https://www.playstation.com/en-us/explore/games/ps4-games/?console=ps4
When I right click and view source, I get the page before the content is loaded.When I right click and inspect element, I want to get the source after the content has loaded.
What I've tried:
background.js
var acceptedURLPattern = "playstation.com";
tabUpdatedCallback = function(tabID, changeInfo, tab) {
if(tab.url.indexOf(acceptedURLPattern) == -1) return;
var eventJsonScript = {
code: "console.log(\"Script Injected\"); window.addEventListener(\"load\", (event) => { " + browserString + ".runtime.sendMessage({ \"html\": document.documentElement.outerHTML });});"
};
browser.tabs.executeScript(tabID, eventJsonScript);
}
handleHTMLMessage = function(request, sender, sendResponse) {
console.log(request);
}
browser.tabs.onUpdated.addListener(tabUpdatedCallback);
browser.runtime.onMessage.addListener(handleHTMLMessage);
The above script is injecting an eventListener onto the page I want to grab the source of after it fires the "load" event which will then send a message back to background.js containing that source.
I've tried changing the documentElement to innerHTML/outerHTML as well as changing the eventListener to document.addEventListener(\"DOMContentLoaded\"), but none of these changes seemed to have any effect.
I've also tried using these: Get javascript rendered html source using phantomjs and get a browser rendered html+javascript but they are using phantomjs to load and execute the page, then return the html. In my solution, I need to be able to grab the already rendered page.
Thanks for the help in advance!
Edit #1:
I took a look at MutationObserver as mentioned by #wOxxOm and changed the eventJsonScript variable to look like this:
var eventJsonScript = {
code: "console.log(\"Script Injected\"); var mutationObserver = new MutationObserver( (mutations) => { mutations.forEach((mutation) => {if( JSON.stringify(mutation).indexOf(\"Yakuza\") != -1) { console.log(mutation); } });}); mutationObserver.observe(document.documentElement, {attributes: true, characterData: true, childList: true, subtree: true, attributeOldValue: true, characterDataOldValue: true}); mutationObserver.takeRecords()"
};
however despite the site clearly having a section for Yakuza 6, the event doesn't get fired. I did remove the if condition in the injected script to verify that events do get fired normally, it just doesn't seem to contain information that I'm looking for.
So the good news is that someone has already written the code to do this in Ember, you can find it here:
https://github.com/emberjs/ember-test-helpers/blob/031969d016fb0201fd8504ac275526f3a0ab2ecd/addon-test-support/%40ember/test-helpers/settled.js
This is the code Ember tests use to wait until everything is rendered and complete, or "settled".
The bad news is it is a nontrivial task to extract it correctly for your extension.
Basically, you will want to:
Wait till the page is loaded (window.load event)
setTimeout at least 200 ms to ensure the Ember app has booted.
Wait until settled, using code linked above.
Wait until browser is idle (requestIdleCallback in latest Chrome, or get a polyfill).
Hope this helps get you started.

(Javascript , Chrome) query tab id and then access its elements

I would like to get elements from chrome tab with a certain URL, it does not have to be active. so far I have:
Test()
function Test() {
chrome.tabs.query({url: "https://www.somewebsite.com/*"}, function(results) {
chrome.tabs.executeScript(results[0].id,{code: 'El = document.getElementsByClassName("someclass")'});
console.log(El);
})
}
Maybe it has to be done through a content script file?
I have this code placed in my background.js file. Given the proper URL and Class this function will not return the Element. Why?
Thanks for any suggestions!
The background script is executed only once when chrome launches. Instead of writing this inside a function which you call in the same file, you have to put it in some kind of event listener. This code should stay in the background file, but has to be triggered by something, not just executed on extension load.
EDIT: Oh, I see, you're trying to pass the element to the background script. You can't.
Background scripts have access to the whole chrome APIs but not to the page content. The script you write in executeScript runs on the tab you specified, so the variables are available to other code within that tab, not within your background script. Extension code that can run on the page and edit its content is either sent using executeScript or put in content scripts.
If you want to share information between various layers of an extension, you need to use messages. They work like events and listeners. Read the docs here. You'll be able to pass data like numbers and strings, but not the actual HTMLElement. So any code that manipulates the DOM has to run on the tab itself.
If your code is small and simple, you could just write it in the executeScript call instead.
You are correct, there does not appear to be a way to manipulate elements in another tab from the background.js;
Test();
function Test() {
chrome.tabs.query({
url: "https://www.google.com/*"
}, function(results) {
if (results.length < 1) { //if there is no tab open with google.com
console.log("Tab not found, creating new tab");
chrome.tabs.create({
"url": "https://www.google.com/",
"selected": true
}, null);
chrome.tabs.query({
url: "https://www.google.com/*"
}, function(results) {
if (results.length >= 1) {
console.log("Found google.com in the new tab");
}
chrome.tabs.executeScript(results[0].id, {
code: "foo = document.getElementsByTagName('input'); console.log(foo,' This is sent from the active tab');
chrome.storage.local.set({
'foo': foo[0]
});
"},function(foo) {chrome.storage.local.get("
foo ", function(foo) {console.log(foo , 'This has retrieved foo right after saving it to storage.');});});
});
}
});
}
search()
function search(result) {
chrome.storage.local.get("foo", function(result) {
console.log(result, 'This has retrieved foo from storage in a separate function within background.js');});
}
the first console log will show the html collection object for the inputs from google.com inside the google.com tab console, the other two will show blank objects in the background.js console. Thanks, I will find another solution.

passing data from executed script to background in chrome extension

I have this idea for passing data form an injected script (getDOM.js) to my background.js
bacgkround.js
chrome.contextMenus.onClicked.addListener(function(info, tab){
chrome.tabs.executeScript(tab.id, {file: "getDOM.js"})
});
chrome.contextMenus.onClicked.addListener(function(info, tab){
chrome.tabs.query({active: true, currentWindow: true}, function(tabs) {
chrome.tabs.sendMessage(tabs[0].id, {greeting: "GetURL"}, function(response) {
alert(response.navURL);
});
});
});
getDOM.js
chrome.runtime.onMessage.addListener(
function(request, sender, sendResponse) {
if (request.greeting === "GetURL")
sendResponse({navURL:'test'});
});
as you can see i used massage function to pass data, but there is a problem. i cant get data on right time, i will pass background.js previous data
content must be dynamic (not specific "test"), every time it will alert previous data. imagine getDOM.js will pass selected text, with this code it will pas previous selected text. how can i fix this ?
my example of dynamic data :
function getHTMLOfSelection () {
var range;
if (document.selection && document.selection.createRange) {
range = document.selection.createRange();
return range.htmlText;
}
else if (window.getSelection) {
var selection = window.getSelection();
if (selection.rangeCount > 0) {
range = selection.getRangeAt(0);
var clonedSelection = range.cloneContents();
var div = document.createElement('div');
div.appendChild(clonedSelection);
return div.innerHTML;
}
else {
return '';
}
}
else {
return '';
}
}
var dom = getHTMLOfSelection();
chrome.runtime.onMessage.addListener(
function(request, sender, sendResponse) {
if (request.greeting === "getDom")
sendResponse({DOM:dom});
});
it will pass selected dom to background.js
Lots of problems here. Let's see.
It does not make sense to execute getHTMLOfSelection() at the point when the script is injected. You probably should put that inside the message handler: to get the selection when asked to.
A much bigger problem is the fact that every time you inject a script. This, together with 1, leads to all kinds of fun. Let's look in more detail!
The user triggers the context menu.
Your first contextMenu.onClicked handler runs. The script injection is scheduled. It's asynchronous, so you don't know when it will finish unless you use a callback. Which you don't.
Your second contextMenu.onClicked handler runs. It sends a message to a script which potentially haven't finished executing. If you're lucky, you get a response.
User triggers the context menu again on the same page.
Your first contextMenu.onClicked handler runs, again. The script is going to be injected again, creating a second listener for the message that will compete with the first. It's again asynchronous, so maybe by the time your message arrives dom is up to date. Maybe not.
Your second contextMenu.onClicked handler runs, again. This time there sure is a message listener (maybe two!) that returns maybe up to date data.
And so on. You see the problem?
Furthermore, you can't pass a DOM object with sendResponse. The object needs to be JSON-serializable, and DOM objects contain circular references, which is a no-no. You need to extract the data you need on the content script side, and pass only that.
So, let's try to tackle those problems.
There are 2 ways of dealing with this. I'll present both, pick the one you prefer. Both will take care of problems 1 and 2.
First way is to ensure your message handler is only added once by including some kind of guard variable:
// getDOM.js
if(!getDOM_ready) { // This will be undefined the first time around
getDOM_ready = true;
chrome.runtime.onMessage.addListener(
function(request, sender, sendResponse) {
// If only you knew how I hate this "greeting" example copied over
if (request.command === "GetURL") {
var dom = getHTMLOfSelection();
sendResponse({DOM: dom});
}
}
);
}
function getHTMLOfSelection() {
/* ... */
}
Then on the background side, we need to be sure we send a message only after the script finishes executing:
chrome.contextMenus.onClicked.addListener(function(info, tab){
chrome.tabs.executeScript(tab.id, {file: "getDOM.js"}, function() {
// This only executes after the content script runs
// Oh, and you most certainly don't need to query for tabs,
// you already have a tab and its id in `onClicked`
chrome.tabs.sendMessage(tab.id, {command: "GetURL"}, function(response) {
alert(response.navURL);
});
});
The second way is to drop Messaging altogether. executeScript actually returns the last value the content script evaluated. That makes the content script trivial and does not leave a message listener behind:
// getDOM.js
function getHTMLOfSelection() {
/* ... */
}
getHTMLOfSelection(); // Yes, that's it
On the background side, you need to adapt the listener:
chrome.contextMenus.onClicked.addListener(function(info, tab){
chrome.tabs.executeScript(tab.id, {file: "getDOM.js"}, function(results) {
// results is an array, because it can be executed in more than one frame
alert(results[0]);
});
The code is much simpler here, AND it does not leave an active event listener behind.
As for problem 3, you need to extract the info you need (say, a link) from the selection and pass only that instead of the object.
Finally, this is not a complete solution. Problem is, your selection can be inside an iframe, and this solution only injects code into the top frame. Solution to that is left as an exercise to the reader; using all_frames: true in the content script options will inject into all frames, and one of them will have a non-empty selection. You just need to see which.

Get all elements of the current window

I need to modify the DOM of a page but can not find how to get the elements of the current window.
window.onload = function() {
var bot = document.getElementById('bot');
bot.style.cursor = "pointer";
bot.addEventListener('click',function(){
chrome.tabs.query({
currentWindow: true,
active: true
}, function(tab) {
// tab[0] <----- I need to get the elements of the current window to modify
// for example document.getElementById('element')
});
});
}
I assume the code you showed runs in a popup.
To access DOM of a page open in an existing tab, you need Content Scripts.
See this Architecture Overview as the first step.
For a concrete example, given a tab ID, you can inject a script and get a single value back (as a simplest solution) like this:
chrome.tabs.executeScript({file: "content.js"}, function(result){
console.log(result);
});
And the content script:
// Simplest possible DOM operation
return document.getElementById('element').value;
For more advanced usage, see the documentation link above and Messaging documentation.

Chrome extension persistent popup best practices

I've understood from the docs that closing chrome extension popups when losing focus has been a design choice.
I'm working on an extension where the user chooses to save elements from a webpage. As he interacts with the main webpage I would like the popup to get updated but that's obviously not possible.
What's the proper way of handling this situation? (this is my first chrome extension)
You can have a content script detect the "save" action. Let's suppose it's a specific DOM element you know for sure it's going to be in the specific main, or that you create by yourself.
content.js
//content script
document.onreadystatechange = function () {
if (document.readyState == "complete") {
// Grab the UI frmo the mainpage you want to append the save functionality
var someElementsYouWantToAppendASaveButtonTo = document.getElementsByTagName("...");
var len = someElementsYouWantToAppendASaveButtonTo.length;
for (var i = 0; i < len; i++) {
// Create a UI save button to provide a functionality
var theSaveButton = document.createElement("button");
theSaveButton.value = "Save to Chrome Extension";
// Send data to extension when clicked
theSaveButton.addEventListener("click", function() {
var dataToSentToExtension = {...} // Retrieve from the clicked element, or whatever you want to save
chrome.extension.sendMessage(dataToSentToExtension, function(response) {
if(response.success) console.log("Saved successfully");
else console.log("There was an error while saving")
});
}, false);
someElementsYouWantToAppendASaveButtonTo[i].appendChild(theSaveButton)
}
}
}
Then, on the background, you detect the response and set up the popup as you wish.
background.js
chrome.extension.onMessage.addListener(function(request, sender, sendResponse) {
if(request.dataToSave) {
chrome.storage.local.set(dataToSave, function() {...});
// You can then set upn the proper popup for the next click or even switch to it
switch(request.popupToDisplay) {
case "awesomeDisplay":
chrome.browserAction.setPopup({...})
break;
}
var responseFromExtension = {success: true}
} else {
var responseFromExtension = {error: true}
}
});
It seems you are looking to modify\update your popup.html page in accord to changes in a web page. If so, use content scripts and establish connection for single message communication with background page(Since focus is lost every time) and update popup.html indirectly.
References:
Content Scripts
Background Page
Message Passing
Reference for communication between popup and background page apart from these,
there are bunch of questions on these topics, they will get you started..

Categories

Resources