I want to get data from html page, but the page has onload functions which aren't executed when I use the get method of requests.Session().
with Session() as s:
s.get('https://o2.amdm.pro/amdm/S/S/S/insure/Portfolio#/entityHandle=01%7CPD%7C00000000000001384776%7C0001%7C0001', stream=True)
My question is, how to execute those functions as if I was in a browser to get the missing data in order to fill the main div ? Or at least, load the page in a browser and get the html from this page fully loaded ?
My question is, how to execute those functions as if I was in a browser to get the missing data in order to fill the main div ?
You need tool with JavaScript support for that, if you want one similar in usage to python-requests I suggest giving a try Requests-HTML.
load the page in a browser and get the html from this page fully loaded
For that you need web automation tool, here I suggest trying Selenium.
Related
I have 5 html pages and a JavaScript function DoInitialConfiguration() in a JavaScript File. User can open any of the five html pages and I want that irrespective of which page is opened, I call this function on the first page access. But also want to remember that the function has been called once and not call it in other page load. I only have these 5 html pages and the JavaScript file which has the function. I am owner of the JavaScript file but can do limited change in the html pages (which I don't own) like load the JavaScipt file and call the function DoInitialConfiguration().
Since the JavaScript file will remain in browser cache, is there a way to remember the function has been called once by using any variable in the JS file. It is OK to call DoInitialConfiguration() again if the page is reloaded after clearing browser cache.
how can this functionality be achieved
If your 5 pages are hosted under same site (which probably would be the case), you can use localStorage to add a key to check if your script was called first time or not.
if (localStorage.getItem("firstRun") != null) {
// second run+ code goes here
} else {
localStorage.setItem("firstRun", "ohyes");
// first run code goes here
}
You can possibly use localStorage for this. Once your code executes set a localStorage variable i.e. localStorage.setItem(<key>, <value>) and in the function check if the localStorage has been set i.e. localStorage.getItem("lastname"). If its set do not execute the code.
It would be good to understand you setup and case study better.
If I understand you correctly, you have 5 separate HTML pages (and you are not running a Single Page Application [SPA]) then what you want to do is impossible through browser and cache memory alone. If you want to remember settings you need to save these using localStorage or cookies (as some of the answers popped up have suggested) but as they are 5 different html pages what does the Js do to make you not want to re-run it on a second page load?
I am trying to analyze some JavaScript code for which I make use of function rewriting so that calls to a JavaScript library go through my JavaScript code. My JavaScript code is part of a Chrome Extension. From a Chrome extension content script, I install/inject the code into the target page's DOM.
This works fine for functions that are induced after the load of page. The library calls go through my function. But, there's JavaScript code that runs while the page is actually loading (probably while the DOM is being rendered). This happens before my custom script is injected. This way, the function calls before the custom script is injected are lost to me, or those JavaScript calls do not go through my function.
I make use of Content Script to actually inject other JavaScript by appending to the DOM as mentioned in the following Stack Exchange question:
Insert code into the page context using a content script
I know I can cause the loading time of Content Script to be at the start/end of the DOM but this is another script file that I append to the DOM of the target page. I do not seem to understand how to control it.
The problem explained in Is it possible to run a script in context of a webpage, before any of the webpage's scripts run, using a chrome extension?
is exactly the same, but the solution does not seem to work. My intention is to make the injected script execute before any JavaScript code executes from the webpage. By specifying document_start in manifest.json, content script execution can be made to run before the webpage, but not the script that I inject through the content script (injecting script as explained in first link). This injected script is not running in any specific manner with respect to the webpage
Manifest.json:
Manifest file has the content script content.js added at document_start, so content.js is run before the target webpage (underlying page) runs.
"content_scripts":[
{
"matches":["<all_urls>"],
"js":["content.js"],
"run_at":"document_start",
"all_frames":false
}
],
content.js:
content.js has the below code with which I add the main.js to the DOM, so that I am actually able to interact with the JavaScript that is in the target page's environment. I do this from a different file and attach it to the DOM because I cannot interact with the target page's JavaScript through the Content Scripts, since they both do not interfere with each other.
To explain further, main.js has some JavaScript that intercepts JavaScript calls during the execution of JavaScript in target page. JavaScript in target page makes calls to a library and I intend just to write a wrapper on those library functions.
var u = document.createElement('script');
u.src = chrome.extension.getURL('main.js');
(document.head||document.documentElement).appendChild(u);
u.onload = function() {
u.parentNode.removeChild(u);
};
I expect that main.js is available in the target page's domain and any of the scripts in the target page, since I inject it through the content script that is run at document_start.
Assume I have a call to some JavaScript function like this in my target page HTML, someJSCall() is defined by the target page's domain.
<html onLoad="someJSCall( )">
In this scenario, main.js (code injected through my Chrome extension) is already available. So calls to the JavaScript library from someJSCall() function go through main.js wrapper functions.
This works fine.
The problem is when there are IIFE (immediately invoked function expressions) defined in the target page's JavaScript. If these IIFE calls make library calls, this does not go through my main.js interceptions. If I look at the files loaded in the browser through Chrome Dev Tools, I see that main.js is still not loaded while IIFE calls are executing.
I hope I have explained the problem in detail.
Based on the additional information you added to the question about 2.5 weeks after I answered, you are adding code to the page context by including a "main.js", which is a separate file in your extension, using a <script> that looks something like:
<script src="URL_to_file_in_extension/main.js"/>
However, when you do that you introduce an asynchronous delay between when the <script> is inserted into the page and when the "main.js" is fetched from the extension and executed in the page context. You will not be able to control how long this delay is and it may, or may not, result in your code running prior to any particular code in the page. It will probably run prior to code that has to be fetched from external URLs, but may not.
In order to guarantee that your code runs synchronously, you must insert it in a <script> tag as actual code, not using the src attribute to pull in another file. That means the code which you want to execute in the page must exist within the content script file you are loading into the page.
Needing to execute code in the page context is a fairly common requirement. I've needed to do so in browser extensions (e.g. Chrome, Firefox, Edge, etc.) and in userscripts. I've also wanted to be able to pass data to such code, so I wrote a function called executeInPage(), which will take a function defined in the current context, convert it to text, insert it into the page context and execute it while passing any arguments you have for it (of most types). If interested, you can find executeInPage() in my answer to Calling webpage JavaScript methods from browser extension and my answer to How to use cloneInto in a Firefox web extension?
The following is my original answer based on the original version of the question, which did not show when the content script was being executed, or explain that the code being added to the page was in a separate file, not in the actual content script.
You state in your question that you "can handle the loading time of Content Script to be at the start/end of the DOM", but you don't make clear why you are unable to resolve your issue by executing your content script at document_start.
You can have your script injected prior to the page you are injecting into being built by specifying document_start for the run_at property in your manifest.json content_scripts entry, or for the runAt option passed to chrome.tabs.executeScript(). If you do this, then your script will start running when document.head and document.body are both null. You can then control what gets added to the page.
For chrome.tabs.executeScript() exactly when your script runs depends on when you execute chrome.tabs.executeScript() in relation to the process of loading the page. Due to the asynchronous nature of the processing (your background script is usually running in a different process), it is difficult to get your script consistently injected when document.head and document.body are both null. The best I've accomplished is to have the script injected sometimes when that is the case, and sometimes after the page is populated, but prior to any other resources being fetched. This timing will work for most things, but if you really need to have your script run prior to the page existing, then you should use a manifest.json content_scripts entry.
With your content script running prior to the existence of the head and body, you can control what gets inserted first. Thus, you can insert your <script> prior to anything else on the page. This should make your script execute prior to any other script in the page context.
I want to grab data from a webpage and display it in my Android app. The problem is, the elements I want from the HTML must be first created by an ajax call.
Because the data is loaded via Javascript my approach is to use a Webview to return the HTML. I use the method outlined by jluckyiv here : How do I get the web page contents from a WebView?
However, I realized this doesn't work because the ajax calls have not returned by when the javascript has finished running.
Are there any solutions? I don't have the access to modify the code on the webpage.
Do you use setJavaScriptEnabled(true) ?
I have a script that sends emails (subscriptions). Processing takes long enough. The script outputs a log (of what is successfully sent to the moment) to browser.
I would like it to show progress bar also. How do I do that? There is no AJAX calls, page loads synchronously. I thought may be I should just output <script>...</script> tags every X email sendings to move progress bar, but I'm not sure it's cross-browser compliant. Is that a standard, that browser should execute Javascript as soon as it encounters some in the page body?
Yes, your idea is used very often and should work in most browsers - the standard is that javascript must be executed synchronously, unless specifically told not to by the async attribute.
So you can just put in a script tag every now and again to update the status bar. Gmail uses the same technique, as far as I know.
I'm loading user control through jQuery in my asp.net page.
User control contains JavaScript files, while loading the user control all my js load at one time which are dependent on each other and they tend to give error while all file load at one time. So I want that my JavaScript file to load synchronously one by one , as one file get completely loaded than next file should start loading .
Is there any way to set synchronously mode in JavaScript? or any JavaScript to set this? Any pointer or suggestion would be really helpful.
You should use jQuery.load() to load only a HTML fragment and not a full page with the scripts. jQuery use DOM structure of the loaded document to modify the DOM structure of the corresponding part of your page (controls).
In general you can use jQuery.ajax to load a script, but I recommend you to use the simplified form jQuery.getScript() instead. jQuery.getScript() can be used to load a JavaScript file from the server using a GET HTTP request and then execute it. Using success event handler you can do some action after the script are loaded.
JS files are loaded in the order you put them in your HTML code.
For example,
<script type="text/javascript" src="http://ajax.googleapis.com/ajax/libs/jquery/1.4.2/jquery.min.js"></script>
<script type="text/javascript" src="http://ajax.googleapis.com/ajax/libs/jqueryui/1.8.4/jquery-ui.min.js"></script>
You always need to load jQuery before jQuery UI (or UI will not be recognize since it uses the jQuery $ shortcut) so you must put the line with jQuery before the one with jQuery UI into your HTML.
And when your page is fully loaded, js will start thanks to window.onload, $(document).ready(function(){}); for jQuery or via the first command it will encounter.
JavaScript files always load synchronously. In fact, JavaScript always runs synchronously because it is single threaded.
My guess is that you need to work out which order to include the files so that it runs properly. You can use the window.onload event to run script once all of the JavaScript and images have been loaded.