reading dynamic/java page using webrequest - javascript

I have a VB.NET (2010) forms project, and I want to read a webpage that has a java script on it, generating log output.
But when I do a request to the webpage, I only get the static part of the page.
When I fire the URL in a browser, it displays dynamic content and is updated regularry.
Whats the best wat to execute javascript remotely from vb.net.

Related

Call JavaScript PostBack via another Script while parsing Page

I am writing a plugin for google chrome which goes and extracts some data for a webpage and saves it to a local db. I have covered all the parsing of the pages but some info is on a different tab and for now requires me to navigate manually as it is not a basic url but a javascript. My quiestion is, is there a way to script calling a script on the page i am parsing to force the postback and if so how would i go about it. If not is there another way to solve this ?
Below are some links i would like to call from my script on the webpage
Tax
OC19064888

ASP.NET: on the server side, create PDF copy of HTML, CSS and Javascript webpage, exactly as it appears in Google Chrome

Is it possible to use C# to render an ASP.NET view on the server side and save it as a PDF, preserving all the visual elements that involve CSS and Javascript, exactly as it renders in Chrome? The Javascript includes the latest versions of the standard Bootstrap and d3 libraries, as well as code using d3 to draw SVG charts. The page's CSS heavily uses Bootstrap.
I've tried a few things including IronPdf, but it completely destroys the formatting no matter what options I have tried. The only good results I've been able to get are by actually viewing the web page in Chrome, trying to print it, and saving it as a PDF that way. I'm trying to basically get exactly the same results using backend C# code to generate the PDF, without any user interaction needed. Can this be done? If it's impossible to render perfectly as a PDF I would also be open to other visual file formats that preserve the appearance of the web page.
You can run any install of chrome in headless mode and send it a command to print:
chrome --headless --disable-gpu --print-to-pdf=file1.pdf https://www.google.co.in/
I did this a couple of years ago. I created a microservice that took a URI, a test javascript, and a massage script then returned a pdf.
Test Script:
The test script is injected into the page and is called repeatedly until it returns true. The script should verify that components are in a properly loaded state. (This could be skipped by simply using a long delay prior to printing the pdf)
Massage Script:
The massage script is not required. It is injected into the page to alter the javascript or HTML prior to printing the pdf.
I used this heavily to load the entire user JOM including all Angular data stores (NGRX) since the user context was not present in the server-side Chrome instance.
Delayed Printing:
Since this is not a feature supported by chrome, I made an endpoint on my server that would hold a GET connection indefinitely. A script referencing the endpoint was injected into each page to be printed. When the Test Script returned ready, the code would cancel the script request by changing the script tag src to an empty script file that would return. This would conclude the last item that Chrome was waiting on and the documentready event would fire thus triggering the chrome print.
In this way, I was able to control Chrome printing on complex authenticated pages at my server.

C# Collecting data from website after scripts are loaded

I want to download some code from an HTML website but the data that I need appears after the JavaScript loads (as i know). I tried WebClient but this gets only the HTML code without any JS changes and as far I know there is nothing more I can do. Now I'm trying WebBrowser in WPF and Forms. I have WebBrowser control and I'm navigating to my url address but I'm getting JS errors and scripts are still not loading.
webBrowser1.Navigate(new Uri("http://www.polskieszlaki.pl/atrakcje/woj-slaskie/"));
How to get webpage fully loaded with all scripts?
Btw. I don't need a web browser, I just need to collect some data so the HTML code after scripts are loaded is enough for me.

Is there any way to run asp.net web browser control on server?

ASP.NET. VB.NET 3.5
In order to scrape image URLs automatically from some of our clients' websites, we want to inspect the DOM after JavaScript has completed running as often the rendered HTML changes because of onload() JavaScript. The article:
Get the final generated html source using c# or vb.net
shows how to do that with a form with a web browser control on the client but is there a way to do it all on the server (since our process is called in a background thread anyway when the client navigates off a certain aspx page)?
Tia
See my comment for other links for a response.

How to extract the dynamically generated HTML from a website

Is it possible to extract the HTML of a page as it shows in the HTML panel of Firebug or the Chrome DevTools?
I have to crawl a lot of websites but sometimes the information is not in the static source code, a JavaScript runs after the page is loaded and creates some new HTML content dynamically. If I then extract the source code, these contents are not there.
I have a web crawler built in Java to do this, but it's using a lot of old libraries. Therefore, I want to move to a Rails/Ruby solution for learning purposes. I already played a bit with Nokogiri and Mechanize.
If the crawler is able to execute JavaScript, you can simply get the dynamically created HTML structure using document.firstElementChild.outerHTML.
Nokogiri and Mechanize are currently not able to parse JavaScript. See
"Ruby Nokogiri Javascript Parsing" and "How do I use Mechanize to process JavaScript?" for this.
You will need another tool like WATIR or Selenium. Those drive a real web browser, and can thus handle any JavaScript.
You can't fetch the records coming from the database side. You can only fetch the HTML code which is static.
JavaScript must be requesting the records from the database using a query request which can't be fetch by the crawler.

Categories

Resources