Get http page after javascript has been loaded - javascript

I need to get HTML page when it is already with javascript results.
I use C++ and Qt
...
QNetworkAccessManager *manager = new QNetworkAccessManager(this);
connect(manager, &QNetworkAccessManager::finished, this, &MainWidget::onFinished);
manager->get(QNetworkRequest(QUrl("http://website.com")));
...
void MainWidget::onFinished(QNetworkReply *reply)
{
qDebug() << QString(reply->readAll());
}
but I only able to get such a thing:
<script>
loadMap("mapContainer");
var baseStopId;
function imReady()
{
getMovie().addListener("MARKER_SHOW", "loadSurrounding");
...
</script>
and I need to get results.
Not a javascript but ready html page after proccessing javascript. Any advices? I have found something with Python, but I wish to know if there any C++ variants?

Related

How to trigger jQuery script on site by Java parser

I'm trying to parse a vacancies from https://www.epam.com/careers/job-listings?query=java&department=all&city=Kyiv&country=Ukraine
But I dont get anything execept plain text like "Job Listings Global/English Deutschland/Deutsch Россия/Русский"
The problem is when you load a page - browser runs a script that load some vacancies, but how can I undesrstand JSOUP cant "simulate" browser and run a script. I tried HtmlUnit, but it also done nothing.
Question: What should i do? Am I doing something wrong with HtmlUnit?
Jsoup
Element page = = Jsoup.connect("https://www.epam.com/careers/job-listings?sort=best_match&query=java&department=all&city=all&country=Poland").get();
HtmlUnit
try (final WebClient webClient = new WebClient(BrowserVersion.FIREFOX_52)) {
page = webClient.getPage("https://www.epam.com/careers/job-listings?query=java&department=all&city=Kyiv&country=Ukraine");
}
I think i need manualy run some script with
result = page.executeJavaScript("function aa()");
But which one?
You just need to wait a little as hinted here.
You can use:
try (final WebClient webClient = new WebClient(BrowserVersion.CHROME)) {
String url = "https://www.epam.com/careers/job-listings?query=java&department=all&city=Kyiv&country=Ukraine";
HtmlPage page = webClient.getPage(url);
Thread.sleep(3_000);
System.out.println(page.asXml());
}

load webpage completely in C# (contains page-load scripts)

I'm trying to load a webpage in my application background. following code shows How I am loading a page:
request = (HttpWebRequest)WebRequest.Create("http://example.com");
request.CookieContainer = cookieContainer;
using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
{
Stream st = response.GetResponseStream();
StreamReader sr = new StreamReader(st);
string responseString = sr.ReadToEnd();
sr.Close();
st.Close();
}
as you know, the server responses HTML codes or some javascript codes, but there are many codes which added to the webpage by javascripts functions. so I have to interpret or compile the first HTTP response.
I tried to use System.Windows.Forms.WebBrowser object to load the webpage completely, but this is a weak engine to do this.
so I tried to use CEFSharp (Chromium embedded Browser), it's great and works fine but I have trouble with that. following is how I use CEFSharp to load a webpage:
ChromiumWebBrowser MainBrowser = new ChromiumWebBrowser("http://Example/");
MainBrowser.FrameLoadEnd+=MainBrowser.FrameLoadEnd;
panel1.Controls.Add(MainBrowser);
MainBrowser.LoadHtml(responseString,"http://example.com");
it works fine when I use this code in Form1.cs and when I add MainBrowser to a panel. but I want to use it in another class, actually ChromiumWebBrowser is part of another custom object and the custom object works in background. also it would possible 10 or 20 custom objects work in a same time. in this situation ChromiumWebBrowser doesn't work any more!
second problem is the threading issue, when I call this function MainBrowser.LoadHtml(responseString,"http://example.com");
it doesn't return any results, so I have to pause the code execution by using Semaphore and wait for the result at this event: MainBrowser.FrameLoadEnd
so I wish my code be some thing like this:
request = (HttpWebRequest)WebRequest.Create("http://example.com");
request.CookieContainer = cookieContainer;
string responseString="";
using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
{
Stream st = response.GetResponseStream();
StreamReader sr = new StreamReader(st);
responseString = sr.ReadToEnd();
sr.Close();
st.Close();
}
string FullPageContent = SomeBrowserEngine.LoadHtml(responseString);
//Do stuffs
Can you please show me how to do this? do you know any other web browser engines that work like what I want?
please tell me if I'm doing any things wrong with CEFSharp or other concepts.

Invoking javascript file with dom access

I have a javascript file that has code with DOM access
var a=document.getElementById("abc").value
I have html file, that contains all DOM information
<html>...<input id="abc" ...></html>
Is there anyway to get C# invoke the javascript file, and return the value of a, back to the c# program?
In reality the JavaScript can be much more complex, and I need to channel those "interested values" back to C#, but let's just consider the simple example mentioning here.
Possible directions I could think is using https://jint.codeplex.com/, or Web browser control. The challenge here is that it not only involves the JavaScript, it also involving the HTML file.
What I want to know is:
Is there a way to channel variable value from JavaScript back to C#?
How to get JavaScript evaluate DOM elements from a HTML file?
Using WebBrowser and Jint:
using (WebBrowser browser = new WebBrowser())
{
browser.ScriptErrorsSuppressed = true;
browser.DocumentText = #"<html><head/><body><input id=""abc"" value=""this is the value in input""></body></html>";
// Wait for control to load page
while (browser.ReadyState != WebBrowserReadyState.Complete)
Application.DoEvents();
dynamic d = (dynamic)browser.Document.DomDocument;//get de activex dom
var jengine = new Jint.Engine();
jengine.SetValue("document", d);
try
{
string val=jengine.Execute(#"var a=document.getElementById('abc').value;").GetValue("a").ToString();
Console.WriteLine(val);
}
catch (Jint.Runtime.JavaScriptException je)
{
Console.WriteLine(je);
}
}

Call a Java Function using Browser's Client Side JavaScript

Good morning!
I have been working on a client side browser based app using JavaScript that (all of a sudden) needs the capability to save and load files locally.
The saved files are plain text (.txt) files.
I have managed to get JavaScript to read existing text files. However, I am unable to find reliable information on how to create and edit the contents of these files.
Based on what I see online, I am under the impression that you can't do this with JavaScript alone.
I found out from another source that the best way to do this is outsource the file writing/editing to a Java file and let Java do the work.
I found a code snippet and tweaked it around a bit, but it is not working and I seem to be at a loss:
JAVASCRIPT
<!Doctype html>
<html>
<OBJECT ID="Test" height=0 width=0
CLASSID="CLSID:18F79884-E141-49E4-AB97-99FF47F71C9E" CODEBASE="JavaApplication2/src/TestJava.java" VIEWASTEXT>
</OBJECT>
<script language="Javascript">
var Installed;
Installed = false;
try
{
if (Test==null)
Installed = false;
else
Installed = true;
}
catch (e)
{
Installed = false;
}
alert ("Installed :- " + Installed);
TestStr = Test.SendStr("Basil");
alert (TestStr);
</script>
</html>
JAVA
import javax.swing.*;
public class TestJava {
/**
* #param args the command line arguments
*/
public static void main(String[] args) {
// TODO code application logic here
}
public String SendStr(String lStr)
{
return lStr + "!!!";
}
}
If someone could point me in the right direction or even just explain why this isn't working, I would appreciate it.
I believe the sandbox issue prevents all browsers from performing any and all local file writing, without an enormous amount of working around the access restrictions. It is easier to write files remotely on the server than to write them locally to the client. This is true across all browsers.
So while it may be possible to perform the load function, you cannot perform the 'save' function on the local machine.

PDF hostContainer callback

Following this SO solution here to notify clients of a click event in a PDF document, how is it possible to notify the client when the PDF gets submitted by the client using this.myPDF.submitForm("localhost/Handler.ashx?r=2) function?
The PDF File is created inside a user control then rendered into a HTML object:
string container = ("<object data='/myfile.pdf' type='application/pdf'></object>");
The JS file attached to the PDF is done like this:
var webClient = new WebClient();
string htmlContent = webClient.DownloadString(fileurl + "pdf_script.js");
PdfAction action = PdfAction.JavaScript(htmlContnent, pdfstamper.Writer);
pdfstamper.Writer.SetOpenAction(action);
And the content of the js file:
this.disclosed = true;
if (this.external && this.hostContainer) {
function onMessageFunc(stringArray) {
try {
this.myPDF.submitForm("http://localhost/Handler.ashx?EmpNo=12345" + "#FDF", false);
}
catch (e) {
}
}
function onErrorFunc(e) {
console.show();
console.println(e.toString());
}
try {
if (!this.hostContainer.messageHandler);
this.hostContainer.messageHandler = new Object();
this.hostContainer.messageHandler.myPDF = this;
this.hostContainer.messageHandler.onMessage = onMessageFunc;
this.hostContainer.messageHandler.onError = onErrorFunc;
this.hostContainer.messageHandler.onDisclose = function () { return true; };
}
catch (e) {
onErrorFunc(e);
}
}
When the submitForm call is made the PDF contents (form fields) get saved successfully and an alert is displayed in the PDF by doing this:
message = "%FDF-1.2
1 0 obj
<<
/FDF
<<
/Status("Success!")
>>
>>
endobj
trailer
<</Root 1 0 R>>
%%EOF");
return message;
What I'm trying to do is to get the PDF to callback the client after the form submit call sent from this client, a way to acknowledge the client that the form has been submitted, not in a form of an alert, but rather, a way to trigger a function in the host (the container, an iframe, object...etc).
The FDF response you used was unknown to me, so I've learned something new from your question. I've studied the AcroJS Reference and the FDF specification in the PDF Reference, and now I have a better understanding of what your code does. Thank you for that.
I assume that you already know how to trigger a JavaScript message in an HTML file using a JavaScript call from a PDF. See the createMessageHandler() in the JavaScript Communication between HTML and PDF article.
I interpret your question as: "How to I invoke this method after a successful submission of the data?"
If there's a solution to this question, it will involve JavaScript. I see that one can add JavaScript in an FDF file, but I'm not sure if that JavaScript can 'talk to' HTML. I'm not sure if you can call a JavaScript function in your initial PDF from the FDF response. If it's possible, you should add a JavaScript entry to your PDF similar to the /Status entry.
The value of this entry is a dictionary, something like:
<<
/Before (app.alert\("before!"\))
/After (app.alert\("after"\))
/Doc [/MyDocScript1, (myFunc1\(\)),
/MyDocScript2, (myFunc2\(\))
>>
In your case, I would remove the /Before and /Doc keys. I don't think you need them, I'd reduce the dictionary to:
<<
/After (talkToHtml\(\))
>>
Where talkToHtml() is a method already present in the PDF:
function talkToHtml() {
var names = new Array();
names[0] = "Success!";
try{
this.hostContainer.postMessage(names);
}
catch(e){
app.alert(e.message);
}
}
I don't know if this will work. I've never tried it myself. I'm basing my answer on the specs.
I don't know if you really need to use FDF. Have you tried adding JavaScript to your submitForm() method? Something like:
this.myPDF.submitForm({
cURL: "http://localhost/Handler.ashx?EmpNo=12345",
cSubmitAs: "FDF",
oJavaScript: {
Before: 'app.alert("before!")',
After: 'app.alert("after")',
Doc: ["MyDocScript1", "myFunc1()",
"MyDocScript2", "myFunc2()" ]
}
});
This will only work if you submit as FDF. I don't think there's a solution if you submit an HTML query string.
In case you're wondering what MyDocScript1 and MyDocScript2 are:
Doc defines an array defining additional JavaScript scripts to be
added to those defined in the JavaScript entry of the document’s name
dictionary. The array contains an even number of elements, organized
in pairs. The first element of each pair is a name and the second
is a text string or text stream defining the script corresponding
to that name. Each of the defined scripts is added to those already
defined in the name dictionary and then executed before the script
defined in the Before entry is executed. (ISO-32000-1 Table 245)
I'm not sure if all of this will work in practice. Please let me know either way.

Categories

Resources