I need to change a web page source in GeckoFX web browser including html, css and js.
This is my code:
geckoWebBrowser1.Navigate("http://example.com/");
geckoWebBrowser1.DocumentCompleted += GeckoWebBrowser1_DocumentCompleted;
private void GeckoWebBrowser1_DocumentCompleted(object sender, Gecko.Events.GeckoDocumentCompletedEventArgs e)
{
WebClient w = new WebClient();
string s = (w.DownloadString("http://example.com/"));
//after do changes on (s)
geckoWebBrowser1.LoadHtml(s, "http://example.com/");
But it's not working on javascript, can anyone help me?
The problem is that geckoWebBrowser1.LoadHtml also triggers GeckoWebBrowser1_DocumentCompleted(). So you will loop endlessly.
Move the LoadHtml to another function, or change the content live as below.
Also, are you're using WebClient to download the same page? There is no need, the source is already available.
GeckoHtmlElement element = null;
var geckoDomElement = e.Window.Document.DocumentElement;
if (geckoDomElement is GeckoHtmlElement)
{
element = (GeckoHtmlElement)geckoDomElement;
element.InnerHtml = element.InnerHtml.Replace("Google", "Göggel");
}
Javascript is most easily executed using the following:
using (AutoJSContext context = new AutoJSContext(ActiveBrowser.Window.DomWindow))
{
var result = context.EvaluateScript("testFunction();");
}
Related
I just managed to implement a small webserver on my Raspberry Pi.
The webserver is created as an UWP headless app.
It can use Javascript. Which is pretty helpful.
I only just start with HTML and JS so I'm a big noob in this and need some help.
I already managed to show the same data I show on the webpage in a headed app on the same device.
Now I want to be able to manipulate the data from the webpage.
But I don't know how I'm supposed to do that.
I parse the HTML / JS as a complete string so I can't use variables I defined in code. I would need another way to do this.
My code for the webserver is currently this:
public sealed class StartupTask : IBackgroundTask
{
private static BackgroundTaskDeferral _deferral = null;
public async void Run(IBackgroundTaskInstance taskInstance)
{
_deferral = taskInstance.GetDeferral();
var webServer = new MyWebServer();
await ThreadPool.RunAsync(workItem => { webServer.Start(); });
}
}
class MyWebServer
{
private const uint BufferSize = 8192;
public async void Start()
{
var listener = new StreamSocketListener();
await listener.BindServiceNameAsync("8081");
listener.ConnectionReceived += async (sender, args) =>
{
var request = new StringBuilder();
using (var input = args.Socket.InputStream)
{
var data = new byte[BufferSize];
IBuffer buffer = data.AsBuffer();
var dataRead = BufferSize;
while (dataRead == BufferSize)
{
await input.ReadAsync(buffer, BufferSize, InputStreamOptions.Partial);
request.Append(Encoding.UTF8.GetString(data, 0, data.Length));
dataRead = buffer.Length;
}
}
string query = GetQuery(request);
using (var output = args.Socket.OutputStream)
{
using (var response = output.AsStreamForWrite())
{
string htmlContent = "<html>";
htmlContent += "<head>";
htmlContent += "<script>";
htmlContent += "function myFunction() {document.getElementById('demo').innerHTML = 'Paragraph changed.'}";
htmlContent += "</script>";
htmlContent += "<body>";
htmlContent += "<h2>JavaScript in Head</h2>";
htmlContent += "<p id='demo'>A paragraph.</p>";
htmlContent += "<button type='button' onclick='myFunction()'>Try it!</button>";
htmlContent += "</body>";
htmlContent += "</html>";
var html = Encoding.UTF8.GetBytes(htmlContent);
using (var bodyStream = new MemoryStream(html))
{
var header =
$"HTTP/1.1 200 OK\r\nContent-Length: {bodyStream.Length}\r\nConnection: close\r\n\r\n";
var headerArray = Encoding.UTF8.GetBytes(header);
await response.WriteAsync(headerArray, 0, headerArray.Length);
await bodyStream.CopyToAsync(response);
await response.FlushAsync();
}
}
}
};
}
public static string GetQuery(StringBuilder request)
{
var requestLines = request.ToString().Split(' ');
var url = requestLines.Length > 1
? requestLines[1]
: string.Empty;
var uri = new Uri("http://localhost" + url);
var query = uri.Query;
return query;
}
}
Your question is a bit vague, so I have to guess what you're trying to do. Do you mean that a browser (or another app with a Web view) will connect to your Pi server, grab some data off it, and then manipulate the data to format them / display them in a particular way on the page? If so, then first you need to decide how you get the data. You seem to imply the data will just be a stream of HTML, though it's not clear how you'll be passing that string to the browser. Traditional ways of grabbing the data might be with Ajax and possibly JSON, but it's also possible to use an old-fashioned iframe (maybe a hidden one) -- though if starting from scratch, Ajax would be better.
The basic issue is to know: what page will access the data on the server and in what format? Is it a local page served locally from the client app's filestore, that will then launch a connection to the server, grab the data and display them in a <div> or and <iframe>, or is it a page on your server that comes with the data incorporated in one part of the DOM, and you want to transform them and display them in another element?
Let's now assume your client app has received the data in an element like <div id="myData">data</div>. A script on the client page can grab those data as a string with document.getElementById('myData').innerHTML(see getElementById). You can then transform the data as necessary with JavaScript methods. Then there are various DOM techniques for inserting the transformed data either back in the same element or a different one.
Instead, let's assume you have received the data via XMLHttpRequest. Then you'll need to identify just the data you want from the received object (that might involve turning the object into a string and using a regular expression, or more likely, use DOM selection methods on the object till you have the part of the data you want). When you've extracted the data / node / element, you can insert it into a <div> on your page as above.
Sorry if this is all a bit vague and abstract, but hopefully it can point you in the right direction to look up further things as needed. https://www.w3schools.com/ is a great resource for beginners.
I'm trying to parse a vacancies from https://www.epam.com/careers/job-listings?query=java&department=all&city=Kyiv&country=Ukraine
But I dont get anything execept plain text like "Job Listings Global/English Deutschland/Deutsch Россия/Русский"
The problem is when you load a page - browser runs a script that load some vacancies, but how can I undesrstand JSOUP cant "simulate" browser and run a script. I tried HtmlUnit, but it also done nothing.
Question: What should i do? Am I doing something wrong with HtmlUnit?
Jsoup
Element page = = Jsoup.connect("https://www.epam.com/careers/job-listings?sort=best_match&query=java&department=all&city=all&country=Poland").get();
HtmlUnit
try (final WebClient webClient = new WebClient(BrowserVersion.FIREFOX_52)) {
page = webClient.getPage("https://www.epam.com/careers/job-listings?query=java&department=all&city=Kyiv&country=Ukraine");
}
I think i need manualy run some script with
result = page.executeJavaScript("function aa()");
But which one?
You just need to wait a little as hinted here.
You can use:
try (final WebClient webClient = new WebClient(BrowserVersion.CHROME)) {
String url = "https://www.epam.com/careers/job-listings?query=java&department=all&city=Kyiv&country=Ukraine";
HtmlPage page = webClient.getPage(url);
Thread.sleep(3_000);
System.out.println(page.asXml());
}
I'm trying to program my first website scraper, and my first step is to save the HTML to a string. However, from what I can tell, the data that I need to get is not in the HTML code per se, but rather is added after JavaScript executes some stuff.
My current code is this:
let myURLString = "Example URL"
let myURL = URL(string: myURLString)
var myHTMLString = ""
do {
myHTMLString = try String(contentsOf: myURL!)
} catch let error {
print("Error: \(error)")
}
But this doesn't seem to execute the javascript and instead just gives me the 'unprocessed' HTMl.
I read this answer here, but it's written in Swift 2.0 and since I, to be honest, didn't really understand what was going on ( I don't have much programming experience ): I couldn't get to work in Swift 3.
So, Is there a way to take the HTML from a website, run the JavaScript and then save that as a String in Swift 3? And if so, how do you do it?
Thanks!
After some digging I got something that worked:
import Cocoa
import WebKit
class ViewController: NSViewController, WebFrameLoadDelegate {
#IBOutlet var myWebView: WebView!
override func viewDidLoad() {
super.viewDidLoad()
// Do any additional setup after loading the view.
self.myWebView.frameLoadDelegate = self
let urlString = "YOUR HTTPS URL"
self.myWebView.mainFrame.load(NSURLRequest(url: NSURL(string: urlString)! as URL) as URLRequest!)
}
override var representedObject: Any? {
didSet {
// Update the view, if already loaded.
}
}
func webView(_ sender: WebView!, didFinishLoadFor frame: WebFrame!) {
let doc = myWebView.stringByEvaluatingJavaScript(from: "document.documentElement.outerHTML")! //get it as html
//doc now has the 'processed HTML'
}
}
I'm trying to build a universal rss application for Windows 10 that could be able to download the content of the full article's page for offline consultation.
So after spending a lot of time on stackoverflow I've found some code:
HttpClientHandler handler = new HttpClientHandler { UseDefaultCredentials = true, AllowAutoRedirect = true };
HttpClient client = new HttpClient(handler);
HttpResponseMessage response = await client.GetAsync(ni.Url);
response.EnsureSuccessStatusCode();
string html = await response.Content.ReadAsStringAsync();
However this solution doesn't work on some web page where the content is dynamically called.
So the alternative that remains seems to be that one: load the web page into the Webview control of WinRT and somehow copy and paste the rendered text.
BUT, the Webview doesn't implement any copy/paste method or similar so there is no way to do it easily.
And finally I found this post on stackoverflow (Copying the content from a WebView under WinRT) that seems to be dealing with the same exact problematic as mine with the following solution;
Use the InvokeScript method from the webview to copy and paste the content through a javascript function.
It says: "First, this javascript function must exist in the HTML loaded in the webview."
function select_body() {
var range = document.body.createTextRange();
range.select();
}
and then "use the following code:"
// call the select_body function to select the body of our document
MyWebView.InvokeScript("select_body", null);
// capture a DataPackage object
DataPackage p = await MyWebView.CaptureSelectedContentToDataPackageAsync();
// extract the RTF content from the DataPackage
string RTF = await p.GetView().GetRtfAsync();
// SetText of the RichEditBox to our RTF string
MyRichEditBox.Document.SetText(Windows.UI.Text.TextSetOptions.FormatRtf, RTF);
But what it doesn't say is how to inject the javascript function if it doesn't exist in the page I'm loading ?
If you have a WebView like this:
<WebView Source="http://kiewic.com" LoadCompleted="WebView_LoadCompleted"></WebView>
Use InvokeScriptAsync in combination with eval() to get the document content:
private async void WebView_LoadCompleted(object sender, NavigationEventArgs e)
{
WebView webView = sender as WebView;
string html = await webView.InvokeScriptAsync(
"eval",
new string[] { "document.documentElement.outerHTML;" });
// TODO: Do something with the html ...
System.Diagnostics.Debug.WriteLine(html);
}
So I’m trying to interact with a flash variables using jQuery. The original author of the flash based program has not got back to me yet and so I thought to ask here. I'm not that strong in AC3 so forgive me.
Within the original action script, I added a new import statement:
import flash.external.*;
There's a function that initializes the program called ini and added this towards the bottom:
//MODS===========
ExternalInterface.addCallback(‘gotoLastPage’,gotoLastPage)
//===============
For all intensive purposes, just know that there is an existing and working function called gotoLastPage. It is declared as private void and works by the default application. All seemed fine there, got no errors when I recompiled the swf file.
Now the swf object is initialized like this
var flashvars = {};
flashvars.pages = “reader_fl/pages.xml”;
flashvars.settings = “reader_fl/settings.xml”;
var params = {};
params.quality = “high”;
params.scale = “noscale”;
params.wmode = “transparent”; var attributes = {};
attributes.align = “middle”;
attributes.allowFullscreen = “true”;
swffit.showScrollV();
swfobject.embedSWF("reader_fl/PageFlip_v6.swf", "Reader_Window_player", "100%", "100%",
"10.0.0", false, flashvars, params, attributes);
As a note, I'm using swfobject. The reader comes up fine and is wrapping around a div called Reader_Window_player.
Now when I go to jQuery, I tried:
$("#Floating_CtrlStart").click(function(){
var Reader = $('#Reader_Window_player')[0];
Reader.gotoLastPage();
})
However, I still can’t seem to access the gotoLastPage. Console says that gotoLastPage is not defined.
Any help here?
Are you opening the html page from the file system and not served from a web server? If so, that would explain why it's not working.
Calls to ExternalInterface fail if the content (html and swf) is in the local-with-networking or local-with-filesystem sandbox (source: http://help.adobe.com/en_US/ActionScript/3.0_ProgrammingAS3/WS5b3ccc516d4fbf351e63e3d118a9b90204-7c9b.html).
I love JQuery but I usually do that the old fashion way:
var getSwf = function (swfName) {
var isIE = navigator.appName.indexOf("Microsoft") != -1;
return (isIE) ? window[swfName] : document[swfName];
}
getSwf("Reader_Window_player").gotoLastPage();
Also make sure you have the following in your JS:
attributes.id = "Reader_Window_player";
attributes.name = "Reader_Window_player";
and as #Cherniv stated in the comments:
params.allowScriptAccess="always"