I have run into a problem during I tried scrape data from website niche.com
I know that website is use JavaScript to render pages.
I have used Java and Selenium (tried FireFox and Chrome, both give the same results).
I tried download page https://www.niche.com/k12/search/best-schools/c/santa-clara-county-ca/
and save rendered page into text file on my laptop.
The code:
private WebDriver driver = new ChromeDriver();
public final String load(String url) {
String html = "";
try {
driver.navigate().to(url);
html = gotHtml();
} catch (Exception ex) {
logger.warn(ex);
}
return html;
}
public String gotHtml() {
try {
return ((JavascriptExecutor) driver).executeScript("return document.documentElement.outerHTML;").toString();
} catch (Exception ex) {
logger.error(ex);
}
return driver.getPageSource();
}
On all other websites the code "return document.documentElement.outerHTML;" returned the rendered page, but for this one that code return not rendered page (page with javascript code, but without rendered html code).
This div must contains the rendered HTML code, but I got it empty:
<div class="platform__wrapper" id="app"><!-- react-empty: 1 --></div>
Any ideas?
Related
I am getting a HTML block from server and want to push it into a html page in android application.
A sample html coming from server (this is the exact output with crlf etc. printed in logcat):
<ul>
<li>217</li>
<li>214</li>
</ul>
This is how I pass the output to the application html page:
runJavaScript("setHTML('"+ html +"')");
public void runJavaScript(final String code){
webview.post(new Runnable() {
#Override
public void run() {
if (android.os.Build.VERSION.SDK_INT >= android.os.Build.VERSION_CODES.KITKAT) {
webview.evaluateJavascript(code, null);
}else {
webview.loadUrl("javascript:" + code);
}
}
});
}
and this is the Javascript setHTML function inside the html page which is being called from android:
function setHTML(html){
$("#result").html(html);
}
this is the error:
"Uncaught SyntaxError: Invalid or unexpected token", source:
file:///android_asset/html/en/result.html (1)
I have tried debugging the project by replacing the HTML block with a simple word and it works. I think HTML characters like quotation or crlf may produce the error but I don't want to escape them (I dont want to set pure text but real html). How should I change current codes? thanks.
I guess, it's too late and doesn't answer your question exactly, as you don't want to escape any characters (I'm not sure why though), but maybe it will help somebody.
In my case, before running
runJavaScript("setHTML('"+ html +"')");
I had to escape all single quotation marks and it was enough.
html = html.replace("'", "\\'");
I'm working on a code piece which must load html file to my custom webview. I'm moving my file out of assets to a local folder. And from there I need to read the html to a webview. I need to move the file from assets folder so please don't suggest anything about assets.
This is how I move files:
File f = new File(context.getFilesDir()+"/TMP/");
try {
String[] assets = {"editor.html", "normalize.css", "rich_editor.js", "style.css"};
for(String a : assets)
{
InputStream is = context.getAssets().open(a);
int size = is.available();
byte[] buffer = new byte[size];
is.read(buffer);
is.close();
FileOutputStream fos = new FileOutputStream(f + a);
fos.write(buffer);
fos.close();
}
} catch (FileNotFoundException e) { }
catch (IOException ex)
{
ex.printStackTrace();
}
This is how I'm trying to read it.
loadUrl("file://"+ new File(context.getFilesDir()+"/TMP/editor.html").getAbsolutePath());
I don't get an error or anything. But I do get a white screen and nothing else. I have thought that it might not be loading Javascript or CSS. But loading JS is enabled so it must work. Is any aditional editing required for html file due to moving it from one folder to another?
Update:
I know that the file is coppied successfully because I have checked it from inside the code and it's all good. I believe the problem is in the WebView method or linking.
I have also tried loadData and loadDataWithBaseURL(). Both work the same. Blank screen
I'm trying to build a universal rss application for Windows 10 that could be able to download the content of the full article's page for offline consultation.
So after spending a lot of time on stackoverflow I've found some code:
HttpClientHandler handler = new HttpClientHandler { UseDefaultCredentials = true, AllowAutoRedirect = true };
HttpClient client = new HttpClient(handler);
HttpResponseMessage response = await client.GetAsync(ni.Url);
response.EnsureSuccessStatusCode();
string html = await response.Content.ReadAsStringAsync();
However this solution doesn't work on some web page where the content is dynamically called.
So the alternative that remains seems to be that one: load the web page into the Webview control of WinRT and somehow copy and paste the rendered text.
BUT, the Webview doesn't implement any copy/paste method or similar so there is no way to do it easily.
And finally I found this post on stackoverflow (Copying the content from a WebView under WinRT) that seems to be dealing with the same exact problematic as mine with the following solution;
Use the InvokeScript method from the webview to copy and paste the content through a javascript function.
It says: "First, this javascript function must exist in the HTML loaded in the webview."
function select_body() {
var range = document.body.createTextRange();
range.select();
}
and then "use the following code:"
// call the select_body function to select the body of our document
MyWebView.InvokeScript("select_body", null);
// capture a DataPackage object
DataPackage p = await MyWebView.CaptureSelectedContentToDataPackageAsync();
// extract the RTF content from the DataPackage
string RTF = await p.GetView().GetRtfAsync();
// SetText of the RichEditBox to our RTF string
MyRichEditBox.Document.SetText(Windows.UI.Text.TextSetOptions.FormatRtf, RTF);
But what it doesn't say is how to inject the javascript function if it doesn't exist in the page I'm loading ?
If you have a WebView like this:
<WebView Source="http://kiewic.com" LoadCompleted="WebView_LoadCompleted"></WebView>
Use InvokeScriptAsync in combination with eval() to get the document content:
private async void WebView_LoadCompleted(object sender, NavigationEventArgs e)
{
WebView webView = sender as WebView;
string html = await webView.InvokeScriptAsync(
"eval",
new string[] { "document.documentElement.outerHTML;" });
// TODO: Do something with the html ...
System.Diagnostics.Debug.WriteLine(html);
}
I am trying to get the full Html Source of the WebPage Loaded into the WebView and my Code is Working Fine But It's giving me null for One url( //mobile.twitter.com)
While on Other pages of Twitter(like //mobile.twitter.com/account),it is working fine.
But Give me an Error for the that One URl;
My Code:
twitter_WebView.addJavascriptInterface(new LoadListener(), "HTMLOUT");
twitter_WebView.loadUrl("javascript:window.HTMLOUT.processHTML(document.documentElement.outerHTML); ");
class LoadListener{
#JavascriptInterface
public void processHTML(String html) throws IOException
{
pageHTML = html; // Giving Me Null here ///
})
}
I've got CKEditor embedded on a UserControl in our web application. All works fine with loading the default templates for CKEditor, on my local machine and our server.
I'm fetching templates from a database table, transforming the results to the appropriate JSON format and then writing that to a javascript file to add to CKEDITOR.template_files.
An example of the js content I'm generating in the js file:
CKEDITOR.addTemplates('templateskey',{imagesPath:'',templates:[{title:'LCH - Complaint', image:'', description:'Message Template - Complaints', html:'HtmlContent'}]});
Now my problem is that on our server my dynamically created js file seems to get blocked since it's supposed to load over HTTPS. Either this or my file can't be found.
[blocked] The page at 'https://...' was loaded over HTTPS, but ran
insecure content from 'http://...' (page not found url): this content
should also be loaded over HTTPS.
After this CKEDITOR.config tries to load the "templatesKey" template and fails to do so with:
Uncaught TypeError: Cannot read property 'imagesPath' of undefined
I've downloaded the ASP.Net version of CKEditor and included the project in my solution. I'm setting myCKEditor.TemplatesFiles and myCKEditor.Templates in the code behind:
myCKEditor.TemplatesFiles = "['" + relativePath + "']";
myCKEditor.Templates = "templateskey";
Is the problem that I'm generating the js file dynamically? Or is the problem that the templates plugin is loading content over HTTP rather than HTTPS? Is there a better way to dynamically add templates to CKEditor?
So talking to my friend that has expertise in SSH and HTTPS. This might be a limitation on HTTPS since I'm generating content dynamically it sees the content as a possible threat and unsecure.
CkEditor - Template loaded from AJAX is a good solution to the problem.
If you're working with ASP .Net you can build a Handler. Call the handler with ajax and pass JSON back.
e.g. The handler:
//Implement IRequiresSessionState is you want anything from the session state
public class TemplateHandler : IHttpHandler, IRequiresSessionState
{
/// <summary>
/// You will need to configure this handler in the Web.config file of your
/// web and register it with IIS before being able to use it. For more information
/// see the following link: http://go.microsoft.com/?linkid=8101007
/// </summary>
#region IHttpHandler Members
public bool IsReusable
{
// Return false in case your Managed Handler cannot be reused for another request.
// Usually this would be false in case you have some state information preserved per request.
get { return true; }
}
public void ProcessRequest(HttpContext context)
{
try
{
//write your handler implementation here.
string ID = Convert.ToString(context.Session["ID"]);
DataSet dsTemplates = ExecuteStoredProc("uspTemplateRead");
if (!dsTemplates.ContainsData())
return;
List<Template> templates = new List<Template>();
foreach (DataRow row in dsTemplates.Tables[0].Rows)
{
Template template = new Template();
template.Title = row["Title"].ToString();
template.Image = "template.gif";
template.Description = row["Descr"].ToString();
template.Html = row["Temp"].ToString();
templates.Add(template);
}
byte[] b;
DataContractJsonSerializer jsonSerializer = new DataContractJsonSerializer(typeof(List<Template>));
using (MemoryStream stream = new MemoryStream())
{
jsonSerializer.WriteObject(stream, templates);
b = stream.ToArray();
}
context.Response.Clear();
context.Response.ContentType = "application/json";
context.Response.AddHeader("Content-Length", b.Length.ToString());
context.Response.BinaryWrite(b);
context.Response.Flush();
}
catch (Exception ex)
{
throw ex;
}
}
}
And then the Ajax call:
CKEDITOR.on("instanceReady", function () {
try {
var httpRequest = new XMLHttpRequest();
httpRequest.onreadystatechange = function () {
var json;
if (this.responseText == "")
json = "";
else
json = JSON.parse(this.responseText);
var template = {
imagesPath: CKEDITOR.getUrl(CKEDITOR.plugins.getPath("templates") + "templates/images/"),
templates: json
};
CKEDITOR.addTemplates('myTemplates', template);
};
httpRequest.open('GET', '/handlers/templatehandler.ashx');
httpRequest.send();
} catch (ex) {
console.log(ex);
}
});