Parse HTML contents of Webview - Android - javascript

I'd like to detect when my webview loads a certain page, for example, an incorrect login page. I've tried using onLoadResource and shouldOverrideUrlLoading, but I can't get either to work, and I'm thinking a better way would to parse the HTML whenever the webview starts loading a page, and if a certain string is found within the HTML, then do whatever.
Is there a method to do this? I've tried using TagSoup, but I have no clue how to relate it into my webview. Here's what my code looks like now:
String fullpost = "pass=" + passwordt + "&user=" + usernamet + "&uuid=" + UUID;
String url = "mydomain.com";
mWebview.postUrl(url, EncodingUtils.getBytes(fullpost, "BASE64"));
mWebview.setWebViewClient(new WebViewClient() {
public void onPageFinished(WebView mWebview, String url) {
String webUrl = mWebview.getUrl();
if (webUrl.contains("/loginf")) {
MainActivity.this.mWebview.stopLoading();
MainActivity.this.setContentView(R.layout.preweb);
}
}
});
Basically, the postUrl is initiated from a user click on a button in a layout, and that's what starts the WebView, and then I call setContentView to the layout that contains the webview.
From there, if the login info is correct, the webpage goes to XXX, and if it's incorrect, it goes to YYY. So, I want to detect immediately (and on every page load from there on out), if YYY is loaded, then //domagic. Hope that makes sense. Being the page redirect from url to XXX or YYY is automatic and not initiated by the user, shouldOverrideUrlLoading doesn't work, and I can't figure out how to use onLoadResource, so I'm just completely lost.
My current thought is loading everything in a separate thread and then using the WebView to display the content (that way I can parse the HTML), but I'm not sure how that'd work or even how to do it.
Anyone have any ideas or suggestions?

I think I've read a way to get a text string of a webview's content. Then, you could use jsoup to parse it. [neh, don't even need jsoup; just indexOf string check]
I'll suggest that you do consider handling the login with an HTTP client. It gives you flexibility, and seems the more proper way to go. I've been using the loopj library for HTTP get and post requests. It allows for simpler code. For me, anyway, a relative Android newbie. Here's some code from my project, to get you thinking. I've left out stuff like progress bar, and cookie management.
private void loginFool() {
String urlString = "http://www.example.com/login/";
// username & password
RequestParams params = new RequestParams();
params.put("username", username.getText().toString());
params.put("password", password.getText().toString());
// send the request
loopjClient.post(urlString, params, new TextHttpResponseHandler() {
#Override
public void onStart() {
// called before request is started
//System.err.println("Starting...");
}
#Override
public void onSuccess(int statusCode, Header[] headers, String responseString) {
// called when response HTTP status is "200 OK"
// see if 'success' was a failed login...
int idx = responseString.indexOf("Please try again!");
if(idx > -1) {
makeMyToast("Sorry, login failed!");
}
// or actual success-ful login
else {
// manage cookies here
// put extractData in separate thread
final String responseStr = responseString;
new Thread(new Runnable() {
public void run(){
extractData(responseStr);
selectData(defaultPrefs.getInt("xyz_display_section", 0));
// start the next activity
Intent intent = new Intent(MainActivity.this, PageViewActivity.class);
startActivity(intent);
finish();
}
}).start();
}
}
#Override
public void onFailure(int statusCode, Header[] headers, String responseString, Throwable throwable) {
// called when response HTTP status is "4XX" (eg. 401, 403, 404)
makeMyToast("Whoops, network error!");
}
#Override
public void onFinish() {
// done
}
});
}
You can see that, in the response handler's onSuccess callback, I can test for a string, to see if the login failed, and, in the onFailure callback, I give a network error message.
I'm not experienced enough to know what percent of web servers this type of post login works on.
The loopj client receives and manages cookies. If you will be accessing pages from the site via a webview you need to copy cookies from the loopj client, over to the webview. I cobbled code from a few online posts, to do that:
// get cookies from the generic http session, and copy them to the webview
CookieSyncManager.createInstance(getApplicationContext());
CookieManager.getInstance().removeAllCookie();
CookieManager cookieManager = CookieManager.getInstance();
List<Cookie> cookies = xyzCookieStore.getCookies();
for (Cookie eachCookie : cookies) {
String cookieString = eachCookie.getName() + "=" + eachCookie.getValue();
cookieManager.setCookie("http://www.example.com", cookieString);
//System.err.println(">>>>> " + "cookie: " + cookieString);
}
CookieSyncManager.getInstance().sync();
// holy crap, it worked; I am automatically logged in, in the webview
EDIT: And, I should have included the class variable definitions and initializations:
private AsyncHttpClient loopjClient = new AsyncHttpClient();
private PersistentCookieStore xyzCookieStore;
xyzCookieStore = new PersistentCookieStore(this);
loopjClient.setCookieStore(Utility.xyzCookieStore);

Related

Parsing web javascript content to string using android

I would like to read the content of a website into a string.
I started by using jsoup as follows:
private void getWebsite() {
new Thread(new Runnable() {
#Override
public void run() {
final StringBuilder builder = new StringBuilder();
try {
String query = "https://merhav.nli.org.il/primo-explore/search?tab=default_tab&search_scope=Local&vid=NLI&lang=iw_IL&query=any,contains,הארי פוטר";
Document doc = Jsoup.connect(query).get();
String title = doc.title();
Elements links = doc.select("div");
builder.append(title).append("\n");
for (Element link : links) {
builder.append("\n").append("Link : ").append(link.attr("href"))
.append("\n").append("Text : ").append(link.text());
}
} catch (IOException e) {
builder.append("Error : ").append(e.getMessage()).append("\n");
}
runOnUiThread(new Runnable() {
#Override
public void run() {
tv_result.setText(builder.toString());
}
});
}
}).start();
}
However, the problem is that in this site, when I web browser such as chrome it says in one of it lines:
window.appPerformance.timeStamps['index.html']= Date.now();</script><primo-explore><noscript>JavaScript must be enabled to use the system</noscript><style>.init-message {
So I read that jsoup doesn't have a good solution for this case.
Is there any good way to get the element of this page even though that it uses javascript?
EDIT:
After trying the suggestions below, I used webView to load the url and then parsed it using jsoap as follows:
wb_result.getSettings().setJavaScriptEnabled(true);
MyJavaScriptInterface jInterface = new MyJavaScriptInterface();
wb_result.addJavascriptInterface(jInterface, "HtmlViewer");
wb_result.setWebViewClient(new WebViewClient() {
#Override
public void onPageFinished(WebView view, String url) {
wb_result.loadUrl("javascript:window.HtmlViewer.showHTML ('<head>'+document.getElementsByTagName('html')[0].innerHTML+'</head>');");
}
});
It did the job and indeed showed me the element. However, still, unlike a browser, it shows some lines as a function and not as a result. For example:
ng-href="{{::$ctrl.getDeepLinkPath()}}"
Is there a way to parse and display the result like in the browser?
Thank you
I'd suggest looking at the network tab in chrome developer tools and then submitting the request to load up the URL ... you'll see a lot of requests going back/forth.
Two that seem to contain relevant content are:
https://merhav.nli.org.il/primo_library/libweb/webservices/rest/primo-explore/v1/pnxs?blendFacetsSeparately=false&getMore=0&inst=NNL&lang=iw_IL&limit=10&newspapersActive=false&newspapersSearch=false&offset=0&pcAvailability=true&q=any,contains,%D7%94%D7%90%D7%A8%D7%99+%D7%A4%D7%95%D7%98%D7%A8&qExclude=&qInclude=&refEntryActive=false&rtaLinks=true&scope=Local&skipDelivery=Y&sort=rank&tab=default_tab&vid=NLI
which requires a token to access token which comes from:
https://merhav.nli.org.il/primo_library/libweb/webservices/rest/v1/guestJwt/NNL?isGuest=true&lang=iw_IL&targetUrl=https%253A%252F%252Fmerhav.nli.org.il%252Fprimo-explore%252Fsearch%253Ftab%253Ddefault_tab%2526search_scope%253DLocal%2526vid%253DNLI%2526lang%253Diw_IL%2526query%253Dany%252Ccontains%252C%2525D7%252594%2525D7%252590%2525D7%2525A8%2525D7%252599%252520%2525D7%2525A4%2525D7%252595%2525D7%252598%2525D7%2525A8&viewId=NLI
.. which likely requires the JSessoinId which comes from:
https://merhav.nli.org.il/primo_library/libweb/webservices/rest/v1/configuration/NLI
.. so in order to replicate the chain of calls you could use JSoup to make these (and any other relevant) HTTP GET requests, pull out the relevant HTTP headers (typically: session, referer, accept and some other cookie values potentially)
Its not going to be straight forward, but you're essentially looking for a url on the page in one of the JSON responses from one of the network requests:
Once you know which request you want to recreate, you just have to work back up the list of requests and try to recreate them.
This one is not an easy one and would require a lot of time to recreate - my advice if you're going to attempt it, forget trying to parse HTML, try to rebuild/recreate the chain of 3 or so HTTP requests to the back end to get the relevant JSON and parse that. You can often pick apart the website but this ones a big job

Pass variable from plugin back to cordova

The Cordova plugin I'm using plays a VR video file, and is called via GoogleVRPlayer.playVideo(videoUrl, fallbackVideoUrl).
Somewhere in the .java files of the plugin, there's:
#Override
public void onLoadError(String errorMessage) {
// I want to know if this function is executed
Log.e(TAG, "Error loading video: " + errorMessage);
}
Basically, when the video fails to load, I want to set a variable in my javascript in cordova to "error", for example var video_status = "error", so I can use this information later on in my app.
I've found some answers that would possibly solve my problem, but I can't seem to integrate it the right way. I have very little experience with native plugins and Java.
Anyone who can help me with this?
Since the Cordova plugin GoogleVRPlayer launches a new Activity (VrVideoActivity), I would use a Singleton class as an interstitial data bridge between the two Activites to hold the error message.
This is because, on launching the video player activity, your app (the Cordova activity) is paused in the background and will only resume execution once the video player activity is closed.
By using the intermediate class, both the Cordova plugin and the video activity are able to share data.
So I would do something like this:
Create a new file called CordovaBridge.java in cordova-vr-player/src/android/java/neotrino/
package com.neotrino;
public class CordovaBridge {
private String errorMsg = null;
public String getErrorMsg() {return errorMsg;}
public void setErrorMsg(String errorMsg) {this.errorMsg = errorMsg;}
private static final CordovaBridge holder = new CordovaBridge();
public static CordovaBridge getInstance() {return holder;}
}
Modify GoogleVRPlayer.java as follows:
import com.neotrino.CordovaBridge;
#Override
public void onResume(boolean multitasking) {
cordova.getActivity().runOnUiThread(new Runnable() {
#Override
public void run() {
String errorMsg = CordovaBridge.getInstance().getErrorMsg();
if(errorMsg != null){
webView.loadUrl("javascript:window.video_status = '"+errorMsg+"'");
}
}
});
}
Modify VrVideoActivity.java as follows:
import com.neotrino.CordovaBridge;
#Override
public void onLoadError(String errorMessage) {
// I want to know if this function is executed
Log.e(TAG, "Error loading video: " + errorMessage);
CordovaBridge.getInstance().setErrorMsg(errorMessage);
}
I haven't tested the above code, but in terms of an approach it should give you an approximation on which to base your solution.

Stop Android WebView from trying to load/capture resources like CSS on loadData()

Background
This may seem to be a duplicate to many other questions. Trust me that it isn't.
I'm trying to load html data into a WebView, being able to capture user hyperlink requests. In the process I've found this answer which does exactly what I want to do, except it captures other requests to things like CSS files and images:
// you tell the webclient you want to catch when a url is about to load
#Override
public boolean shouldOverrideUrlLoading(WebView view, WebResourceRequest request){
return true;
}
// here you execute an action when the URL you want is about to load
#Override
public void onLoadResource(WebView view, String url){
if( url.equals("http://cnn.com") ){
// do whatever you want
}
}
I've shut off automatic image loading, network loads, and Javascript execution:
settings.setBlockNetworkLoads(true);
settings.setBlockNetworkImage(true);
settings.setJavaScriptEnabled(false);
But these do nothing as to preventing the capture of these requests.
Maybe there's a different procedure to capturing the link click, but it was either this or to stop the loading of external resources.
Question
How do I prevent WebView from capturing (or attempting to load) resource requests like CSS, JS, or images?
Otherwise if I can't prevent capturing or attempting to load, how can I differentiate between links clicked and web resources?
Thanks ahead!
You could override WebViewClient's shouldInterceptRequest and return some non-null response instead of the CSS, JS, images, etc. being fetched.
Example:
#Override
public WebResourceResponse shouldInterceptRequest(WebView view, String url) {
Log.d(TAG, "shouldInterceptRequest: " + url);
if (url.contains(".css")
|| url.contains(".js")
|| url.contains(".ico")) { // add other specific resources..
return new WebResourceResponse(
"text/css",
"UTF-8",
getActivity().getResources().openRawResource(R.raw.some_css));
} else {
return super.shouldInterceptRequest(view, url);
}
}
where R.raw.some_css is:
body {
font-family: sans-serif;
}
Note:
I'm not sure what pages you're loading, but this approach may ruin the look of the page.
I've found a way to ignore automated WebView resource requests.
By ignoring requests in the first second of WebView initialization, I am able to isolate user based clicks from the rest:
final Long time = System.currentTimeMillis()/1000;
//load up a WebView, define a WebViewClient for capturing link clicking
WebView webview = new WebView(this);
WebViewClient webviewClient = new WebViewClient() {
#Override
public boolean shouldOverrideUrlLoading(WebView view, WebResourceRequest request){
return true;
}
#Override
public void onLoadResource(WebView view, String url){
Long currentTime = System.currentTimeMillis()/1000;
if (currentTime - time > 1) {
//do stuff here
}
}
};
I have not tested this solution without blocking JavaScript execution and automatic image loading, but it should work regardless:
WebSettings settings = webview.getSettings();
settings.setBlockNetworkLoads(true);
settings.setBlockNetworkImage(true);
settings.setJavaScriptEnabled(false);
Short answer is, you can't.
A longer answer could be like this: you won't be able to do that because it is designed to be "capture all or capture nothing". Web requests are a general concept, not tied to a particular resource like images or css - in fact, it does not have any clue of what does are. That's why you won't find anything.
Do like this: in shouldOverrideUrlLoading, instead of returning true all the time, you only return true for the urls you want to handle yourself. For all other cases, like css and so forth, you return false, so the webview will take care of that for you.
For example:
#Override
public boolean shouldOverrideUrlLoading(WebView view, String url) {
// Ignore css and js
if (url.endsWith(".css") || url.endsWith(".js")) {
return false;
}
return true;
}

Could someone once and for all please explain Cannot call determinedVisibility() - never saw a connection for the pid

I'm currently working on graphing data via d3 into a webview. Naturally, things are breaking as soon as I try to reload the graph and feed it new data. This lovely line keeps popping up: W/cr_BindingManager: Cannot call determinedVisibility() - never saw a connection for the pid.
I've scoured SO for an explanation, but there doesn't seem to be anything conclusive. People are just suggesting to turn on DOM storage in webview settings (which obviously doesn't fix the issue). I'm suspecting there is a race condition between reloading the graph and feeding it new data. I've overridden onPageFinished() in my WebViewClient to call the listener to load the data into the chart, thinking it would resolve the race condition, but to no avail.
Can someone please explain to me what W/cr_BindingManager: Cannot call determinedVisibility() - never saw a connection for the pid means? Am I off in my assessment? How can I debug it?
Any tips are appreciated.
EDIT: I've solved the original issue, but I would still love to learn what that line means. Bounty up.
Consecutive calls to loadUrl cause a race condition. The problem is that loadUrl("file://..") doesn't complete immediately, and so when you call loadUrl("javascript:..") it will sometimes execute before the page has loaded.
This is how I setup my webview:
wv = (CustomWebView) this.findViewById(R.id.webView1);
WebSettings wv_settings = wv.getSettings();
//this is where you fixed your code I guess
//And also by setting a WebClient to catch javascript's console messages :
wv.setWebChromeClient(new WebChromeClient() {
public boolean onConsoleMessage(ConsoleMessage cm) {
Log.d(TAG, cm.message() + " -- From line "
+ cm.lineNumber() + " of "
+ cm.sourceId() );
return true;
}
});
wv_settings.setDomStorageEnabled(true);
wv.setWebViewClient(new WebViewClient() {
#Override
public void onPageFinished(WebView view, String url) {
super.onPageFinished(view, url);
setTitle(view.getTitle());
//do your stuff ...
}
#Override
public boolean shouldOverrideUrlLoading(WebView view, String url) {
if (url.startsWith("file"))
{
// Keep local assets in this WebView.
return false;
}
}
});
//wv.setWebViewClient(new HelpClient(this));//
wv.clearCache(true);
wv.clearHistory();
wv_settings.setJavaScriptEnabled(true);//XSS vulnerable
wv_settings.setJavaScriptCanOpenWindowsAutomatically(true);
wv.loadUrl("file:///android_asset/connect.php.html");
NOTE this line wv.setWebChromeClient(new WebChromeClient());
In API level 19 (Android 4.4 KitKat), the browser engine switched from Android webkit to chromium webkit, with almost all the original WebView API's wrapped to the counterparts of chromium webkit.
This is the method that gives the error (BindingManagerImpl.java), from Chromium source:
#Override
public void determinedVisibility(int pid) {
ManagedConnection managedConnection;
synchronized (mManagedConnections) {
managedConnection = mManagedConnections.get(pid);
}
if (managedConnection == null) {
Log.w(TAG, "Cannot call determinedVisibility() - never saw a connection for the pid: "
+ "%d", pid);
return;
}
It's a rendering warning from content.
You can dig around forever in that github source code, might be nice to see where the method determinedVisibility (in BindingManagerImpl.java) is called from...(suffix “Impl” for Implementation).
Hope this helps ;O)
This usually pops up when you are overriding the method shouldOverrideUrlLoading().
From my WebView usages on prior apps, this is due to what is being rendered on the WebView, what is being caught on the above method and in turn ignored.
I see this a lot when the websites that I load attempt to load scripts outside of the allowed domain.

Running Javascript in Android WebView - onPageFinished Loop

I am having a bit of trouble getting my application to correctly run some JS on a page using the onPageFinished method.
The code below is contained within a class I've created that extends AsyncTask to fetch and parse a JSON file held elsewhere.
I am able to fetch the JSON file correctly, parse the data and the url for the WebView is obtained and set. Everything works loads as it should until I attempt to run some JS with the onPageFinished method.
//onPostExecute method runs when the doInBackground method is completed
#SuppressLint("SetJavaScriptEnabled")
#Override
protected void onPostExecute(Boolean aBoolean) {
super.onPostExecute(aBoolean);
//Casting as WebView as findViewById doesnt explicity return a value type.
webView = (WebView) findViewById(R.id.webView);
//Obtaining the websettings of the webView
WebSettings webViewSettings = webView.getSettings();
//Setting Javascript enabled
webViewSettings.setJavaScriptEnabled(true);
webView.setWebViewClient(new webViewClient(){
#Override
public void onPageFinished(WebView view, String url) {
super.onPageFinished(view, url);
webView.loadUrl("document.getElementById('field_133').value = 'Test';");
Log.d("onPageFinished", "The Page has finished loading");
}
});
//Obtaining the first item in the cellRef List Array - From here we will access the Url data for the train operator.
parsedUrl = cellRef.get(0).getUrl();
//load the page we parsed from online file
webView.loadUrl(parsedUrl);
Log.d("loadUrl", "Now load the parsed Url");
}
All I am looking to do at the moment is test that the JS can correctly populate a textbox once the page has loaded with the value of "Test" - However, the WebView appears to be stuck in a loop of loading & refreshing (seeing repeated logcat prints of "The page has finished loading") when trying to run:
webView.loadUrl("document.getElementById('field_133').value = 'Test';");
Is this the correct way of trying to inject some JS into the WebView in Android? Apologies if there is something obvious missing, the majority of my experience lies in Swift.
Any help would be appreciated.
Thanks
Try "javascript:" before the code.
I use this, works perfectly:
loadUrl("javascript:(function() { document.getElementsByTagName('video')[0].play(); })()");

Categories

Resources