I searched and tried a lot to develop an application which uses the content of a Website. I just saw the StackExchange app, which looks like I want to develop my application. The difference between web and application is here:
Browser:
App:
As you can see, there are some differences between the Browser and the App.
I hope somebody knows how to create an app like that, because after hours of searching I just found the solution of using a simple WebView (which is just a 1:1 like the browser) or to use Javascript in the app to remove some content (which is actually a bit buggy...).
To repeat: the point is, I want to get the content of a website (on start of the app) and to put it inside my application.
Cheers.
What you want to do is to scrape the websites in question by getting their html code and sorting it using some form of logic - I recomend xPath for this. then you can implement this data into some nice native interface.
You need however to be very aware that the data you get is not allways formated the way you want so all of your algorithems have to be very flexible.
the proccess can be cut into steps like this
retrive data from website (DefaultHttpClient and AsyncTask)
analyse and retrive relevant data (your relevant algorithm)
show data to user (Your interface implementation)
UPDATE
Bellow is some example code to fetch some data of a website it implements html-cleaner libary and you will need to implement this in your project.
class GetStationsClass extends AsyncTask<String, String, String> {
#Override
protected String doInBackground(String... params) {
HttpClient httpclient = new DefaultHttpClient();
httpclient.getParams().setParameter(CoreProtocolPNames.PROTOCOL_VERSION, HttpVersion.HTTP_1_1);
httpclient.getParams().setParameter(CoreProtocolPNames.HTTP_ELEMENT_CHARSET, "iso-8859-1");
HttpPost httppost = new HttpPost("http://ntlive.dk/rt/route?id=786");
httppost.setHeader("Accept-Charset", "iso-8859-1, unicode-1-1;q=0.8");
try {
// Add your data
List<NameValuePair> nameValuePairs = new ArrayList<NameValuePair>(3);
httppost.setEntity(new UrlEncodedFormEntity(nameValuePairs, "utf-8"));
// Execute HTTP Post Request
HttpResponse response = httpclient.execute(httppost);
int status = response.getStatusLine().getStatusCode();
String data = "";
if (status != HttpStatus.SC_OK) {
ByteArrayOutputStream ostream = new ByteArrayOutputStream();
response.getEntity().writeTo(ostream);
data = ostream.toString();
} else {
BufferedReader reader = new BufferedReader(new InputStreamReader(response.getEntity().getContent(),
"iso-8859-1"));
String line = null;
while ((line = reader.readLine()) != null) {
data += line;
}
XPath xpath = XPathFactory.newInstance().newXPath();
try {
Document document = readDocument(data);
NodeList nodes = (NodeList) xpath.evaluate("//*[#id=\"container\"]/ul/li", document,
XPathConstants.NODESET);
for (int i = 0; i < nodes.getLength(); i++) {
Node thisNode = nodes.item(i);
Log.v("",thisNode.getTextContent().trim);
}
} catch (XPathExpressionException e) {
e.printStackTrace();
}
}
} catch (Exception e) {
e.printStackTrace();
}
return null;
}
#Override
protected void onPostExecute(String result) {
super.onPostExecute(result);
//update user interface here
}
}
private Document readDocument(String content) {
Long timeStart = new Date().getTime();
TagNode tagNode = new HtmlCleaner().clean(content);
Document doc = null;
try {
doc = new DomSerializer(new CleanerProperties()).createDOM(tagNode);
return doc;
} catch (ParserConfigurationException e) {
e.printStackTrace();
} catch (Exception e) {
e.printStackTrace();
}
return doc;
}
to run the code above use
new getStationsClass.execute();
Related
I using WebViewClient's shouldInterceptRequest function to filter photos in my android webView.
The problem
I'm trying to block google's base64 URL photos (like:data:image/jpeg;base64,+) using this regex data:image/(jpg|png|jpeg);base64, but without success.
I done my research and found that:
these base64 background images are injected with JS.
but I don't know to continue.
my code:
#Override
public WebResourceResponse shouldInterceptRequest(WebView view, WebResourceRequest request) {
String url = request.getUrl().toString();
if (imgCheck(url,res))
{
return new WebResourceResponse(
BrowserUnit.MIME_TYPE_TEXT_PLAIN,
BrowserUnit.URL_ENCODING,
new ByteArrayInputStream("".getBytes())
);
}
return super.shouldInterceptRequest(view, request);
}
private static boolean isImage(String url, Resources res) {
BufferedReader imageUrlRegex = openRawFile(res, R.raw.image_regex);
try {
String line = imageUrlRegex.readLine();
while (line != null) {
Pattern pattern = Pattern.compile(line);
if (pattern.matcher(url).find())
return true;
line = imageUrlRegex.readLine();
}
} catch (IOException e) {
e.printStackTrace();
return true;
}
return false;
}
data:image/(jpg|png|jpeg|svg|ico|webp|tif|tiff|bmp|eps|apng|avif|jfif|pjpeg|pjp|cur);base64,
**note : I tried block like this return url.contains("data:image"); still nothing
I am building an android app. I have build a request using OKHTTP and I get the response as a string composed of html css and js content. This response is actualy a form that the user must use to allow the app to communicate with a given website.
Now I want the user to be able to see that response as an html page and clicks on a button to allow the communictaion. Only problem I don't know how to display that response as an html in webview or in the web browser.
From the MainActivity:
Authenticate myAouth = new Authenticate("myCostumerKey","mySecretKey");
try {
myResponse=myAouth.run("myUrlHere");
//System.out.println( myResponse);
} catch (Exception e) {
e.printStackTrace();
}
the Autheticate class
public class Authenticate {
private final OkHttpClient client;
String[] myResponse =new String[2];
public Authenticate( final String consumerKey, final String consumerSecret) {
client = new OkHttpClient.Builder()
.authenticator(new Authenticator() {
#Override public Request authenticate(Route route, Response response) throws IOException {
if (response.request().header("Authorization") != null) {
return null; // Give up, we've already attempted to authenticate.
}
System.out.println("Authenticating for response: " + response);
System.out.println("Challenges: " + response.challenges());
String credential = Credentials.basic(consumerKey, consumerSecret);
Request myRequest =response.request().newBuilder()
.header("Authorization", credential)
.build();
HttpUrl myURL = myRequest.url();
myResponse[0]= String.valueOf(myURL);
return myRequest;
}
})
.build();
}
#RequiresApi(api = Build.VERSION_CODES.KITKAT)
public String[] run(String url) throws Exception {
Request request = new Request.Builder()
.url(url)
.build();
try (Response response = client.newCall(request).execute()) {
if (!response.isSuccessful()) throw new IOException("Unexpected code " + response);
myResponse[1]=response.body().string();
System.out.println(" URL is "+myResponse[0]+" my response body is "+myResponse[1]);
}
return myResponse;
}}
Any help would be apriciated.
Kind Regards
You can use the following code to convert the String to HTML and then display it in a WebView
try {
String html = new String(response, "UTF-8");
String mime = "text/html";
String encoding = "utf-8";
myWebView.getSettings().setJavaScriptEnabled(true);
myWebView.loadDataWithBaseURL(null, html, mime, encoding, null);
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
}
}
currently developing a .net C# application which is showing a web browser. But since visual studio web browser is still using ie7 and does not support quite lots of things, I plan to put in the CefSharp which is the Chromium. So, have you guys every try get some json data from a localhost server using CefSharp? I have tried two ways to get it but failed.
For C# in Visual Studio, I fired the Chromium browser like this:
var test = new CefSharp.WinForms.ChromiumWebBrowser(AppDomain.CurrentDomain.BaseDirectory + "html\\index.html")
{
Dock = DockStyle.Fill,
};
this.Controls.Add(test);
Then for the index.html, it is require to get data from local host port 1000 after it loaded. I have tried two ways for the javascript:
First using XMLHttpRequest:
var xmlhttp = new XMLHttpRequest();
var url = "http://localhost:1000/api/data1";
var services;
xmlhttp.onreadystatechange = function () {
if (xmlhttp.readyState == 4 && xmlhttp.status == 200) {
services = jQuery.parseJSON(xmlhttp.responseText);
}
}
xmlhttp.open("GET", url, true);
xmlhttp.send();
Secondly using jquery's .get():
$.get("http://localhost:1000/api/data1", function (data) {
var services = data;
});
But both ways can't return the data. If I put the index.html into normal browser like Chrome or Firefox, I am able to get the data.
Is it something missing in my coding? Any ideas what's wrong guys?
I am using Chromium web browser and making GET request to localhost for JSON. Along with this i am running a webserver which keeps on listening and return JSON.
Webserver:
public class WebServer
{
public WebServer()
{
}
void Process(object o)
{
Thread thread = new Thread(() => new WebServer().Start());
thread.Start();
HttpListenerContext context = o as HttpListenerContext;
HttpListenerResponse response = context.Response;
try
{
string json;
string url = context.Request.Url.ToString();
if (url.Contains("http://localhost:8888/json"))
{
List<SampleObject> list = new List<SampleObject>();
json = JsonConvert.SerializeObject(new
{
results = list
});
byte[] decryptedbytes = new byte[0];
decryptedbytes = System.Text.Encoding.UTF8.GetBytes(json);
response.AddHeader("Content-type", "text/json");
response.ContentLength64 = decryptedbytes.Length;
System.IO.Stream output = response.OutputStream;
try
{
output.Write(decryptedbytes, 0, decryptedbytes.Length);
output.Close();
}
catch (Exception e)
{
response.StatusCode = 500;
response.StatusDescription = "Server Internal Error";
response.Close();
Console.WriteLine(e.Message);
}
}
}
catch (Exception ex)
{
response.StatusCode = 500;
response.StatusDescription = "Server Internal Error";
response.Close();
Console.WriteLine(ex);
}
}
static byte[] GetBytes(string str)
{
byte[] bytes = new byte[str.Length * sizeof(char)];
System.Buffer.BlockCopy(str.ToCharArray(), 0, bytes, 0, bytes.Length);
return bytes;
}
public void Start()
{
HttpListener server = new HttpListener();
server.Prefixes.Add("http://localhost:8888/json");
server.Start();
while (true)
{
ThreadPool.QueueUserWorkItem(Process, server.GetContext());
}
}
}
public class SampleObject
{
string param1 { get; set; }
string param2 { get; set; }
string param3 { get; set; }
}
To Start Webserver:
Thread thread = new Thread(() => new WebServer().Start());
thread.Start();
Index.html
<!DOCTYPE html>
<html>
<head>
<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.3/jquery.min.js"></script>
<script>
$(document).ready(function(){
$.get("http://localhost:8888/json", function (data) {
var jsonData= data;
});
});
</script>
</head>
<body>
<p>Example JSON Request.</p>
</body>
</html>
Before launching Index.html inside Chromium web browser, start running webserver to listen for requests. After document load event it makes a ajax call, then it hits the Webserver then Webserver returns JSON. You can test it using Chrome also. Start webserver and type the URL(http://localhost:8888/json) in address bar you will see returned JSON in Developers tools.
Note: Code is not tested. Hope it will work.
I am trying to extract data for a class project from a webpage (a page that shows search results). Specifically, it's this page:
http://www.target.com/c/xbox-one-games-video/-/N-55krw#navigation=true&category=55krw&searchTerm=&view_type=medium&sort_by=bestselling&faceted_value=&offset=60&pageCount=60&response_group=Items&isLeaf=true&parent_category_id=55kug&custom_price=false&min_price=from&max_price=to
I just want to extract the titles of the products.
I'm using the following code:
final WebClient webClient = new WebClient(BrowserVersion.CHROME);
webClient.getOptions().setThrowExceptionOnScriptError(false);
webClient.getOptions().setJavaScriptEnabled(true);
webClient.setAjaxController(new NicelyResynchronizingAjaxController());
final HtmlPage page = webClient.getPage(itemPageURL);
int tries = 20; // Amount of tries to avoid infinite loop
while (tries > 0) {
tries--;
synchronized(page) {
page.wait(2000); // How often to check
}
}
int numThreads = webClient.waitForBackgroundJavaScript(1000000l);
PrintWriter pw = new PrintWriter("test-target-search.txt");
pw.println(page.asXml());
pw.close();
The page that results does not have the product information that's shown on the web browser. I imagine the AJAX calls haven't completed? (not sure though.)
Any help would greatly be appreciated. Thanks!
You can use GET requests for such task. Control the page by the "pageCount" and "offset" argument in the URL, after retrieving the page (the example below does this for one page) you can use regex or whatever the content is in (JSON?) to extract the titles.
public static void main(String[] args)
{
try
{
WebClient webClient = new WebClient();
URL url = new URL(
"http://tws.target.com/searchservice/item/search_results/v1/by_keyword?callback=getPlpResponse&navigation=true&category=55krw&searchTerm=&view_type=medium&sort_by=bestselling&faceted_value=&offset=60&pageCount=60&response_group=Items&isLeaf=true&parent_category_id=55kug&custom_price=false&min_price=from&max_price=to");
WebRequest requestSettings = new WebRequest(url, HttpMethod.GET);
requestSettings.setAdditionalHeader("Accept", "*/*");
requestSettings.setAdditionalHeader("Content-Type", "application/x-www-form-urlencoded; charset=UTF-8");
requestSettings.setAdditionalHeader("Referer", "http://www.target.com/c/xbox-one-games-video/-/N-55krw");
requestSettings.setAdditionalHeader("Accept-Language", "en-US,en;q=0.8");
requestSettings.setAdditionalHeader("Accept-Encoding", "gzip,deflate,sdch");
requestSettings.setAdditionalHeader("Accept-Charset", "ISO-8859-1,utf-8;q=0.7,*;q=0.3");
Page page = webClient.getPage(requestSettings);
System.out.println(page.getWebResponse().getContentAsString());
}
catch (Exception e)
{
e.printStackTrace();
}
}
I made a Javascript page to generate a JSON object for read it then from Android device.
I read it with the following code
StringBuilder stringBuilder = new StringBuilder();
HttpClient client = new DefaultHttpClient();
HttpGet httpGet = new HttpGet(url);
try {
HttpResponse response = client.execute(httpGet);
StatusLine statusLine = response.getStatusLine();
int statusCode = statusLine.getStatusCode();
if (statusCode == 200){
HttpEntity entity = response.getEntity();
InputStream content = entity.getContent();
BufferedReader reader = new BufferedReader(new InputStreamReader(content));
String line;
while ((line = reader.readLine()) != null){
stringBuilder.append(line);
}
} else {
Log.e("JSON", "Failed to donwload file");
}
} catch (ClientProtocolException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
The problem is that this code returns the source code of the webpage, and the source code is the script in Javascript, not the JSON string generated after execute it.
I need the JSON string and I need use Javascript to generate the JSON string because I access to an external service.
I haven't find any solution for this. I don't care if the possible solution involves the server or the Android terminal.
Thanks.
String myresponse=Html.escapeHtml(YourStringHere);
Try this.
private class MyJavaScriptInterface {
private MyJavaScriptInterface () {
}
public void setHtml(String contentHtml) {
//here you get the content html
}
}
private WebViewClient webViewClient = new WebViewClient() {
#Override
public void onPageFinished(WebView view, String url) {
view.loadUrl("javascript:window.ResponseChecker.setHtml"
+ "(document.body.innerHTML);");
}
}