How to create a cross domain HTTP request - javascript

I have a website, and I need a way to get html data from a different website via an http request, and I've looked around for ways to implement it and most say via an ajax call instead.
An ajax call is blocked by linked in so I want to try a plain cross domain http request and hope it's not blocked one way or another.

If you have a server running and are able to run code on it, you can make the HTTP call server side. Keep in mind though that most sites only allow so many calls per IP address so you can't serve a lot of users this way.
This is a simple httpListener that downloads an websites content when the QueryString contains ?site=http://linkedin.com:
// setup an listener
using(var listener = new HttpListener())
{
// on port 8080
listener.Prefixes.Add("http://+:8080/");
listener.Start();
while(true)
{
// wait for a connect
var ctx = listener.GetContext();
var req = ctx.Request;
var resp = ctx.Response;
// default page
var cnt = "<html><body>click me </body></html>";
foreach(var key in req.QueryString.Keys)
{
if (key!=null)
{
// if the url contains ?site=some url to an site
switch(key.ToString())
{
case "site":
// lets download
var wc = new WebClient();
// store html in cnt
cnt = wc.DownloadString(req.QueryString[key.ToString()]);
// when needed you can do caching or processing here
// of the results, depending on your needs
break;
default:
break;
}
}
}
// output whatever is in cnt to the calling browser
using(var sw = new StreamWriter(resp.OutputStream))
{
sw.Write(cnt);
}
}
}
To make above code work you might have to set permissions for the url, if you'r on your development box do:
netsh http add urlacl url=http://+:8080/ user=Everyone listen=yes
On production use sane values for the user.
Once that is set run the above code and point your browser to
http://localhost:8080/
(notice the / at the end)
You'll get a simple page with a link on it:
click me
Clicking that link will send a new request to the httplistener but this time with the query string site=http://linkedin.com. The server side code will fetch the http content that is at the url given, in this case from LinkedIn.com. The result is send back one-on-one to the browser but you can do post-processing/caching etc, depending on your requirements.
Legal notice/disclaimer
Most sites don't like being scraped this way and their Terms of Service might actually forbid it. Make sure you don't do illegal things that either harms site reliability or leads to legal actions against you.

Related

How to replace ajax with webrtc data channel

** JAVASCRIPT question **
I'm using regularly ajax via XMLHttpRequest. But in 1 case, I need 1 ajax call per seconds....
but long term wise and with growing number of simultaneous users, it could bloat easily...
I'm reading stuff about webRTC data channel and it seems interesting and promissing.
Here my working AJAX function as an example of how easy and there is a few lines of codes to communicate from the browser to the server and vice-versa
function xhrAJAX ( divID , param2 ) {
// random value for each call to avoid cache
var pcache = (Math.floor(Math.random() * 100000000) + 1);
// parameters
var params = "divID="+encodeURIComponent(divID)+"&param2="+encodeURIComponent(param2);
// setup XMLHttpRequest with pcache
var xhr = new XMLHttpRequest();
xhr.open("POST", "/file.php?pcache="+pcache, true);
// setup headers
xhr.setRequestHeader("Content-Type", "application/x-www-form-urlencoded");
// prepare onready scripts
xhr.onreadystatechange = function(e) { if (xhr.readyState == 4) { $("#"+divID).html(e.currentTarget.responseText) ; } }
// send the ajax call
xhr.send(params);
}
How can I "transpose" or "convert" this ajax workflow into a webRTC data channel ? in order to avoid to setup a setInterval 1000...
Note: I mean how to replace the javascript portion of the code. PHP here is only to illustrate, I don't want to do a webRTC via PHP...
Is there a simple few lines of code way to push/receive data like this ajax function ?
the answer I'm looking for is more like a simple function to push and receive
(once the connection with STUN, ICE, TURN is established and working...)
If I need to include a javascript library like jquery or the equivalent for webRTC, I'm welcoming good and simple solution.
*** The main goal is this kind of scenario :
I have a webapp : users in desktop and users within webview in Android and IOS
right now I have this workflow => ajax every 3 seconds to "tell" the main database that the user is still active and using the browser (or the app)
But I'd like to replace with this kind : when the user uses the browser => do a webrtc data chata in background between the browser and the server
While reading on the web I think that webRTC is a better solution than websocket.
** I did a bit of search and found peerjs....
https://github.com/jmcker/Peer-to-Peer-Cue-System/blob/main/send.html
I'll do some testing, but in the meantime, if someone can trow ideas, it could be fun.
Cheers

How to recreate Chrome Waterfall column (Network tab) that documents time required for different stages of network request for jQuery AJAX call?

This question is similar but not helpful.
To provide more feedback to users, we want to mimic the Waterfall column in the Network tab of Chrome, which deconstructs network requests into different stages and times them.
An example is included below.
In the particular, we want to indicate three stages:
Time uploading a file
Time processing a file on the server
Time download results
From the jQuery AJAX docs, it seems like beforeSend could be used to time file uploads. How about download time and time on server (TTFB in screenshot)?
Here's how we implement AJAX calls:
async function doRequest() {
// Set server URL.
let serverUrl = 'https://test.com/test';
// Set form data
let imageFile = imageFile
// Create request form.
let formData = new FormData();
formData.append('imageFile', imageFile);
// Set request settings.
let settings = {
url: serverUrl,
method: 'POST',
timeout: 0,
contentType: false,
processData: false,
data: formData,
xhr: function() {
let xhr = new XMLHttpRequest();
xhr.onreadystatechange = function() {
if (xhr.readyState == 2) {
if (xhr.status == 200) {
xhr.responseType = 'blob';
} else {
xhr.responseType = 'text';
}
}
};
return xhr;
},
};
// Make request.
try {
let result = await $.ajax(settings);
// Handle success
} catch (error) {
// Handle failure
}
}
Resource Loading and Timing
As usual, someone had the same idea and has provided a pre-coded solution. I discovered these resources in an attempt to help you with this very complicated task. You can use the code as written or place it into a bookmarklet.
I found a detailed article that describes how to use both the Navigation Timing API & the Resource Timing API The article I came across is titled (and found at):
Assessing Loading Performance in Real Life with Navigation and Resource Timing
The two prebuilt solutions provided by that article take completely different approaches to visualizing the data you seek.
To use them without any effort, create a book mark for each of the following URLs:
More Detailed Analysis <-- copy this link to your bookmarks collection
Performance Waterfall <-- copy this link to your bookmarks collection
As mentioned, these are bookmarklets. They contain JavaScript code that can be executed directly on the page you have loaded. To use them,
Load the page in Chrome that you want performance data
Open you bookmarks and click on one of the two bookmarklets provided here
The result will be the waterfall or other detailed data you are seeking.
Note: The script can be blocked by content-security-policy and may not
work on all sites.
Source Code
The waterfall chart like you originally asked about can be found at the following link. Note I am hosting this file for your answer. I can't guarantee it will be available forever. Please download and host the file. (Open License)
Waterfall by Andy Davies
The more detailed version is found here: (MIT License)
Performance-Bookmarklet by Michael Mrowetz.
File Upload
You'll see the Resource Timing API provides this data. If you prefer to use the XHR API the a simple way to measure file upload time is by using xhr.upload object which takes an event listener for progress. As pointed out, this isn't necessary given the previous tools.
xhr.upload.addEventListener("progress", function(evt){
// Initialize and finalize a timer here
if (evt.lengthComputable) {
console.log(evt.loaded + "/" + evt.total);
}
}, false);
Server Processing Time
In order to achieve the goal of measuring performance of the server and reporting it back to the client, the server must be involved in order to share its internal processing timing that you seek in your question. There is no way to determine that from the browser alone.
I recommend the use of the Server-Timing feature with details about its use in the PerformanceServerTiming API
It is fairly simple to use this API. As the example shows (using a NodeJS server), all your server has to do is respond with a specific HTTP header that contains the performance data you would like to display in the browser:
const headers = {
'Server-Timing': `
cache;desc="Cache Read";dur=23.2,
db;dur=53,
app;dur=47.2
`.replace(/\n/g, '')
};
Using the information on the client is as simple as this (from the MDN link page):
let entries = performance.getEntriesByType('resource');
console.log(entries[0].serverTiming);
// 0: PerformanceServerTiming {name: "cache", duration: 23.2, description: "Cache Read"}
// 1: PerformanceServerTiming {name: "db", duration: 53, description: ""}
// 2: PerformanceServerTiming {name: "app", duration: 47.2, description: ""}
For monitoring the upload state, I think you need XMLHttpRequestUpload and request.upload.addEventListener("progress", updateProgress) or request.onprogress and onloadend to check the loadend event. See https://developer.mozilla.org/en-US/docs/Web/API/XMLHttpRequest/upload.
I don't see there is a partucular HTTP state to determine the start of a response from a server. Check https://developer.mozilla.org/en-US/docs/Web/HTTP/Status. So from the HTTP API level (XMLHttpRequest) I dont think you can find a clue of that. But the browser should be able to know from TCP level. If checking devtools is not your preference, you may need to specify the timestamp in the response. Once the client gets the response, the client knows the start time of the response.
The client can easily get the time that it receives the response from the server.
So
Dur_uploading = Time_loadend - Time_requeststarts
Dur_serverprocessing = Time_responsespecified - Time_loadend
Dur_download = Time_responsereceived - Time_resonsespecified

Retry failed pages with new proxyUrl

I have developed an Actor+PuppeteerCrawler+Proxy based crawler and want to rescrape failed pages. To increase the chance for the rescrape, I want to switch to another proxyUrl. The idea is, to create a new crawler with a modified launchPupperteer function and a different proxyUrl, and re-enque the failed pages. Please check the sample code below.
But unfortunately, it doesn't work, although I reset the request queue by using drop and reopening. Is it possible to rescraped failed pages by using PuppeteerCrawler with a different proxyUrl and how?
Best regards,
Wolfgang
for(let retryCount = 0; retryCount <= MAX_RETRY_COUNT; retryCount++){
if(retryCount){
// Try to reset the request queue, so that failed request shell be rescraped
await requestQueue.drop();
requestQueue = await Apify.openRequestQueue(); // this is necessary to avoid exceptions
// Re-enqueue failed urls in array failedUrls >>> ignored although using drop() and reopening request queue!!!
for(let failedUrl of failedUrls){
await requestQueue.addRequest({url: failedUrl});
}
}
crawlerOptions.launchPuppeteerFunction = () => {
return Apify.launchPuppeteer({
// generates a new proxy url and adds it to a new launchPuppeteer function
proxyUrl: createProxyUrl()
});
};
let crawler = new Apify.PuppeteerCrawler(crawlerOptions);
await crawler.run();
}
I think your approach should work but on the other hand it should not be necessary. I'm not sure what createProxyUrl does.
You can supply a generic proxy URL with auto username which will use all your datacenter proxies at Apify. Or you can provide proxyUrls directly to PuppeteerCrawler.
Just don't forget that you have to switch browser to get a new IP from the proxy. More in this article - https://help.apify.com/en/articles/2190650-how-to-handle-blocked-requests-in-puppeteercrawler

Redirecting XMLHTTP request - Javascript

I have a web page that has a too much content and javascript. When the page loads it makes multiple requests using Ajax and XMLHttp to load data. Is there a way to hook up all these requests and direct them to a different server.
For example the webpage fetches data from www.apple.com/data and www.mango.com/data after it is loaded. Is is possible to insert a script somewhere in the webpage which automatically changes any request made to www.orange.com/data.
Waiting for answer. Thanks
You can add a global handler to the ajaxSend event, the event will be triggered right before the ajax request being sent out. So you can check the request uri, apply some filtering logic, and then redirect the request by abort the original and resend it.
Below is an example
$(document).ajaxSend(function(e, xhr, opt) {
if (opt.url.indexOf("www.apple.com") !== -1) {
// abort the request
xhr.abort();
// change the uri to www.orange.com
opt.url = opt.url.replace("www.apple.com", "www.orange.com");
$.ajax(opt);
}
});
Ok. So I followed Anthony C's answer and it did actually work. But the problem with his solution is that it only works with Ajax requests not XMLHttpRequests (I am not sure why, I am a beginner at this topic.) However digging on his idea of creating a hook I came across a similar post here How to get the URL of a xmlhttp request (AJAX). The code provided a way to fetch the requested URL for each request. So by a little tweak to the code I managed to come up with this:-
XMLHttpRequest.prototype.open = (function(open) {
return function(method,url,async) {
var uri=getLocation(url);// use get location function to convert requested url string into readable url
if(uri.hostname!="orange.com"){
url="https://orange.com" + url;
}
open.apply(this,arguments);
};
})(XMLHttpRequest.prototype.open);
var getLocation = function(href) {
var l = document.createElement("a");
l.href = href;
return l;
};
This code at top of the page allows me to change the host name of all XMLHttpRequests that are not directed towards orange.com. Though I am sure there are better ways to write this code as well but since I am not an expert over javascript this will suffice my need for the time.

How to make JS wait until protocol execution finished

I have a custom URL protocol handler cgit:[...]
It launches up a background process which configures some stuff on the local machine. The protocol works fine, i'm launching it from JavaScript (currently using document.location = 'cgit:[...]'), but i actually want JavaScript to wait until the associated program exits.
So basically the steps i want JavaScript to do:
JavaScript does something
JavaScript launches cgit:[...]
Javascript waits until cgit:[...] exits
JavaScript does something else
Code:
function launchCgit(params)
{
showProgressBar();
document.location="cgit:"+params;
document.addEventListener( /* CGit-Program exited event */, hideProgressBar );
}
or:
function launchCgit(params)
{
showProgressBar();
// setLocationAndWait("cgit:"+params);
hideProgressBar();
}
Any ideas if this is possible?
Since this isn't really an expected use of window.location I would doubt that there's an easy way. My recommendation would be to use an AJAX request and have the c++ program send a response when it's done. That way, whatever code needs to run after the c++ program can be run when the request completes.
As i didn't find a suitable way to solve my problem using ajax requests or anything similar, i finally solved my problem using a kind-of-ugly workarround including XmlHttpRequest
For launching the protocol i'm still using document.location=cgit:[...]
I'm using a server side system including "lock-files" - that's like generic dummy files, with generated names for each request.
Once the user requests to open the custom protocol, such a file is being generated on the server specifically for that one protocol-opening-request.
I created a folder called "$locks" on the server where these files are being placed in. Once the protocol-associated program exits, the appropriate file is being deleted.
The website continuously checks if the file for a request still exists using XmlHttpRequest and fires a callback if it doesn't (example timout between tests: 1 sec).
The structure of the new files is the following:
lockThisRequest.php: It creates a file in the $locks directory based on the req url-parameter.
unlockThisRequest.php: It deletes a file in the $locks directory; again based on the req url-parameter.
The JavaScript part of it goes:
function launchCgit(params,callback)
{
var lock = /* Generate valid filename from params variable */;
// "Lock" that Request (means: telling the server that a request with this ID is now in use)
var locker = new XmlHttpRequest();
locker.open('GET', 'lockThisRequest.php?req='+lock, true)
locker.send(null);
function retry()
{
// Test if the lock-file still exists on the server
var req = new XmlHttpRequest();
req.open('GET', '$locks/'+lock, true);
req.onReadyStateChanged=function()
{
if (req.readyState == 4)
{
if (req.status == 200)
{
// lock-file exists -> cgit has not exited yet
window.setTimeout(retry,1000);
}
else if (req.status == 404)
{
// lock-file not found -> request has been proceeded
callback();
}
}
}
req.send(null);
}
document.location = 'cgit:'+params; // execute custom protocol
retry(); // initialize lockfileCheck-loop
}
Ussage is:
launchCgit("doThisAndThat",function()
{
alert("ThisAndThat finished.");
});
the lockThisRequest.php-file:
<?php
file_put_contents("$locks/".$_GET["req"],""); // Create lock file
?>
and unlockThisRequest.php:
<?php
unlink("../\$locks/".$_GET["req"]); // Delete lock file
?>
The local program / script executed by the protocol can simply call something like:
#!/bin/bash
curl "http://servername/unlockThisRequest.php?req=$1"
after it finished.
As i just said this works, but it's anything else than nice (congratulations if you kept track of those instructions)
I'd rather prefered a more simple way and (important) this also may cause security issues with the lockThisRequest.php and unlockThisRequest.php files!
I'm fine with this solution, because i'm only using it on a password protected private page. But if you plan to use it on a public or non protected page, you may want to add some security to the php files.
Anyways, the solution works for me now, but if anyone finds a better way to do it - for example by using ajax requests - he/she would be very welcome to add that way to the respective stackoverflow-documentation or the like and post a link to it on this thread. I'd still be interested in alternative solutions :)

Categories

Resources