How to efficiently send tons of get request with php - javascript

I'm working on a project in which I have to develop a simple PHP based web module from where the user (admins) can send SMS messages (Followup) to students, as for the sake of advertisement and other needs.
The SMS API is very simple and I just need to send a GET request to a Cross Origin Domain along with the phone number and message.
I tested it with the file_get_contents("sms_api_url?credentials"); and it works fine.
What worries me is that the SMS will be sent to TONS of numbers and so I have to send the request multiple times using a loop, which will take a lot of time and I think will be too much resource consuming.
Also the max execution time for PHP is set to 30 seconds which I don't want to change.
I thought to use the Client side JavaScript for sending cross origin request in a loop so that it wont affect my server but that wouldn't be secure as it would reveal the API credentials.
What Technology should I use to accomplish my goals? and send tons of get request efficiently?

You've told us nothing about the the actual volume you need to handle, the metrics for the processing/connection time nor what constraints there are on the implementation.
As it stands this is way too broad to answer. But some approaches you might consider are:
1) Running concurrent requests - but note that just like domain sharding, this can undermine your bandwidth if over used
2) You can have PHP scripts running indefinitely outside the webserver (using the CLI SAPI) and these can be launched from a web session.
I thought to use the Client side JavaScript for sending cross origin request in a loop so that it wont affect my server but that wouldn't be secure as it would reveal the API credentials.
If you send directly to the endpoint, then yes, you'd need the credentials in the browser. But if you implement a proxy script which injects the credentials on your webserver then you can use your own credentials from the browser.
Using cron has certian advantages - but you really don't want to be spawning a task from crond to send one SMS message - it needs to run in batches, and you need to manage the concurrency.
You might want to consider switching to a different aggregator whom can offer bulk processing.
Regardless of the aproach you will need a way to store the messages/phone numbers and a locking mechanism around retrieval processing.
Personally, I'd be tempted to look at using an MTA for this or perhaps even Kannel - but that's more an approach for handling volumes in excess of 300,000 per day.

To send as many network requests as needed in less than 30 seconds are two requirements that kind of contradict themselves. Also, raw "efficiency" can just mean squeeze every single resource in the server, which not may be desirable.
Said that, I think the key points are:
I may be wrong but, as far as I know, there're only two ways to prevent a non-authorised party from consuming a web service: private credentials and IP filtering. None are possible in browser-based JavaScript.
Don't make a human being stare in front of the computer until a task of this kind completes. There's absolutely no need to and it can even cause the task to abort.
If you need to send the same text to different recipients, find out whether the SMS provider has an API that allows to do it in a single API request. Large batch deliveries get one or two orders of magnitude harder when this feature is not available.
In short you need:
A command line script
A task scheduler (e.g. cron)
Prefer server stability to maximum efficiency (you may even want to throttle your requests)

Send the requests from the server, but don't do it in the PHP script that generates the page.
Instead, store information about the desired messages in a database.
Write another program which, periodically, checks the database for unsent messages and makes the call to the API. You could run it using cron.

Related

Prevent recursive calls of XmlHttpRequest to server

I've been googling for hours for this issue, but did not find any solution.
I am currently working on this app, built on Meteor.
Now the scenario is, after the website is opened and all the assets have been loaded in browser, the browser constantly makes recursive xhr calls to server. These calls are made at the regular interval of 25 seconds.
This can be seen in the Network tab of browser console. See the Pending request of the last row in image.
I can't figure out from where it originates, and why it is invoked automatically even when the user is idle.
Now the question is, How can I disable these automatic requests? I want to invoke the requests manually, i.e. when the menu item is selected, etc.
Any help will be appriciated.
[UPDATE]
In response to the Jan Dvorak's comment:
When I type "e" in the search box, the the list of events which has name starting with letter "e" will be displayed.
The request goes with all valid parameters and the Payload like this:
["{\"msg\":\"sub\",\"id\":\"8ef5e419-c422-429a-907e-38b6e669a493\",\"name\":\"event_Coll_Search_by_PromoterName\",\"params\":[\"e\"]}"]
And this is the response, which is valid.
a["{\"msg\":\"data\",\"subs\":[\"8ef5e419-c422-429a-907e-38b6e669a493\"]}"]
The code for this action is posted here
But in the case of automatic recursive requests, the request goes without the payload and the response is just a letter "h", which is strange. Isn't it? How can I get rid of this.?
Meteor has a feature called
Live page updates.
Just write your templates. They automatically update when data in the database changes. No more boilerplate redraw code to write. Supports any templating language.
To support this feature, Meteor needs to do some server-client communication behind the scenes.
Traditionally, HTTP was created to fetch dead data. The client tells the server it needs something, and it gets something. There is no way for the server to tell the client it needs something. Later, it became needed to push some data to the client. Several alternatives came to existence:
polling:
The client makes periodic requests to the server. The server responds with new data or says "no data" immediately. It's easy to implement and doesn't use much resources. However, it's not exactly live. It can be used for a news ticker but it's not exactly good for a chat application.
If you increase the polling frequency, you improve the update rate, but the resource usage grows with the polling frequency, not with the data transfer rate. HTTP requests are not exactly cheap. One request per second from multiple clients at the same time could really hurt the server.
hanging requests:
The client makes a request to the server. If the server has data, it sends them. If the server doesn't have data, it doesn't respond until it does. The changes are picked up immediately, no data is transferred when it doesn't need to be. It does have a few drawbacks, though:
If a web proxy sees that the server is silent, it eventually cuts off the connection. This means that even if there is no data to send, the server needs to send a keep-alive response anyways to make the proxies (and the web browser) happy.
Hanging requests don't use up (much) bandwidth, but they do take up memory. Nowadays' servers can handle multiple concurrent TCP connections, so it's less of an issue than it was before. What does need to be considered is the amount of memory associated with the threads holding on to these requests - especially when the connections are tied to specific threads serving them.
Browsers have hard limits on the number of concurrent requests per domain and in total. Again, this is less of a concern now than it was before. Thus, it seems like a good idea to have one hanging request per session only.
Managing hanging requests feels kinda manual as you have to make a new request after each response. A TCP handshake takes some time as well, but we can live with a 300ms (at worst) refractory period.
Chunked response:
The client creates a hidden iFrame with a source corresponding to the data stream. The server responds with an HTTP response header immediately and leaves the connection open. To send a message, the server wraps it in a pair of <script></script> tags that the browser executes when it receives the closing tag. The upside is that there's no connection reopening but there is more overhead with each message. Moreover, this requires a callback in the global scope that the response calls.
Also, this cannot be used with cross-domain requests as cross-domain iFrame communication presents its own set of problems. The need to trust the server is also a challenge here.
Web Sockets:
These start as a normal HTTP connection but they don't actually follow the HTTP protocol later on. From the programming point of view, things are as simple as they can be. The API is a classic open/callback style on the client side and the server just pushes messages into an open socket. No need to reopen anything after each message.
There still needs to be an open connection, but it's not really an issue here with the browser limits out of the way. The browser knows the connection is going to be open for a while, so it doesn't need to apply the same limits as to normal requests.
These seem like the ideal solution, but there is one major issue: IE<10 doesn't know them. As long as IE8 is alive, web sockets cannot be relied upon. Also, the native Android browser and Opera mini are out as well (ref.).
Still, web sockets seem to be the way to go once IE8 (and IE9) finally dies.
What you see are hanging requests with the timeout of 25 seconds that are used to implement the live update feature. As I already said, the keep-alive message ("h") is used so that the browser doesn't think it's not going to get a response. "h" simply means "nothing happens".
Chrome supports web sockets, so Meteor could have used them with a fallback to long requests, but, frankly, hanging requests are not at all bad once you've got them implemented (sure, the browser connection limit still applies).

Is there any good trick for server to handle more requests if I don't have to sent any data back?

I want to handle a lot of (> 100k/sec) POST requests from javascript clients with some kind of service server. Not many of this data will be stored, but I have to process all of them so I cannot spend my whole server power for serving requests only. All the processing need to be done in the same server instance, otherwise I'll need to use database for synchronization between servers which will be slower by orders of magnitude.
However I don't need to send any data back to the clients, and they don't even expect them.
So far my plan was to create few proxy servers instances which will be able to buffer the request and send them to main server in bigger packs.
For example let's say that I need to handle 200k requests / sec and each server can handle 40k. I can split load between 5 of them. Then each one will be buffering requests and sending them back to main server in packs of 100. This will result in 2k requests / sec on the main server (however, each message will be 100 times bigger - which probably means around 100-200kB). I could even send them back to the server using UDP to decrease amount of needed resources (then I need only one socket on main server, right?).
I'm just thinking if there is no other way to speed up the things. Especially, when as I said I don't need to send anything back. I have full control over javascript clients also, but unlucky javascript is unable to send data using UDP which probably would be solution for me (I don't even care if 0.1% of data will be lost).
Any ideas?
Edit in response to answers given me so far.
The problem isn't with server being to slow at processing events from the queue or with putting events in the queue itself. In fact I plan to use disruptor pattern (http://code.google.com/p/disruptor/) which was proven to process up to 6 million requests per second.
The only problem which I potentially can have is need to have 100, 200 or 300k sockets open at the same time, which cannot be handled by any of the mainstream servers. I know some custom solutions are possible (http://www.metabrew.com/article/a-million-user-comet-application-with-mochiweb-part-3) but I'm wondering if there is no way to even better utilization of fact that I don't have to replay to clients.
(For example some way to embed part of the data in initial TCP packet and handle TCP packets as they would be UDP. Or some other kind of magic ;))
Make a unique and fast (probably in C) function that get's all requests, from a very fast server (like nginx). The only job of this function is to store the requests in a very fast queue (like redis if you got enought ram).
In another process (or server), depop the queue and do the real work, processing request one by one.
If you have control of the clients, as you say, then your proxy server doesn't even need to be an HTTP server, because you can assume that all of the requests are valid.
You could implement it as a non-HTTP server that simply sends back a 200, reads the client request until it disconnects, and then queues the requests for processing.
I think what you're describing is an implementation of a Message Queue. You also will need something to hand off these requests to whatever queue you use (RabbitMQ is quite good, there are many alternatives).
You'll also need something else running which can do whatever processing you actually want on the requests. You haven't made that very clear, so I'm not too sure exactly what would be right for you. Essentially the idea will be that incoming requests are dumped as quickly as simply as possible into the queue by your web server, and then the web server is free to go back to serving more requests. When the system has some resources, it uses them to process the queue, but when it's busy the queue just keeps growing.
Not sure what platform you're on, but might want to look at something like Lighttpd for serving the POSTs. You might (if same-domain restrictions don't shoot you down) get away with having Lighttpd running on a subdomain of your application (so post.myapp.com). Failing that you could put a proper load balancer in front of your webservers altogether (so all requests go to www.myapp.com and the load balancer decides whether to forward them to the web server or the queue processor).
Hope that helps
Consider using MongoDB for persisting your requests, it's fire and forget mechanism can help your servers to response faster.

How to deal with a big set of pending requests

I want to implement web site that will display to user a notification about some event happened on server. My plan is:
to make an asynchronous request to the server (ASP.NET) which will have a 600 seconds time-out
if event occurs on the server in the time interval of these 600 seconds server will response with an event details
if event is not occurred the server then server will send an 'no event' response at the end of 600 seconds
JS upon receiving a feedback from server will process the response and send the next request.
The problem of the approach is that for a big amount of visitors web site will have a lot of 'pending' requests.
Questions:
Should I consider that as a problem? What is solution for that? Probably I should implement another approach?
Please advice, any feedback is welcome.
I don't know specifics about asp.net's handling of pending requests, but what you are describing is basically long-polling. It's tricky for a number of reasons, including but not limited to:
each pending request consumes a thread, and you'll need to store state on each of those threads
if you have enough connections (not necessarily all that many; see above), you'll need them to span multiple machines, and you then need to come up with an architecture to distribute endpoints across those machines, and make sure each incoming request goes to the right machine. If you're only broadcasting the same data to all your users, this becomes much easier.
proxies or ISPs or what-have-you may shut down your long-poll request. You'll need an architecture resilient to that.
Here's a question about long-polling in asp.net: How to do long-polling AJAX requests in ASP.NET MVC? It's probably a good place to start.
Also you could consider a 3rd-party service like pusher to handle these connections for you, or (disclaimer: I work on App Engine) App Engine's Channel API.
Surely you could make more frequent requests to the server that do not consume server resources for 10 whole minutes?
e.g. send an AJAX request every 60 seconds or so, and return whether or not any event has occurred. The downside is that it could take up to a minute for a user to see notification about some event, so if you need it more or less immediately, that is a problem.
If it does have to be immediate, it seems like looking into "long polling" with something like node.js might be a solution, though non-trivial to implement.

Understanding mod_proxy and Apache 2 for writing a comet-server

I currently try to implement a simple HTTP-server for some kind of comet-technique (long polling XHR-requests). As JavaScript is very strict about crossdomain requests I have a few questions:
As I understood any apache worker is blocked while serving a request, so writing the "script" as a usual website would block the apache, when all workers having a request to serve. --> Does not work!
I came up with the idea writing a own simple HTTP server only for serving this long polling requests. This server should not be blocking, so each worker could handle many request at the same time. As my site also contains content / images etc and my server does not need to server content I started him on a different port then 80. The problem now is that I can't interact between my JavaScript delivered by my apache and my comet-server running on a different port, because of some crossdomain restrictions. --> Does not work!
Then I came up with the idea to use mod_proxy to map my server on a new subdomain. I really don't could figure out how mod_proxy works but I could imagine that I know have the same effect as on my first approach?
What would be the best way to create these kind of combination this kind of classic website and these long-polling XHR-requests? Do I need to implement content delivery on my server at my own?
I'm pretty sure using mod_proxy will block a worker while the request is being processed.
If you can use 2 IPs, there is a fairly easy solution.
Let's say IP A is 1.1.1.1 and IP B is 2.2.2.2, and let's say your domain is example.com.
This is how it will work:
-Configure Apache to listen on port 80, but ONLY on IP A.
-Start your other server on port 80, but only on IP B.
-Configure the XHR requests to be on a subdomain of your domain, but with the same port. So the cross-domain restrictions don't prevent them. So your site is example.com, and the XHR requests go to xhr.example.com, for example.
-Configure your DNS so that example.com resolves to IP A, and xhr.example.com resolves to IP B.
-You're done.
This solution will work if you have 2 servers and each one has its IP, and it will work as well if you have one server with 2 IPs.
If you can't use 2 IPs, I may have another solution, I'm checking if it's applicable to your case.
This is a difficult problem. Even if you get past the security issues you're running into, you'll end up having to hold a TCP connection open for every client currently looking at a web page. You won't be able to create a thread to handle each connection, and you won't be able to "select" on all the connections from a single thread. Having done this before, I can tell you it's not easy. You may want to look into libevent, which memcached uses to a similar end.
Up to a point you can probably get away with setting long timeouts and allowing Apache to have a huge number of workers, most of which will be idle most of the time. Careful choice and configuration of the Apache worker module will stretch this to thousands of concurrent users, I believe. At some point, however, it will not scale up any more.
I don't know what you're infrastructure looks like, but we have load balancing boxes in the network racks called F5s. These present a single external domain, but redirect the traffic to multiple internal servers based on their response times, cookies in the request headers, etc.. They can be configured to send requests for a certain path within the virtual domain to a specific server. Thus you could have example.com/xhr/foo requests mapped to a specific server to handle these comet requests. Unfortunately, this is not a software solution, but a rather expensive hardware solution.
Anyway, you may need some kind of load-balancing system (or maybe you have one already), and perhaps it can be configured to handle this situation better than Apache can.
I had a problem years ago where I wanted customers using a client-server system with a proprietary binary protocol to be able to access our servers on port 80 because they were continuously having problems with firewalls on the custom port that the system used. What I needed was a proxy that would live on port 80 and direct the traffic to either Apache or the app server depending on the first few bytes of what came across from the client. I looked for a solution and found nothing that fit. I considered writing an Apache module, a plugin for DeleGate, etc., but eventually rolled by own custom content-sensing proxy service. That, I think, is the worst-case scenario for what you're trying to do.
To answer the specific question about mod-proxy: yes, you can setup mod_proxy to serve content that is generated by a server (or service) that is not public facing (i.e. which is only available via an internal address or localhost).
I've done this in a production environment and it works very, very well. Apache forwarding some requests to Tomcat via AJP workers, and others to a GIS application server via mod proxy. As others have pointed out, cross-site security may stop you working on a sub-domain, but there is no reason why you can't proxy requests to mydomain.com/application
To talk about your specific problem - I think really you are getting bogged down in looking at the problem as "long lived requests" - i.e. assuming that when you make one of these requests that's it, the whole process needs to stop. It seems as though your are trying to solve an issue with application architecture via changes to system architecture. In-fact what you need to do is treat these background requests exactly as such; and multi-thread it:
Client makes the request to the remote service "perform task X with data A, B and C"
Your service receives the request: it passes it onto a scheduler which issues a unique ticket / token for the request. The service then returns this token to the client "thanks, your task is in a queue running under token Z"
The client then hangs onto this token, shows a "loading/please wait" box, and sets up a timer that fires say, for arguments, every second
When the timer fires, the client makes another request to the remote service "have you got the results for my task, it's token Z"
You background service can then check with your scheduler, and will likely return an empty document "no, not done yet" or the results
When the client gets the results back, it can simply clear the timer and display them.
So long as you're reasonably comfortable with threading (which you must be if you've indicated you're looking at writing your own HTTP server, this shouldn't be too complex - on top of the http listener part:
Scheduler object - singleton object, really that just wraps a "First in, First Out" stack. New tasks go onto the end of the stack, jobs can be pulled off from the beginning: just make sure that the code to issue a job is thread safe (less you get two works pulling the same job from the stack).
Worker threads can be quite simple - get access to the scheduler, ask for the next job: if there is one then do the work send the results, otherwise just sleep for a period, start over.
This way, you're never going to be blocking Apache for longer than needs be, as all you are doing is issues requests for "do x" or "give me results for x". You'll probably want to build some safety features in at a few points - such as handling tasks that fail, and making sure there is a time-out on the client side so it doesn't wait indefinitely.
For number 2: you can get around crossdomain restrictions by using JSONP.
Two Three alternatives:
Use nginx. This means you run 3 servers: nginx, Apache, and your own server.
Run your server on its own port.
Use Apache mod_proxy_http (as your own suggestion).
I've confirmed mod_proxy_http (Apache 2.2.16) works proxying a Comet application (powered by Atmosphere 0.7.1) running in GlassFish 3.1.1.
My test app with full source is here: https://github.com/ceefour/jsfajaxpush

Ajax Security

We have a heavy Ajax dependent application. What are the good ways of making it sure that the request to server side scripts are not coming through standalone programs and are through an actual user sitting on a browser
There aren't any really.
Any request sent through a browser can be faked up by standalone programs.
At the end of the day does it really matter? If you're worried then make sure requests are authenticated and authorised and your authentication process is good (remember Ajax sends browser cookies - so your "normal" authentication will work just fine). Just remember that, of course, standalone programs can authenticate too.
What are the good ways of making it sure that the request to server side scripts are not coming through standalone programs and are through an actual user sitting on a browser
There are no ways. A browser is indistinguishable from a standalone program; a browser can be automated.
You can't trust any input from the client side. If you are relying on client-side co-operation for any security purpose, you're doomed.
There isn't a way to automatically block "non browser user" requests hitting your server side scripts, but there are ways to identify which scripts have been triggered by your application and which haven't.
This is usually done using something called "crumbs". The basic idea is that the page making the AJAX request should generate (server side) a unique token (which is typically a hash of unix timestamp + salt + secret). This token and timestamp should be passed as parameters to the AJAX request. The AJAX handler script will first check this token (and the validity of the unix timestamp e.g. if it falls within 5 minutes of the token timestamp). If the token checks out, you can then proceed to fulfill this request. Usually, this token generation + checking can be coded up as an Apache module so that it is triggered automatically and is separate from the application logic.
Fraudulent scripts won't be able to generate valid tokens (unless they figure out your algorithm) and so you can safely ignore them.
Keep in mind that storing a token in the session is also another way, but that won't buy any more security than your site's authentication system.
I'm not sure what you are worried about. From where I sit I can see three things your question can be related to:
First, you may want to prevent unauthorized users from making a valid request. This is resolve by using the browser's cookie to store a session ID. The session ID needs to tied to the user, be regenerated every time the user goes through the login process and must have an inactivity timeout. Anybody request coming in without a valid session ID you simply reject.
Second, you may want to prevent a third party from doing a replay attacks against your site (i.e. sniffing an inocent user's traffic and then sending the same calls over). The easy solution is to go over https for this. The SSL layer will prevent somebody from replaying any part of the traffic. This comes at a cost on the server side so you want to make sure that you really cannot take that risk.
Third, you may want to prevent somebody from using your API (that's what AJAX calls are in the end) to implement his own client to your site. For this there is very little you can do. You can always look for the appropriate User-Agent but that's easy to fake and will be probably the first thing somebody trying to use your API will think of. You can always implement some statistics, for example looking at the average AJAX requests per minute on a per user basis and see if some user are way above your average. It's hard to implement and it's only usefull if you are trying to prevent automated clients reacting faster than human can.
Is Safari a webbrowser for you?
If it is, the same engine you got in many applications, just to say those using QT QWebKit libraries. So I would say, no way to recognize it.
User can forge any request one wants - faking the headers like UserAgent any they like...
One question: why would you want to do what you ask for? What's the diffrence for you if they request from browser or from anythning else?
Can't think of one reason you'd call "security" here.
If you still want to do this, for whatever reason, think about making your own application, with a browser embedded. It could somehow authenticate to the application in every request - then you'd only send a valid responses to your application's browser.
User would still be able to reverse engineer the application though.
Interesting question.
What about browsers embedded in applications? Would you mind those?
You can probably think of a way of "proving" that a request comes from a browser, but it will ultimately be heuristic. The line between browser and application is blurry (e.g. embedded browser) and you'd always run the risk of rejecting users from unexpected browsers (or unexpected versions thereof).
As been mentioned before there is no way of accomplishing this... But there is a thing to note, useful for preventing against CSRF attacks that target the specific AJAX functionality; like setting a custom header with help of the AJAX object, and verifying that header on the server side.
And if in the value of that header, you set a random (one time use) token you can prevent automated attacks.

Categories

Resources