I am trying to understand the basics of the internals of HTTP servers and clients with regards to how they transmit data. I have read many articles about how HTTP works but I haven't found any that answers some of my questions.
I would like to go trough the process of loading a web page as I understand it and I would appreciate if you make me notice where I got it wrong.
When I visit a site my browser asks for an HTML file to a server, for that my browser creates a socket, binds it to my ip adress, and connects it to a listening socket of the server of the site I am visiting. In order to connect my browser's socket to the server I need a port number and a hostname, the port number is 80 because this is HTTP and the hostname is obtained via DNS resolution. Now that there is a connection between sockets my browser sends a GET request. That request is an ASCII file with the contents corresponding to an HTTP request. My browser writes the ASCII raw bytes to the socket and that is written to the server's socket.
The server writes back the HTML file I requested to the socket. The HTML the server sends is just an ASCII file that the server will write byte by byte to the socket.
My browser recieves the ASCII file and parses it. Lets assume here that it finds an image tag. The browser sends an HTTP request for that image file. Here comes something I don't understand. How does the server respond? As far as I can tell the server must send back an ASCII file formed by a set of headers followed by a CRLF and then the body of the message. In this case, assuming my browser asked for a .jpeg, does the server write the headers as ASCII plaintext to the socket and then writes the raw bytes of the image to the socket?
If the HTML file has several images do we open a socket per image (per request)?
Lets assume that my browser now finds a javascript tag. When the server answers to my request for that script does the server writes the ASCII bytes of the source of the script to the socket? What happens with js libraries? Does the server have to send all the source code for each one?
On writing data to the sockets: is write(2) the correct way to do all this writing between sockets?
On the transmission of large files: if I click a button on the site that lets me download a large PDF, how is this accomplished by the server? I assume that the server tries to transmit this in pieces. As far as I can tell there is an option for chunked encoding. Is this the way? If it is, is the file divided into chunks, and these are appended to the ASCII response and written byte by byte into the socket?
Finally, how is video transmitted? I know video encoding and transmission would require entire books for a detailed explanation but if you could say something about the generalities of video transmission (for example in youtube) I would appreciate it.
Anything that you could say about HTTP on the socket level would be appreciated. Thanks.
All my answers below relate to HTTP/1.1, not HTTP/2:
3.-My browser recieves the ASCII file and parses it. Lets assume here that it finds an image tag. The browser sends an HTTP request for that image file. Here comes something I don't understand. How does the server respond? As far as I can tell the server must send back an ASCII file formed by a set of headers followed by a CRLF and then the body of the message. In this case, assuming my browser asked for a .jpeg, does the server write the headers as ASCII plaintext to the socket and then writes the raw bytes of the image to the socket?
yes it does, usually. It's possible that it's encoded in a different format (gzip, brotli) or it might be chunked if a Content-Length was not set.
4.- If the HTML file has several images do we open a socket per image (per request)?
In HTTP/1 modern browsers will open up to 6 sockets per host but not more. If there's more than 6 requests going to the same host, it will wait until the other responses have been received.
5.- Lets asume that my browser now finds a javascript tag. When the server answers to my request for that script does the server writes the ASCII bytes of the source of the script to the socket? What happens with js libraries? Does the server have to send all the source code for each one?
Usually yes, you need 1 http request per javascript file. There's some server-side tools that combine javascript sources along with their dependencies in a single javascript 'file'. Note that javascript sources are typically UTF-8, not ASCII.
6.- On writing data to the sockets: is write(2) the correct way to do all this writing between sockets?
Dunno! Not a C guy
7.- On the transmition of large files: if I click a button on the site that lets me download a large PDF, how is this accomplished by the server? I assume that the server tries to transmit this in pieces. As far as I can tell there is an option for chunked encoding. Is this the way? If it is, is the file divided into chunks, and these are appended to the ASCII response and written byte by byte into the socket?
No, chunked is used for HTTP responses for which the content-length is not known ahead of time. The 'splitting up' you're talking about is done on a IP/TCP level, not at the HTTP protocol level. From a HTTP perspective it's just one continuous stream.
Finally, how is video transmited? I know video encoding and transmition would require entire books for a detailed explanaition but if you could say something about the generalities of video transmition (for example in youtube) I would appreciate it.
Too broad for me to answer.
It is highly recommended to read High-Performance Browser Networking.
About HTTP
HTTP is a message structuring protocol. It can be built on top of TCP/IP, or UDP, or any other communication protocol.
IP solves the problem of figuring out which computer in a network a message is meant to get to, and TCP solves the problem of ensuring the message gets received despite noise interfering. UDP does what TCP does, but without some important guarantees that make it better in some situations, such as video streaming.
HTTP only solves the problem of what the messages should look like so everyone can understand what you mean. An HTTP message consists of a header and a body. The body is the message you want to send; the header contains meta-information about the status of the message itself. HTTP lets you structure your applications in a meaningful, context-oriented way through a standard set of terms.
For example, you can communicate character encodings of your body with HTTP, how long your content is, whether you are okay with receiving it in a compressed format, and so on and so forth. So, no, HTTP is not limited to ASCII texts - you can send UTF-8 encoded characters with BOM markings, or not even specify an encoding at all. All HTTP does is let you ask for things in the way you want it, and inform recipients how you've packaged a message.
The actual thing responsible for handling how your messages are sent rather than structured are TCP/IP and UDP. HTTP has nothing to do with it. Both TCP/IP and UDP add overhead, but are well worth it so that communication can pass through unimpeded.
About Sockets
Computers listen on "sockets", which is just a fancy name to refer to a communication channel. It does not matter what a socket is - it is just a generic name used to refer to a communication channel, be it a wire or a wireless radio. All that matters is what a socket can do. Computers can send bytes down a socket (called flushing), and can read bytes sent through a socket. Sockets always carry a certain amount of memory reserved for incoming messages (like an inbox) called a buffer, and can even bundle many messages together and send them together in one shot to save time.
Sockets at the hardware level usually devolve to a network card, which lets you talk to wireless network, or to an Ethernet cable. Note that the computer may have many more sockets than cables - this is because a socket is a generic name for a single communication channel, and a single network/ Ethernet card can handle multiple communication channels. Being able to handle multiple channels at once is called multiplexing.
TCP/IP and UDP are just blueprints - it is the responsibility of the operating system to actually do as they lay out, and most OSs have some program designed to implement these standards. At the software level, how information is read and written becomes slightly more complicated than just passing bytes since a computer must also be able to interrupt its running programs when a hardware event happens, including while communicating from a socket - here is a reference for how the Linux kernel implements TCP/IP.
All operating systems expose a set of calls to start listening to (bind) a socket, read a socket and write to a socket. You can read from a socket in multiple ways, however. These range from the basic select() and [poll()] in most Linux distributions, which force the program to wait until all the data requested for has been received and then read it, to epoll() in Linux as well, which enables a program to ask to be notified when data has been received before having to read it.
Windows exports a completely different set of system calls, so you would be well advised to consult a reference manual for the same if planning to build applications for Windows.
About TCP/IP
TCP/IP is a combination of two protocols that has mostly become the norm for ensuring reliable communication.
IP is responsible for the term IP address. Every computer has a unique address associated with it, specified as either a 32-bit number (IPv4) or a 128-bit number (IPv6, or IP version 6). Note that these addresses do not exist outside of a network: a network is just a collection of computers, and a computer's address only makes sense within that collection. The network that the computer comes from is part of the IP address of a computer; the network itself is given a unique address; and a network may be composed of multiple networks. The IP protocol introduces the concept of a port, which is essentially synonymous with the concept of a socket.
I'm just tossing about the term 'network' willy-nilly as an abstract concept, but physically it boils down to a router. A router is a special computer responsible for figuring out who is being referenced to in a message using the IP address attached to the message, for assigning IP addresses to computers it is aware of (a network is quite literally the set of computers the router knows about), and for forwarding messages to other computers or routers. An internetwork (or just the Internet) is simply a bunch of routers, each with their own network, able to communicate to each other to form one giant network of connected networks. Effectively, a router implements the IP standard.
TCP and UDP are designed to solve another harrowing problem: how to ensure all of your messages get through. Sending any message down a shared communication channel like wireless or even wired channels organised like a bus topology is inherently messy - different messages can overlap, messages can be lost unexpectedly, messages can be corrupted and so on. TCP aims to solve these problems by guaranteeing all of a message goes through. On the other hand, UDP makes no such guarantees, and thus saves time by skipping a lot of steps TCP does.
TCP and UDP chunk the message into packets of a certain size, so that a message can be sent out as quickly as possible. TCP further adds some additional structure to the exchange called a three-way handshake:
It sends off a TCP-specific message called a SYN packet to the computer it wants to send a message to, and waits for a response.
If the target computer receives it, it responds with a SYN ACK packet. On receiving this, the source computer responds with an ACK packet. This lets both computers know each other is listening, and they can start sending packets.
On the other hand, if either the source or target computer don't hear anything after a while, they wait for a while and send again, and wait some more. Every time they have to wait, they wait for twice as long as they did last time, until a maximum wait period has been reached and they abort a connection. This is called exponential backoff, and is key to TCP.
A three-way handshake ensures everyone is ready and willing to listen. However, the fun doesn't stop there:
As part of the handshake, the source computer specifies it will fire off an initial certain number of packets, each of a certain size.
After the handshake, the source computer fires off the specified packets, and waits for an ACK for every packet sent. If it doesn't receive an ACK for any packet, it goes into exponential backoff before resending that packet
Meanwhile, the target computer has been told to await a certain number of packets, so it waits until all of them are in. Packets may arrive out of order, depending on how the intervening networks routers chose to optimise the path for each packet, so each packet is prepended with a certain message indicating their order, and the target computer sorts them together into one neat message.
Once the source receives an ACK, it uses the total time taken to see how much it can send next. The better the response time, the more packets TCP is willing to send.
UDP skips the three-way handshake. It only chunks and sends. It is not guaranteed all of your message will get there. It is not guaranteed it will be sent in order (as opposed to received in order). It is perfect for cases where high network reliability means most of your messages will probably arrive, but where it doesn't matter if all of it arrives (e.g . it is okay if some frames in a video don't arrive).
About Video
Video is fundamentally no different from any other content format. It is perfectly possible to use HTTP for videos. Whether it is advisable to use TCP is another matter, but isn't bad - Skype uses both UDP and TCP.
All video consists of a series of bytes. How those bytes are to be interpreted is the job of the encoding. Video can have many encodings: avi and mp4 come readily to mind. With HTTP, you can specify the content encoding as part of the message headers.
HTTP enables compression of content, including for video. HTTP also allows you to request that a connection be kept-alive i.e. that a three-way handshake need not be performed again after a full message has been sent. An extension to HTTP called websockets was developed that effectively use these two features to provide support for real-time video passing. These only optimise the video arrival so it doesn't look laggy, but it doesn't change how the video arrives.
Of course, sometimes you want more guarantees about video, and there are lots and lots of tricks to use to support high-fidelity video in low-speed Internet environments, or enable multiple people to subscribe to a live broadcast, etc. That's when you have to get creative. But otherwise video content is not fundamentally different from any other content type.
To Answer Your Questions
When I visit a site my browser asks for an HTML file to a server, for
that my browser creates a socket, binds it to my ip adress, and
connects it to a listening socket of the server of the site I am
visiting. In order to connect my browser's socket to the server I need
a port number and a hostname, the port number is 80 because this is
HTTP and the hostname is obtained via DNS resolution. Now that there
is a connection between sockets my browser sends a GET request. That
request is an ASCII file with the contents corresponding to an HTTP
request. My browser writes the ASCII raw bytes to the socket and that
is written to the server's socket.
HTTP does not require port 80. It is a convention that port 80 be the default port for HTTP-using servers and 443 for HTTPS, but any port can be used, so long as no other port is occupied.
You do not receive a hostname from DNS. Actually, it's the opposite - you supply a hostname, and retrieve an IP address from DNS. It is the IP address that is used to identify a location on another network.
It is not necessary for the response to be ASCII. Headers, yes, are to be interpreted as ASCII as they are part of an international standard that was developed before UTF-8 gained prominence, but no such restrictions are needed on the body. In fact, the content encoding is traditionally passed along as a header itself, which the browser or a client can use to decode the body content automatically.
The server writes back the HTML file I requested to the socket. The
HTML the server sends is just an ASCII file that the server will write
byte by byte to the socket.
Yes, except there is no need for it to be ASCII.
My browser recieves the ASCII file and parses it. Lets assume here
that it finds an image tag. The browser sends an HTTP request for that
image file. Here comes something I don't understand. How does the
server respond? As far as I can tell the server must send back an
ASCII file formed by a set of headers followed by a CRLF and then the
body of the message. In this case, assuming my browser asked for a
.jpeg, does the server write the headers as ASCII plaintext to the
socket and then writes the raw bytes of the image to the socket?
Yes.
If the HTML file has several images do we open a socket per image (per
request)?
See this answer. HTML is always downloaded first before the image requests are fired off, and images are always requested for in the order that they are encountered in the DOM. If you have 24 images on Chrome, 6 of them will be loaded in parallel at a time, meaning four parallel connections.
You can additionally answer this yourself by opening up your Network tab in the Chrome console, and inspecting whether requests for images are fired off in parallel.
Lets assume that my browser now finds a javascript tag. When the
server answers to my request for that script does the server writes
the ASCII bytes of the source of the script to the socket? What
happens with js libraries? Does the server have to send all the source
code for each one?
The HTML specification allows you to select what order you want your Javascript files to be downloaded.
Yes, the server writes bytes. The bytes do not need to be ASCII-encoded. The headers will be in ASCII. Yes, the server must send the source code for each library. This is why an important part of web optimisation is minimising your Javascript file sizes and bundling all the libraries into one file, in order to reduce the number and size of requests.
On writing data to the sockets: is write(2) the correct way to do all
this writing between sockets?
It is certainly the most basic way to write to an open file descriptor on Linux kernels. Everything in Linux is treated like a file, including sockets, so yes, sockets have file descriptors and can be written to this way.
There are more complex ways of accomplishing this, all of which are referenced in the manual page for write. Most languages have support for writing to sockets, however, by having glue code to manually call write() using a friendlier interface. Perhaps the only time you would need to explicitly call write() in C is if you were writing kernel-level programs or are on embedded hardware.
On the transmission of large files: if I click a button on the site
that lets me download a large PDF, how is this accomplished by the
server? I assume that the server tries to transmit this in pieces. As
far as I can tell there is an option for chunked encoding. Is this the
way? If it is, is the file divided into chunks, and these are appended
to the ASCII response and written byte by byte into the socket?
See the TCP/IP section I wrote above. The HTTP standard does let you get away with breaking up a message into higher-order chunks before letting TCP chunk it still further, so you can make do with small segments that arrive at a time.
Finally, how is video transmitted?
See the video section I wrote above.
HTTP, sockets, streaming and packages transmition are different topics.
HTTP is a communication protocol to request or send data. Sockets are not used regularly by web developers because they are not very network friendly, due to the persistent connection required. How your browser manages the HTTP requests usually should not be a real concern for you.
For big chunks of data like video, streaming is maybe the best technique, because you don't need synchronization between the client and server, or an always active connection like with sockets.
The way streaming is done only depends on you and on the language you have on the server to share your content.
If you want to learn more about HTTP, I recommend to you to read a little on RFC's like RFC 7230 or RFC 7231.
To understand how data is transmitted you should really know the basis of Abstraction Layers and for video streaming, you might learn how to make one video streaming server with NodeJs, (you might pick another language of your preference), or just search and install an NPM package that already does that job for you.
Related
I'm writing a TCP server application using NodeJS. However, each socket runs on a separate child-process (server.on("connection")). To send messages to specific clients, I used Emitter, and each socket generates its own listener (on clientID). So if there are 10000 connected devices, the application will create 10000 listeners. This looks terrible. What dangers will this pose? I can't find a solution to send a message from one client to another in the TCP protocol writing NodeJS code.
Update:
Have any idea to send message to specific client without add custom listeners?
However, each socket runs on a separate process.
Why would you do that? The core idea behind NodeJS is to run things in an event loop. Single threaded, yes, but asynchronous.
This looks terrible. What dangers will this pose?
It is terrible. The biggest issue is that you sacrifice a lot of resources. You not only spawn thousands of processes but you also spawn lots of emitters. So first of all this means lots of RAM eaten. Secondly this means degraded performance due to process context switch, which typically is slower than user space switch. Assuming your machine will even allow you to spawn so many processes.
I can't find a solution to send a message from one client to another in the TCP protocol writing NodeJS code.
I assume you have a TCP server, two connected clients and client A wants to send message to client B. Is that correct? TCP by itself won't do that for you. You need some protocol on top of it. For example:
Client connects to the server. At this point the client is not logged in and cannot do anything except for authentication.
Client authenticates. It sends (username, password) pair to the server. The server validates the pair. The server keeps a global mapping {"<username>": [sockets]} and adds newly authenticated client to that mapping.
Client A wants to send a message to client B. So it sends data of the form {"type": "direct", "destination": "clientB", "data": "hello B"}. The server parses the message and forwards it to the appropriate client (taken from the global mapping).
In case when you want to broadcast the message you send say {"type":"broadcast", "data": "hello all"} kind of message. The server then parses it, it loops through all connected clients (found in the global mapping) and forwards the message to each client.
Of course you also need some framing of packets. Since TCP is a stream, then it doesn't really understand messages and where one starts and the other ends. Dumping things to JSON is a half of the problem. Because then you have to send this JSON over the network and the other side has to know how many bytes it has to read. One way is to prefix each message with, say, 2 bytes that tell the other side how long the message is.
Btw you may want to consider using socket.io (or some other lib) that take care of some of those tedious details for you.
I have a websocket client and I want it to send a ping frame to my WS server.
According to the RFC 6455 a ping frame is represented by opcode %x9 but I don't know if it's even possible to send from a browser.
Any help will be appreciated
Building on-top of the comments in your original post, you can manually send pings via websocket.
The link by Daniel W. of the ws library has a ping() method which can be called from a valid websocket client. See an example here.
If you have written your own websocket library then you must conform to the standard framing outlined in RFC6455 The Websocket Protocol.
You should be able to pack a buffer with the correct header and control opcode and send it over the upgraded HTTP1.1 TCP connection.
Edit:
Further research shows, at the point of writing, the Ping method is not supported via the native Websocket Client and the send method seems to only support payloads and not command messages, which include opcodes for commands such as ping.
An option could be to go with a WASM based solution which could make non-XHR or non-fetch based TCP connections, however this comes with a whole different set of challenges.
I am building a small chat application for friends, but unsure about how to get information in a timely manner that is not as manual or as rudimentary as forcing a page refresh.
Currently, I am implementing this using simple AJAX, but this has the disadvantage of regularly hitting the server when a short timer elapses.
In researching long/short polling, I ran across HTML5 WebSockets. This seems easy to implement, but I'm not sure if there are some hidden disadvantages. For example, I think WebSockets is only supported by certain browsers. Are there other disadvantages to WebSockets that I should be aware of?
Since it seems like both technologies do the same thing, in what sorts of scenarios would one prefer to use one over the other? More specifically, has HTML5 WebSockets made AJAX long/short polling obsolete, or are there compelling reasons to prefer AJAX over WebSockets?
WebSockets is definitely the future now.
Long polling is a dirty workaround to prevent creating connections for each request like AJAX does - but long polling was created when WebSockets didn't exist. Now due to WebSockets,
long polling is going away no more.
WebRTC allows for peer-to-peer communication.
I recommend learning WebSockets.
Comparison:
of different communication techniques on the web
AJAX - request → response. Creates a connection to the server, sends request headers with optional data, gets a response from the server, and closes the connection.
Supported in all major browsers.
Long poll - request → wait → response. Creates a connection to the server like AJAX does, but maintains a keep-alive connection open for some time (not long though). During connection, the open client can receive data from the server. The client has to reconnect periodically after the connection is closed, due to timeouts or data eof. On server side it is still treated like an HTTP request, same as AJAX, except the answer on request will happen now or some time in the future, defined by the application logic.
support chart (full) | wikipedia
WebSockets - client ↔ server. Create a TCP connection to the server, and keep it open as long as needed. The server or client can easily close the connection. The client goes through an HTTP compatible handshake process. If it succeeds, then the server and client can exchange data in both directions at any time. It is efficient if the application requires frequent data exchange in both ways. WebSockets do have data framing that includes masking for each message sent from client to server, so data is simply encrypted.
support chart (very good) | wikipedia
WebRTC - peer ↔ peer. Transport to establish communication between clients and is transport-agnostic, so it can use UDP, TCP or even more abstract layers. This is generally used for high volume data transfer, such as video/audio streaming, where reliability is secondary and a few frames or reduction in quality progression can be sacrificed in favour of response time and, at least, some data transfer. Both sides (peers) can push data to each other independently. While it can be used totally independent from any centralised servers, it still requires some way of exchanging endPoints data, where in most cases developers still use centralised servers to "link" peers. This is required only to exchange essential data for establishing a connection, after which a centralised server is not required.
support chart (medium) | wikipedia
Server-Sent Events - client ← server. Client establishes persistent and long-term connection to server. Only the server can send data to a client. If the client wants to send data to the server, it would require the use of another technology/protocol to do so. This protocol is HTTP compatible and simple to implement in most server-side platforms. This is a preferable protocol to be used instead of Long Polling. support chart (good, except IE) | wikipedia
Advantages:
The main advantage of WebSockets server-side, is that it is not an HTTP request (after handshake), but a proper message based communication protocol. This enables you to achieve huge performance and architecture advantages. For example, in node.js, you can share the same memory for different socket connections, so they can each access shared variables. Therefore, you don't need to use a database as an exchange point in the middle (like with AJAX or Long Polling with a language like PHP).
You can store data in RAM, or even republish between sockets straight away.
Security considerations
People are often concerned about the security of WebSockets. The reality is that it makes little difference or even puts WebSockets as better option. First of all, with AJAX, there is a higher chance of MITM, as each request is a new TCP connection that is traversing through internet infrastructure. With WebSockets, once it's connected it is far more challenging to intercept in between, with additionally enforced frame masking when data is streamed from client to server as well as additional compression, which requires more effort to probe data. All modern protocols support both: HTTP and HTTPS (encrypted).
P.S.
Remember that WebSockets generally have a very different approach of logic for networking, more like real-time games had all this time, and not like http.
One contending technology you've omitted is Server-Sent Events / Event Source. What are Long-Polling, Websockets, Server-Sent Events (SSE) and Comet? has a good discussion of all of these. Keep in mind that some of these are easier than others to integrate with on the server side.
For chat applications or any other application that is in constant conversation with the server, WebSockets are the best option. However, you can only use WebSockets with a server that supports them, so that may limit your ability to use them if you cannot install the required libraries. In which case, you would need to use Long Polling to obtain similar functionality.
XHR polling A Request is answered when the event occurs (could be straight away, or after a delay). Subsequent requests will need to made to receive further events.
The browser makes an asynchronous request of the server,
which may wait for data to be available before responding. The
response can contain encoded data (typically XML or JSON) or
Javascript to be executed by the client. At the end of the processing
of the response, the browser creates and sends another XHR, to await
the next event. Thus the browser always keeps a request outstanding
with the server, to be answered as each event occurs. Wikipedia
Server Sent Events Client sends request to server. Server sends new data to webpage at any time.
Traditionally, a web page has to send a request to the server to
receive new data; that is, the page requests data from the server.
With server-sent events, it's possible for a server to send new data
to a web page at any time, by pushing messages to the web page. These
incoming messages can be treated as Events + data inside the web page. Mozilla
WebSockets After the initial handshake (via HTTP protocol). Communication is done bidirectionally using the WebSocket protocol.
The handshake starts with an HTTP request/response, allowing servers
to handle HTTP connections as well as WebSocket connections on the
same port. Once the connection is established, communication switches
to a bidirectional binary protocol which does not conform to the HTTP
protocol. Wikipedia
I would like to know what kind of limitations there are in using websockets.
Websockets is just so.. powerful. I can't imagine that it is without disadvantages.
Say, what is the number of users that can simultaneously connect to a server (if I'm creating a game and users will connect to the game through WebSockets, what will limit the number of users able to connect at any one time?)
Also is it true that with each additional connection, the quality of the connections (speed and stuff like that) will decrease?
The advantages and disadvantages will of course depend on the specific use case, but I'll try to point out some differences between WebSocket and HTTP.
WebSocket is more complex than HTTP. You can establish an HTTP connection with a telnet client, but you probably cannot do the same with WS. Even if you ignored the handshake requirements (which include the use of the SHA1 hash function), you would then be unable to properly mask and frame the data to be sent, and the server would close the connection.
As Uwe said, WebSocket connections are intended to be more persistent than HTTP connections. If you only want to receive an update every 30 minutes, you will want to go with HTTP. If you want to receive updates every second, a WebSocket might be a better option, because establishing an HTTP connection takes a lot of time.
To establish an HTTP connection, you first have to establish a TCP connection (SYN, SYN/ACK, ACK), then send a GET request with a pretty big header, then finally receive the server's response (along with another big header).
With an open WebSocket you simply receive the response (no request needed), and it comes with a much smaller header: from two bytes for small frames, up to 10 bytes for ridiculously large frames (in the gigabyte range).
You need to weigh the two costs (keeping a connection open vs establishing a new connection) to decide between the two protocols.
Note: this answer is based on the current draft of the protocol (draft-ietf-hybi-thewebsocketprotocol-09). WebSocket is evolving rapidly, many implementations are still based on older drafts, and some details may change before it is finalized.
From what I read, this seems to be related to HTTP Server Push which I read that it is usually not recommended to use since it creates lots of connections on the server.
If I have to choose, I probably would always develop a client polling mechanism.
I currently try to implement a simple HTTP-server for some kind of comet-technique (long polling XHR-requests). As JavaScript is very strict about crossdomain requests I have a few questions:
As I understood any apache worker is blocked while serving a request, so writing the "script" as a usual website would block the apache, when all workers having a request to serve. --> Does not work!
I came up with the idea writing a own simple HTTP server only for serving this long polling requests. This server should not be blocking, so each worker could handle many request at the same time. As my site also contains content / images etc and my server does not need to server content I started him on a different port then 80. The problem now is that I can't interact between my JavaScript delivered by my apache and my comet-server running on a different port, because of some crossdomain restrictions. --> Does not work!
Then I came up with the idea to use mod_proxy to map my server on a new subdomain. I really don't could figure out how mod_proxy works but I could imagine that I know have the same effect as on my first approach?
What would be the best way to create these kind of combination this kind of classic website and these long-polling XHR-requests? Do I need to implement content delivery on my server at my own?
I'm pretty sure using mod_proxy will block a worker while the request is being processed.
If you can use 2 IPs, there is a fairly easy solution.
Let's say IP A is 1.1.1.1 and IP B is 2.2.2.2, and let's say your domain is example.com.
This is how it will work:
-Configure Apache to listen on port 80, but ONLY on IP A.
-Start your other server on port 80, but only on IP B.
-Configure the XHR requests to be on a subdomain of your domain, but with the same port. So the cross-domain restrictions don't prevent them. So your site is example.com, and the XHR requests go to xhr.example.com, for example.
-Configure your DNS so that example.com resolves to IP A, and xhr.example.com resolves to IP B.
-You're done.
This solution will work if you have 2 servers and each one has its IP, and it will work as well if you have one server with 2 IPs.
If you can't use 2 IPs, I may have another solution, I'm checking if it's applicable to your case.
This is a difficult problem. Even if you get past the security issues you're running into, you'll end up having to hold a TCP connection open for every client currently looking at a web page. You won't be able to create a thread to handle each connection, and you won't be able to "select" on all the connections from a single thread. Having done this before, I can tell you it's not easy. You may want to look into libevent, which memcached uses to a similar end.
Up to a point you can probably get away with setting long timeouts and allowing Apache to have a huge number of workers, most of which will be idle most of the time. Careful choice and configuration of the Apache worker module will stretch this to thousands of concurrent users, I believe. At some point, however, it will not scale up any more.
I don't know what you're infrastructure looks like, but we have load balancing boxes in the network racks called F5s. These present a single external domain, but redirect the traffic to multiple internal servers based on their response times, cookies in the request headers, etc.. They can be configured to send requests for a certain path within the virtual domain to a specific server. Thus you could have example.com/xhr/foo requests mapped to a specific server to handle these comet requests. Unfortunately, this is not a software solution, but a rather expensive hardware solution.
Anyway, you may need some kind of load-balancing system (or maybe you have one already), and perhaps it can be configured to handle this situation better than Apache can.
I had a problem years ago where I wanted customers using a client-server system with a proprietary binary protocol to be able to access our servers on port 80 because they were continuously having problems with firewalls on the custom port that the system used. What I needed was a proxy that would live on port 80 and direct the traffic to either Apache or the app server depending on the first few bytes of what came across from the client. I looked for a solution and found nothing that fit. I considered writing an Apache module, a plugin for DeleGate, etc., but eventually rolled by own custom content-sensing proxy service. That, I think, is the worst-case scenario for what you're trying to do.
To answer the specific question about mod-proxy: yes, you can setup mod_proxy to serve content that is generated by a server (or service) that is not public facing (i.e. which is only available via an internal address or localhost).
I've done this in a production environment and it works very, very well. Apache forwarding some requests to Tomcat via AJP workers, and others to a GIS application server via mod proxy. As others have pointed out, cross-site security may stop you working on a sub-domain, but there is no reason why you can't proxy requests to mydomain.com/application
To talk about your specific problem - I think really you are getting bogged down in looking at the problem as "long lived requests" - i.e. assuming that when you make one of these requests that's it, the whole process needs to stop. It seems as though your are trying to solve an issue with application architecture via changes to system architecture. In-fact what you need to do is treat these background requests exactly as such; and multi-thread it:
Client makes the request to the remote service "perform task X with data A, B and C"
Your service receives the request: it passes it onto a scheduler which issues a unique ticket / token for the request. The service then returns this token to the client "thanks, your task is in a queue running under token Z"
The client then hangs onto this token, shows a "loading/please wait" box, and sets up a timer that fires say, for arguments, every second
When the timer fires, the client makes another request to the remote service "have you got the results for my task, it's token Z"
You background service can then check with your scheduler, and will likely return an empty document "no, not done yet" or the results
When the client gets the results back, it can simply clear the timer and display them.
So long as you're reasonably comfortable with threading (which you must be if you've indicated you're looking at writing your own HTTP server, this shouldn't be too complex - on top of the http listener part:
Scheduler object - singleton object, really that just wraps a "First in, First Out" stack. New tasks go onto the end of the stack, jobs can be pulled off from the beginning: just make sure that the code to issue a job is thread safe (less you get two works pulling the same job from the stack).
Worker threads can be quite simple - get access to the scheduler, ask for the next job: if there is one then do the work send the results, otherwise just sleep for a period, start over.
This way, you're never going to be blocking Apache for longer than needs be, as all you are doing is issues requests for "do x" or "give me results for x". You'll probably want to build some safety features in at a few points - such as handling tasks that fail, and making sure there is a time-out on the client side so it doesn't wait indefinitely.
For number 2: you can get around crossdomain restrictions by using JSONP.
Two Three alternatives:
Use nginx. This means you run 3 servers: nginx, Apache, and your own server.
Run your server on its own port.
Use Apache mod_proxy_http (as your own suggestion).
I've confirmed mod_proxy_http (Apache 2.2.16) works proxying a Comet application (powered by Atmosphere 0.7.1) running in GlassFish 3.1.1.
My test app with full source is here: https://github.com/ceefour/jsfajaxpush