I'm looking for a way to transfer video files to a client's mobile without streaming. The reason is (client's request) to eliminate the cost of such a server due to an expected all-at-once high traffic.
So I have looked on base64 encoding, below is the time it takes to get the 19mb file (one with 100mb internet connection, second with a 3G connection). This could make the waiting painful, especially on 3G connection.
I have also considered using byte array to significantly reduce the file size, but it's hard passing it via JSON with all the escaping backslashes...
Finally, I have looked on another possible solution, and that is to transfer a video directly to the client's phone while the app is closed (pushing a notification when the file has uploaded in the client's phone), but that is probably one of Cordova's limitation (as far as i'm aware).
I'm searching a solution for this for weeks now, therefore I have placed a bounty on it, since I believe it's a question worth answering. Somebody someday will thank us for it. :) I'll be the first.
Much thanks, and happy coding.
Hosting vs app serving
First of all you need to understand that no matter where the file is coming from - a file server (streaming) or application server (base64 encoded string), the hosting costs are going to be the similar (well, a file hosting server should be more effecient than anything you write, but that's a minor difference). You still need to store the file somewhere and you still need to send it over the network. The difference is only that in one case Apache/IIS/whatever server you use is handling all the complex stuff and in the second case you are going to be trying to recreate it all yourself.
Streaming vs Non-Streaming
When you serve a file (be it yourself, or through a file server) you can either allow it to be retrieved in chunks (streamed) or only as a huge big file. In the first case - streaming - if the user stops watching halfway through the video you will only need the server capacity to serve like 60 or 70% of the file. In the second case - non-streaming - you need to have the user first wait for the file to be retrieved in it's entirety and on top of that it will always cost you 100% power.
Precaching files
That's not to say nothing can be optimized. For example, if you are distributing a single file every week on saturday 6 pm, yet already know a full week before hand what that file is you could theoretically encrypt the file and serve it in the background distributed over the course of the entire week. And yes, you could even do that whilst building a Cordova application (though it will be a bit harder and you might end up writing your own plugin). Still though, that situation is incredibly rare and is definitely not worth the development time except in rare cases (e.g. it's often done with game files, but that's tens of GBs of data downloaded tens of thousands of times).
Related
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I'm working on a mobile front-end project using cordova, and the backend developer i'm working with insists that the media files (images/videos) should be transferred as base64 encoded in json files.
Now, with images it's so far working. Although it freezes the UI for a few seconds, it can be deferred somehow.
The videos however are so far a pain to handle, the length of a single/simple video being transferred is nearly 300,000 in length. It brings my poor laptop through a wild spin, and gets the uri after about 20 seconds of going through the code (and it still not working, and I don't feel like debugging it cause it nearly crashes my laptop with every refresh).
So my questions are:
Is base64 encoding a popular way of transferring media in mobile development?
And if not, what alternative way would you recommend using to transfer/present these videos?
I should mention though, the videos are meant to be viewed at once by a large number of people (hundreds perhaps), and the other developer says that their servers can't handle such traffic.
Much thanks for any advice, I couldn't find this info nowhere. :)
[...] the backend developer [...] insists that the media files (images/videos) should be transferred as base64 encoded in json files.
This is a very bad (and silly) idea up-front. You do not want to transfer large amount of binary data as strings. Especially not Unicode strings.
Here you need to arm up and convince your backend dev rebel to change his mind with what-ever it takes, play some Biber or Nickelback, or even change his background image to something Hello Kitty, or take a snapshot of his screen, set it as background and hide all the icons and the bar. This should help you changing his mind. If not, place a webasto in his office at max and lock all doors and windows.
Is base64 encoding a popular way of transferring media in mobile development?
It is popular and has a relative long history and became very common on Usenet and so forth. In those days however, the data amount was very low compared to today as all data where transferred over modems.
However, just because it is popular doesn't mean it is the right tool for everything. It is not very efficient as it require an encoding process which convert three octets into four bytes, causing an addition of 33% to the size.
On top of that: in JavaScript each string char is stored as two bytes due to Unicode char-set so your data is doubled and extended 33%. Your 300 mb data is now 300 x 2 x 1.33 = 798 mb (show that to your backdev! :) as it's a real factor if the servers cannot handle large amount of traffic).
This works fine for smaller files but for larger file as in your example this can cause a significant overhead in both time and memory usage, and of course bandwidth. And of course, on server side you would need to reverse the process with its own overhead.
And if not, what alternative way would you recommend using to transfer/present these videos?
I would recommend:
Separate meta-data out as JSON with a reference to the data. No binary data in the JSON.
Transfer the media data itself separately in native bytes (ArrayBuffer).
Send both at the same time to server.
The server then only need to parse the JSON data into edible for the backend, the binary data can go straight to disk.
Update I forgot to mention, as Pablo does in his answer, that you can look into streaming the data.
However, streaming is pretty much a synonym with buffering so the bandwidth will be about the same, just provided in a more brute-force way (usually UDP versus TCP, ie. loss of packets doesn't break the transfer). Streaming with limit your options more than buffering in the client though.
My 2 cents...
Not sure why "33% overhead" is always mentioned, when that's complete nonsense. Yes, it does initially roughly add that amount, however there's a little thing called gzip (ever heard of it?). I've done tons of tests and the difference is typically negligible. In fact, sometimes the gzipped base64 string is actually smaller than the binary file. Check out this guy's tests. So please, can we stop spreading absolute fiction.
Base64 is a perfectly acceptable method of retrieving a video. In fact, it works amazing for a private messaging system. For instance, if you were using AWS S3, you could store the files privately so there is no URL.
However, the main disadvantage (imho) of using a gzipped base64 video is that you need to wait for the whole video to load, so pseudo-streaming is out of the question.
Base64 is a convenient (but not efficient) way of transferring binary data. It's inefficient because transfer size will be 33% bigger than what you're originally transferring. Si it's not a popular way of transmitting video. If you are planning to stream that video, you should be looking for a established protocol for doing just that.
I would recommend a streaming protocol (there are a lot where you can chose from).
I think is bad idea, video files is large. But you can try with small video files.
Try online encoder https://base64.online/encoders/encode-video-to-base64
There you can convert video to Base64 Data URI, and try to insert in HTML
Result like this:
<video controls><source src="data:video/mpeg;base64,AAABuiEAAQALgBexAAABuwAMgBexBeH/wMAg4ODgAAA..."></video>
My company is building a single page application using javascript extensively. As time goes on, the number of javascript files to include in the associated html page is getting bigger and bigger.
We have a program that mignifies the javascript files during the integration process but it does not merge them. So the number of files is not reduced.
Concretely this means that when the page is getting loaded, the browser requires the javascript files one by one initiating each time a http request.
Does anyone has metrics or a sort of benchmark that would indicate up to what extent the overhead in requesting the javascript files one by one is really a problem that would require to merge the files into a single one?
Thanks
It really depends on the number of users and connections allowed by the server and the maximum number of connections of the client.
Generally, a browser can do multiple HTTP requests at the same time, so in theory there shouldn't be much difference in having one javascript file or a few.
You don't only have to consider the javascript files, but the images too of course, so a high number of files can indeed slow things down (if you hit the maximum number of simultaneous connection from server or client). So regarding that it would be wise to merge those files.
#Patrick already explained benefits of merging. There is however also a benefit of having many small files. Browsers by default give you a maximum number of parallel requests per domain. It should be 2 by HTTP standard but browsers don't follow it anymore. This means that requests beyond that limit wait.
You can use subdomains and redirect requests from them to your server. Then you can code client in such way that it will use a unique subdomain for each file. Thus you'll be able to download all files at the same time (requests won't queue) effectively increasing performance (note that you will probably need more static files servers for this to handle the traffic).
I haven't seen this being used in real life but I think that's an idea worth mentioning and testing. Related:
Max parallel http connections in a browser?
I think that you should have a look at your app Architecture more than thinking about what is out there.
But this site should give you a good idea: http://www.browserscope.org/?category=network
Browsers and servers may have their own rules which are different. If you search for http requests limit, you will find a lot of posts. For example the max http request limit is per domain.
But speaking a bit about software development. I like the component based approach.
You should group your files per component. Depending on your application requirements, you can load first the mandatory components and lazy load the less needed one or on the fly. I don't think you should download the entire app if it's huge and has a lot of different functionalities that may or may not all be used by your users.
When optimizing websites I've used concatenating and spriting to group related, reusable bits together but I'm often wondering how much or how little to package assets for delivery to the browser (sometimes automated tools aren't always part of my build process, though I prefer them).
I'm curious if there are some sensible guidelines just in the area of filesize when combining assets for delivery to the browser. Assuming no compression, or caching, just straightforward http transfer from a server to a browser with or without AJAX.
What is the largest smallest filesize recommended?
I've heard that because of packet size (right? apologies if that was inept) that 1kb and 2kb of data will transfer at basically the same speed — is there a general threshold in kb where additional bytes start impacting transfer rate.
Does transfer speed change linearly with filesize, or does it stair-stepper?
Extending the first question, does each kilobyte increase transfer speed in a fairly linear fashion? Or does it stair-stepper at packet-sized intervals (again, possibly inept word choice)?
Is there a maximum size
Again, I know there are lot's contextual reasonings that influence this, but is there are filesize that is inadvisably large given current networks, browsers, or is it heavily dependent on the server and networks? If there is a good generalization, that's all I'm curious about.
It probably goes without saying, but I'm not a server/networking expert, just a front-end dev looking for some sensible defaults to guide quick decisions in asset optimization.
It really depends on the server, network, and client.
Use common sense, is the basic answer: Don't try to send several-megabyte bitmaps, or the page will take as long to load as if the person is trying to download any other several-megabyte file. A bunch of PNGs right there on a single page, on the other hand, will not really be noticeable to most modern users. In a more computational realm than you've asked, don't abuse iframes to redirect people to several steps of other web pages.
If you want more information about actual transmission, the maximum size of a single TCP packet is technically 64kB, but you're not really going to be sending more than 1.5kB in a single packet. However, TCP is stream-based, so the packet size is mostly irrelevant. You should be more concerned with bandwidth of modern machines, and considering how efficient we have video asset streaming nowadays, I really don't think you should be overly worried about delivering uncompressed assets to your users.
Because of the relative infrequency of actual delivery errors (which have to be corrected over TCP), along with the miniscule packet size relative to the size of most modern web pages, it's going to increase in delivery time pretty much linearly with total size (again, like one giant file). There are some details about multi-stage web page delivery that I'm leaving out, but they're mostly ignored when you're delivering high-asset-count web pages.
Edit: To address your concern (in the title) about transferring actual html/js files, they're just text in transfer. The browser does all of that rendering and code-running for you. If you have a single jpg, it's going to mostly overshadow any html/js you have on the page.
Transfer size: maximum packet size for a TCP connection
http flow (a rough view): http://www.http.header.free.fr/http.html
Basically, as you get the primary html document representing the page as a whole (which is from your initial request to access the page), you parse for other URLs specified as images or scripts, and request those from the server for delivery to your session. The linked page is old but still relevant, and (I hope) easy to understand.
If you want actual bandwidth statistics for modern users, that's probably too much for me to track down. But if you want more technical info, wikipedia's pages on HTTP and TCP are pretty great.
Scenario: You are building a large javascript-driven web application where you want as few page refreshes as possible. Imagine 80-100MB of unminified javascript, just for the sake of having a number to define "large".
My assumption is that if you lazy-load your javascript files you can get a better balance on your load times (meaning, you don't have to wait a few seconds each time the page refreshes), hopefully resulting in the user not really noticing a lag during loading. I'm guessing that in a scenario like this, lazy-loading would be more desireable than your typical single minified .js file.
Now, theoretically, there is a fixed cost for a request to any file on a given server, regardless of the file's size. So, too many requests would not be desirable. For example, if a small javascript file loads at the same time as 10 other small- or medium-sized files, it might be better to group them together to save on the cost of multiple requests.
My question is, assuming reasonable defaults (say the client has a 3-5Mbps connection and a decent piece of hardware), what is an ideal size of a file to request? Too large, and you are back to loading too much at one time; too small, and the cost of the request becomes more expensive than the amount of data you are getting back, reducing your data-per-second economy.
Edit: All the answers were fantastic. I only chose Ben's because he gave a specific number.
Google's Page Speed initiative covers this in some detail:
http://code.google.com/speed/page-speed/docs/rules_intro.html
Specifically http://code.google.com/speed/page-speed/docs/payload.html
I would try to keep the amount that needs to be loaded to show the page (even if just the loading indicator) under 300K. After that, I would pull down additional data in chunks of up to 5MB at a time, with a loading indicator (maybe a progress bar) shown. I've had 15MB downloads fail on coffee shop broadband wifi that otherwise seemed OK. If it was bad enough that <5MB downloads failed I probably wouldn't blame a website for not working.
I also would consider downloading two files at a time, beyond the initial <300K file, using a loader like LabJS or HeadJS to programmatically add script tags.
I think it's clear that making the client download more than a MB of js before they can do anything is bad. And also making the client download more of anything than is necessary is also bad. But there's a clear benefit to having it all cached.
Factors that will influence the numbers:
Round-trip time
Server response time
Header Size (including cookies)
Caching Technique
Browser (see http://browserscope.com)
Balancing parallel downloading and different cache requirements are also factors to worry about.
This was partially covered recently by Kyle Simpson here: http://www.reddit.com/r/javascript/comments/j7ng4/do_you_use_script_loaders/c29wza8
I'm just looking for ideas/suggestions here; I'm not asking for a full on solution (although if you have one, I'd be happy to look at it)
I'm trying to find a way to only upload changes to text. It's most likely going to be used as a cloud-based application running on jQuery and HTML, with a PHP server running the back-end.
For example, if I have text like
asdfghjklasdfghjkl
And I change it to
asdfghjklXasdfghjkl
I don't want to have to upload the whole thing (the text can get pretty big)
For example, something like 8,X sent to the server could signify:
add an X to the 8th position
Or D8,3 could signify:
go to position 8 and delete the previous 3 terms
However, if a single request is corrupted en route to the server, the whole document could be corrupted since the positions would be changed. A simple hash could detect corruption, but then how would one go about recovering from the corruption? The client will have all of the data, but the data is possibly very large, and it is unlikely to be possible to upload.
So thanks for reading through this. Here is a short summary of what needs suggestions
Change/Modification Detection
Method to communicate the changes
Recovery from corruption
Anything else that needs improvement
There is already an accepted form for transmitting this kind of "differences" information. It's called Unified Diff.
The google-diff-match-patch provides implementations in Java, JavaScript, C++, C#, Lua and Python.
You should be able to just keep the "original text" and the "modified text" in variables on the client, then generate the diff in javascript (via diff-match-patch), send it to the server, along with a hash, and re-construct it (either using diff-match-patch or the unix "patch" program) on the server.
You might also want to consider including a "version" (or a modified date) when you send the original text to the client in the first place. Then include the same version (or date) in the "diff request" that the client sends up to the server. Verify the version on the server prior to applying the diff, so as to be sure that the server's copy of the text has not diverged from the client's copy while the modification was being made. (of course, in order for this to work, you'll need to update the version number on the server every time the master copy is updated).
You have a really interesting approach. But if the text files are really so large that it would need too much time to upload them every time, why do you have the send the whole thing to the client? Does the client really have to receive the whole 5mb text file? Wouldn't it be possible to send him only what he needs?
Anyway, to your question:
The first thing that comes to my mind when hearing "large text files" and modification detection is diff. For the algorithm, read here. This could be an approach to commit the changes, and it specifies a format for it. You'd just have to rebuild diff (or a part of it) in javascript. This will be not easy, but possible, as I guess. If the algorithm doesn't help you, possibly at least the definition of the diff file format does.
To the corruption issue: You don't have to fear that your date gets corrupted on the way, because the TCP protocol, on which HTTP is based, looks that everything arrives without being corrupted. What you should fear is the connection reset. Might be you can do something like a handshake? When the client sends an update to the server, the server applies the modifications and keeps one old version of the file. To ensure that the client has received the ratification from the server that the modification went fine (that's where the conneciton reset happens), the client sends back another ajax request to the server. If this one doesn't come to the server within sone definied time, the file gets reset on the server side.
Another thing: I don't know if javascript likes it to handle such gigantic files/data...
This sounds like a problem that versioning systems (CVS, SVN, Git, Bazaar) already solve very well.
They're all reasonably easy to set up on a server, and you can communicate with them through PHP.
After the setup, you'd get for free: versioning, log, rollback, handling of concurrent changes, proper diff syntax, tagging, branches...
You wouldn't get the 'send just the updates' functionality that you asked for. I'm not sure how important that is to you. Pure texts are really very cheap to send as far as bandwidth is concerned.
Personally, I would probably make a compromise similar to what Wikis do. Break down the whole text into smaller semantically coherent chunks (chapters, or even paragraphs), determine on the client side just which chunks have been edited (without going down to the character level), and send those.
The server could then answer with a diff, generated by your versioning system, which is something they do very efficiently. If you want to allow concurrent changes, you might run into cases where editors have to do manual merges, anyway.
Another general hint might be to look at what Google did with Wave. I have to remain general here, because I haven't really studied it in detail myself, but I seem to remember that there have been a few articles about how they've solved the real-time concurrent editing problem, which seems to be exactly what you'd like to do.
In summary, I believe the problem you're planning to tackle is far from trivial, there are tools that address many of the associated problems already, and I personally would compromise and reformulate the approach in favor of much less workload.