FileReader readAsDataURL creating large base64 images

FileReader readAsDataURL creating large base64 images - javascript

I have a requirement to convert an input image (blob) to base64 image. Im aware the size will increase by approx 1.37.
However i have noticed when using the firereader.readAsDataUrl the size of the image is extremely large when compared to the built in base64 command line tool you have on macOS.
https://developer.mozilla.org/en-US/docs/Web/API/FileReader/readAsDataURL
Steps to reproduce:
Download this free sample image https://wall.alphacoders.com/big.php?i=685897 make sure to click the Download button (Downlaod Original 8000x600). Do not use right click save as because the image will be compressed
Use the jsfiddle to pick the above downloaded image and check the console log for the size of the base 64. https://jsfiddle.net/b7nkw8j9/ you can see the size is 11.2mb.
Now if your on mac use the base64 command line tool to convert the above downloaded image to base64 and check the file size there. You will see the base64 size here is. 5.9mb. base64 -w 0 downloaded_image.jpg > downloaded_image.base64
This is the function i am using in my code to convert an input file image to base64.
async convertToBase64(file) {
if (file === undefined) {
throw new Error("File could not be found");
}
const fileReader = new FileReader();
return new Promise((resolve, reject) => {
fileReader.readAsDataURL(file);
fileReader.onerror = (error) => {
reject("Input: File could not be read:" + error);
};
fileReader.onloadend = () => {
resolve(fileReader.result);
};
});
}
Why does the filereader.readAsDataURL make the image base64 output extremely large.
How can i make this more efficient?

I have 5,882,455 bytes for the FileReader vs 5,882,433 bytes for base64 output, if you add the 22 bytes from data:image/png;base64,, we're not too far.
However to the question How can i make this more efficient?, just don't use a data URL here. Whatever you've been told you need it too, it was a lie (I'm 99% percent sure).
Instead you should always prefer to work with the Blob directly.
To display it, use a blob URL:
inp.onchange = e => {
img.src = URL.createObjectURL(inp.files[0]);
};
<input type="file" id="inp">
<img id="img" src="" height="200" alt="Image preview..." accept="image/*">
To send it to your server, send the Blob directly, either through a FormData or, directly from xhr.send() or as the body of a fetch request.
The only case can think of where you would need a data URL version of a binary file on the front end is to generate a standalone document which would need to embed that binary file. For all other use cases I met in my career, Blob where far better suited.
For the message that gets printed in the Chrome's tooltip, that's the size of the USVString that the browser has to hold in memory. It's twice the size of the file you have on disk, because USVStrings are encoded in UTF-16, even though this string holds only ASCII characters.
However, to be sent to your server, or to be saved as a text file, it will generally be reencoded in an 8 bit encoding, and retrieve it's normal size.
So don't take this tooltip as a mean to tell you how big your data is, it's only the size it does take in the browser's allocated memory, but outside, it won't be the same at all.
If you wanna check the size of binary data generated here is a better fiddle, which will convert USVStrings to UTF-8 and keep binary as binary (e.g if you pass an ArrayBuffer to it):
function previewFile() {
var preview = document.querySelector('img');
var file = document.querySelector('input[type=file]').files[0];
var reader = new FileReader();
reader.addEventListener("load", function () {
console.log(new Blob([reader.result]).size);
preview.src = reader.result;
}, false);
if (file) {
reader.readAsDataURL(file);
}
}
<!-- Learn about this code on MDN: https://developer.mozilla.org/en-US/docs/Web/API/FileReader/readAsDataURL -->
<input type="file" onchange="previewFile()"><br>
<img src="" height="200" alt="Image preview...">

Related

which function should I use? .toDataURL() or .toBlob()?

I want to resize image at client side using JavaScript. I found 2 solutions, one is using .toDataURL() function and the other is using .toBlob() function. Both solutions are worked. Well, I just curious what is the difference between those two functions? which one is better? or when should I use .toDataURL() function or .toBlob() function?
Here is the code I used to output those two function, and I got very slightly different in image size (bytes) and image color (I'm not sure about this one). Is something wrong with the code?
<html>
<head>
<title>Php code compress the image</title>
</head>
<body>
<input id="file" type="file" onchange="fileInfo();"><br>
<div>
<h3>Original Image</h3>
<img id="source_image"/>
<button onclick="resizeImage()">Resize Image</button>
<button onclick="compressImage()">Compress Image</button>
<h1 id="source_image_size"></h1>
</div>
<div>
<h3>Resized Image</h3>
<h1> image from dataURL function </h1>
<img id="result_resize_image_dataURL"/>
<h1> image from toBlob function </h1>
<img id="result_resize_image_toBlob"/>
</div>
<div>
<fieldset>
<legend>Console output</legend>
<div id='console_out'></div>
</fieldset>
</div>
<script>
//Console logging (html)
if (!window.console)
console = {};
var console_out = document.getElementById('console_out');
var output_format = "jpg";
console.log = function(message) {
console_out.innerHTML += message + '<br />';
console_out.scrollTop = console_out.scrollHeight;
}
var encodeButton = document.getElementById('jpeg_encode_button');
var encodeQuality = document.getElementById('jpeg_encode_quality');
function fileInfo(){
var preview = document.getElementById('source_image');
var file = document.querySelector('input[type=file]').files[0];
var reader = new FileReader();
reader.addEventListener("load", function(e) {
preview.src = e.target.result;
}, false);
if (file) {
reader.readAsDataURL(file);
}
}
function resizeImage() {
var loadedData = document.getElementById('source_image');
var result_image = document.getElementById('result_resize_image_dataURL');
var cvs = document.createElement('canvas'),ctx;
cvs.width = Math.round(loadedData.width/4);
cvs.height = Math.round(loadedData.height/4);
var ctx = cvs.getContext("2d").drawImage(loadedData, 0, 0, cvs.width, cvs.height);
cvs.toBlob(function(blob) {
var newImg = document.getElementById('result_resize_image_toBlob'),
url = URL.createObjectURL(blob);
newImg.onload = function() {
// no longer need to read the blob so it's revoked
URL.revokeObjectURL(url);
};
newImg.src = url;
console.log(blob.size/1024);
}, 'image/jpeg', 0.92);
var newImageData = cvs.toDataURL('image/jpeg', 0.92);
var result_image_obj = new Image();
result_image_obj.src = newImageData;
result_image.src = result_image_obj.src;
var head = 'data:image/png;base64,';
var imgFileSize = ((newImageData.length - head.length)*3/4)/1024;
console.log(imgFileSize);
}
Edited:
Based on Result of html5 Canvas getImageData or toDataURL - Which takes up more memory?, its said that
"DataURL (BASE64) is imageData compressed to JPG or PNG, then converted to string, and this string is larger by 37% (info) than BLOB."
What is the string mean? is it same as size in bytes? using the code I provided above, the size difference is less than 1Kb (less than 1%). is .toBlob() always better than .toDataURL() function? or there is a specific condition where it would be better to use .toDataURL() function?

Definitely go with toBlob().
toDataURL is really just an early error in the specs, and had it been defined a few months later, it certainly wouldn't be here anymore, since we can do the same but better using toBlob.
toDataURL() is synchronous and will block your UI while doing the different operations
move the bitmap from GPU to CPU
conversion to the image format
conversion to base64 string
toBlob() on the other hand will only do the first bullet synchronously, but will do the conversion to image format in a non blocking manner. Also, it will simply not do the third bullet.
So in raw operations, this means toBlob() does less, in a better way.
toDataURL result takes way more memory than toBlob's.
The data-URL returned by toDataURL is an USVString, which contains the full binary data compressed in base64.
As the quote in your question said, base64 encoding in itself means that the binary data will now be ~37% bigger. But here it's not only encoded to base64, it is stored using UTF-16 encoding, which means that every ascii character will take twice the memory needed by raw ascii text, and we jump to a 174% bigger file than its original binary data...
And it's not over... Every time you'll use this string somewhere, e.g as the src of a DOM element*, or when sending it through a network request, this string can get reassigned in memory once again.
*Though modern browsers seem to have mechanism to handle this exact case
You (almost) never need a data-url.
Everything you can do with a data-url, you can do the same better with a Blob and a Blob-URI.
To display, or locally link to the binary data (e.g for user to download it), use a Blob-URI, using the URL.createObjectURL() method.
It is just a pointer to the binary data held in memory that the Blob itself points to. This means you can duplicate the blob-URI as many times as you wish, it will always be a ~100 chars UTF-16 string, and the binary data will not move.
If you wish to send the binary data to a server, send it as binary directly, or through a multipart/formdata request.
If you wish to save it locally, then use IndexedDB, which is able to save binary files. Storing binary files in LocalStorage is a very bad idea as the Storage object has to be loaded at every page-load.
The only case you may need data-urls is if you wish to create standalone documents that need to embed binary data, accessible after the current document is dead. But even then, to create a data-url version of the image from your canvas, use a FileReader to which you'd pass the Blob returned by canvas.toBlob(). That would make for a complete asynchronous conversion, avoiding your UI to be blocked for no good reasons.

File Uploading ReadAsDataUrl

I have a question about the File API and uploading files in JavaScript and how I should do this.
I have already utilized a file uploader that was quite simple, it simply took the files from an input and made a request to the server, the server then handled the files and uploaded a copy file on the server in an uploads directory.
However, I am trying to give people to option to preview a file before uploading it. So I took advantage of the File API, specifically the new FileReader() and the following readAsDataURL().
The file object has a list of properties such as .size and .lastModifiedDate and I added the readAsDataURL() output to my file object as a property for easy access in my Angular ng-repeat().
My question is, it occurred to me as I was doing this that I could store the dataurl in a database rather than upload the actual file? I was unsure if modifying the File data directly with it's dataurl as a property would affect its transfer.
What is the best practice? Is it better to upload a file or can you just store the dataurl and then output that, since that is essentially the file itself? Should I not modify the file object directly?
Thank you.
Edit: I should also note that this is a project for a customer that wants it to be hard for users to simply take uploaded content from the application and save it and then redistribute it. Would saving the files are urls in a database mitigate against right-click-save-as behavior or not really?

There is more then one way to preview a file. first is dataURL with filereader as you mention. but there is also the URL.createObjectURL which is faster
Decoding and encoding to and from base64 will take longer, it needs more calculations, more cpu/memory then if it would be in binary format.
Which i can demonstrate below
var url = 'https://upload.wikimedia.org/wikipedia/commons/c/cc/ESC_large_ISS022_ISS022-E-11387-edit_01.JPG'
fetch(url).then(res => res.blob()).then(blob => {
// Simulates a file as if you where to upload it throght a file input and listen for on change
var files = [blob]
var img = new Image
var t = performance.now()
var fr = new FileReader
img.onload = () => {
// show it...
// $('body').append(img)
var ms = performance.now() - t
document.body.innerHTML = `it took ${ms.toFixed(0)}ms to load the image with FileReader<br>`
// Now create a Object url instead of using base64 that takes time to
// 1 encode blob to base64
// 2 decode it back again from base64 to binary
var t2 = performance.now()
var img2 = new Image
img2.onload = () => {
// show it...
// $('body').append(img)
var ms2 = performance.now() - t2
document.body.innerHTML += `it took ${ms2.toFixed(0)}ms to load the image with URL.createObjectURL<br><br>`
document.body.innerHTML += `URL.createObjectURL was ${(ms - ms2).toFixed(0)}ms faster`
}
img2.src = URL.createObjectURL(files[0])
}
fr.onload = () => (img.src = fr.result)
fr.readAsDataURL(files[0])
})
The base64 will be ~3x larger. For mobile devices I think you would want to save bandwidth and battery.
But then there is also the latency of doing a extra request but that's where http 2 comes to rescue

How is a file split / chunked? How do I access the chunks sequentially?

For example, if I have an image and I download it -- how does the computer know that all the bytes are supposed to be in that sequential order?
When I upload an image, is it possible to upload it in "chunks" so I can access/use whatever I have uploaded thus far (i.e. I have only uploaded the top half of the image) -- how would I access it?
The same would go for video or PDF, etc.

You can read file into chunks and upload file in chunks.
See the example below.
<input type="file" name="filebrowsefileid">
var filePicker = document.getElementById('filebrowsefileid');
var file = filePicker.files[0];
var chunck = file.slice(a,b);
var reader = new FileReader();
reader.readAsBinaryString(chunck );
....
reader["onloadend"] = function () {
reader.result; // This is what you want
}

Why does canvas.toDataURL() not produce the same base64 as in Ruby for an image?

I'm trying to produce the same base64 data for an image file in both JavaScript and in Ruby. Unfortunately both are outputting two very different values.
In Ruby I do this:
Base64.encode64(File.binread('test.png'));
And then in JavaScript:
var image = new Image();
image.src = 'http://localhost:8000/test.png';
$(image).load(function() {
var canvas, context, base64ImageData;
canvas = document.createElement('canvas');
context = canvas.getContext('2d');
canvas.width = this.width;
canvas.height = this.height;
context.drawImage(this, 0, 0);
imageData = canvas.toDataURL('image/png').replace(/data:image\/[a-z]+;base64,/, '');
console.log(imageData);
});
Any idea why these outputs are different?

When you load the image in Ruby the binary file without any modifications will be encoded directly to base-64.
When you load an image in the browser it will apply some processing to the image before you will be able to use it with canvas:
ICC profile will be applied (if the image file contains that)
Gamma correction (where supported)
By the time you draw the image to canvas, the bitmap values has already been changed and won't necessarily be identical to the bitmap that was encoded before loading it as image (if you have an alpha channel in the file this may affect the color values when drawn to canvas - canvas is a little peculiar at this..).
As the color values are changed the resulting string from canvas will naturally also be different, before you even get to the stage of re-encoding the bitmap (as PNG is loss-less the encoding/compressing should be fairly identical, but factors may exist depending on the browser implementation that will influence that as well. to test, save out a black unprocessed canvas as PNG and compare with a similar image from your application - all values should be 0 incl. alpha and at the same size of course).
The only way to avoid this is to deal with the binary data directly. This is of course a bit overkill (in general at least) and a relative slow process in a browser.
A possible solution that works in some cases, is to remove any ICC profile from the image file. To save an image from Photoshop without ICC choose "Save for web.." in the file menu.

The browser is re-encoding the image as you save the canvas.
It does not generate an identical encoding to the file you rendered.

So I actually ended up solving this...
Fortunately I am using imgcache.js to cache images in the local filesystem using the FileSystem API. My solution is to use this API (and imgcache.js makes it easy) to get the base64 data from the actual cached copy of the file. The code looks like this:
var imageUrl = 'http://localhost:8000/test.png';
ImgCache.init(function() {
ImgCache.cacheFile(imageUrl, function() {
ImgCache.getCachedFile(imageUrl, function(url, fileEntry) {
fileEntry.file(function(file) {
var reader = new FileReader();
reader.onloadend = function(e) {
console.log($.md5(this.result.replace(/data:image\/[a-z]+;base64,/, '')));
};
reader.readAsDataURL(file);
});
});
});
});
Also, and very importantly, I had to remove line breaks from the base64 in Ruby:
Base64.encode64(File.binread('test.png')).gsub("\n", '');

JPEG data obtained from FileReader doesn't match data in file

I'm trying to select a local JPEG file in the web browser via the HTML5 FileReader so I can submit it to a server without reloading the page. All the mechanics are working and I think I'm transferring and saving the exact data that JavaScript gave me, but the result is an invalid JPEG file on the server. Here's the basic code that demonstrates the problem:
<form name="add_photos">
<input type="file" name="photo" id="photo" /><br />
<input type="button" value="Upload" onclick="upload_photo();" />
</form>
<script type="text/javascript">
function upload_photo() {
file = document.add_photos.photo.files[0];
if (file) {
fileReader = new FileReader();
fileReader.onload = upload_photo_ready;
fileReader.readAsBinaryString(file);
}
}
function upload_photo_ready(event) {
data = event.target.result;
// alert(data);
URL = "submit.php";
ajax = new XMLHttpRequest();
ajax.open("POST", URL, 1);
ajax.setRequestHeader("Ajax-Request", "1");
ajax.send(data);
}
</script>
Then my PHP script does this:
$data = file_get_contents("php://input");
$filename = "test.jpg";
file_put_contents($filename, $data);
$result = imagecreatefromjpeg($filename);
That last line throws a PHP error "test.jpg is not a valid JPEG file." If I download the data back to my Mac and try to open it in Preview, Preview says the file "may be damaged or use a file format that Preview doesn’t recognize."
If I open both the original file on my desktop and the uploaded file on the server in text editors to inspect their contents, they are almost but not quite the same. The original file starts like this:
ˇÿˇ‡JFIFˇ˛;CREATOR: gd-jpeg v1.0 (using IJG JPEG v62), quality = 90
But the uploaded file starts like this:
ÿØÿàJFIFÿþ;CREATOR: gd-jpeg v1.0 (using IJG JPEG v62), quality = 90
Interestingly, if I view the data in a JavaScript alert with the commented-out line above, it looks just like the uploaded file's data, so it seems as if the FileReader isn't giving the correct data at the very beginning, as opposed to a problem that is introduced while transferring or saving the data on the server. Can anyone explain this?
I'm using Safari 6 and I also tried Firefox 14.
UPDATE: I just figured out that if I skip the FileReader code and change ajax.send(data) to ajax.send(file), the image is transferred and saved correctly on the server. So my problem is basically solved, but I'll award the answer points to anyone who can explain why my original approach with readAsBinaryString didn't work.

Your problem lies with readAsBinaryString. This will transfer the binary data byte-for-byte into a string, so that you will send a text string to your PHP file. Now a text string always has an encoding; and when you use XmlHttpRequest to upload a string, by default it will use UTF-8.
So each character, which was originally supposed to represent one byte, will be encoded as UTF-8... which uses multiple bytes for each character with a code point above 127!
Your best best is to use readAsArrayBuffer instead of readAsBinaryString. This will avoid all the character set conversions (that are necessary when dealing with strings).

Develop Reference

JavaScript is the programming language of the Web.