File API - Blob to JSON - javascript

I'm trying to do some experiment with HTML5, WebSocket and File API.
I'm using the Tomcat7 WebSocket implementation.
I'm able to send and received text messages from the servlet. What I want to do now is to send from the servlet to the client JSON objects, but I want to avoid text message in order to skip the JSON.parse (or similar) on the client, so I'm trying to send binary messages.
The servlet part is really simple:
String s = "{arr : [1,2]}";
CharBuffer cbuf = CharBuffer.wrap(s);
CharsetEncoder encoder = Charset.forName("UTF-8").newEncoder();
getWsOutbound().writeBinaryMessage(encoder.encode(cbuf));
getWsOutbound().flush();
After this message, on the client I see that I received a binary frame, that is converted to a Blob object (http://www.w3.org/TR/FileAPI/#dfn-Blob).
The question is: is it possible to get the JSON object from the Blob?
I took a look at the FileReader interface (http://www.w3.org/TR/FileAPI/#FileReader-interface), and I used code like this to inspect what the FileReader can do (the first line creates a brand new Blob, so you can test on the fly if you want):
var b = new Blob([{"test": "toast"}], {type : "application/json"});
var fr = new FileReader();
fr.onload = function(evt) {
var res = evt.target.result;
console.log("onload",arguments, res, typeof res);
};
fr.readAsArrayBuffer(b);
using all the "readAs..." methods that I saw on the File Reader implementation (I'm using Chrome 22). Anyway I didn't find something useful.
Did you have any suggestion? Thanks.

You should have tried readAsText() instead of readAsArrayBuffer() (JSON is text in the end).
You've also missed to stringify the object (convert to JSON text)
var b = new Blob([JSON.stringify({"test": "toast"})], {type : "application/json"}),
fr = new FileReader();
fr.onload = function() {
console.log(JSON.parse(this.result))
};
fr.readAsText(b);

To convert Blob/File that contains JSON data to a JavaScript object use it:
JSON.parse(await blob.text());
The example:
Select a JSON file, then you can use it in the browser's console (json object).
const input = document.createElement("input");
input.type = "file";
input.accept = "application/json";
document.body.prepend(input);
input.addEventListener("change", async event => {
const json = JSON.parse(await input.files[0].text());
console.log("json", json);
globalThis.json = json;
});

What you're doing is conceptually wrong. JSON is a string representation of an object, not an object itself. So, when you send a binary representation of JSON over the wire, you're sending a binary representation of the string. There's no way to get around parsing JSON on the client side to convert a JSON string to a JavaScript Object.
You absolutely should always send JSON as text to the client, and you should always call JSON.parse. Nothing else is going to be easy for you.

let reader = new FileReader()
reader.onload = e => {
if (e.target.readyState === 2) {
let res = {}
if (window.TextDecoder) {
const enc = new TextDecoder('utf-8')
res = JSON.parse(enc.decode(new Uint8Array(e.target.result))) //转化成json对象
} else {
res = JSON.parse(String.fromCharCode.apply(null, new Uint8Array(e.target.result)))
}
console.info('import-back:: ', res)
}
}
reader.readAsArrayBuffer(response)

Related

Decode XMLHTTPResponseText into dataUrl without base encoding on server side

How can the plain text response of an XMLHTTPRequest be converted to a dataUrl on client side?
Image data is being send from the server to an Iframe and the only option to retrieve the data is the default encoded data from a GET request.*
I do not have any control over the server. I can not specify overrideMimeType nor the responseType of the request.
I tried to utf8 encode the returned data:
const utf8 = new TextEncoder();
const bytes = utf8.encode(imageDataAsText);
//Convert to data url
const fr = new FileReader();
const blob = new Blob([bytes], { type: attachment.metadata.mediaType });
fr.onload = (e) => {
setImageData(fr.result as string);
};
fr.readAsDataURL(blob);
Converting via the char code didn't work either:
https://stackoverflow.com/a/6226756/3244464
let bytesv2 = []; // char codes
for (var i = 0; i < imageDataAsString.length; ++i) {
var code = imageDataAsString.charCodeAt(i);
bytesv2 = bytesv2.concat([code & 0xff, code / 256 >>> 0]);
}
Raw data as it is displayed by console out. What is the actual default encoding of the data I am working with here?
Context:
* The data in question is recieved inside an iframe inside the atlassian (jira/confuence) ecosystem. They do not support piping binary data from the parent frame to the iframe, nor can I make my own request due to the authorization flow which requires cookies stored on the parent page. All other options mention to override some encoding, or changing it on the server side do not apply in this specific case.
When you are using XMLHttpRequest to get binary data then don't return it as a string. Use xhr.responseType = 'blob' (or arrayBuffer if you intend to read/modify it)
var xhr = new XMLHttpRequest()
xhr.responseType = 'blob'
xhr.onload = () => {
// Convert xhr blob to data url
const fr = new FileReader()
fr.onload = () => {
setImageData(fr.result)
}
fr.readAsDataURL(xhr.response)
}
Better yet use the fetch api instead of the old XMLHttpRequest to get the binary
It's more popular among web workers and servers. It's also simpler and based on promises
fetch(url)
.then(res => res.blob())
.then(blob => { ... })
And why do you need it to be a base64 url? if it's just to show a preview of an <img> then it's a waste of time, cpu and memory.
It would be better just to do:
img.src = URL.createObjectURL(blob)

Why is nodeJs not reading entire binary file from disk?

I have a PDF file which I want to read into memory using NodeJS. Ideally I'd like to encode it using base64 for transferring it. But somehow the read function does not seem to read the full PDF file, which makes no sense to me. The original PDF was generated using pdfKit, and is ok and viewable using a PDF reader program.
The original file test.pdf has 90kB on disk. But if I read and write it back to disk there are just 82kB and the new PDF test-out.pdf is not ok. The pdf viewer says:
Unable to open document. The pdf document is damaged.
The base64 encoding therefore also does not work correctly. I tested it using this webservice. Does someone know why and what is happening here? And how to resolve it.
I found this post already.
fs = require('fs');
let buf = fs.readFileSync('test.pdf'); // returns raw buffer binary data
// buf = fs.readFileSync('test.pdf', {encoding:'base64'}); // for the base64 encoded data
// ...transfer the base64 data...
fs.writeFileSync('test-out.pdf', buf); // should be pdf again
EDIT MCVE:
const fs = require('fs');
const PDFDocument = require('pdfkit');
let filepath = 'output.pdf';
class PDF {
constructor() {
this.doc = new PDFDocument();
this.setupdocument();
this.doc.pipe(fs.createWriteStream(filepath));
}
setupdocument() {
var pageNumber = 1;
this.doc.on('pageAdded', () => {
this.doc.text(++pageNumber, 0.5 * (this.doc.page.width - 100), 40, {width: 100, align: 'center'});
}
);
this.doc.moveDown();
// draw some headline text
this.doc.fontSize(25).text('Some Headline');
this.doc.fontSize(15).text('Generated: ' + new Date().toUTCString());
this.doc.moveDown();
this.doc.font('Times-Roman', 11);
}
report(object) {
this.doc.moveDown();
this.doc
.text(object.location+' '+object.table+' '+Date.now())
.font('Times-Roman', 11)
.moveDown()
.text(object.name)
.font('Times-Roman', 11);
this.doc.end();
let report = fs.readFileSync(filepath);
return report;
}
}
let pdf = new PDF();
let buf = pdf.report({location: 'athome', table:'wood', name:'Bob'});
fs.writeFileSync('outfile1.pdf', buf);
The encoding option for fs.readFileSync() is for you to tell the readFile function what encoding the file already is so the code reading the file knows how to interpret the data it reads. It does not convert it into that encoding.
In this case, your PDF is binary - it's not base64 so you are telling it to try to convert it from base64 into binary which causes it to mess up the data.
You should not be passing the encoding option at all and you will then get the RAW binary buffer (which is what a PDF file is - raw binary). If you then want to convert that to base64 for some reason, you can then do buf.toString('base64') on it. But, that is not its native format and if you write that converted data back out to disk, it won't be a legal PDF file.
To just read and write the same file out to a different filename, leave off the encoding option entirely:
const fs = require('fs');
let buf = fs.readFileSync('test.pdf'); // get raw buffer binary data
fs.writeFileSync('test-out.pdf', buf); // write out raw buffer binary data
After a lot of searching I found this Github issue. The problem in my question seems to be the call of doc.end() which for some reason doesn't wait for the stream to finish (finish event of write stream). Therefore as suggested in the Github issue, the following approaches work:
callback based:
doc = new PDFDocument();
writeStream = fs.createWriteStream('filename.pdf');
doc.pipe(writeStream);
doc.end()
writeStream.on('finish', function () {
// do stuff with the PDF file
});
or promise based:
const stream = fs.createWriteStream(localFilePath);
doc.pipe(stream);
.....
doc.end();
await new Promise<void>(resolve => {
stream.on("finish", function() {
resolve();
});
});
or even nicer, instead of calling doc.end() direcly, call the function savePdfToFile below:
function savePdfToFile(pdf : PDFKit.PDFDocument, fileName : string) : Promise<void> {
return new Promise<void>((resolve, reject) => {
// To determine when the PDF has finished being written sucessfully
// we need to confirm the following 2 conditions:
//
// 1. The write stream has been closed
// 2. PDFDocument.end() was called syncronously without an error being thrown
let pendingStepCount = 2;
const stepFinished = () => {
if (--pendingStepCount == 0) {
resolve();
}
};
const writeStream = fs.createWriteStream(fileName);
writeStream.on('close', stepFinished);
pdf.pipe(writeStream);
pdf.end();
stepFinished();
});
}
This function should correctly handle the following situations:
PDF generated successfully
Error is thrown inside pdf.end() before write stream is closed
Error is thrown inside pdf.end() after write stream has been closed

Proper way to read a file using FileReader() to generate an md5 hash string from image files?

I'm currently doing this (see snippet below) to get an md5 hash string for the image files I'm uploading (I'm using the hash as fileNames):
NOTE: I'm using the md5 package to generate the hash (it's loaded into the snippet).
There are 4 available methods on FileReader() to read the files. They all seem to produce good results.
readAsText(file)
readAsBinaryString(file);
readAsArrayBuffer(file);
readAsDataURL(file);
Which is should I be using in this case and why? Can you also explain the difference between them?
function onFileSelect(e) {
const file = e.target.files[0];
const reader1 = new FileReader();
const reader2 = new FileReader();
const reader3 = new FileReader();
const reader4 = new FileReader();
reader1.onload = (event) => {
const fileContent = event.target.result;
console.log('Hash from "readAsText()": ');
console.log(md5(fileContent));
}
reader2.onload = (event) => {
const fileContent = event.target.result;
console.log('Hash from "readAsBinaryString()": ');
console.log(md5(fileContent));
}
reader3.onload = (event) => {
const fileContent = event.target.result;
console.log('Hash from "readAsArrayBuffer()": ');
console.log(md5(fileContent));
}
reader4.onload = (event) => {
const fileContent = event.target.result;
console.log('Hash from "readAsDataURL()": ');
console.log(md5(fileContent));
}
reader1.readAsText(file);
reader2.readAsBinaryString(file);
reader3.readAsArrayBuffer(file);
reader4.readAsDataURL(file);
}
.myDiv {
margin-bottom: 10px;
}
<script src="https://cdn.jsdelivr.net/npm/js-md5#0.7.3/src/md5.min.js"></script>
<div class="myDiv">Pick an image file to see the 4 hash results on console.log()</div>
<input type='file' onChange="onFileSelect(event)" accept='.jpg,.jpeg,.png,.gif' />
Use readAsArrayBuffer.
readAsBinaryString() and readAsDataURL() will make your computer do a lot more work than what needs to be done:
read the blob as binary stream
convert to UTF-16 / base64 String (remember strings are not mutable in js, any operation you do on it will actually create a copy in memory)
[ pass to your lib ]
convert to binary string
process the data
Also, it seems your library doesn't handle data URLs and fails on UTF-16 strings.
readAsText() by default will try to interpret you binary data as an UTF-8 text sequence, which is pretty bad for binary data like raster image:
// generate some binary data
document.createElement('canvas').toBlob(blob => {
const utf8_reader = new FileReader();
const bin_reader = new FileReader();
let done = 0;
utf8_reader.onload = bin_reader.onload = e => {
if(++done===2) {
console.log('same results: ', bin_reader.result === utf8_reader.result);
console.log("utf8\n", utf8_reader.result);
console.log("utf16\n", bin_reader.result);
}
}
utf8_reader.readAsText(blob);
bin_reader.readAsBinaryString(blob);
});
readAsArrayBuffer on the other hand will just allocate the binary data as is in memory. Simple I/O, no processing.
To manipulate this data, we can use TypedArrays views over this binary data, which being only views, won't create any overhead either.
And if you look at the library you are using, they will anyway pass your input to such an Uint8Array to further process it. However beware they apparently need you to pass an Uint8Array view of this ArrayBuffer instead of the nude ArrayBuffer directly.

Convert Blob data to Raw buffer in javascript or node

I am using a plugin jsPDF which generates PDF and saves it to local file system. Now in jsPDF.js, there is some piece of code which generates pdf data in blob format as:-
var blob = new Blob([array], {type: "application/pdf"});
and further saves the blob data to local file system. Now instead of saving I need to print the PDF using plugin node-printer.
Here is some sample code to do so
var fs = require('fs'),
var dataToPrinter;
fs.readFile('/home/ubuntu/test.pdf', function(err, data){
dataToPrinter = data;
}
var printer = require("../lib");
printer.printDirect({
data: dataToPrinter,
printer:'Deskjet_3540',
type: 'PDF',
success: function(id) {
console.log('printed with id ' + id);
},
error: function(err) {
console.error('error on printing: ' + err);
}
})
The fs.readFile() reads the PDF file and generates data in raw buffer format.
Now what I want is to convert the 'Blob' data into 'raw buffer' so that I can print the PDF.
If you are not using NodeJS then you should know that the browser does not have a Buffer class implementation and you are probably compiling your code to browser-specific environment on something like browserify. In that case you need this library that converts your blob into a Buffer class that is supposed to be as perfectly equal to a NodeJS Buffer object as possible (the implementation is at feross/buffer).
If you are using node-fetch (not OP's case) then you probably got a blob from a response object:
const fetch = require("node-fetch");
const response = await fetch("http://www.stackoverflow.com/");
const blob = await response.blob();
This blob is an internal implementation and exists only inside node-fetch or fetch-blob libraries, to convert it to a native NodeJS Buffer object you need to transform it to an arrayBuffer first:
const arrayBuffer = await blob.arrayBuffer();
const buffer = Buffer.from(arrayBuffer);
This buffer object can then be used on things such as file writes and server responses.
For me, it worked with the following:
const buffer=Buffer.from(blob,'binary');
So, this buffer can be stored in Google Cloud Storage and local disk with fs node package.
I used blob file, to send data from client to server through ddp protocol (Meteor), so, when this file arrives to server I convert it to buffer in order to store it.
var blob = new Blob([array], {type: "application/pdf"});
var arrayBuffer, uint8Array;
var fileReader = new FileReader();
fileReader.onload = function() {
arrayBuffer = this.result;
uint8Array = new Uint8Array(arrayBuffer);
var printer = require("./js/controller/lib");
printer.printDirect({
data: uint8Array,
printer:'Deskjet_3540',
type: 'PDF',
success: function(id) {
console.log('printed with id ' + id);
},
error: function(err) {
console.error('error on printing: ' + err);
}
})
};
fileReader.readAsArrayBuffer(blob);
This is the final code which worked for me. The printer accepts uint8Array encoding format.
Try:
var blob = new Blob([array], {type: "application/pdf"});
var buffer = new Buffer(blob, "binary");

Chrome extension: how to pass ArrayBuffer or Blob from content script to the background without losing its type?

I have this content script that downloads some binary data using XHR, which is sent later to the background script:
var self = this;
var xhr = new XMLHttpRequest();
xhr.open('GET', url);
xhr.responseType = 'arraybuffer';
xhr.onload = function(e) {
if (this.status == 200) {
self.data = {
data: xhr.response,
contentType: xhr.getResponseHeader('Content-Type')
};
}
};
xhr.send();
... later ...
sendResponse({data: self.data});
After receiving this data in background script, I'd like to form another XHR request that uploads this binary data to my server, so I do:
var formData = new FormData();
var bb = new WebKitBlobBuilder();
bb.append(data.data);
formData.append("data", bb.getBlob(data.contentType));
var req = new XMLHttpRequest();
req.open("POST", serverUrl);
req.send(formData);
The problem is that the file uploaded to the server contains just this string: "[object Object]". I guess this happens because ArrayBuffer type is lost somehow while transferring it from content process to the background? How can I solve that?
Messages passed between a Content Script and a background page are JSON-serialized.
If you want to transfer an ArrayBuffer object through a JSON-serialized channel, wrap the buffer in a view, before and after transferring.
I show an isolated example, so that the solution is generally applicable, and not just in your case. The example shows how to pass around ArrayBuffers and typed arrays, but the method can also be applied to File and Blob objects, by using the FileReader API.
// In your case: self.data = { data: new Uint8Array(xhr.response), ...
// Generic example:
var example = new ArrayBuffer(10);
var data = {
// Create a view
data: Array.apply(null, new Uint8Array(example)),
contentType: 'x-an-example'
};
// Transport over a JSON-serialized channel. In your case: sendResponse
var transportData = JSON.stringify(data);
//"{"data":[0,0,0,0,0,0,0,0,0,0],"contentType":"x-an-example"}"
// At the receivers end. In your case: chrome.extension.onRequest
var receivedData = JSON.parse(transportData);
// data.data is an Object, NOT an ArrayBuffer or Uint8Array
receivedData.data = new Uint8Array(receivedData.data).buffer;
// Now, receivedData is the expected ArrayBuffer object
This solution has been tested successfully in Chrome 18 and Firefox.
new Uint8Array(xhr.response) is used to create a view of the ArrayBuffer, so that the individual bytes can be read.
Array.apply(null, <Uint8Array>) is used to create a plain array, using the keys from the Uint8Array view. This step reduces the size of the serialized message. WARNING: This method only works for small amounts of data. When the size of the typed array exceeds 125836, a RangeError will be thrown. If you need to handle large pieces of data, use other methods to do the conversion between typed arrays and plain arrays.
At the receivers end, the original buffer can be obtained by creating a new Uint8Array, and reading the buffer attribute.
Implementation in your Google Chrome extension:
// Part of the Content script
self.data = {
data: Array.apply(null, new Uint8Array(xhr.response)),
contentType: xhr.getResponseHeader('Content-Type')
};
...
sendResponse({data: self.data});
// Part of the background page
chrome.runtime.onMessage.addListener(function(data, sender, callback) {
...
data.data = new Uint8Array(data.data).buffer;
Documentation
MDN: Typed Arrays
MDN: ArrayBuffer
MDN: Uint8Array
MDN: <Function> .apply
Google Chrome Extension docs: Messaging > Simple one-time requests
"This lets you send a one-time JSON-serializable message from a content script to extension, or vice versa, respectively"
SO bonus: Upload a File in a Google Chrome Extension - Using a Web worker to request, validate, process and submit binary data.
For Chromium Extensions manifest v3, URL.createObjectURL() approach doesn't work anymore because it is prohibited in the service workers.
The best (easiest) way to pass data from a service worker to a content script (and vice-versa), is to convert the blob into a base64 representation.
const fetchBlob = async url => {
const response = await fetch(url);
const blob = await response.blob();
const base64 = await convertBlobToBase64(blob);
return base64;
};
const convertBlobToBase64 = blob => new Promise(resolve => {
const reader = new FileReader();
reader.readAsDataURL(blob);
reader.onloadend = () => {
const base64data = reader.result;
resolve(base64data);
};
});
Then send the base64 to the content script.
Service worker:
chrome.tabs.sendMessage(sender.tab.id, { type: "LOADED_FILE", base64: base64 });
Content script:
chrome.runtime.onMessage.addListener(async (request, sender) => {
if (request.type == "LOADED_FILE" && sender.id == '<your_extension_id>') {
// do anything you want with the data from the service worker.
// e.g. convert it back to a blob
const response = await fetch(request.base64);
const blob = await response.blob();
}
});

Categories

Resources