Convert Blob data to Raw buffer in javascript or node - javascript

I am using a plugin jsPDF which generates PDF and saves it to local file system. Now in jsPDF.js, there is some piece of code which generates pdf data in blob format as:-
var blob = new Blob([array], {type: "application/pdf"});
and further saves the blob data to local file system. Now instead of saving I need to print the PDF using plugin node-printer.
Here is some sample code to do so
var fs = require('fs'),
var dataToPrinter;
fs.readFile('/home/ubuntu/test.pdf', function(err, data){
dataToPrinter = data;
}
var printer = require("../lib");
printer.printDirect({
data: dataToPrinter,
printer:'Deskjet_3540',
type: 'PDF',
success: function(id) {
console.log('printed with id ' + id);
},
error: function(err) {
console.error('error on printing: ' + err);
}
})
The fs.readFile() reads the PDF file and generates data in raw buffer format.
Now what I want is to convert the 'Blob' data into 'raw buffer' so that I can print the PDF.

If you are not using NodeJS then you should know that the browser does not have a Buffer class implementation and you are probably compiling your code to browser-specific environment on something like browserify. In that case you need this library that converts your blob into a Buffer class that is supposed to be as perfectly equal to a NodeJS Buffer object as possible (the implementation is at feross/buffer).
If you are using node-fetch (not OP's case) then you probably got a blob from a response object:
const fetch = require("node-fetch");
const response = await fetch("http://www.stackoverflow.com/");
const blob = await response.blob();
This blob is an internal implementation and exists only inside node-fetch or fetch-blob libraries, to convert it to a native NodeJS Buffer object you need to transform it to an arrayBuffer first:
const arrayBuffer = await blob.arrayBuffer();
const buffer = Buffer.from(arrayBuffer);
This buffer object can then be used on things such as file writes and server responses.

For me, it worked with the following:
const buffer=Buffer.from(blob,'binary');
So, this buffer can be stored in Google Cloud Storage and local disk with fs node package.
I used blob file, to send data from client to server through ddp protocol (Meteor), so, when this file arrives to server I convert it to buffer in order to store it.

var blob = new Blob([array], {type: "application/pdf"});
var arrayBuffer, uint8Array;
var fileReader = new FileReader();
fileReader.onload = function() {
arrayBuffer = this.result;
uint8Array = new Uint8Array(arrayBuffer);
var printer = require("./js/controller/lib");
printer.printDirect({
data: uint8Array,
printer:'Deskjet_3540',
type: 'PDF',
success: function(id) {
console.log('printed with id ' + id);
},
error: function(err) {
console.error('error on printing: ' + err);
}
})
};
fileReader.readAsArrayBuffer(blob);
This is the final code which worked for me. The printer accepts uint8Array encoding format.

Try:
var blob = new Blob([array], {type: "application/pdf"});
var buffer = new Buffer(blob, "binary");

Related

Base64 to Image File Convertion in JS

I am working on a project where I have to upload an image as form data along with other text fields. I have my file in Base64 string at first, then I convert it into a file before uploading it to the server.
const data = await fetch(base64String);
const blob = await data.blob();
const file = await new File([blob], 'avatar', { type: 'image/png' });
I logged the base64String in the client side before uploading it to the server. Then I upload file to the server as a File. Before saving it to MongoDB when I log it as a base64 string again in the server side, I see my string is not the same as before. I feel like while converting the base64 to file in the client side I am doing something wrong. Help me out please.
I have figured out my problem. When I take image file input from my computer I get a base64 string like below -
dataimage/jpegbase64/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAA...
But, when I convert it back into a file it expects a string like below -
/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAA....
So, basically, I had to trim the string accordingly to match the expected format and wrote a base64 to file conversion function following this answer.
Here is my function to convert a base64 string to an image file
export function getFileFromBase64(string64:string, fileName:string) {
const trimmedString = string64.replace('dataimage/jpegbase64', '');
const imageContent = atob(trimmedString);
const buffer = new ArrayBuffer(imageContent.length);
const view = new Uint8Array(buffer);
for (let n = 0; n < imageContent.length; n++) {
view[n] = imageContent.charCodeAt(n);
}
const type = 'image/jpeg';
const blob = new Blob([buffer], { type });
return new File([blob], fileName, { lastModified: new Date().getTime(), type });
}

Why is nodeJs not reading entire binary file from disk?

I have a PDF file which I want to read into memory using NodeJS. Ideally I'd like to encode it using base64 for transferring it. But somehow the read function does not seem to read the full PDF file, which makes no sense to me. The original PDF was generated using pdfKit, and is ok and viewable using a PDF reader program.
The original file test.pdf has 90kB on disk. But if I read and write it back to disk there are just 82kB and the new PDF test-out.pdf is not ok. The pdf viewer says:
Unable to open document. The pdf document is damaged.
The base64 encoding therefore also does not work correctly. I tested it using this webservice. Does someone know why and what is happening here? And how to resolve it.
I found this post already.
fs = require('fs');
let buf = fs.readFileSync('test.pdf'); // returns raw buffer binary data
// buf = fs.readFileSync('test.pdf', {encoding:'base64'}); // for the base64 encoded data
// ...transfer the base64 data...
fs.writeFileSync('test-out.pdf', buf); // should be pdf again
EDIT MCVE:
const fs = require('fs');
const PDFDocument = require('pdfkit');
let filepath = 'output.pdf';
class PDF {
constructor() {
this.doc = new PDFDocument();
this.setupdocument();
this.doc.pipe(fs.createWriteStream(filepath));
}
setupdocument() {
var pageNumber = 1;
this.doc.on('pageAdded', () => {
this.doc.text(++pageNumber, 0.5 * (this.doc.page.width - 100), 40, {width: 100, align: 'center'});
}
);
this.doc.moveDown();
// draw some headline text
this.doc.fontSize(25).text('Some Headline');
this.doc.fontSize(15).text('Generated: ' + new Date().toUTCString());
this.doc.moveDown();
this.doc.font('Times-Roman', 11);
}
report(object) {
this.doc.moveDown();
this.doc
.text(object.location+' '+object.table+' '+Date.now())
.font('Times-Roman', 11)
.moveDown()
.text(object.name)
.font('Times-Roman', 11);
this.doc.end();
let report = fs.readFileSync(filepath);
return report;
}
}
let pdf = new PDF();
let buf = pdf.report({location: 'athome', table:'wood', name:'Bob'});
fs.writeFileSync('outfile1.pdf', buf);
The encoding option for fs.readFileSync() is for you to tell the readFile function what encoding the file already is so the code reading the file knows how to interpret the data it reads. It does not convert it into that encoding.
In this case, your PDF is binary - it's not base64 so you are telling it to try to convert it from base64 into binary which causes it to mess up the data.
You should not be passing the encoding option at all and you will then get the RAW binary buffer (which is what a PDF file is - raw binary). If you then want to convert that to base64 for some reason, you can then do buf.toString('base64') on it. But, that is not its native format and if you write that converted data back out to disk, it won't be a legal PDF file.
To just read and write the same file out to a different filename, leave off the encoding option entirely:
const fs = require('fs');
let buf = fs.readFileSync('test.pdf'); // get raw buffer binary data
fs.writeFileSync('test-out.pdf', buf); // write out raw buffer binary data
After a lot of searching I found this Github issue. The problem in my question seems to be the call of doc.end() which for some reason doesn't wait for the stream to finish (finish event of write stream). Therefore as suggested in the Github issue, the following approaches work:
callback based:
doc = new PDFDocument();
writeStream = fs.createWriteStream('filename.pdf');
doc.pipe(writeStream);
doc.end()
writeStream.on('finish', function () {
// do stuff with the PDF file
});
or promise based:
const stream = fs.createWriteStream(localFilePath);
doc.pipe(stream);
.....
doc.end();
await new Promise<void>(resolve => {
stream.on("finish", function() {
resolve();
});
});
or even nicer, instead of calling doc.end() direcly, call the function savePdfToFile below:
function savePdfToFile(pdf : PDFKit.PDFDocument, fileName : string) : Promise<void> {
return new Promise<void>((resolve, reject) => {
// To determine when the PDF has finished being written sucessfully
// we need to confirm the following 2 conditions:
//
// 1. The write stream has been closed
// 2. PDFDocument.end() was called syncronously without an error being thrown
let pendingStepCount = 2;
const stepFinished = () => {
if (--pendingStepCount == 0) {
resolve();
}
};
const writeStream = fs.createWriteStream(fileName);
writeStream.on('close', stepFinished);
pdf.pipe(writeStream);
pdf.end();
stepFinished();
});
}
This function should correctly handle the following situations:
PDF generated successfully
Error is thrown inside pdf.end() before write stream is closed
Error is thrown inside pdf.end() after write stream has been closed

Create image from ArrayBuffer in Nodejs

I'm trying to create an image file from chunks of ArrayBuffers.
all= fs.createWriteStream("out."+imgtype);
for(i=0; i<end; i++){
all.write(picarray[i]);
}
all.end();
where picarray contains ArrayBuffer chunks. However, I get the error TypeError: Invalid non-string/buffer chunk.
How can I convert ArrayBuffer chunks into an image?
Have you tried first converting it into a node.js. Buffer? (this is the native node.js Buffer interface, whereas ArrayBuffer is the interface for the browser and not completely supported for node.js write operations).
Something along the line of this should help:
all= fs.createWriteStream("out."+imgtype);
for(i=0; i<end; i++){
var buffer = new Buffer( new Uint8Array(picarray[i]) );
all.write(buffer);
}
all.end();
after spending some time i got this, it worked for me perfectly.
as mentioned by #Nick you will have to convert buffer array you recieved from browser in to nodejs Buffer.
var readWriteFile = function (req) {
var fs = require('fs');
var data = new Buffer(req);
fs.writeFile('fileName.png', data, 'binary', function (err) {
if (err) {
console.log("There was an error writing the image")
}
else {
console.log("The sheel file was written")
}
});
});
};
Array Buffer is browser supported which will be unsupportable for writing file, we need to convert to Buffer native api of NodeJs runtime engine.
This few lines of code will create image.
const fs = require('fs');
let data = arrayBuffer // you image stored on arrayBuffer variable;
data = Buffer.from(data);
fs.writeFile(`Assets/test.png`, data, err => { // Assets is a folder present in your root directory
if (err) {
console.log(err);
} else {
console.log('File created successfully!');
}
});

JS - How to compute MD5 on binary data

EDIT: changed title from "JS File API - write and read UTF-8 data is inconsistent" to reflect the actual question.
I have some binary content i need to calculate the MD5 of. The content is a WARC file, that means that it holds text as well as encoded images. To avoid errors in the file saving, I convert and store all the data in arrayBuffers. All the data is put in UInt8Arrays to convert it to UTF-8.
My first attempt, for testing, is to use the saveAs library to save files from Chrome extensions. This means I was using a blob object to be passed on to the method and create the file.
var b = new Blob(arrayBuffers, {type: "text/plain;charset=utf-8"});
saveAs(b,'name.warc');
I haven't found a tool to compute the MD5 from a Blob object so what I was doing was using a FileReader to read the blob file as binary data and then use an MD5 tool (I used cryptoJS as well as a tool from faultylabs) to compute the result.
f = new FileReader();
f.readAsBinaryString(b);
f.onloadend = function(a){
console.log( 'Original file checksum: ', faultylabs.MD5(this.result) );
}
The resources (images) are downloaded directly in arraybuffer format so I have no need to convert them.
The result was wrong, meaning that checking the MD5 from the code and checking it from the file I saved on my local machine gave 2 different results. Reading as text, obviously shoots out an error.
The workaround I found, consists in writing the blob object on the disk using the filesystem API and then read it back as binary data, compute the MD5 and then save that retrieved file as WARC file (not directly the blob object but this "refreshed" version of the file).
In this case the computed MD5 is fine ( I calculate it on the "refreshed" version of the warc file) but when I launch the WARC replay instance with the "refreshed" warc archive, it throws me errors - while with the original file I don't have any problem (but the MD5 is not correct).
var fd = new FormData();
// To compute the md5 hash and to have it correct on the server side, we need to write the file to the system, read it back and then calculate the md5 value.
// We need to send this version of the warc file to the server as well.
window.requestFileSystem = window.requestFileSystem || window.webkitRequestFileSystem;
function computeWARC_MD5(callback,formData) {
window.requestFileSystem(window.TEMPORARY, b.size, onInitFs);
function onInitFs(fs) {
fs.root.getFile('warc.warc', {create: true}, function(fileEntry) {
fileEntry.createWriter(function(fileWriter) {
fileWriter.onwriteend = function(e) {
readAndMD5();
};
fileWriter.onerror = function(e) {
console.error('Write failed: ' + e.toString());
};
fileWriter.write(b);
});
});
function readAndMD5() {
fs.root.getFile('warc.warc', {}, function(fileEntry) {
fileEntry.file( function(file) {
var reader = new FileReader();
reader.onloadend = function(e) {
var warcMD5 = faultylabs.MD5( this.result );
console.log(warcMD5);
var g = new Blob([this.result],{type: "text/plain;charset=utf-8"});
saveAs(g, o_request.file);
formData.append('warc_file', g)
formData.append('warc_checksum_md5', warcMD5.toLowerCase());
callback(formData);
};
reader.readAsBinaryString(file);
});
});
}
}
}
function uploadData(formData) {
// upload
$.ajax({
type: 'POST',
url: server_URL_upload,
data: fd,
processData: false,
contentType: false,
// [SPECS] fire a progress event named progress at the XMLHttpRequestUpload object about every 50ms or for every byte transmitted, whichever is least frequent
xhrFields: {
onprogress: function (e) {
if (e.lengthComputable) {
console.log(e.loaded / e.total * 100 + '%');
}
}
}
}).done(function(data) {
console.log('done uploading!');
//displayMessage(port_to_page, 'Upload finished!', 'normal')
//port_to_page.postMessage( { method:"doneUpload" } );
});
}
computeWARC_MD5(uploadData, fd);
saveAs(b, 'warc.warc');
Could anybody explain me why there is this discrepancy? What am I missing in treating all the objects I am dealing with as binary data (store, read)?
Basically I tried another route and converted the blob file back to arraybuffer and computed the MD5 on that. At that point, the file's MD5 and the arraybuffer's are the same.
var b = new Blob(arrayBuffers, {type: "text/plain;charset=utf-8"});
var blobHtml = new Blob( [str2ab(o_request.main_page_html)], {type: "text/plain;charset=utf-8"} );
f = new FileReader();
f.readAsArrayBuffer(b);
f.onloadend = function(a){
var warcMD5 = faultylabs.MD5(this.result);
var fd = new FormData();
fd.append('warc_file', b)
fd.append('warc_checksum_md5', warcMD5.toLowerCase());
uploadData(fd);
}
I guess the result from a binary string and from a buffer array is different, that's why also the MD5 is inconsistent.

Chrome extension: how to pass ArrayBuffer or Blob from content script to the background without losing its type?

I have this content script that downloads some binary data using XHR, which is sent later to the background script:
var self = this;
var xhr = new XMLHttpRequest();
xhr.open('GET', url);
xhr.responseType = 'arraybuffer';
xhr.onload = function(e) {
if (this.status == 200) {
self.data = {
data: xhr.response,
contentType: xhr.getResponseHeader('Content-Type')
};
}
};
xhr.send();
... later ...
sendResponse({data: self.data});
After receiving this data in background script, I'd like to form another XHR request that uploads this binary data to my server, so I do:
var formData = new FormData();
var bb = new WebKitBlobBuilder();
bb.append(data.data);
formData.append("data", bb.getBlob(data.contentType));
var req = new XMLHttpRequest();
req.open("POST", serverUrl);
req.send(formData);
The problem is that the file uploaded to the server contains just this string: "[object Object]". I guess this happens because ArrayBuffer type is lost somehow while transferring it from content process to the background? How can I solve that?
Messages passed between a Content Script and a background page are JSON-serialized.
If you want to transfer an ArrayBuffer object through a JSON-serialized channel, wrap the buffer in a view, before and after transferring.
I show an isolated example, so that the solution is generally applicable, and not just in your case. The example shows how to pass around ArrayBuffers and typed arrays, but the method can also be applied to File and Blob objects, by using the FileReader API.
// In your case: self.data = { data: new Uint8Array(xhr.response), ...
// Generic example:
var example = new ArrayBuffer(10);
var data = {
// Create a view
data: Array.apply(null, new Uint8Array(example)),
contentType: 'x-an-example'
};
// Transport over a JSON-serialized channel. In your case: sendResponse
var transportData = JSON.stringify(data);
//"{"data":[0,0,0,0,0,0,0,0,0,0],"contentType":"x-an-example"}"
// At the receivers end. In your case: chrome.extension.onRequest
var receivedData = JSON.parse(transportData);
// data.data is an Object, NOT an ArrayBuffer or Uint8Array
receivedData.data = new Uint8Array(receivedData.data).buffer;
// Now, receivedData is the expected ArrayBuffer object
This solution has been tested successfully in Chrome 18 and Firefox.
new Uint8Array(xhr.response) is used to create a view of the ArrayBuffer, so that the individual bytes can be read.
Array.apply(null, <Uint8Array>) is used to create a plain array, using the keys from the Uint8Array view. This step reduces the size of the serialized message. WARNING: This method only works for small amounts of data. When the size of the typed array exceeds 125836, a RangeError will be thrown. If you need to handle large pieces of data, use other methods to do the conversion between typed arrays and plain arrays.
At the receivers end, the original buffer can be obtained by creating a new Uint8Array, and reading the buffer attribute.
Implementation in your Google Chrome extension:
// Part of the Content script
self.data = {
data: Array.apply(null, new Uint8Array(xhr.response)),
contentType: xhr.getResponseHeader('Content-Type')
};
...
sendResponse({data: self.data});
// Part of the background page
chrome.runtime.onMessage.addListener(function(data, sender, callback) {
...
data.data = new Uint8Array(data.data).buffer;
Documentation
MDN: Typed Arrays
MDN: ArrayBuffer
MDN: Uint8Array
MDN: <Function> .apply
Google Chrome Extension docs: Messaging > Simple one-time requests
"This lets you send a one-time JSON-serializable message from a content script to extension, or vice versa, respectively"
SO bonus: Upload a File in a Google Chrome Extension - Using a Web worker to request, validate, process and submit binary data.
For Chromium Extensions manifest v3, URL.createObjectURL() approach doesn't work anymore because it is prohibited in the service workers.
The best (easiest) way to pass data from a service worker to a content script (and vice-versa), is to convert the blob into a base64 representation.
const fetchBlob = async url => {
const response = await fetch(url);
const blob = await response.blob();
const base64 = await convertBlobToBase64(blob);
return base64;
};
const convertBlobToBase64 = blob => new Promise(resolve => {
const reader = new FileReader();
reader.readAsDataURL(blob);
reader.onloadend = () => {
const base64data = reader.result;
resolve(base64data);
};
});
Then send the base64 to the content script.
Service worker:
chrome.tabs.sendMessage(sender.tab.id, { type: "LOADED_FILE", base64: base64 });
Content script:
chrome.runtime.onMessage.addListener(async (request, sender) => {
if (request.type == "LOADED_FILE" && sender.id == '<your_extension_id>') {
// do anything you want with the data from the service worker.
// e.g. convert it back to a blob
const response = await fetch(request.base64);
const blob = await response.blob();
}
});

Categories

Resources