Why is nodeJs not reading entire binary file from disk? - javascript

I have a PDF file which I want to read into memory using NodeJS. Ideally I'd like to encode it using base64 for transferring it. But somehow the read function does not seem to read the full PDF file, which makes no sense to me. The original PDF was generated using pdfKit, and is ok and viewable using a PDF reader program.
The original file test.pdf has 90kB on disk. But if I read and write it back to disk there are just 82kB and the new PDF test-out.pdf is not ok. The pdf viewer says:
Unable to open document. The pdf document is damaged.
The base64 encoding therefore also does not work correctly. I tested it using this webservice. Does someone know why and what is happening here? And how to resolve it.
I found this post already.
fs = require('fs');
let buf = fs.readFileSync('test.pdf'); // returns raw buffer binary data
// buf = fs.readFileSync('test.pdf', {encoding:'base64'}); // for the base64 encoded data
// ...transfer the base64 data...
fs.writeFileSync('test-out.pdf', buf); // should be pdf again
EDIT MCVE:
const fs = require('fs');
const PDFDocument = require('pdfkit');
let filepath = 'output.pdf';
class PDF {
constructor() {
this.doc = new PDFDocument();
this.setupdocument();
this.doc.pipe(fs.createWriteStream(filepath));
}
setupdocument() {
var pageNumber = 1;
this.doc.on('pageAdded', () => {
this.doc.text(++pageNumber, 0.5 * (this.doc.page.width - 100), 40, {width: 100, align: 'center'});
}
);
this.doc.moveDown();
// draw some headline text
this.doc.fontSize(25).text('Some Headline');
this.doc.fontSize(15).text('Generated: ' + new Date().toUTCString());
this.doc.moveDown();
this.doc.font('Times-Roman', 11);
}
report(object) {
this.doc.moveDown();
this.doc
.text(object.location+' '+object.table+' '+Date.now())
.font('Times-Roman', 11)
.moveDown()
.text(object.name)
.font('Times-Roman', 11);
this.doc.end();
let report = fs.readFileSync(filepath);
return report;
}
}
let pdf = new PDF();
let buf = pdf.report({location: 'athome', table:'wood', name:'Bob'});
fs.writeFileSync('outfile1.pdf', buf);

The encoding option for fs.readFileSync() is for you to tell the readFile function what encoding the file already is so the code reading the file knows how to interpret the data it reads. It does not convert it into that encoding.
In this case, your PDF is binary - it's not base64 so you are telling it to try to convert it from base64 into binary which causes it to mess up the data.
You should not be passing the encoding option at all and you will then get the RAW binary buffer (which is what a PDF file is - raw binary). If you then want to convert that to base64 for some reason, you can then do buf.toString('base64') on it. But, that is not its native format and if you write that converted data back out to disk, it won't be a legal PDF file.
To just read and write the same file out to a different filename, leave off the encoding option entirely:
const fs = require('fs');
let buf = fs.readFileSync('test.pdf'); // get raw buffer binary data
fs.writeFileSync('test-out.pdf', buf); // write out raw buffer binary data

After a lot of searching I found this Github issue. The problem in my question seems to be the call of doc.end() which for some reason doesn't wait for the stream to finish (finish event of write stream). Therefore as suggested in the Github issue, the following approaches work:
callback based:
doc = new PDFDocument();
writeStream = fs.createWriteStream('filename.pdf');
doc.pipe(writeStream);
doc.end()
writeStream.on('finish', function () {
// do stuff with the PDF file
});
or promise based:
const stream = fs.createWriteStream(localFilePath);
doc.pipe(stream);
.....
doc.end();
await new Promise<void>(resolve => {
stream.on("finish", function() {
resolve();
});
});
or even nicer, instead of calling doc.end() direcly, call the function savePdfToFile below:
function savePdfToFile(pdf : PDFKit.PDFDocument, fileName : string) : Promise<void> {
return new Promise<void>((resolve, reject) => {
// To determine when the PDF has finished being written sucessfully
// we need to confirm the following 2 conditions:
//
// 1. The write stream has been closed
// 2. PDFDocument.end() was called syncronously without an error being thrown
let pendingStepCount = 2;
const stepFinished = () => {
if (--pendingStepCount == 0) {
resolve();
}
};
const writeStream = fs.createWriteStream(fileName);
writeStream.on('close', stepFinished);
pdf.pipe(writeStream);
pdf.end();
stepFinished();
});
}
This function should correctly handle the following situations:
PDF generated successfully
Error is thrown inside pdf.end() before write stream is closed
Error is thrown inside pdf.end() after write stream has been closed

Related

How to read remote image to a base64 data url

actually there are many answers for this question. But my problem is,
i want to generate pdf dynamically with 5 external(URL) images. Im using PDFmake node module.
it supports only two ways local and base64 format. But i don't want to store images locally.
so my requirement is one function which takes url as parameter and returns base64.
so that i can store in global variable and create pdfs
thanks in advance
function urlToBase(URL){
return base64;
}
var img = urlToBase('https://unsplash.com/photos/MVx3Y17umaE');
var dd = {
content: [
{
text: 'fjfajhal'
},
{
image: img,
}
]
};
var writeStream = fs.createWriteStream('myPdf.pdf');
var pdfDoc = printer.createPdfKitDocument(dd);
pdfDoc.pipe(writeStream);
pdfDoc.end();
im using PDFmake module from npm
The contents of the remote image can first be fetched with an HTTP request, for example using the ubiquitous request npm module. The image string contents can then be transformed to a buffer and finally converted to a base64 string. To complete the transformation, add the proper data-url prefix, for example, data:image/png,base64, to the beginning of the base64 string.
Here is a rough example for a PNG image:
const request = require('request-promise-native');
let jpgDataUrlPrefix = 'data:image/png;base64,';
let imageUrl = 'https://www.google.com/images/branding/googlelogo/1x/googlelogo_color_272x92dp.png';
request({
url: imageUrl,
method: 'GET',
encoding: null // This is actually important, or the image string will be encoded to the default encoding
})
.then(result => {
let imageBuffer = Buffer.from(result);
let imageBase64 = imageBuffer.toString('base64');
let imageDataUrl = jpgDataUrlPrefix+imageBase64;
console.log(imageDataUrl);
});

Upload a file stream to S3 without a file and from memory

I'm trying to create a csv from a string and upload it to my S3 bucket. I don't want to write a file. I want it all to be in memory.
I don't want to read from a file to get my stream. I would like to make a stream with out a file. I would like this method createReadStream, but instead of a file, I would like to pass a string with my stream's contents.
var AWS = require('aws-sdk'),
zlib = require('zlib'),
fs = require('fs');
s3Stream = require('s3-upload-stream')(new AWS.S3()),
// Set the client to be used for the upload.
AWS.config.loadFromPath('./config.json');
// Create the streams
var read = fs.createReadStream('/path/to/a/file');
var upload = s3Stream.upload({
"Bucket": "bucket-name",
"Key": "key-name"
});
// Handle errors.
upload.on('error', function (error) {
console.log(error);
});
upload.on('part', function (details) {
console.log(details);
});
upload.on('uploaded', function (details) {
console.log(details);
});
read.pipe(upload);
You can create a ReadableStream and push your string directly to it which, can then be consumed by your s3Stream instance.
const Readable = require('stream').Readable
let data = 'this is your data'
let read = new Readable()
read.push(data) // Push your data string
read.push(null) // Signal that you're done writing
// Create upload s3Stream instance and attach listeners go here
read.pipe(upload)

Create image from ArrayBuffer in Nodejs

I'm trying to create an image file from chunks of ArrayBuffers.
all= fs.createWriteStream("out."+imgtype);
for(i=0; i<end; i++){
all.write(picarray[i]);
}
all.end();
where picarray contains ArrayBuffer chunks. However, I get the error TypeError: Invalid non-string/buffer chunk.
How can I convert ArrayBuffer chunks into an image?
Have you tried first converting it into a node.js. Buffer? (this is the native node.js Buffer interface, whereas ArrayBuffer is the interface for the browser and not completely supported for node.js write operations).
Something along the line of this should help:
all= fs.createWriteStream("out."+imgtype);
for(i=0; i<end; i++){
var buffer = new Buffer( new Uint8Array(picarray[i]) );
all.write(buffer);
}
all.end();
after spending some time i got this, it worked for me perfectly.
as mentioned by #Nick you will have to convert buffer array you recieved from browser in to nodejs Buffer.
var readWriteFile = function (req) {
var fs = require('fs');
var data = new Buffer(req);
fs.writeFile('fileName.png', data, 'binary', function (err) {
if (err) {
console.log("There was an error writing the image")
}
else {
console.log("The sheel file was written")
}
});
});
};
Array Buffer is browser supported which will be unsupportable for writing file, we need to convert to Buffer native api of NodeJs runtime engine.
This few lines of code will create image.
const fs = require('fs');
let data = arrayBuffer // you image stored on arrayBuffer variable;
data = Buffer.from(data);
fs.writeFile(`Assets/test.png`, data, err => { // Assets is a folder present in your root directory
if (err) {
console.log(err);
} else {
console.log('File created successfully!');
}
});

Convert Blob data to Raw buffer in javascript or node

I am using a plugin jsPDF which generates PDF and saves it to local file system. Now in jsPDF.js, there is some piece of code which generates pdf data in blob format as:-
var blob = new Blob([array], {type: "application/pdf"});
and further saves the blob data to local file system. Now instead of saving I need to print the PDF using plugin node-printer.
Here is some sample code to do so
var fs = require('fs'),
var dataToPrinter;
fs.readFile('/home/ubuntu/test.pdf', function(err, data){
dataToPrinter = data;
}
var printer = require("../lib");
printer.printDirect({
data: dataToPrinter,
printer:'Deskjet_3540',
type: 'PDF',
success: function(id) {
console.log('printed with id ' + id);
},
error: function(err) {
console.error('error on printing: ' + err);
}
})
The fs.readFile() reads the PDF file and generates data in raw buffer format.
Now what I want is to convert the 'Blob' data into 'raw buffer' so that I can print the PDF.
If you are not using NodeJS then you should know that the browser does not have a Buffer class implementation and you are probably compiling your code to browser-specific environment on something like browserify. In that case you need this library that converts your blob into a Buffer class that is supposed to be as perfectly equal to a NodeJS Buffer object as possible (the implementation is at feross/buffer).
If you are using node-fetch (not OP's case) then you probably got a blob from a response object:
const fetch = require("node-fetch");
const response = await fetch("http://www.stackoverflow.com/");
const blob = await response.blob();
This blob is an internal implementation and exists only inside node-fetch or fetch-blob libraries, to convert it to a native NodeJS Buffer object you need to transform it to an arrayBuffer first:
const arrayBuffer = await blob.arrayBuffer();
const buffer = Buffer.from(arrayBuffer);
This buffer object can then be used on things such as file writes and server responses.
For me, it worked with the following:
const buffer=Buffer.from(blob,'binary');
So, this buffer can be stored in Google Cloud Storage and local disk with fs node package.
I used blob file, to send data from client to server through ddp protocol (Meteor), so, when this file arrives to server I convert it to buffer in order to store it.
var blob = new Blob([array], {type: "application/pdf"});
var arrayBuffer, uint8Array;
var fileReader = new FileReader();
fileReader.onload = function() {
arrayBuffer = this.result;
uint8Array = new Uint8Array(arrayBuffer);
var printer = require("./js/controller/lib");
printer.printDirect({
data: uint8Array,
printer:'Deskjet_3540',
type: 'PDF',
success: function(id) {
console.log('printed with id ' + id);
},
error: function(err) {
console.error('error on printing: ' + err);
}
})
};
fileReader.readAsArrayBuffer(blob);
This is the final code which worked for me. The printer accepts uint8Array encoding format.
Try:
var blob = new Blob([array], {type: "application/pdf"});
var buffer = new Buffer(blob, "binary");

File API - Blob to JSON

I'm trying to do some experiment with HTML5, WebSocket and File API.
I'm using the Tomcat7 WebSocket implementation.
I'm able to send and received text messages from the servlet. What I want to do now is to send from the servlet to the client JSON objects, but I want to avoid text message in order to skip the JSON.parse (or similar) on the client, so I'm trying to send binary messages.
The servlet part is really simple:
String s = "{arr : [1,2]}";
CharBuffer cbuf = CharBuffer.wrap(s);
CharsetEncoder encoder = Charset.forName("UTF-8").newEncoder();
getWsOutbound().writeBinaryMessage(encoder.encode(cbuf));
getWsOutbound().flush();
After this message, on the client I see that I received a binary frame, that is converted to a Blob object (http://www.w3.org/TR/FileAPI/#dfn-Blob).
The question is: is it possible to get the JSON object from the Blob?
I took a look at the FileReader interface (http://www.w3.org/TR/FileAPI/#FileReader-interface), and I used code like this to inspect what the FileReader can do (the first line creates a brand new Blob, so you can test on the fly if you want):
var b = new Blob([{"test": "toast"}], {type : "application/json"});
var fr = new FileReader();
fr.onload = function(evt) {
var res = evt.target.result;
console.log("onload",arguments, res, typeof res);
};
fr.readAsArrayBuffer(b);
using all the "readAs..." methods that I saw on the File Reader implementation (I'm using Chrome 22). Anyway I didn't find something useful.
Did you have any suggestion? Thanks.
You should have tried readAsText() instead of readAsArrayBuffer() (JSON is text in the end).
You've also missed to stringify the object (convert to JSON text)
var b = new Blob([JSON.stringify({"test": "toast"})], {type : "application/json"}),
fr = new FileReader();
fr.onload = function() {
console.log(JSON.parse(this.result))
};
fr.readAsText(b);
To convert Blob/File that contains JSON data to a JavaScript object use it:
JSON.parse(await blob.text());
The example:
Select a JSON file, then you can use it in the browser's console (json object).
const input = document.createElement("input");
input.type = "file";
input.accept = "application/json";
document.body.prepend(input);
input.addEventListener("change", async event => {
const json = JSON.parse(await input.files[0].text());
console.log("json", json);
globalThis.json = json;
});
What you're doing is conceptually wrong. JSON is a string representation of an object, not an object itself. So, when you send a binary representation of JSON over the wire, you're sending a binary representation of the string. There's no way to get around parsing JSON on the client side to convert a JSON string to a JavaScript Object.
You absolutely should always send JSON as text to the client, and you should always call JSON.parse. Nothing else is going to be easy for you.
let reader = new FileReader()
reader.onload = e => {
if (e.target.readyState === 2) {
let res = {}
if (window.TextDecoder) {
const enc = new TextDecoder('utf-8')
res = JSON.parse(enc.decode(new Uint8Array(e.target.result))) //转化成json对象
} else {
res = JSON.parse(String.fromCharCode.apply(null, new Uint8Array(e.target.result)))
}
console.info('import-back:: ', res)
}
}
reader.readAsArrayBuffer(response)

Categories

Resources