I am downloading a .csv.gz file from a remote server, and I have the contents of this file stored as a string. Here is a small sample of what I see when I console.log it:
�}�v������)��t�Y�j�8p0�eCR��
l��1�=���6������~̵r�����0c7�7L���������U:���0�����g��
How can I unzip this in Node.js so that it converts it to the original .csv file?
I have tried zlib.gunzip(Buffer.new(body), callback), but then I get an error
incorrect header check at Gunzip.zlibOnError (zlib.js:152:15)
The file itself is valid, and I can double-click to unzip and open it on my computer.
I create the file using: zlib.createGzip(); and then gzip.pipe(writeStream);
Update
The (actual) issue was my data was utf8 encoded so I needed to ensure that it remained either as a Buffer or binary.
The problem is that fs.createWriteStream defaults to utf-8 encoding, you should change that to binary, then you'll be able to create a valid buffer that gunzip will happily accept.
You could probably accomplish this by changing your code to:
gzip.pipe(data => writeStream(data, { encoding: 'binary'})
see https://nodejs.org/api/fs.html#fs_fs_createwritestream_path_options
UPDATE:
I have modified the code so that now you have a ArrayBuffer which gets actually decompressed.
function decompressFile(filename) {
var decompress = zlib.createUnzip(),
input = fs.createReadStream(filename);
var data = [];
input.on('data', function(chunk){
data.push(chunk);
}).on('end', function(){
var buf = Buffer.concat(data);
zlib.gunzip(buf, function(err, buffer) {
if (!err) {
console.log(buffer.toString()+'\n');
}else{
console.log(err);
}
});
});
}
decompressFile('TestFileSheet1.csv.gz');
This looks straight forward. But I think the problem might be somewhere else in your code Or the http library that you are using. Check whether the response header's content encoding is gzip and then call the zlib.gunzip. I think your http library might already be decompressing the csv file.
Related
I have a lot of devices sending messages to a TCP Server written in node. The main task of the TCP server is to route some of that messages to redis in order to be processed by another app.
I've written a simple server that does the job quite well. The structure of the code is basically this (not the actual code, details hidden):
const net = require("net");
net.createServer(socket => {
socket.on("data", buffer => {
const data = buffer.toString();
if (shouldRouteMessage(data)) {
redis.publish(data);
}
});
});
Most of the messages are like: {"text":"message body"}, or {"lng":32.45,"lat":12.32}. But sometimes I need to process a message like {"audio":"...encoded audio..."} that spans several "data" events.
What I need in this case is to save the encoded audio into a file and send to redis {"audio":"path/to/audio-file.mp3"} where the route is the file with the audio data received.
One simple option is to store the buffers until I detect the end of the message and then save all them to a file, but that means, among other things, that I must keep the file on memory before saving to disk.
I hope there are better options using streams and pipes. ¿Any suggestions? (some code examples, would be nice)
Thanks
I finally solved, so I post the solution here for documentation purposes (and with some luck, to help others).
The solution was, indeed, quite simple: just open a write stream to a file and write the data packets as they are received. Something like this:
const net = require("net");
const fs = require("fs");
net.createServer(socket => {
socket.on("data", buffer => {
let file = null;
let filePath = null;
const data = buffer.toString();
if (shouldRouteMessage(data)) {
// just publish the message
redis.publish(data);
} else if (isAudioStart(data)) {
// create a write stream to a file and write the first data packet
filePath = buildFilePath(data);
file = fs.createWriteStream(filePath);
file.write(data);
} else if (isLastFragment(data)) {
// if is the last fragment, write it, close the file and publish the result
file.write(data);
file.close();
redis.publish(filePath);
file = filePath = null;
} else if (isDataFragment(data)) {
// just write (stream) it to file
file.write(data);
}
});
});
Note: shouldRouteMessage, buildFilePath, isDataFragment, and isLastFragment are custom functions that depends on the kind of data.
In this way, the incoming data is streamed to the file directly and no need to save the contents in memory before. node's streams rocks!
As always the devil is in the details. Some checks are necesary to, for example, ensure there's always a file when you want to write it. Remember also to set the proper encoding when converting to string (for example: buffer.toString('binary'); did the trick for me). Depending on your data format, the shouldRouteMessage, isAudioStart... and all this custom functions can be more or less complex.
Hope it helps.
We transform HTML to PDF in the backend (PHP) using dompdf. The generated output from dompdf is Base64 encoded with
$output = $dompdf->output();
base64_encode($output);
This Base64 encoded content is saved as a file on the server. When we decode this file content like this:
cat /tmp/55acbaa9600f4 | base64 -D > test.pdf
we get a proper PDF file.
But when we transfer the Base64 content to the client as a string value inside a JSON object (the server provides a RESTful API...):
{
"file_data": "...the base64 string..."
}
And decode it with atob() and then create a Blob object to download the file later on, the PDF is always "empty"/broken.
$scope.downloadFileData = function(doc) {
DocumentService.getFileData(doc).then(function(data) {
var decodedFileData = atob(data.file_data);
var file = new Blob([decodedFileData], { type: doc.file_type });
saveAs(file, doc.title + '.' + doc.extension);
});
};
When we log the decoded content, it seems that the content is "broken", because several symbols are not the same as when we decode the content on the server using base64 -D.
When we encode/decode the content of simple text/plain documents, it's working as expected. But all binary (or not ASCII formats) are not working.
We have searched the web for many hours, but didn't find a solution for this that works for us. Does anyone have the same problem and can provide us with a working solution? Thanks in advance!
This is a example for a on the server Base64 encoded content of a PDF document:
JVBERi0xLjMKMSAwIG9iago8PCAvVHlwZSAvQ2F0YWxvZwovT3V0bGluZXMgMiAwIFIKL1BhZ2VzIDMgMCBSID4+CmVuZG9iagoyIDAgb2JqCjw8IC9UeXBlIC9PdXRsaW5lcyAvQ291bnQgMCA+PgplbmRvYmoKMyAwIG9iago8PCAvVHlwZSAvUGFnZXMKL0tpZHMgWzYgMCBSCl0KL0NvdW50IDEKL1Jlc291cmNlcyA8PAovUHJvY1NldCA0IDAgUgovRm9udCA8PCAKL0YxIDggMCBSCj4+Cj4+Ci9NZWRpYUJveCBbMC4wMDAgMC4wMDAgNjEyLjAwMCA3OTIuMDAwXQogPj4KZW5kb2JqCjQgMCBvYmoKWy9QREYgL1RleHQgXQplbmRvYmoKNSAwIG9iago8PAovQ3JlYXRvciAoRE9NUERGKQovQ3JlYXRpb25EYXRlIChEOjIwMTUwNzIwMTMzMzIzKzAyJzAwJykKL01vZERhdGUgKEQ6MjAxNTA3MjAxMzMzMjMrMDInMDAnKQo+PgplbmRvYmoKNiAwIG9iago8PCAvVHlwZSAvUGFnZQovUGFyZW50IDMgMCBSCi9Db250ZW50cyA3IDAgUgo+PgplbmRvYmoKNyAwIG9iago8PCAvRmlsdGVyIC9GbGF0ZURlY29kZQovTGVuZ3RoIDY2ID4+CnN0cmVhbQp4nOMy0DMwMFBAJovSuZxCFIxN9AwMzRTMDS31DCxNFUJSFPTdDBWMgKIKIWkKCtEaIanFJZqxCiFeCq4hAO4PD0MKZW5kc3RyZWFtCmVuZG9iago4IDAgb2JqCjw8IC9UeXBlIC9Gb250Ci9TdWJ0eXBlIC9UeXBlMQovTmFtZSAvRjEKL0Jhc2VGb250IC9UaW1lcy1Cb2xkCi9FbmNvZGluZyAvV2luQW5zaUVuY29kaW5nCj4+CmVuZG9iagp4cmVmCjAgOQowMDAwMDAwMDAwIDY1NTM1IGYgCjAwMDAwMDAwMDggMDAwMDAgbiAKMDAwMDAwMDA3MyAwMDAwMCBuIAowMDAwMDAwMTE5IDAwMDAwIG4gCjAwMDAwMDAyNzMgMDAwMDAgbiAKMDAwMDAwMDMwMiAwMDAwMCBuIAowMDAwMDAwNDE2IDAwMDAwIG4gCjAwMDAwMDA0NzkgMDAwMDAgbiAKMDAwMDAwMDYxNiAwMDAwMCBuIAp0cmFpbGVyCjw8Ci9TaXplIDkKL1Jvb3QgMSAwIFIKL0luZm8gNSAwIFIKPj4Kc3RhcnR4cmVmCjcyNQolJUVPRgo=
If you atob() this, you don't get the same result as on the console with base64 -D. Why?
Your issue looks identical to the one I needed to solve recently.
Here is what worked for me:
const binaryImg = atob(base64String);
const length = binaryImg.length;
const arrayBuffer = new ArrayBuffer(length);
const uintArray = new Uint8Array(arrayBuffer);
for (let i = 0; i < length; i++) {
uintArray[i] = binaryImg.charCodeAt(i);
}
const fileBlob = new Blob([uintArray], { type: 'application/pdf' });
saveAs(fileBlob, 'filename.pdf');
It seems that only doing a base64 decode is not enough...you need to put the result into a Uint8Array. Otherwise, the pdf pages appear blank.
I found this solution here:
https://github.com/sayanee/angularjs-pdf/issues/110#issuecomment-579988190
You can use btoa() and atob() work in some browsers:
For Exa.
var enc = btoa("this is some text");
alert(enc);
alert(atob(enc));
To JSON and base64 are completely independent.
Here's a JSON stringifier/parser (and direct GitHub link).
Here's a base64 Q&A. Here's another one.
i'm not 100% sure but from what i read when i send a blob (binary data) over websocket, the blob does not contain any file information. (Also the official specification states that wesockets only send the raw binary)
the filesize
the mimetype
user info (explain later)
i'm using https://github.com/websockets/ws
Testing:
Sending directly the blob from an input file.
ws.send(this.files[0]) //this should already contain the info
Creating a new blob with the native javascript api from file setting the proper mimetype.
ws.send(new Blob([this.files[0]],{type:this.files[0].type})); //also this
on both sides you can get only the effective blob without any other information.
Is it possible to append let's say a 4kb predefined json data converted also to binary that contains important information like the mimetype and the filesize,
and then just split off the 4kb when needed?
{"mime":"txt/plain","size":345}____________4KB_REST_OF_THE_BINARY
OR
ws.send({"mime":"txt\/plain","size":345})
ws.send(this.files[0])
Even if the first one is the worst solution ever it would allow me to send everything in one time.
The second one has a big problem:
it's a chat that allows to send also files like documents,images,music videos.
i could write some sort of handshaking system when sending the file/user info before i send the binary data.
BUT
if another person sends also a file, as it's async, the handshaking system has no chance to determine wich file is the right one for the correct user and mimetype.
So how do you properly send a binary file in a multiuser async envoirement?
i know i can convert to base64 but thats 30% bigger.
btw. Totally disappointed with Apple... while chrome shows every binary data properly, my ios devices are not able to handle blob's, only images will show in blob or base64 format, not even a simple txt file. Basically only a <img> tag can read dynamic files.
How everything works (now):
user sends a file
nodejs gets the binary data, also user info... but not mimetype,filename,size.
nodejs broadcasts the raw binary file to all the users.(can't specify user & file info)
clients create a bloburl (who send that? XD).
EDIT
what i have now:
client 1 (sends a file)CHROME
fileInput.addEventListener('change',function(e){
var file=this.files[0];
ws.send(new Blob([file],{
type:file.type //<- SET MIMETYPE
}));
//file.size
},false);
note: file is already a blob ... but this is how you would normally create a new blob specifying the mimetype.
server (broadcasts the binary data to the other clients)NODEJS
aaaaaand the mimetype is gone...
ws.addListener('message',function(binary){
var b=0,c=wss.clients.length;
while(b<c){
wss.clients[b++].send(binary)
}
});
client 2 (recieves the binary)CHROME
ws.addEventListener('message',function(msg){
var blob=new Blob([msg.data],{
type:'application/octet-stream' //<- LOST
});
var file=window.URL.createObjectURL(blob);
},false);
note: m.data is already a blob ... but this is how you would normally create a new blob specifying the mimetype witch is lost.
In client 2 i need the mimetype and naturally i also need the info about the user, wich can be retrieved from client 1 or the server (not a good choice)...
You're a bit out of luck with this because Node doesn't support the Blob interface and so any data you send or receive in Binary with Node is just Binary. You would have to have something that knew how to interpret a Blob object.
Here's an idea, and let me know if this works. Reading through the documentation for websockets\ws it says it supports sending and receiving ArrayBuffers. Which means you can use TypedArrays.
Here's where it gets nasty. You set a certain fixed n number of bytes at the beginning of every TypedArray to signal the mime type encoded in utf8 or what have you, and the rest of your TypedArray contains your file's bytes.
I would recommend using UInt8Array because utf8 characters are 8 bits long and your text will probably be readable when encoded that way. As for the file bits you'll probably just end up writing those down somewhere and appending an ending to it.
Also note, this method of interpretation works both ways whether from Node or in the Browser.
This solution is really just a form of type casting and you might get some unexpected results. The fixed length of your mime type field is crucial.
Here it is illustrated. Copy, paste, set the image file to whatever you want and then run that. You'll see the mime type I set pop out.
var fs = require('fs');
//https://stackoverflow.com/questions/8609289/convert-a-binary-nodejs-buffer-to-javascript-arraybuffer
function toUint8Array(buffer) {
var ab = new ArrayBuffer(buffer.length);
var array = new Uint8Array(ab);
for (var i = 0; i < buffer.length; ++i) {
array[i] = buffer[i];
}
return array;
}
//data is a raw Buffer object
fs.readFile('./ducklings.png', function (err, data) {
var mime = new Buffer('image/png');
var allBuffed = Buffer.concat([mime, data]);
var array = toUint8Array(allBuffed);
var mimeBytes = array.subarray(0,9); //number of characters in mime Buffer
console.log(String.fromCharCode.apply(null, mimeBytes));
});
Here's how you do it on the client side:
SOLUTION A: GET A PACKAGE
Get buffer, an implementation of Node's Buffer API for browsers. The solution to concatenate Byte buffers will work exactly as before. You can append fields like To: and what not as well. The way you format your headers in order to best serve your clients will be an evolving process I'm sure.
SOLUTION B: OLD SCHOOL
STEP 1: Convert your Blob to an ArrayBuffer
Notes: How to convert a String to an ArrayBuffer
var fr = new FileReader();
fr.addEventListener('loadend', function () {
//Asynchronous action in part 2.
var message = concatenateBuffers(headerStringAsBuffer, fr.result);
ws.send(message);
});
fr.readAsArrayBuffer(blob);
STEP 2: Concatenate ArrayBuffers
function concatenateBuffers(buffA, buffB) {
var byteLength = buffA.byteLength + buffB.byteLength;
var resultBuffer = new ArrayBuffer(byteLength);
//wrap ArrayBuffer in a typedArray/view
var resultView = new Uint8Array(resultBuffer);
var viewA = new Uint8Array(resultBuffer);
var viewB = new Uint8Array(resultBuffer);
//Copy 8 bit integers AKA Bytes
resultView.set(viewA);
resultView.set(viewB, viewA.byteLength);
return resultView.buffer
}
STEP 3: Receive and Reblob
I'm not going to repeat how to convert the concatenated String bytes back into a string because I've done it in the server example, but for turning the file bytes into a blob of your mime type is fairly simple.
new Blob(buffer.slice(offset, buffer.byteLength), {type: mimetype});
This Gist by robnyman goes into further details on how you would use an image transmitted via XHR, put it into localstorage, and use it in an image tag on your page.
I liked #Breedly's idea of prepending a fixed length byte array to indicate mime type of the ArrayBuffer so I created this npm package that I use when dealing with websockets but maybe others' might find it useful.
Example usage
const {
arrayBufferWithMime,
arrayBufferMimeDecouple
} = require('arraybuffer-mime')
// some image array buffer
const uint8 = new Uint8Array(1)
uint8[0] = 1
const ab = uint8.buffer
const mime = 'image/png'
const abWithMime = arrayBufferWithMime(ab, mime)
const {mime, arrayBuffer} = arrayBufferMimeDecouple(abWithMime)
console.log(mime) // "image/png"
console.log(arrayBuffer) // ArrayBuffer
I had a problem when try to saving a binary file by IE, ADODB.Stream on client.
Assume the the client's browser has full permission to write file to hard disk.
Here is my client code witten by javascript.:
var base64_encoded_string = "base64 string of a binary file";
var data = window.atob(base64_encoded_string);
var stream = new ActiveXObject("ADODB.Stream");
stream.Type = 1; // text
stream.Open();
stream.Write(data); //Problem in here
stream.SaveToFile("D:\\Whatever.pfx", 2);
stream.Close();
As I marked problem come from writing binary data. I always got error:
"Arguments are of the wrong type, are out of acceptable range, or are in conflict with one another"
Whatever I format or change data variable to array of bytes, blob ...
Please help me how to input data to write binary file in this circumstance.
You need to use WriteText
stream.WriteText(data);
I am encoding a MP3 file to Base64 in Node Js using this method :
encodebase64 = function(mp3file){
var bitmap = fs.readFileSync(mp3file);
var encodedstring = new Buffer(bitmap).toString('base64');
fs.writeFileSync('encodedfile.bin', encodedstring);}
and then again I want to construct the MP3 file from the Base64 bin file, but the file created is missing some headers , so obviously there's a problem with the decoding.
the decoding function is :
decodebase64 = function(encodedfile){
var bitmap = fs.readFileSync(encodedfile);
var decodedString = new Buffer(bitmap, 'base64');
fs.writeFileSync('decodedfile.mp3', decodedString);}
I wondered if anyone can help
Thanks.
Perhaps it is an issue with the encoding parameter. See this answer for details. Try using utf8 when decoding to see if that makes a difference. What platforms are you running your code on?
#Noah mentioned an answer about base64 decoding using Buffers, but if you use the same code from the answer, and you try to create MP3 files with it, then they won't play and their file size will be larger than original ones just like you experienced in the beginning.
We should write the buffer directly to the mp3 file we want to create without converting it(the buffer) to an ASCII string:
// const buff = Buffer.from(audioContent, 'base64').toString('ascii'); // don't
const buff = Buffer.from(audioContent, 'base64');
fs.writeFileSync('test2.mp3', buff);
More info about fs.writeFile / fs.writeFileAsync