I have this format and i wanted to convert this into a zipped file and unzipped that file using javascript. when i am converting this https://base64.guru/converter/decode/file?fbclid=IwAR3X1qwrnSLTw9cHT9iKl5HxiCRmKG5l0tForN3Odraz_4pYsYApoVprEJE it will give a zipped file and i have to unzip the file to excess the data
{
"awslogs": {
"data": "H4sIAAAAAAAAAFVSUU/bMBD+K5FfR8B2YsfuW0UZYhobouVpQZPjXKm1xKlstwgh/jsXF7RNeXB8d9/33X2+VzJCjOYJNi97IAuyWm6Wv2+v1uvl9RU5I9Ozh4BhQRvJZcNkoxmGh+npOkyHPWaijxeHWIKJqWQX/9ZdrFwAm+4O3eDibjPd7SYPPw5jh4yZYp0CmBE5OgBLueSlrVVd1n1vyg4aW+oOaslVzSvbIyQeumiD2yc3+a9uSBAiWfwiLX7kMTNeHcGnOfhKXI/ElaikYFJKraimnGu8VqKRjeBMKdnUNeNVVeFZM6qU0oIJyXmNYsmhMcmMOCMTSiiuNGdYdPZpGNK/tsRPyW2dNXNPLVlg5CN90+O1JbavVKe1KY3lUIotE2UnQZWwlaIxdS+l1S05a//qZRinnJa0LikvGF9Qtqj0OZVY+YalPQzuCOHlpLefbb00ITgIGXuPaeMtFN/cVNz47WSncSy+u9El6IuyWMGwc1lz9BYRSjT47/PD/Nzentq/MyFFTLKsF5PznyO25ItmjVZca1pJnon2wVm48Q/rFRbQc4pWVxiOY5zXKoM2wfho7ExihpO6ndVrKjLBdHQ9hHuI+8nHE+SjlWJnYoEr4gtjLeznIbqXIo+defpnGIYN2nebG5b8/9iDT25YwRE7XNo/WIEbUM82otvpELPS+uHyEnce7SVvj2/vK7R2rRQDAAA="
}
}
We can use node's zlib and buffer modules in order to parse as base64 and gunzip:
const zlib = require("zlib");
function unzip(str) {
return new Promise((res, rej) => {
// we read in the base64
const data = Buffer.from(str, "base64");
// we create a transform string to gunzip it
const gzip = zlib.createGunzip();
// initialize ret with an empty buffer
let ret = Buffer.alloc(0);
gzip.on("data", function(dat) {
// concatenate the data to ret
ret = Buffer.concat([ret, dat], res.length + dat.length);
});
gzip.on("error", function(err) {
rej(err);
});
gzip.on("end", function() {
// we're done, resolve the promise
res(ret);
});
// send the data to the gunzip stream
gzip.end(data);
});
}
const data = "H4sIAAAAAAAAAFVSUU/bMBD+K5FfR8B2YsfuW0UZYhobouVpQZPjXKm1xKlstwgh/jsXF7RNeXB8d9/33X2+VzJCjOYJNi97IAuyWm6Wv2+v1uvl9RU5I9Ozh4BhQRvJZcNkoxmGh+npOkyHPWaijxeHWIKJqWQX/9ZdrFwAm+4O3eDibjPd7SYPPw5jh4yZYp0CmBE5OgBLueSlrVVd1n1vyg4aW+oOaslVzSvbIyQeumiD2yc3+a9uSBAiWfwiLX7kMTNeHcGnOfhKXI/ElaikYFJKraimnGu8VqKRjeBMKdnUNeNVVeFZM6qU0oIJyXmNYsmhMcmMOCMTSiiuNGdYdPZpGNK/tsRPyW2dNXNPLVlg5CN90+O1JbavVKe1KY3lUIotE2UnQZWwlaIxdS+l1S05a//qZRinnJa0LikvGF9Qtqj0OZVY+YalPQzuCOHlpLefbb00ITgIGXuPaeMtFN/cVNz47WSncSy+u9El6IuyWMGwc1lz9BYRSjT47/PD/Nzentq/MyFFTLKsF5PznyO25ItmjVZca1pJnon2wVm48Q/rFRbQc4pWVxiOY5zXKoM2wfho7ExihpO6ndVrKjLBdHQ9hHuI+8nHE+SjlWJnYoEr4gtjLeznIbqXIo+defpnGIYN2nebG5b8/9iDT25YwRE7XNo/WIEbUM82otvpELPS+uHyEnce7SVvj2/vK7R2rRQDAAA=";
// this uses promises for asynchronous execution
unzip(data).then(v => console.log(v.toString("utf8")));
Related
I'm using Pinata's pinFileToIPFS() function that requires a ReadableStream to upload to Pinata's IPFS nodes. I have the bytearray of the file, for example of a PNG.
I want to convert this byteArray to readableStream and upload this on IPFS.
How can i convert that in typescript?
export async function putFileToIPFS(file:any): Promise<string>{
readableStream = ** CONVERT FILE TO READABLE **
let cid ;
try {
cid = await pinata.pinFileToIPFS(readableStream)
console.log(cid)
}
catch (error) { console.error(error);}
return cid['IpfsHash']
}
Thanks
export async function putFileToIPFS(file: ArrayBuffer) {
const readableStream = new ReadableBufferStream(file)
...
}
function ReadableBufferStream(ab: ArrayBuffer) {
return new ReadableStream({
start(controller) {
controller.enqueue(ab)
controller.close()
}
})
}
Alternatively, the "read size" could be controlled by setting a chunk size with multiple enqueues. This could potentially increase/decrease the number of HTTP requests sent by pinFileToIPFS(). subarray() maintains the memory footprint by reusing the underlying ArrayBuffer.
function ReadableBufferStream(ab: ArrayBuffer, chunkSize = 64*1024) { // 64 KiB
return new ReadableStream({
start(controller) {
const bytes = new Uint8Array(ab)
for (let readIndex = 0; readIndex < bytes.byteLength;) {
controller.enqueue(bytes.subarray(readIndex, readIndex += chunkSize))
}
controller.close()
}
})
}
I want to download a zip file from a url and parse its contents in node. I do not want to save the file on disk. The zip file is a directory of csv files that I want to process. How can I approach this? The only package that has an option for unzipping from a URL is unzipper but it does not work for me. Every other package lets you unzip a file on disk by providing the path to the file but not a url.
I am downloading the file like so:
const res = await this.get(test)
But what can I do now? There are packages like AdmZip that can extract zip files but need a path as a string to a file on disk. Is there a way I can pass/ stream my res object above to the below?
var AdmZip = require('adm-zip');
// reading archives
var zip = new AdmZip("./my_file.zip");
var zipEntries = zip.getEntries(); // an array of ZipEntry records
zipEntries.forEach(function(zipEntry) {
console.log(zipEntry.toString()); // outputs zip entries information
if (zipEntry.entryName == "my_file.txt") {
console.log(zipEntry.getData().toString('utf8'));
}
});
Here's a simple example of downloading a .zip file and unzipping using adm-zip. As #anonymouze points out, you can pass a buffer to the AdmZip constructor.
const axios = require("axios");
const AdmZip = require('adm-zip');
async function get(url) {
const options = {
method: 'GET',
url: url,
responseType: "arraybuffer"
};
const { data } = await axios(options);
return data;
}
async function getAndUnZip(url) {
const zipFileBuffer = await get(url);
const zip = new AdmZip(zipFileBuffer);
const entries = zip.getEntries();
for(let entry of entries) {
const buffer = entry.getData();
console.log("File: " + entry.entryName + ", length (bytes): " + buffer.length + ", contents: " + buffer.toString("utf-8"));
}
}
getAndUnZip('https://www.learningcontainer.com/wp-content/uploads/2020/05/sample-zip-file.zip');
In this case I'm simply using axios to download a zip file buffer, then parsing it with AdmZip.
Each entry's data can be accessed with entry.getData(), which will return a buffer.
In this case we'll see an output something like this:
File: sample.txt, length (bytes): 282, contents: I would love to try or hear the sample audio your app can produce. I do...
Here's another example, this time using node-fetch:
const fetch = require('node-fetch');
const AdmZip = require('adm-zip');
async function get(url) {
return fetch(url).then(res => res.buffer());
}
async function getAndUnZip(url) {
const zipFileBuffer = await get(url);
const zip = new AdmZip(zipFileBuffer);
const entries = zip.getEntries();
for(let entry of entries) {
const buffer = entry.getData();
console.log("File: " + entry.entryName + ", length (bytes): " + buffer.length + ", contents: " + buffer.toString("utf-8"));
}
}
getAndUnZip('https://www.learningcontainer.com/wp-content/uploads/2020/05/sample-zip-file.zip');
Problem Statement:
I am getting a text file where the byte array of a binary file is stored in comma-separated values in a single line.
for Eg: 82,19,176,106,0,0,0,4,247,35,186,20,87,143,18,120,44,76,100
The string is very long and everything is in a single line , i have no control on this because it depends on binary file size.
I have to read this byte array and convert it back to the original binary file.
Implemented Logic:
using Node.js and FS
var instream = fs.createReadStream('stream1.txt',{ highWaterMark: 1 * 1024 , encoding: 'utf8' });
instream.on("data", function(line) {
lineCount++;
var splitArray = line.split(',');
var uintArray = new Uint8Array(splitArray);
chunks.push(uintArray);
console.log(lineCount);
});
instream.on("end", function() {
var fullUint8Array = concatenate(chunks);
fs.writeFile("abc.prt", Buffer.from(fullUint8Array), function (err) {
if (err) {
console.log(err);
} else {
console.log("Done");
}
});
});
I am not able to get the original binary file. It is always getting corrupted.
If I am reading a file in single chunk and try the above solution it will work. But always this cannot be done because if try to convert a very big string array to uint8Array it gives memory error.
But when I read the string in chunks and do I am not able to get the binary file.
I am not able to get what I am doing wrong. Technology to be used Node.JS, javascript.
Updated The Question with samples
This is a sample stream. (stream1.txt)
This is the original binary file which is needed as output after reading stream1.txt.
Link to the files
Code for concatenate
//For joining uInt8Arrays
function concatenate(arrays) {
let totalLength = 0;
for (const arr of arrays) {
totalLength += arr.length;
}
const result = new Uint8Array(totalLength);
let offset = 0;
for (const arr of arrays) {
result.set(arr, offset);
offset += arr.length;
}
return result;
}
I am not able to get the original binary file. It is always getting
corrupted.
No, it's not corrupted. The string was splitted by comma, its unencoded values were put in Uint8Array and later on file is saved with that data.
This is more or less what's happening
let line = "82,19,176,106,0,0,0,4,247,35,186,20,87,143,18,120,44,76,100";
let result = line.split(',').map(pr => String.fromCharCode(Number(pr))).join('');
console.log(result);
// Solution 1
let encoded = line.split('').map(npr => npr.charCodeAt(0));
result = encoded.map(pr => String.fromCharCode(pr)).join('');
console.log(result);
// Solution 2
const encoder = new TextEncoder();
const decoder = new TextDecoder();
encoded = encoder.encode(line);
result = decoder.decode(encoded);
console.log(result);
If you apply code above it might look like this:
const fs = require('fs');
let lineCount = 0;
let chunks = [];
const encoder = new TextEncoder();
function concatenate(chunks) {
return chunks.reduce((acc, chunk) => {
return new Uint8Array([...acc, ...chunk]);
}, new Uint8Array([]));
}
var instream = fs.createReadStream('stream1.txt',{ highWaterMark: 1 * 1024 , encoding: 'utf8' });
instream.on("data", function(line) {
lineCount++;
var splitArray = line.split(',');
var uintArray = encoder.encode(line);
chunks.push(uintArray);
});
instream.on("end", function() {
var fullUint8Array = concatenate(chunks);
fs.writeFile("abc.prt", Buffer.from(fullUint8Array, 'utf-8'), function (err) {
if (err) {
console.log(err);
} else {
console.log("Done");
}
});
});
If I am reading a file in single chunk and try the above solution it
will work. But always this cannot be done because if try to convert a
very big string array to uint8Array it gives memory error.
You can reduce memory footprint by creating write stream and putting data immediately there.
Example
const fs = require('fs');
let lineCount = 0;
let chunks = [];
const encoder = new TextEncoder();
var outputStream = fs.createWriteStream("abc.prt");
var inputStream = fs.createReadStream('stream1.txt',{ highWaterMark: 1 * 1024 , encoding: 'utf8' });
outputStream.on("open", function() {
inputStream.on("data", function(line) {
lineCount++;
var splitArray = line.split(',');
var uintArray = encoder.encode(line);
outputStream.write(uintArray);
});
inputStream.on("end", function() {
outputStream.close();
})
})
If you are reading a file in chunks, you need to adjust your splitting logic to cope with that. Your code probably does produce the corrupt results because an input string like 82,19,176,106,0,0 could be read as 82,19,17+6,106,0,0 or 82,19+,176,106,+0,0.
Instead, you need to make sure that you always read whole byte values. If it is not followed by a comma or the eof, you cannot process it yet. I'd recommend to do this with a Transform stream (see also this article about the technique):
import { createReadStream, createWriteStream } from 'fs';
import { pipeline, Transform } from 'stream';
const parseCommaSeparatedBytes = new Transform({
transform(chunk, encoding, callback) {
const prefix = this.leftover || '';
const string = prefix + chunk.toString();
// TODO: validate inputs to be numeric and in the byte range
const splitArray = string.split(',');
if (splitArray.length)
this.leftover = splitArray.pop();
this.push(new Uint8Array(splitArray));
callback();
},
flush(callback) {
const last = this.leftover || '';
if (last.length)
this.push(new Uint8Array([last]));
callback();
},
});
const instream = createReadStream('stream1.txt', {
highWaterMark: 1024,
encoding: 'utf8'
});
const outstream = createWriteStream('abc.prt');
pipeline(instream, parseCommaSeparatedBytes, outstream, function (err) {
if (err) {
console.error(err);
} else {
console.log("Done");
}
});
I'm sending an image encoded as base64 through sockets and decoding is not working. The file that must contain the new image is written as base64 instead of a jpg file.
encoding socket:
function encode_base64(filename) {
fs.readFile(path.join(__dirname, filename), function (error, data) {
if (error) {
throw error;
} else {
console.log(data);
var dataBase64 = data.toString('base64');
console.log(dataBase64);
client.write(dataBase64);
}
});
}
rl.on('line', (data) => {
encode_base64('../image.jpg')
})
decoding socket:
function base64_decode(base64str, file) {
var bitmap = new Buffer(base64str, 'base64');
fs.writeFileSync(file, bitmap);
console.log('****** File created from base64 encoded string ******');
}
client.on('data', (data) => {
base64_decode(data,'copy.jpg')
});
// the first few characters in the new file
//k1NRWuGwBGJpmHDTI9VcgOcRgIT0ftMsldCjFJ43whvppjV48NGq3eeOIeeur
Change encode function like below. Also, keep in mind new Buffer() has been deprecated so use Buffer.from() method.
function encode_base64(filename) {
fs.readFile(path.join(__dirname, filename), function (error, data) {
if (error) {
throw error;
} else {
//console.log(data);
var dataBase64 = Buffer.from(data).toString('base64');
console.log(dataBase64);
client.write(dataBase64);
}
});
}
And decode as Below :
function base64_decode(base64Image, file) {
fs.writeFileSync(file,base64Image);
console.log('******** File created from base64 encoded string ********');
}
client.on('data', (data) => {
base64_decode(data,'copy.jpg')
});
You can decode the base64 image using following method .
EDITED
To strip off the header
let base64String = ''; // Not a real image
// Remove header
let base64Image = base64String.split(';base64,').pop();
To write to a file
import fs from 'fs';
fs.writeFile('image.png', base64Image, {encoding: 'base64'}, function(err) {
console.log('File created');
});
Note :- Don’t forget the {encoding: 'base64'} here and you will be good to go.
You can use a Buffer.from to decode the Base64, and write it to a file using fs.writeFileSync
const { writeFileSync } = require("fs")
const base64 = "iVBORw0KGgoA..."
const image = Buffer.from(base64, "base64")
writeFileSync("image.png", image)
If you have the Base64 string inside a file, you need to decode it into string first, like:
const { writeFileSync, readFileSync } = require("fs")
const base64 = readFileSync(path, "ascii")
const image = Buffer.from(base64, "base64")
writeFileSync("image.png", image)
It seems that the decoding function base64_decode gets the data as a buffer.
Thus, the encoding argument in new Buffer(base64str, 'base64') is ignored.
(Compare the docs of Buffer.from(buffer) vs Buffer.from(string[, encoding])).
I suggest to convert to a string first
function base64_decode(base64str, file) {
var bitmap = new Buffer(base64str.toString(), 'base64');
fs.writeFileSync(file, bitmap);
console.log('******** File created from base64 encoded string ********');
}
I want to download a zip file from the internet and unzip it in memory without saving to a temporary file. How can I do this?
Here is what I tried:
var url = 'http://bdn-ak.bloomberg.com/precanned/Comdty_Calendar_Spread_Option_20120428.txt.zip';
var request = require('request'), fs = require('fs'), zlib = require('zlib');
request.get(url, function(err, res, file) {
if(err) throw err;
zlib.unzip(file, function(err, txt) {
if(err) throw err;
console.log(txt.toString()); //outputs nothing
});
});
[EDIT]
As, suggested, I tried using the adm-zip library and I still cannot make this work:
var ZipEntry = require('adm-zip/zipEntry');
request.get(url, function(err, res, zipFile) {
if(err) throw err;
var zip = new ZipEntry();
zip.setCompressedData(new Buffer(zipFile.toString('utf-8')));
var text = zip.getData();
console.log(text.toString()); // fails
});
You need a library that can handle buffers. The latest version of adm-zip will do:
npm install adm-zip
My solution uses the http.get method, since it returns Buffer chunks.
Code:
var file_url = 'http://notepad-plus-plus.org/repository/7.x/7.6/npp.7.6.bin.x64.zip';
var AdmZip = require('adm-zip');
var http = require('http');
http.get(file_url, function(res) {
var data = [], dataLen = 0;
res.on('data', function(chunk) {
data.push(chunk);
dataLen += chunk.length;
}).on('end', function() {
var buf = Buffer.alloc(dataLen);
for (var i = 0, len = data.length, pos = 0; i < len; i++) {
data[i].copy(buf, pos);
pos += data[i].length;
}
var zip = new AdmZip(buf);
var zipEntries = zip.getEntries();
console.log(zipEntries.length)
for (var i = 0; i < zipEntries.length; i++) {
if (zipEntries[i].entryName.match(/readme/))
console.log(zip.readAsText(zipEntries[i]));
}
});
});
The idea is to create an array of buffers and concatenate them into a new one at the end. This is due to the fact that buffers cannot be resized.
Update
This is a simpler solution that uses the request module to obtain the response in a buffer, by setting encoding: null in the options. It also follows redirects and resolves http/https automatically.
var file_url = 'https://github.com/mihaifm/linq/releases/download/3.1.1/linq.js-3.1.1.zip';
var AdmZip = require('adm-zip');
var request = require('request');
request.get({url: file_url, encoding: null}, (err, res, body) => {
var zip = new AdmZip(body);
var zipEntries = zip.getEntries();
console.log(zipEntries.length);
zipEntries.forEach((entry) => {
if (entry.entryName.match(/readme/i))
console.log(zip.readAsText(entry));
});
});
The body of the response is a buffer that can be passed directly to AdmZip, simplifying the whole process.
Sadly you can't pipe the response stream into the unzip job as node zlib lib allows you to do, you have to cache and wait the end of the response. I suggest you to pipe the response to a fs stream in case of big files, otherwise you will full fill your memory in a blink!
I don't completely understand what you are trying to do, but imho this is the best approach. You should keep your data in memory only the time you really need it, and then stream to the csv parser.
If you want to keep all your data in memory you can replace the csv parser method fromPath with from that takes a buffer instead and in getData return directly unzipped
You can use the AMDZip (as #mihai said) instead of node-zip, just pay attention because AMDZip is not yet published in npm so you need:
$ npm install git://github.com/cthackers/adm-zip.git
N.B. Assumption: the zip file contains only one file
var request = require('request'),
fs = require('fs'),
csv = require('csv')
NodeZip = require('node-zip')
function getData(tmpFolder, url, callback) {
var tempZipFilePath = tmpFolder + new Date().getTime() + Math.random()
var tempZipFileStream = fs.createWriteStream(tempZipFilePath)
request.get({
url: url,
encoding: null
}).on('end', function() {
fs.readFile(tempZipFilePath, 'base64', function (err, zipContent) {
var zip = new NodeZip(zipContent, { base64: true })
Object.keys(zip.files).forEach(function (filename) {
var tempFilePath = tmpFolder + new Date().getTime() + Math.random()
var unzipped = zip.files[filename].data
fs.writeFile(tempFilePath, unzipped, function (err) {
callback(err, tempFilePath)
})
})
})
}).pipe(tempZipFileStream)
}
getData('/tmp/', 'http://bdn-ak.bloomberg.com/precanned/Comdty_Calendar_Spread_Option_20120428.txt.zip', function (err, path) {
if (err) {
return console.error('error: %s' + err.message)
}
var metadata = []
csv().fromPath(path, {
delimiter: '|',
columns: true
}).transform(function (data){
// do things with your data
if (data.NAME[0] === '#') {
metadata.push(data.NAME)
} else {
return data
}
}).on('data', function (data, index) {
console.log('#%d %s', index, JSON.stringify(data, null, ' '))
}).on('end',function (count) {
console.log('Metadata: %s', JSON.stringify(metadata, null, ' '))
console.log('Number of lines: %d', count)
}).on('error', function (error) {
console.error('csv parsing error: %s', error.message)
})
})
If you're under MacOS or Linux, you can use the unzip command to unzip from stdin.
In this example I'm reading the zip file from the filesystem into a Buffer object but it works
with a downloaded file as well:
// Get a Buffer with the zip content
var fs = require("fs")
, zip = fs.readFileSync(__dirname + "/test.zip");
// Now the actual unzipping:
var spawn = require('child_process').spawn
, fileToExtract = "test.js"
// -p tells unzip to extract to stdout
, unzip = spawn("unzip", ["-p", "/dev/stdin", fileToExtract ])
;
// Write the Buffer to stdin
unzip.stdin.write(zip);
// Handle errors
unzip.stderr.on('data', function (data) {
console.log("There has been an error: ", data.toString("utf-8"));
});
// Handle the unzipped stdout
unzip.stdout.on('data', function (data) {
console.log("Unzipped file: ", data.toString("utf-8"));
});
unzip.stdin.end();
Which is actually just the node version of:
cat test.zip | unzip -p /dev/stdin test.js
EDIT: It's worth noting that this will not work if the input zip is too big to be read in one chunk from stdin. If you need to read bigger files, and your zip file contains only one file, you can use funzip instead of unzip:
var unzip = spawn("funzip");
If your zip file contains multiple files (and the file you want isn't the first one) I'm afraid to say you're out of luck. Unzip needs to seek in the .zip file since zip files are just a container, and unzip may just unzip the last file in it. In that case you have to save the file temporarily (node-temp comes in handy).
Two days ago the module node-zip has been released, which is a wrapper for the JavaScript only version of Zip: JSZip.
var NodeZip = require('node-zip')
, zip = new NodeZip(zipBuffer.toString("base64"), { base64: true })
, unzipped = zip.files["your-text-file.txt"].data;