How to save object with large binary data and other values? - javascript

I am currently trying to save an js object with some binary data and other values. The result should look something like this:
{
"value":"xyz",
"file1":"[FileContent]",
"file2":"[LargeFileContent]"
}
Till now I had no binary data so I saved everything in JSON. With the binary data I am starting to run into problems with large files (>1GB).
I tried this approach:
JSON.stringify or how to serialize binary data as base64 encoded JSON?
Which worked for smaller files with around 20MB. However if I am using these large files then the result of the FileReader is always an empty string.
The result would look like this:
{
"value":"xyz:,
"file1":"[FileContent]",
"file2":""
}
The code that is reading the blobs is pretty similar to the one in the other post:
const readFiles = async (measurements: FormData) => {
setFiles([]); //This is where the result is beeing stored
let promises: Array<Promise<string>> = [];
measurements.forEach((value) => {
let dataBlob = value as Blob;
console.log(dataBlob); //Everything is fine here
promises.push(
new Promise((resolve, reject) => {
const reader = new FileReader();
reader.readAsDataURL(dataBlob);
reader.onloadend = function () {
resolve(reader.result as string);
};
reader.onerror = function (error) {
reject(error);
};
})
);
});
let result = await Promise.all(promises);
console.log(result); //large file shows empty
setFiles(result);
};
Is there something else I can try?

Since you have to share the data with other computers, you will have to generate your own binary format.
Obviously you can make it as you wish, but given your simple case of just storing Blob objects with a JSON string, we can come up with a very simple schema where we first store some metadata about the Blobs we store, and then the JSON string where we replaced each Blob with an UUID.
This works because the limitation you hit is actually on the max length a string can be, and we can .slice() our binary file to read only part of it. Since we never read the binary data as string we're fine, the JSON will only hold a UUID in places where we had Blobs and it shouldn't grow too much.
Here is one such implementation I made quickly as a proof of concept:
/*
* Stores JSON data along with Blob objects in a binary file.
* Schema:
* 4 first bytes = # of blobs stored in the file
* next 4 * # of blobs = size of each Blob
* remaining = JSON string
*
*/
const hopefully_unique_id = "_blob_"; // <-- change that
function generateBinary(JSObject) {
let blobIndex = 0;
const blobsMap = new Map();
const JSONString = JSON.stringify(JSObject, (key, value) => {
if (value instanceof Blob) {
if (blobsMap.has(value)) {
return blobsMap.get(value);
}
blobsMap.set(value, hopefully_unique_id + (blobIndex++));
return hopefully_unique_id + blobIndex;
}
return value;
});
const blobsArr = [...blobsMap.keys()];
const data = [
new Uint32Array([blobsArr.length]),
...blobsArr.map((blob) => new Uint32Array([blob.size])),
...blobsArr,
JSONString
];
return new Blob(data);
}
async function readBinary(bin) {
const numberOfBlobs = new Uint32Array(await bin.slice(0, 4).arrayBuffer())[0];
let cursor = 4 * (numberOfBlobs + 1);
const blobSizes = new Uint32Array(await bin.slice(4, cursor).arrayBuffer())
const blobs = [];
for (let i = 0; i < numberOfBlobs; i++) {
const blobSize = blobSizes[i];
blobs.push(bin.slice(cursor, cursor += blobSize));
}
const pattern = new RegExp(`^${hopefully_unique_id}\\d+$`);
const JSObject = JSON.parse(
await bin.slice(cursor).text(),
(key, value) => {
if (typeof value !== "string" || !pattern.test(value)) {
return value;
}
const index = +value.replace(hopefully_unique_id, "") - 1;
return blobs[index];
}
);
return JSObject;
}
// demo usage
(async () => {
const obj = {
foo: "bar",
file1: new Blob(["Let's pretend I'm actually binary data"]),
// This one is 512MiB, which is bigger than the max string size in Chrome
// i.e it can't be stored in a JSON string in Chrome
file2: new Blob([Uint8Array.from({ length: 512*1024*1024 }, () => 255)]),
};
const bin = generateBinary(obj);
console.log("as binary", bin);
const back = await readBinary(bin);
console.log({back});
console.log("file1 read as text:", await back.file1.text());
})().catch(console.error);

Related

count number of pages pdf by create function node js

I am trying to count number of pages pdf file by create function by not using inbuilt methods.
const fs = require("fs");
const pdf = require("pdf-parse");
let data = fs.readFileSync("pdf/book.pdf");
const countpage = (str) => {
return str.reduce((el) => {
return el + 1;
}, 0);
};
pdf(data).then(function (data) {
console.log(countpage(data.numpages));
});
i am try to output as like number of pages =272 like this when i use data.numpages i got result
but when same think by create function and count then i didn't got result
output like this {numberpage:273}

Getting a value from within FileReader.onloadend (extracting first line of a csv without reading the whole file)

I'm trying to extract the first line (headers) from a CSV using TypeScript. I found a nifty function that does this using FileReader.onloadend, iterating over the bytes in the file until it reaches a line break. This function assigns the the header string to a window namespace. This is unfortunately not that useful to me in a window namespace, but I can't find a workable way of getting the header string assigned to a global variable. Does anyone know how best to do this? Is this achievable with this function?
The function is here:
declare global {
interface Window { MyNamespace: any; }
}
export const CSVImportGetHeaders = async (file: File) => {
const reader = new FileReader();
reader.readAsArrayBuffer(file);
// onload triggered each time the reading operation is completed
reader.onloadend = (evt: any) => {
// get array buffer
const data = evt.target.result;
console.log('reader content ', reader);
// get byte length
const byteLength = data.byteLength;
console.log('HEADER STRING ', byteLength);
// make iterable array
const ui8a = new Uint8Array(data, 0);
// header string, compiled iterably
let headerString = '';
let finalIndex = 0;
// eslint-disable-next-line no-plusplus
for (let i = 0; i < byteLength; i++) {
// get current character
const char = String.fromCharCode(ui8a[i]);
// check if new line
if (char.match(/[^\r\n]+/g) !== null) {
// if not a new line, continue
headerString += char;
} else {
// if new lineBreak, stop
finalIndex = i;
break;
}
}
window.MyNamespace = headerString.split(/,|;/);
const potout = window.MyNamespace;
console.log('reader result in function', potout);
};
const output = await window.MyNamespace;
console.log('outside onload event', output);
};

csvtojson node.js (combine two codes)

How to combine these two codes, so it doesn't just covert csv to Json (first code), but also save this as an json array in an extra file?(second code)
this (first) code converts csv file to json array:
const fs = require("fs");
let fileReadStream = fs.createReadStream("myCsvFile.csv");
let invalidLineCount = 0;
const csvtojson = require("csvtojson");
csvtojson({ "delimiter": ";", "fork": true })
.preFileLine((fileLineString, lineIdx)=> {
let invalidLinePattern = /^['"].*[^"'];/;
if (invalidLinePattern.test(fileLineString)) {
console.log(`Line #${lineIdx + 1} is invalid, skipping:`, fileLineString);
fileLineString = "";
invalidLineCount++;
}
return fileLineString
})
.fromStream(fileReadStream)
.subscribe((dataObj) => {
console.log(dataObj);
// I added the second code hier, but it wirtes the last object of the array (because of the loop?)
}
});
and this (second) code saves the json array to an external file:
fs.writeFile('example.json', JSON.stringify(dataObj, null, 4);
The quistion is how to put the second codes into the first code (combine them)?
You can use .on('done',(error)=>{ ... }) method. (csvtojson). Push the data into a variable in subscribe method and write the data as JSON in .on('done'). (test was successful).
Check it out:
let fileReadStream = fs.createReadStream("username-password.csv");
let invalidLineCount = 0;
let data = []
csvtojson({ "delimiter": ";", "fork": true })
.preFileLine((fileLineString, lineIdx)=> {
let invalidLinePattern = /^['"].*[^"'];/;
if (invalidLinePattern.test(fileLineString)) {
console.log(`Line #${lineIdx + 1} is invalid, skipping:`, fileLineString);
fileLineString = "";
invalidLineCount++;
}
return fileLineString
})
.fromStream(fileReadStream)
.subscribe((dataObj) => {
// console.log(dataObj)
data.push(dataObj)
})
.on('done',(error)=>{
fs.writeFileSync('example.json', JSON.stringify(data, null, 4))
})
Not sure if you are able to change the library but I would definitely recommend Papaparse for this - https://www.npmjs.com/package/papaparse
Your code would then look something like this:
const fs = require('fs'), papa = require('papaparse');
var readFile = fs.createReadStream(file);
papa.parse(readFile, {
complete: function (results, file) {
fs.writeFile('example.json', JSON.stringifiy(results.data), function (err) {
if(err) console.log(err);
// callback etc
})
}
});

Using DataView with nodejs Buffer

I'm trying to read/write some binary data using the DataView object. It seems to work correctly when the buffer is initialized from a UInt8Array, however it if pass it a nodejs Buffer object the results seem to be off. Am I using the API incorrectly?
import { expect } from 'chai';
describe('read binary data', () => {
it('test buffer', () => {
let arr = Buffer.from([0x55,0xaa,0x55,0xaa,0x60,0x00,0x00,0x00,0xd4,0x03,0x00,0x00,0x1c,0xd0,0xbb,0xd3,0x00,0x00,0x00,0x00])
let read = new DataView(arr.buffer).getUint32(0, false);
expect(read).to.eq(0x55aa55aa);
})
it('test uint8array', () => {
let arr = new Uint8Array([0x55,0xaa,0x55,0xaa,0x60,0x00,0x00,0x00,0xd4,0x03,0x00,0x00,0x1c,0xd0,0xbb,0xd3,0x00,0x00,0x00,0x00])
let read = new DataView(arr.buffer).getUint32(0, false);
expect(read).to.eq(0x55aa55aa);
})
})
The one with the buffer fails with
AssertionError: expected 1768779887 to equal 1437226410
+ expected - actual
-1768779887
+1437226410
try use this buf.copy
const buf = fs.readFileSync(`...`);
const uint8arr = new Uint8Array(buf.byteLength);
buf.copy(uint8arr, 0, 0, buf.byteLength);
const v = new DataView(uint8arr.buffer);
Nodejs Buffer is just a view over underlying allocated buffer that can be a lot larger. This is how to get ArrayBuffer out of Buffer:
function getArrayBufferFromBuffer( buffer ) {
return buffer.buffer.slice( buffer.byteOffset, buffer.byteOffset + buffer.byteLength ) );
}
This helped (not sure if this is the most elegant):
const buff = Buffer.from(msgBody, 'base64');
let uint8Array = new Uint8Array(buff.length);
for(let counter=0;counter<buff.length;counter++) {
uint8Array[counter] = buff[counter];
//console.debug(`uint8Array[${counter}]=${uint8Array[counter]}`);
}
let dataview = new DataView(uint8Array.buffer, 0, uint8Array.length);

Read lines of a txt file and organize in a JSON file

I have a text file where each line is separated into 4 categories by colons and I want to put this into a JSON file, where each category is a value to the corresponding name in the JSON file.
Example data.txt file:
Date1:cat1:dog1:bug1
Date2:cat2:dog2:bug2
Date3:cat3:dog3:bug3
Example JSON file:
{
"Date1": {
"cat": "cat1",
"dog": "dog1",
"bug": "bug1"
},
"Date2": {
"cat": "cat2",
"dog": "dog2",
"bug": "bug2"
...
...
}
I've never used JSON before but I think that's how to format it. How would I sort each line using the colons as markers for the next value and store it in the JSON file with the correct name using JavaScript and Node.js?
Use the csv package if you don't want to handle parsing csv file by yourself.
const fs = require("fs");
const csv = require("csv");
const result = {};
const keys = ["cat", "dog", "bug"]
// Read data
const readStream = fs.createReadStream("yourfile.txt");
// Parser
const parser = csv.parse({ delimiter: ":" });
parser.on("data", (chunk) => {
result[chunk[0]] = {};
for(let i = 1; i < chunk.length; i ++) {
result[chunk[0]][keys[i - 1]] = chunk[i];
}
});
parser.on("end", () => {
console.log(result);
});
readStream.pipe(parser);
If your JSON has this defined structure you can go about it with the following code:
import * as fs from 'fs';
/* If you have a large file this is a bad Idea, refer to reading from a stream
* From zhangjinzhou's answer
*/
const file = fs.readFileSync('path/to/data.txt', 'utf8');
const json = file.split(/\n|\r\n/).map(line => {
const values = line.split(":");
let obj = {}
obj[values[0]] = {
cat: values[1],
dog: values[2],
bug: values[3],
};
return obj
}).reduce((acc, current) => Object.assign(acc, current), {})
Using RegExp and Array#forEach, convert the string to lines, then iterate over them and fill up the object with the corresponding data via:
const dataFileContent =
`Date1:cat1:dog1:bug1
Date2:cat2:dog2:bug2
Date3:cat3:dog3:bug3`;
function processData(data) {
// convert to lines
const lines = data.match(/[^\r\n]+/g) || [];
const object = {};
// iterate over the lines
lines.forEach(line => {
const parts = line.split(':');
const main = parts.shift();
const pattern = /^(.*?)(\d+)$/;
// create an object for each main part
object[main] = {};
// fill each main part with the sub parts
parts.forEach(part => {
const match = part.match(pattern) || [];
const key = match[1];
const value = match[2];
if (match) {
object[main][key] = key + value;
}
});
});
return object;
}
const processedData = processData(dataFileContent);
console.log(processedData);
Then convert the processedData to JSON by using JSON.stringify and save it to a file via:
const fs = require('fs');
...
// processData
...
const json = JSON.stringify(processedData);
fs.writeFile('my_json_file.json', json, 'utf8');
For larger files, consider using Streams in Node.js as suggested by #zhangjinzhou.

Categories

Resources