reading a big file in chunks and adding to object

reading a big file in chunks and adding to object - javascript

I am trying to read a big file in chunks instead of loading it directly to memory using nodejs. My goal is to read the file but cannot load it into memory as the file is big and then group the anagrams and then output them.
I started following the article described here
It basically involves creating a shared buffer at the beginning of the program and passing it down.
Essentially it involves the following functions
function readBytes(fd, sharedBuffer) {
return new Promise((resolve, reject) => {
fs.read(fd, sharedBuffer, 0, sharedBuffer.length, null, (err) => {
if (err) {
return reject(err);
}
resolve();
});
});
}
async function* generateChunks(filePath, size) {
const sharedBuffer = Buffer.alloc(size);
const stats = fs.statSync(filePath); // file details
const fd = fs.openSync(filePath); // file descriptor
let bytesRead = 0; // how many bytes were read
let end = size;
for (let i = 0; i < Math.ceil(stats.size / size); i++) {
await readBytes(fd, sharedBuffer);
bytesRead = (i + 1) * size;
if (bytesRead > stats.size) {
// When we reach the end of file,
// we have to calculate how many bytes were actually read
end = size - (bytesRead - stats.size);
}
yield sharedBuffer.slice(0, end);
}
}
I then call it in main like the following. My goal is to group all the anagrams and then output them. However the issue I am having is that when I run the program the first 99,000 items I can access via console.log(Object.values(result)[99000]); however after that I am getting undefined. Any ideas what I am doing wrong?
const CHUNK_SIZE = 10000000; // 10MB
async function main() {
let result = {};
for await (const chunk of generateChunks("Data/example2.txt", CHUNK_SIZE)) {
let words = chunk.toString("utf8").split("\n");
for (let word of words) {
let cleansed = word.split("").sort().join("");
if (result[cleansed]) {
result[cleansed].push(word);
} else {
result[cleansed] = [word];
}
}
}
console.log(Object.values(result)[99000]);
return Object.values(result);
}

Related

Can't use Uniswap V3 SwapRouter for multihop swaps, SwapRouter.exactInput(params) throws 'UNPREDICTABLE_GAS_LIMIT'

I'm trying to implement swap with new Uniswap V3 contracts.
I'm using Quoter contract for getting the quotes out and SwapRouter for making the swaps.
If I'm using methods for direct swap (when tokens have pools) for example - -
ethersProvider = new ethers.providers.Web3Provider(web3.currentProvider, 137);
uniSwapQuoter = new ethers.Contract(uniSwapQuoterAddress, QuoterAbi.abi, ethersProvider);
uniSwapRouterV3 = new ethers.Contract(uniSwapRouterAddress, RouterAbi.abi,
ethersProvider.getSigner());
uniSwapQuoter.callStatic.quoteExactInputSingle(.....)
uniSwapQuoter.callStatic.quoteExactOutputSingle(.....)
uniSwapRouterV3.exactInputSingle(params)
everything works fine, but when I try to use the multihop quotes and multihop swaps if fails with
"reason": "cannot estimate gas; transaction may fail or may require manual gas limit",
"code": "UNPREDICTABLE_GAS_LIMIT",
"error": {
"code": -32000,
"message": "execution reverted"
},
"method": "estimateGas",
"transaction": {
"from": "0x532d647481c20f4422A8331339D76b25cA569959",
"to": "0xE592427A0AEce92De3Edee1F18E0157C05861564",
"data": "0xc04b8d59000000000000000000000000000000000000000000000000000000000000002000000000000000000000000000000000000000000000000000000000000000a00000000000000000000000002a6b82b6dd3f38eeb63a35f2f503b9398f02d9bb0000000000000000000000000000000000000000000000000000000861c468000000000000000000000000000000000000000000000000000000000000002710000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000422791bca1f2de4661ed88a30c99a7a9449aa841740005007ceb23fd6bc0add59e62ac25578270cff1b9f619003000c26d47d5c33ac71ac5cf9f776d63ba292a4f7842000000000000000000000000000000000000000000000000000000000000",
"accessList": null
}
for encoding the params I'm using the uniswap example from tests:
function encodePath(tokenAddresses, fees) {
const FEE_SIZE = 3
if (path.length != fees.length + 1) {
throw new Error('path/fee lengths do not match')
}
let encoded = '0x'
for (let i = 0; i < fees.length; i++) {
// 20 byte encoding of the address
encoded += path[i].slice(2)
// 3 byte encoding of the fee
encoded += fees[i].toString(16).padStart(2 * FEE_SIZE, '0')
}
// encode the final token
encoded += path[path.length - 1].slice(2)
return encoded.toLowerCase()
}
and finally my example code I'm doing for quotes:
const routeAndFees = await getAddressPath(path);
const encodedPath = await encodePath(routeAndFees.path, routeAndFees.fees);
const usdcWithDecimals = parseFloat(usdcAmount) * 1000000
const tokenDecimals = path[path.length - 1].tokenOut.decimals;
try {
const amountOut = await uniSwapQuoter.callStatic.quoteExactInput(encodedPath, usdcWithDecimals.toString());
console.log("Token amount out:", parseFloat(amountOut) / (10 ** tokenDecimals));
return {
tokenOut: parseFloat(amountOut) / (10 ** tokenDecimals),
usdcIn: parseFloat(usdcAmount)
};
} catch (e) {
console.log(e);
return e;
}
}
and swapping:
async function multiSwap(path, userAddress, usdcAmount) {
const usdcWithDecimals = parseFloat(usdcAmount) * 1000000
const routeAndFees = await getAddressPath(path);
const encodedPath = await encodePath(routeAndFees.path, routeAndFees.fees);
const params = {
path: encodedPath,
recipient: userAddress,
deadline: Math.floor(Date.now() / 1000) + 900,
amountIn: usdcWithDecimals.toString(),
amountOutMinimum: 0,
}
try {
return await uniSwapRouterV3.exactInput(params);
} catch (e) {
console.log(e);
return e;
}
}
The path is [address,fee,address,fee,address] like it should be, I not sure about the encoding of that, but didn't find any other example. Actually didn't find any example for doing uniswap v3 multihop swaps, even in the UniDocs there is Trade example and single pool swap...
Can someone point what could I have done wrong here?
The same error is in quoting and when swapping :/
I'm testing on Polygon Mainnet and I can make the same path swap directly on uniswap but it fails when I trigger the script...

You should hash the fee value. Instead of 0 add 6. This should work for you:
async function encodePath(path, fees, exactInput) {
const FEE_SIZE = 6
if (path.length !== fees.length + 1) {
throw new Error('path/fee lengths do not match')
}
if (!exactInput) {
path = path.reverse();
fees = fees.reverse();
}
let encoded = '0x'
for (let i = 0; i < fees.length; i++) {
encoded += path[i].slice(2)
let fee = web3.utils.toHex(parseFloat(fees[i])).slice(2).toString();
encoded += fee.padStart(FEE_SIZE, '0');
}
encoded += path[path.length - 1].slice(2)
return encoded
}

How to use multiple promises in recursion?

I am trying to solve the problem where the script enters a website, takes the first 10 links from it and then goes on those 10 links and then goes on to the next 10 links found on each of these 10 previous pages. Until the number of visited pages will be 1000.
This is what it looks like:
I was trying to get this by using for loop inside promise and recursion, this is my code:
const rp = require('request-promise');
const url = 'http://somewebsite.com/';
const websites = []
const promises = []
const getOnSite = (url, count = 0) => {
console.log(count, websites.length)
promises.push(new Promise((resolve, reject) => {
rp(url)
.then(async function (html) {
let links = html.match(/https?:\/\/(www\.)?[-a-zA-Z0-9#:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b([-a-zA-Z0-9()#:%_\+.~#?&//=]*)/g)
if (links !== null) {
links = links.splice(0, 10)
}
websites.push({
url,
links,
emails: emails === null ? [] : emails
})
if (links !== null) {
for (let i = 0; i < links.length; i++) {
if (count < 3) {
resolve(getOnSite(links[i], count + 1))
} else {
resolve()
}
}
} else {
resolve()
}
}).catch(err => {
resolve()
})
}))
}
getOnSite(url)

I think you might want a recursive function that takes three arguments:
an array of urls to extract links from
an array of the accumulated links
a limit for when to stop crawling
You'd kick it off by calling it with just the root url, and await all of the returned promises:
const allLinks = await Promise.all(crawl([rootUrl]));
On the initial call the second and third arguments could assume default values:
async function crawl (urls, accumulated = [], limit = 1000) {
...
}
The function would fetch each url, extract its links, and recurse until it hit the limit. I haven't tested any of this, but I'm thinking something along these lines:
// limit the number of links per page to 10
const perPageLimit = 10;
async function crawl (urls, accumulated = [], limit = 1000) {
// if limit has been depleted or if we don't have any urls,
// return the accumulated result
if (limit === 0 || urls.length === 0) {
return accumulated;
}
// process this set of links
const links = await Promise.all(
urls
.splice(0, perPageLimit) // limit to 10
.map(url => fetchHtml(url) // fetch the url
.then(extractUrls)); // and extract its links
);
// then recurse
return crawl(
links, // newly extracted array of links from this call
[...accumulated, links], // pushed onto the accumulated list
limit - links.length // reduce the limit and recurse
);
}
async fetchHtml (url) {
//
}
const extractUrls = (html) => html.match( ... )

JavaScript promise won't resolve or reject, it just blocks for some reason

I'm writing a node script, which is supposed to order images by a score value, calculated by a function called getImageScore(). This score takes quite some time to be calculated, therefore I created a promise (promiseImageScore) that would return the score value. After getting the promised value by the promiseImageScore promise, the program would log the name of the image and its score. Despite creating a promise for the score values, they still come back as undefined. I observed the promises would never finish, they just always remain in pending state. I tried tracing what's actually happening in the getImageScore() function, by logging some messages through the function and, as expected, those logs stop at some point, but I cannot get why this happens. Here is the full program:
const fs = require('fs');
const { resolve } = require('path');
const { reject } = require('q');
const { Console } = require('console');
const gm = require('gm').subClass({imageMagick: true});
const PNG = require("pngjs").PNG;
let pathToFolder = '/home/eugen/Pictures/wallpapers1';
let pathToImage = '';
let promiseImageScore = new Promise((resolve, reject) => {
resolve(getImageScore());
});
function getImageScore() {
console.log('entered this promise....');
let img = gm(pathToImage);
// Get the PNG buffer
img.toBuffer("PNG", (err, buff) => {
if (err) return new Error(err);
console.log('got buffer...');
// Get the image size
img.size((err, size) => {
if (err) {
console.log(err);
return new Error(err);
}
console.log('got image size...');
// Parse the PNG buffer
let str = new PNG();
str.end(buff);
// After it's parsed...
str.on("parsed", buffer => {
// Get the pixels from the image
let idx, score = 0, rgb = {r: 0, g: 0, b: 0};
for (let y = 0; y < size.height; y++)
for (let x = 0; x < size.width; x++) {
idx = (size.width * y + x) << 2;
rgb.r = buffer[idx];
rgb.g = buffer[idx + 1];
rgb.b = buffer[idx + 2];
score += (rgb.r + rgb.g + rgb.b) / 765;
}
console.log('one promised finished...');
return score / (size.height * size.width);
});
str.on("error", e => {
return new Error(e);
});
});
});
}
// see which images are to be found in the specificd directory
fs.readdir(pathToFolder, function (err, files) {
if (err) return console.log('Unable to scan directory: ' + err);
console.log('files in directory:\n');
files.forEach(function (file) {
pathToImage = pathToFolder + '/' + file;
//showImageScore();
promiseImageScore
.then(imageScore => {
console.log(file + ' has a score of ' + imageScore);
})
.catch(e => {
throw e;
})
});
});
Here is the output of the code:
entered this promise....
files in directory:
Boats_Thailand_Sea_Crag_Nature_8000x5224.jpg has a score of undefined
Wallpaper_8K_0_7680x4320.jpg has a score of undefined
Water_mountains_landscapes_nature_snow_valley_rocks_switzerland_rivers_3840x2400.jpg has a score of undefined
Waterfalls_USA_Crag_Trees_Hocking_Hills_State_Park_Ohio_Nature_10929x5553.jpg has a score of undefined
cats_blue_eyes_animals_pets_4288x2848.jpg has a score of undefined
cats_blue_eyes_animals_pets_4288x2848.png has a score of undefined
city_night_panorama_117682_3840x2160.jpg has a score of undefined
starry_sky_tree_night_sky_119989_1920x1080.jpg has a score of undefined
got buffer...
After the got buffer... log, the program keeps running, never stopping, apparently doing nothing.

Your code is not working with promises correctly. You need to make several changes:
Pass the resolve and reject to getImageScore:
const promiseImageScore = new Promise((resolve, reject) => {
// You need to pass resolve and reject to getImageScore
getImageScore(resolve, reject);
});
Use resolve and reject in getImageScore:
// For example
img.toBuffer("PNG", (err, buff) => {
// If you get an error you need to call reject and return from the function
// You will need to do the same for img.size((err, size) => {})
// and for str.on("error", e => {})
if (err) return reject(err);
// Then when you have your result and there were no errors,
// you will need to call resolve with the result
resolve(score / (size.height * size.width));
});

Batch a stream of requests into promises, grouped by time interval

I have an api endpoint that receives a large volume of requests from various sources.
For every request received, I create a promise that invokes a internal api.
I want to batch these promises by source, where each batch contains at most 10 seconds of requests.
How can this be done?

If you have multiple requests from multiple sources you may just keep placing them into a Map object where keys being sources and values being received requests collected in an array. Such as let myMap be something like;
{source1: [req1,req2,req3],
source2: [req1,req2],
.
.
sourceN: [req1,req2,...,reqm]}
You may set up a pseudo recursive setTimeout loop to invoke your internal API.
var apiInterval = 10000;
function runner(){
setTimeout(mv => { Promise.all(mv.map(reqs => Promise.all(reqs.map(req => apiCall(req)))))
.then(pss => pss.map(ps => ps.map(p => p.then(r => doSomethingWithEachApiCallResult(r)))));
clearMapValues(); // to be filled in the next 10 seconds
runner();
}, apiInterval, myMap.values.slice());
}
Please take above as a pseudo code just to give you an idea. For instance Map.values return an iterator object and you may need to turn it into an array like [...myMap.values()] before using .map() or .slice() over it.
This is a little better than setInterval way of looping as you may change the interval value dynamically depending on the workload or whatnot.

I propose the following solution.
It uses a Map to store a string key and a array of values.
It uses setTimeout for every map key to flush the values of that map key to a callback.
Code
/**
* A stream of requests come from various sources, can be transposed into a batch indexed
* by the source of the request.
*
* The size of each batch is defined by a time interval. I.e. any request received within the
* time interval is stored in a batch.
*/
export class BatchStream<K, V> {
cache: Map<K, V[]>
flushRate: number
onBatch: (k: K, v: V[]) => Promise<void>
debug: boolean
constructor(onBatch: (k: K, v: V[]) => Promise<void>, flushRate = 5000, debug = false) {
this.cache = new Map<K, V[]>()
this.onBatch = onBatch
this.debug = debug
this.flushRate = flushRate
this.flush = this.flush.bind(this)
}
push(k: K, v: V) {
if (this.cache.has(k)) {
let batch = this.cache.get(k)
batch.push(v)
this.cache.set(k, batch)
} else {
this.cache.set(k, [v])
setTimeout(this.flush, this.flushRate, k)
}
}
flush(k: K) {
this.debug && console.log("Flush", k)
let batch = this.cache.get(k)
this.cache.delete(k)
this.onBatch(k, batch)
this.debug && console.log("Size", this.cache.size)
}
}
Test
it("BatchStream", (done) => {
let sources = []
let iterations = 10
let jobs = []
let jobsDone = 0
let debug = true
// Prepare sources
for (let i = 97; i < 123; i++) {
sources.push(String.fromCharCode(i))
}
// Prepare a stream of test data
for (let k of sources) {
for (let i = 0; i < iterations; i++) {
jobs.push({ k, v: k + i.toString() })
}
}
shuffle(jobs)
// Batch handler
let onBatch = (k: string, v: string[]) => {
return new Promise<void>((resolve, reject) => {
jobsDone += v.length
debug && console.log(" --> " + k, v.length, v.join(","), jobsDone, sources.length * iterations)
if (jobsDone == sources.length * iterations) {
done()
}
resolve()
})
}
let batchStream = new BatchStream<string, string>(onBatch, 5000, debug)
// Stream test data into batcher
let delay = 0
for (let j of jobs) {
delay += 100
setTimeout(() => {
batchStream.push(j.k, j.v)
}, delay)
}
})

CryptoJS - Decrypt an encrypted file

I'm trying to write an application to do end-to-end encryption for files with JS in browser. However I don't seem to be able to get all files decrypted correctly.
TL;DR As it's impractical to encrypt files bigger than 1MB as a whole, I'm trying to encrypt them chunk by chunk. After doing so I try to write the encrypted words (resulted from CryptoJS's WordArray) into a blob. As for decryption I read the files and split them to chunks according to map generated while encrypting the chunks and try to decrypt them. The problem is decrypted result is 0 bits!
I guess I'm not reading the chunks while decrypting correctly. Please take a look at the code below for the function getBlob (writing data to the blob) and the last part of decryptFile for reading chunks.
More explanation
I'm using CryptoJS AES with default settings.
Right now my code looks like this:
function encryptFile (file, options, resolve, reject) {
if (!options.encrypt) {
return resolve(file)
}
if (!options.processor || !options.context) {
return reject('No encryption method.')
}
function encryptBlob (file, optStart, optEnd) {
const start = optStart || 0
let stop = optEnd || CHUNK_SIZE
if (stop > file.size - 1) {
stop = file.size
}
const blob = file.slice(start, stop)
const fileReader = new FileReader()
fileReader.onloadend = function () {
if (this.readyState !== FileReader.DONE) return
const index = Math.ceil(optStart / CHUNK_SIZE)
const result = CryptoJS.lib.WordArray.create(this.result)
encryptedFile[index] = encrypt(result)
chunksResolved++
if (chunksResolved === count) {
const {sigBytes, sigBytesMap, words} = getCipherInfo(encryptedFile)
const blob = getBlob(sigBytes, words)
resolve(blob, Object.keys(sigBytesMap))
}
}
fileReader.readAsArrayBuffer(blob)
}
let chunksResolved = 0
const encryptedFile = []
const CHUNK_SIZE = 1024*1024
const count = Math.ceil(file.size / CHUNK_SIZE)
const encrypt = value => options.processor.call(
options.context, value, 'file',
(v, k) => CryptoJS.AES.encrypt(v, k))
for (let start = 0; (start + CHUNK_SIZE) / CHUNK_SIZE <= count; start+= CHUNK_SIZE) {
encryptBlob(file, start, start + CHUNK_SIZE - 1)
}
}
As you can see I'm trying to read the file chunk by chunk (each chunk is 1MB or fileSize % 1MB) as ArrayBuffer, converting it to WordArray for CryptoJS to understand and encrypt it.
After encrypting all the chunks I try to write each word they have to a blob (using a code I found in CryptoJS's issues in Google Code, mentioned below) and I guess here is what goes wrong. I also generated a map for where encrypted chunks end so I can later use it to get the chunks out of the binary file for decryption.
And here's how I decrypt the files:
function decryptFile (file, sigBytesMap, filename, options, resolve, reject) {
if (!options.decrypt) {
return resolve(file)
}
if (!options.processor || !options.context) {
return reject('No decryption method.')
}
function decryptBlob (file, index, start, stop) {
const blob = file.slice(start, stop)
const fileReader = new FileReader()
fileReader.onloadend = function () {
if (this.readyState !== FileReader.DONE) return
const result = CryptoJS.lib.WordArray.create(this.result)
decryptedFile[index] = decrypt(result)
chunksResolved++
if (chunksResolved === count) {
const {sigBytes, words} = getCipherInfo(decryptedFile)
const finalFile = getBlob(sigBytes, words)
resolve(finalFile, filename)
}
}
fileReader.readAsArrayBuffer(blob)
}
let chunksResolved = 0
const count = sigBytesMap.length
const decryptedFile = []
const decrypt = value => options.processor.call(
options.context, value, 'file',
(v, k) => CryptoJS.AES.decrypt(v, k))
for (let i = 0; i < count; i++) {
decryptBlob(file, i, parseInt(sigBytesMap[i - 1]) || 0, parseInt(sigBytesMap[i]) - 1)
}
}
Decryption is exactly like the encryption but doesn't work. Although chunks are not 1MB anymore, they are limited to sigBytes mentioned in the map. There is no result for the decryption! sigBytes: 0.
Here's the code for generating a blob and getting sigbytesMap:
function getCipherInfo (ciphers) {
const sigBytesMap = []
const sigBytes = ciphers.reduce((tmp, cipher) => {
tmp += cipher.sigBytes || cipher.ciphertext.sigBytes
sigBytesMap.push(tmp)
return tmp
}, 0)
const words = ciphers.reduce((tmp, cipher) => {
return tmp.concat(cipher.words || cipher.ciphertext.words)
}, [])
return {sigBytes, sigBytesMap, words}
}
function getBlob (sigBytes, words) {
const bytes = new Uint8Array(sigBytes)
for (var i = 0; i < sigBytes; i++) {
const byte = (words[i >>> 2] >>> (24 - (i % 4) * 8)) & 0xff
bytes[i] = byte
}
return new Blob([ new Uint8Array(bytes) ])
}
I'm guessing the issue is the method I'm using to read the encrypted chunks. Or maybe writing them!
I should also mention that previously I was doing something different for encryption. I was stringifying each WordArray I got as the result for CryptoJS.AES.encrypt using the toString method with the default encoding (which I believe is CryptoJS.enc.Hex) but some files didn't decrypt correctly. It didn't have anything to do with the size of the original file, rather than their types. Again, I'm guessing!

Turns out the problem was the WordArray returned by CryptoJS.AES.decrypt(value, key) has 4 extra words as padding which should not be included in the final result. CryptoJS tries unpadding the result but only changes sigBytes accordingly and doesn't change words. So when decrypting, before writing chunks to file pop those extra words. 4 words for full chunks and 3 for smaller ones (last chunk).

check this issue
import CryptoJS from "crypto-js";
async function encryptBlobToBlob(blob: Blob, secret: string): Promise<Blob> {
const wordArray = CryptoJS.lib.WordArray.create(await blob.arrayBuffer());
const result = CryptoJS.AES.encrypt(wordArray, secret);
return new Blob([result.toString()]);
}
export async function decryptBlobToBlob(blob: Blob, secret: string): Promise<Blob> {
const decryptedRaw = CryptoJS.AES.decrypt(await blob.text(), secret);
return new Blob([wordArrayToByteArray(decryptedRaw)]);
}
function wordToByteArray(word, length) {
const ba = [];
const xFF = 0xff;
if (length > 0) ba.push(word >>> 24);
if (length > 1) ba.push((word >>> 16) & xFF);
if (length > 2) ba.push((word >>> 8) & xFF);
if (length > 3) ba.push(word & xFF);
return ba;
}
function wordArrayToByteArray({ words, sigBytes }: { sigBytes: number; words: number[] }) {
const result = [];
let bytes;
let i = 0;
while (sigBytes > 0) {
bytes = wordToByteArray(words[i], Math.min(4, sigBytes));
sigBytes -= bytes.length;
result.push(bytes);
i++;
}
return new Uint8Array(result.flat());
}
async function main() {
const secret = "bbbb";
const blob = new Blob(["1".repeat(1e3)]);
const encryptedBlob = await encryptBlobToBlob(blob, secret);
console.log("enrypted blob size", encryptedBlob.size);
const decryptedBlob = await decryptBlobToBlob(encryptedBlob, secret);
console.log("decryptedBlob", decryptedBlob);
console.log(await decryptedBlob.text());
}
main();

Develop Reference

JavaScript is the programming language of the Web.

reading a big file in chunks and adding to object - javascript

Related

Can't use Uniswap V3 SwapRouter for multihop swaps, SwapRouter.exactInput(params) throws 'UNPREDICTABLE_GAS_LIMIT'

How to use multiple promises in recursion?

JavaScript promise won't resolve or reject, it just blocks for some reason

Batch a stream of requests into promises, grouped by time interval

CryptoJS - Decrypt an encrypted file

Categories

Resources