How to read a large csv as a stream

How to read a large csv as a stream - javascript

I am using the #aws-sdk/client-s3 to read a json file from S3, take the contents and dump it into dynamodb. This all currently works fine using:
const data = await (await new S3Client(region).send(new GetObjectCommand(bucketParams)));
And then deserialising the response body etc.
However, I'm looking to migrate to use jsonlines format, effectiely csv, in the sense it needs to be streamed in line by line or in chunks of lines and processed. I can't seem to find a way of doing this that doesnt load the entire file into memory (using response.text() etc).
Ideally, I would like to pipe the response into a createReadStream, and go from there.

I found this example with createReadStream() form module fs in node.js:
import fs from 'fs';
function read() {
let data = '';
const readStream = fs.createReadStream('business_data.csv', 'utf-8');
readStream.on('error', (error) => console.log(error.message));
readStream.on('data', (chunk) => data += chunk);
readStream.on('end', () => console.log('Reading complete'));
};
read();
You can modify it for your use. Hope this helps.
Connection to your S3 you can do by:
var s3 = new AWS.S3({apiVersion: '2006-03-01'});
var params = {Bucket: 'myBucket', Key: 'myImageFile.jpg'};
var file = require('fs').createWriteStream('/path/to/file.jpg');
s3.getObject(params).createReadStream().pipe(file);
see here

Related

How To Write and Read JSON texts on single file

I'm receiving events in JSON format via a POST route, I would like to save these events in a file like 'example.json' and be able to query it.
I tried using writeFileSync, but it rewrites the entire file. With the flag {flag: 'a+'} I was able to save more than one record, but when I try to require 'example.json', I get an error 'Unexpected token { in JSON'.
Works fine when the file has only one record, but gives the error after the second one.
Code:
const filePath = './example.json';
const fs = require('fs');
const file = require('./example.json');
app.post('/events', (request, response) => {
response.send(request.body);
const contentString = JSON.stringify(request.body);
return fs.writeFileSync(filepath, contentString, {flag: 'a+'});
});
example.json that works:
{"type":"call.new","call_id":"71252742562.40019","code":"h9e8j7c0tl0j5eexi07sy6znfd1ponj4","direction":"inbound","our_number":"1130900336","their_number":"11999990000","their_number_type":"mobile","timestamp":"2020-04-01T00:00:00Z"}
example.json (with two records) that stop working:
{"type":"call.new","call_id":"71252742562.40019","code":"h9e8j7c0tl0j5eexi07sy6znfd1ponj4","direction":"inbound","our_number":"1130900336","their_number":"11999990000","their_number_type":"mobile","timestamp":"2020-04-01T00:00:00Z"}{"type":"call.ongoing","call_id":"71252731962.40019","code":"h9e8j7c0tl0j5eexi07sy6znfd1ponj4","direction":"inbound","our_number":"1130900336","their_number":"11999990000","their_number_type":"mobile","timestamp":"2020-04-01T00:00:00Z"}
How can I write this JSON in a readable form? That does not present the error above and it is possible to perform the require.
Could someone help me, please?

Try to read the JSON file, parse it, add new elements to the array and then overwrite the file.
const fs = require("fs");
const path = require("path");
const FILE_PATH = path.join(__dirname, "./elements.json");
const file = fs.readFileSync(FILE_PATH);
const elements = JSON.parse(file);
const newElement = { id: Date.now() };
const updatedElements = [...elements, newElement];
fs.writeFileSync(FILE_PATH, JSON.stringify(updatedElements));
See more here: https://nodejs.org/api/fs.html#fsappendfilesyncpath-data-options

Downloading Image locally from GitHub Raw link using fs.writeFileSync() JS

Currently trying to download image from GitHub locally. Everything seems to work, the fetch goes through with a 200 OK response, however, I don't understand how to store image itself:
const rawGitLink = "https://raw.githubusercontent.com/cardano-foundation/CIPs/master/CIP-0001/CIP_Flow.png"
const folder = "/Folder"
const imageName = "/Test"
const imageResponse = await axios.get(rawGitLink)
fs.writeFileSync(___dirname + folder + imageName, imageResponse, (err) => {
//Error handling
}
)

Four problems had to be fixed:
Image name must include png format for this case
The response must be in the correct format as a buffer for an image
You must write the response data and not the object itself
__dirname only needs two underscores
const rawGitLink = "https://raw.githubusercontent.com/cardano-foundation/CIPs/master/CIP-0001/CIP_Flow.png"
const folder = "/Folder"
const imageName = "/Test.png"
const imageResponse = await axios.get(rawGitLink, { responseType: 'arraybuffer' });
fs.writeFileSync(__dirname + folder + imageName, imageResponse.data)

Axios returns a special object: https://github.com/axios/axios#response-schema
let {data} = await axios.get(...)
await fs.writeFile(filename, data) // you can use fs.promises instead of sync
As #Leau said you should include the extension on the filename
Another sugestion is to use the path module to create the filename:
filename = path.join(__dirname, "/Folder", "Test.png")

NodeJS - How to download from a blob?

How can I return a file from a BLOB column using NodeJS?
I'm using the oracledb library to handle the database operations and I have the following code:
async function getFile(req, res) {
let filename = req.params.filename;
let file = await selectFileFromDb(filename);
file = file.rows[0][0]; //Column that contains the blob content
//I would like to return something like this
res.download(file);
}
What should I do to read the BLOB content from the column and return as a download to the requester?
Thank you.

You have to send content header as the type of file that you have to download and then send the buffer (asuming what you got from the db is a buffer ) in the body . Finally end the response after sending the code. Here is a sample code .
async function getFile(req, res) {
let filename = req.params.filename;
let file = await selectFileFromDb(filename);
file = file.rows[0][0]; //Column that contains the blob content
res.setHeader('Content-Length', file.length);
res.write(file, 'binary');
res.end();
}
HOW TO GET THE BLOB CONTENT AS A BUFFER
Do not forget to set the oracledb.fetchAsBuffer property:
const oracledb = require('oracledb');
oracledb.fetchAsBuffer = [oracledb.BLOB];

Multiple file stream instead of download to disk and then zip?

I have an API method that when called and passed an array of file keys, downloads them from S3. I'd like to stream them, rather than download to disk, followed by zipping the files and returning that to the client.
This is what my current code looks like:
reports.get('/xxx/:filenames ', async (req, res) => {
var AWS = require('aws-sdk');
var s3 = new AWS.S3();
var str_array = filenames.split(',');
for (var i = 0; i < str_array.length; i++) {
var filename = str_array[i].trim();
localFileName = './' + filename;
var params = {
Bucket: config.reportBucket,
Key: filename
}
s3.getObject(params, (err, data) => {
if (err) console.error(err)
var file = require('fs').createWriteStream(localFileName);
s3.getObject(params).createReadStream().pipe(file);
console.log(file);
})
}
});
How would I stream the files rather than downloading them to disk and how would I zip them to return that to the client?

Main problem is to zip multiple files.
More specifically, download them from AWS S3 in bulk.
I've searched through AWS SDK and didn't find bulk s3 operations.
Which brings us to one possible solution:
Load files one by one and store them to folder
Zip folder (with some package like this)
Send zipped folder
This is raw and untested example, but it might give you the idea:
// Always import packages at the beginning of the file.
const AWS = require('aws-sdk');
const fs = require('fs');
const zipFolder = require('zip-folder');
const s3 = new AWS.S3();
reports.get('/xxx/:filenames ', async (req, res) => {
const filesArray = filenames.split(',');
for (const fileName of filesArray) {
const localFileName = './' + filename.trim();
const params = {
Bucket: config.reportBucket,
Key: filename
}
// Probably you'll need here some Promise logic, to handle stream operation end.
const fileStream = fs.createWriteStream(localFileName);
s3.getObject(params).createReadStream().pipe(fileStream);
}
// After that all required files would be in some target folder.
// Now you need to compress the folder and send it back to user.
// We cover callback function in promise, to make code looks "sync" way.
await new Promise(resolve => zipFolder('/path/to/the/folder', '/path/to/archive.zip', (err) => {resolve()});
// And now you can send zipped folder to user (also using streams).
fs.createReadStream('/path/to/archive.zip').pipe(res);
});
Info about streams link and link
Attention: You'll probably could have some problems with async behaviour, according to streams nature, so, please, first of all, check if all files are stored in folder before zipping.
Just a mention, I've not tested this code. So if any questions appear, let's debug together

Upload a file stream to S3 without a file and from memory

I'm trying to create a csv from a string and upload it to my S3 bucket. I don't want to write a file. I want it all to be in memory.
I don't want to read from a file to get my stream. I would like to make a stream with out a file. I would like this method createReadStream, but instead of a file, I would like to pass a string with my stream's contents.
var AWS = require('aws-sdk'),
zlib = require('zlib'),
fs = require('fs');
s3Stream = require('s3-upload-stream')(new AWS.S3()),
// Set the client to be used for the upload.
AWS.config.loadFromPath('./config.json');
// Create the streams
var read = fs.createReadStream('/path/to/a/file');
var upload = s3Stream.upload({
"Bucket": "bucket-name",
"Key": "key-name"
});
// Handle errors.
upload.on('error', function (error) {
console.log(error);
});
upload.on('part', function (details) {
console.log(details);
});
upload.on('uploaded', function (details) {
console.log(details);
});
read.pipe(upload);

You can create a ReadableStream and push your string directly to it which, can then be consumed by your s3Stream instance.
const Readable = require('stream').Readable
let data = 'this is your data'
let read = new Readable()
read.push(data) // Push your data string
read.push(null) // Signal that you're done writing
// Create upload s3Stream instance and attach listeners go here
read.pipe(upload)

Develop Reference

JavaScript is the programming language of the Web.

How to read a large csv as a stream - javascript

Related

How To Write and Read JSON texts on single file

Downloading Image locally from GitHub Raw link using fs.writeFileSync() JS

NodeJS - How to download from a blob?

Multiple file stream instead of download to disk and then zip?

Upload a file stream to S3 without a file and from memory

Categories

Resources