Is there a way to read CSV in node.js synchronously? - javascript

I am parsing CSV in node.js using csv-parser lib. But I need that parsed data would be avaliable in the whole project, not only in 'fs' section. I know that there is fs.readFileSync option but it does not useful since CSV is binary file (at least in node.js interpretation). What should I do?
const csv = require("csv-parser");
const fs = require("fs");
const cities = [];
let content = fs.createReadStream('data.csv')
.pipe(csv())
.on('data', (row) => {
cities.push(row);
});
var city_data = {
createArrayId: function(){
console.log(cities);
return cities;
}
}
module.exports = city_data;
As you can see, I need to export "cities" array. Right now it returns empty value (initialized value).

csv-parser seems to always be asynchronous, so you're out of luck there.
If data.csv doesn't change often, I'd recommend parsing it into JSON once and then require the JSON directly.
Otherwise, you can either implement synchronous CSV parsing yourself, or alternately refactor your code to wait until the asynchronous parsing is complete before carrying on with that data.

Related

Iterate over cells in a CSV file in Node.js

I have a CSV file: "myCSV.csv" with two columns: "first" and "second".
All the data inside is just numbers. So the file looks like this:
first, second
138901801, 849043027
389023890, 382903205
749029820, 317891093
...
I would like to iterate over these numbers and perform some custom parsing on them, then store results in an array.
How can I achieve a behavior like the following?
const parsedData = [];
for (const row of file) {
parsedData.push(row[0].toString() + row[1].toString());
}
If you're working with a file the user has selected in the browser, you'd make a FileReader in response to the user's action. (See FileReader - MDN.)
But it sounds like you already have the file on your server, in which case you'd use Node's built-in File System module. (See File System - NodeJS.)
If you just want the module's readFile function, you'd require it in your file like:
const {readFile} = require("fs");
And you'd use it to process a text file like:
readFile("myFile.txt", "utf8", (error, textContent) => {
if(error){ throw error; }
const parsedData = [];
for(let row of textContent.split("\n")){
const rowItems = row.split(",");
parsedData.push(rowItems[0].toString() + rowItems[1].toString());
}
}
(See Node.js - Eloquent JavaScript).
However, if you want to handle your CSV directly as binary data (rather than converting to a text file before reading), you'd need to add something like this before invoking readFile:
const textContent = String.fromCharCode.apply(null, new Uint16Array(buffer));
...with the textContent parameter in the arrow function replaced by a buffer parameter to handle the binary data.
(If Uint16Array is the wrong size, it might be Uint8Array instead. See Buffer to String - Google.)
You might also find these resources helpful:
JS CSV Tutorial - SeegateSite
JS read-text demo - GeeksForGeeks

How would I parse a large TSV file in node.js?

I'm extremely new to Node and JS. I have a large TSV file (1.5gb) that I need to read in and parse into either an array or JSON object. How would I go about doing that? I don't get an error when I try the code below but it doesn't even enter into it.
var d3 = require("d3-dsv");
d3.tsvParse("amazon_reviews_us_Mobile_Apps_v1_00.tsv", function(error, data)
{
var sum = 0;
data.forEach(function(d)
{
d.helpful_votes += d.helpful_votes;
sum += d.helpful_votes;
});
console.log("Total Helpful Votes: " + sum);
});
Any help would be appreciated.
You need to find a module that provides a streaming parser for a TSV file, meaning that it doesn't load the whole file into memory. You can use readline if your parser is synchronous:
const {createInterface} = require("rl");
const {createReadStream} = require("fs");
createInterface({input: createReadStream("amazon_reviews_us_Mobile_Apps_v1_00.tsv")})
.on('line', (data) => doSomethingWith(data.split("\t")))
.on('end', () => doSomethingWhenDone())
You wrote that you want to parse that file and change it to an array or object of some sort. You'll still need to be looking at your memory, but you could use my scramjet which will allow you to transform the data anyway you like:
const {StringStream} = require("scramjet");
const {createReadStream, createWriteStream} = require("fs");
StringStream.from(createReadStream("amazon_reviews_us_Mobile_Apps_v1_00.tsv"))
// read the file
.CSVParse({delimiter: "\t"})
// parse as csv
.map((entry) => doSomething(entry))
// whatever you return here it will be changed
// this can be asynchronous too, so you can do requests...
.toJSONArray()
.pipe(createWriteStream("somefile.json"))
Let me know what are you trying to achieve besides counting. I'll edit the answer.
BTW, for just counting votes the solution by #hugo-elhaj-lahsen is also good, I'm not sure why it was downvoted.
Use d3.tsv with the promise-based API. Since your file is very large, one optimisation we can do is instead of doing a for-each on each element after they get parsed by D3, use the loop done at parsing time via the initialization function:
var d3 = require("d3-dsv");
var sum = 0
d3.tsvParse("amazon_reviews_us_Mobile_Apps_v1_00.tsv", data => {
sum += d.helpful_votes;
return d // Since this is the parser, need to return the parsed object at the end
}).then(data => {
console.log("Total helpful votes", sum)
})

Reading changes in JSON file using JS

I'm trying to read an updating JSON file from syslog-ng. Currently, syslog, a logging software, is set to continually append a JSON file with logs of the data I want. I'm displaying the data on my cyber attack map only for only 30 seconds until it's not needed anymore. I can read the file and parse what I need, but is there a way to, over time, read & parse only the most recent additions to the file?
Sample code:
//Assume JSON output = {attack source, attack destination, attack type}
//Required modules
var JSONStream = require('JSONStream')
var fs = require('fs');
//Creates readable stream for JSON file parsing
var stream = fs.createReadStream( 'output.json', 'utf8'),
parser = JSONStream.parse(['source', 'dest', 'type']);
//Send read data to parser function
stream.pipe(parser);
//Intake data from parser function
parser.on('data', function (obj) {
//Do something with the object
console.log(obj);
});
I'm using JSONStream to avoid having to read the whole log file into memory, JSONstream should still be able to parse the bits I want, but is there a method to only read changes after the original reading is complete?
Use this code example provided in the library
JSONStream Test code
You don't have to wait for the end, you can use the callback to do your work object by object
But the file structure should suite the library expectation as the files given in the folder
Example file all_npm.json

Read json file content with require vs fs.readFile

Suppose that for every response from an API, i need to map the value from the response to an existing json file in my web application and display the value from the json. What are the better approach in this case to read the json file? require or fs.readfile. Note that there might be thousands of request comes in at a same time.
Note that I do not expect there is any changes to the file during runtime.
request(options, function(error, response, body) {
// compare response identifier value with json file in node
// if identifier value exist in the json file
// return the corresponding value in json file instead
});
I suppose you'll JSON.parse the json file for the comparison, in that case, require is better because it'll parse the file right away and it's sync:
var obj = require('./myjson'); // no need to add the .json extension
If you have thousands of request using that file, require it once outside your request handler and that's it:
var myObj = require('./myjson');
request(options, function(error, response, body) {
// myObj is accessible here and is a nice JavaScript object
var value = myObj.someValue;
// compare response identifier value with json file in node
// if identifier value exist in the json file
// return the corresponding value in json file instead
});
There are two versions for fs.readFile, and they are
Asynchronous version
require('fs').readFile('path/test.json', 'utf8', function (err, data) {
if (err)
// error handling
var obj = JSON.parse(data);
});
Synchronous version
var json = JSON.parse(require('fs').readFileSync('path/test.json', 'utf8'));
To use require to parse json file as below
var json = require('path/test.json');
But, note that
require is synchronous and only reads the file once, following calls return the result from cache
If your file does not have a .json extension, require will not treat the contents of the file as JSON.
Since no one ever cared to write a benchmark, and I had a feeling that require works faster, I made one myself.
I compared fs.readFile (promisified version) vs require (without cache) vs fs.readFileSync.
You can see benchmark here and results here.
For 1000 iterations, it looks like this:
require: 835.308ms
readFileSync: 666.151ms
readFileAsync: 1178.361ms
So what should you use? The answer is not so simple.
Use require when you need to cache object forever. And better use Object.freeze to avoid mutating it in application.
Use readFileSync in unit tests or on blocking application startup - it is fastest.
Use readFile or promisified version when application is running and you don't wanna block event loop.
Use node-fixtures if dealing with JSON fixtures in your tests.
The project will look for a directory named fixtures which must be child of your test directory in order to load all the fixtures (*.js or *.json files):
// test/fixtures/users.json
{
"dearwish": {
"name": "David",
"gender": "male"
},
"innaro": {
"name": "Inna",
"gender": "female"
}
}
// test/users.test.js
var fx = require('node-fixtures');
fx.users.dearwish.name; // => "David"
I only want to point out that it seems require keeps the file in memory even when the variables should be deleted. I had following case:
for (const file of fs.readdirSync('dir/contains/jsons')) {
// this variable should be deleted after each loop
// but actually not, perhaps because of "require"
// it leads to "heap out of memory" error
const json = require('dir/contains/jsons/' + file);
}
for (const file of fs.readdirSync('dir/contains/jsons')) {
// this one with "readFileSync" works well
const json = JSON.parse(fs.readFileSync('dir/contains/jsons/' + file));
}
The first loop with require can't read all JSON files because of "heap out of memory" error. The second loop with readFile works.
If your file is empty, require will break. It will throw an error:
SyntaxError ... Unexpected end of JSON input.
With readFileSync/readFile you can deal with this:
let routesJson = JSON.parse(fs.readFileSync('./routes.json', 'UTF8') || '{}');
or:
let routesJson
fs.readFile('./dueNfe_routes.json', 'UTF8', (err, data) => {
routesJson = JSON.parse(data || '{}');
});
{
"country": [
"INDIA",
"USA"
],
"codes": [
"IN",
"US"
]
}
// countryInfo.json
const { country, code } = require('./countryInfo.json');
console.log(country[0]); // "INDIA"
console.log(code[0]); // "IN"

How to parse JSON using Node.js? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
The community reviewed whether to reopen this question 2 months ago and left it closed:
Original close reason(s) were not resolved
Improve this question
How should I parse JSON using Node.js? Is there some module which will validate and parse JSON securely?
You can simply use JSON.parse.
The definition of the JSON object is part of the ECMAScript 5 specification. node.js is built on Google Chrome's V8 engine, which adheres to ECMA standard. Therefore, node.js also has a global object JSON[docs].
Note - JSON.parse can tie up the current thread because it is a synchronous method. So if you are planning to parse big JSON objects use a streaming json parser.
you can require .json files.
var parsedJSON = require('./file-name');
For example if you have a config.json file in the same directory as your source code file you would use:
var config = require('./config.json');
or (file extension can be omitted):
var config = require('./config');
note that require is synchronous and only reads the file once, following calls return the result from cache
Also note You should only use this for local files under your absolute control, as it potentially executes any code within the file.
You can use JSON.parse().
You should be able to use the JSON object on any ECMAScript 5 compatible JavaScript implementation. And V8, upon which Node.js is built is one of them.
Note: If you're using a JSON file to store sensitive information (e.g. passwords), that's the wrong way to do it. See how Heroku does it: https://devcenter.heroku.com/articles/config-vars#setting-up-config-vars-for-a-deployed-application. Find out how your platform does it, and use process.env to retrieve the config vars from within the code.
Parsing a string containing JSON data
var str = '{ "name": "John Doe", "age": 42 }';
var obj = JSON.parse(str);
Parsing a file containing JSON data
You'll have to do some file operations with fs module.
Asynchronous version
var fs = require('fs');
fs.readFile('/path/to/file.json', 'utf8', function (err, data) {
if (err) throw err; // we'll not consider error handling for now
var obj = JSON.parse(data);
});
Synchronous version
var fs = require('fs');
var json = JSON.parse(fs.readFileSync('/path/to/file.json', 'utf8'));
You wanna use require? Think again!
You can sometimes use require:
var obj = require('path/to/file.json');
But, I do not recommend this for several reasons:
require is synchronous. If you have a very big JSON file, it will choke your event loop. You really need to use JSON.parse with fs.readFile.
require will read the file only once. Subsequent calls to require for the same file will return a cached copy. Not a good idea if you want to read a .json file that is continuously updated. You could use a hack. But at this point, it's easier to simply use fs.
If your file does not have a .json extension, require will not treat the contents of the file as JSON.
Seriously! Use JSON.parse.
load-json-file module
If you are reading large number of .json files, (and if you are extremely lazy), it becomes annoying to write boilerplate code every time. You can save some characters by using the load-json-file module.
const loadJsonFile = require('load-json-file');
Asynchronous version
loadJsonFile('/path/to/file.json').then(json => {
// `json` contains the parsed object
});
Synchronous version
let obj = loadJsonFile.sync('/path/to/file.json');
Parsing JSON from streams
If the JSON content is streamed over the network, you need to use a streaming JSON parser. Otherwise it will tie up your processor and choke your event loop until JSON content is fully streamed.
There are plenty of packages available in NPM for this. Choose what's best for you.
Error Handling/Security
If you are unsure if whatever that is passed to JSON.parse() is valid JSON, make sure to enclose the call to JSON.parse() inside a try/catch block. A user provided JSON string could crash your application, and could even lead to security holes. Make sure error handling is done if you parse externally-provided JSON.
use the JSON object:
JSON.parse(str);
Another example of JSON.parse :
var fs = require('fs');
var file = __dirname + '/config.json';
fs.readFile(file, 'utf8', function (err, data) {
if (err) {
console.log('Error: ' + err);
return;
}
data = JSON.parse(data);
console.dir(data);
});
I'd like to mention that there are alternatives to the global JSON object.
JSON.parse and JSON.stringify are both synchronous, so if you want to deal with big objects you might want to check out some of the asynchronous JSON modules.
Have a look: https://github.com/joyent/node/wiki/Modules#wiki-parsers-json
Include the node-fs library.
var fs = require("fs");
var file = JSON.parse(fs.readFileSync("./PATH/data.json", "utf8"));
For more info on 'fs' library , refer the documentation at http://nodejs.org/api/fs.html
Since you don't know that your string is actually valid, I would put it first into a try catch. Also since try catch blocks are not optimized by node, i would put the entire thing into another function:
function tryParseJson(str) {
try {
return JSON.parse(str);
} catch (ex) {
return null;
}
}
OR in "async style"
function tryParseJson(str, callback) {
process.nextTick(function () {
try {
callback(null, JSON.parse(str));
} catch (ex) {
callback(ex)
}
})
}
Parsing a JSON stream? Use JSONStream.
var request = require('request')
, JSONStream = require('JSONStream')
request({url: 'http://isaacs.couchone.com/registry/_all_docs'})
.pipe(JSONStream.parse('rows.*'))
.pipe(es.mapSync(function (data) {
return data
}))
https://github.com/dominictarr/JSONStream
Everybody here has told about JSON.parse, so I thought of saying something else. There is a great module Connect with many middleware to make development of apps easier and better. One of the middleware is bodyParser. It parses JSON, html-forms and etc. There is also a specific middleware for JSON parsing only noop.
Take a look at the links above, it might be really helpful to you.
JSON.parse("your string");
That's all.
as other answers here have mentioned, you probably want to either require a local json file that you know is safe and present, like a configuration file:
var objectFromRequire = require('path/to/my/config.json');
or to use the global JSON object to parse a string value into an object:
var stringContainingJson = '\"json that is obtained from somewhere\"';
var objectFromParse = JSON.parse(stringContainingJson);
note that when you require a file the content of that file is evaluated, which introduces a security risk in case it's not a json file but a js file.
here, i've published a demo where you can see both methods and play with them online (the parsing example is in app.js file - then click on the run button and see the result in the terminal):
http://staging1.codefresh.io/labs/api/env/json-parse-example
you can modify the code and see the impact...
Using JSON for your configuration with Node.js? Read this and get your configuration skills over 9000...
Note: People claiming that data = require('./data.json'); is a
security risk and downvoting people's answers with zealous zeal: You're exactly and completely wrong.
Try placing non-JSON in that file... Node will give you an error, exactly like it would if you did the same thing with the much slower and harder to code manual file read and then subsequent JSON.parse(). Please stop spreading misinformation; you're hurting the world, not helping. Node was designed to allow this; it is not a security risk!
Proper applications come in 3+ layers of configuration:
Server/Container config
Application config
(optional) Tenant/Community/Organization config
User config
Most developers treat their server and app config as if it can change. It can't. You can layer changes from higher layers on top of each other, but you're modifying base requirements. Some things need to exist! Make your config act like it's immutable, because some of it basically is, just like your source code.
Failing to see that lots of your stuff isn't going to change after startup leads to anti-patterns like littering your config loading with try/catch blocks, and pretending you can continue without your properly setup application. You can't. If you can, that belongs in the community/user config layer, not the server/app config layer. You're just doing it wrong. The optional stuff should be layered on top when the application finishes it's bootstrap.
Stop banging your head against the wall: Your config should be ultra simple.
Take a look at how easy it is to setup something as complex as a protocol-agnostic and datasource-agnostic service framework using a simple json config file and simple app.js file...
container-config.js...
{
"service": {
"type" : "http",
"name" : "login",
"port" : 8085
},
"data": {
"type" : "mysql",
"host" : "localhost",
"user" : "notRoot",
"pass" : "oober1337",
"name" : "connect"
}
}
index.js... (the engine that powers everything)
var config = require('./container-config.json'); // Get our service configuration.
var data = require(config.data.type); // Load our data source plugin ('npm install mysql' for mysql).
var service = require(config.service.type); // Load our service plugin ('http' is built-in to node).
var processor = require('./app.js'); // Load our processor (the code you write).
var connection = data.createConnection({ host: config.data.host, user: config.data.user, password: config.data.pass, database: config.data.name });
var server = service.createServer(processor);
connection.connect();
server.listen(config.service.port, function() { console.log("%s service listening on port %s", config.service.type, config.service.port); });
app.js... (the code that powers your protocol-agnostic and data-source agnostic service)
module.exports = function(request, response){
response.end('Responding to: ' + request.url);
}
Using this pattern, you can now load community and user config stuff on top of your booted app, dev ops is ready to shove your work into a container and scale it. You're read for multitenant. Userland is isolated. You can now separate the concerns of which service protocol you're using, which database type you're using, and just focus on writing good code.
Because you're using layers, you can rely on a single source of truth for everything, at any time (the layered config object), and avoid error checks at every step, worrying about "oh crap, how am I going to make this work without proper config?!?".
If you need to parse JSON with Node.js in a secure way (aka: the user can input data, or a public API) I would suggest using secure-json-parse.
The usage is like the default JSON.parse but it will protect your code from:
prototype poisoning
and constructor abuse:
const badJson = '{ "a": 5, "b": 6, "__proto__": { "x": 7 }, "constructor": {"prototype": {"bar": "baz"} } }'
const infected = JSON.parse(badJson)
console.log(infected.x) // print undefined
const x = Object.assign({}, infected)
console.log(x.x) // print 7
const sjson = require('secure-json-parse')
console.log(sjson.parse(badJson)) // it will throw by default, you can ignore malicious data also
My solution:
var fs = require('fs');
var file = __dirname + '/config.json';
fs.readFile(file, 'utf8', function (err, data) {
if (err) {
console.log('Error: ' + err);
return;
}
data = JSON.parse(data);
console.dir(data);
});
Just want to complete the answer (as I struggled with it for a while), want to show how to access the json information, this example shows accessing Json Array:
var request = require('request');
request('https://server/run?oper=get_groups_joined_by_user_id&user_id=5111298845048832', function (error, response, body) {
if (!error && response.statusCode == 200) {
var jsonArr = JSON.parse(body);
console.log(jsonArr);
console.log("group id:" + jsonArr[0].id);
}
})
Just to make this as complicated as possible, and bring in as many packages as possible...
const fs = require('fs');
const bluebird = require('bluebird');
const _ = require('lodash');
const readTextFile = _.partial(bluebird.promisify(fs.readFile), _, {encoding:'utf8',flag:'r'});
const readJsonFile = filename => readTextFile(filename).then(JSON.parse);
This lets you do:
var dataPromise = readJsonFile("foo.json");
dataPromise.then(console.log);
Or if you're using async/await:
let data = await readJsonFile("foo.json");
The advantage over just using readFileSync is that your Node server can process other requests while the file is being read off disk.
JSON.parse will not ensure safety of json string you are parsing. You should look at a library like json-safe-parse or a similar library.
From json-safe-parse npm page:
JSON.parse is great, but it has one serious flaw in the context of JavaScript: it allows you to override inherited properties. This can become an issue if you are parsing JSON from an untrusted source (eg: a user), and calling functions on it you would expect to exist.
Leverage Lodash's attempt function to return an error object, which you can handle with the isError function.
// Returns an error object on failure
function parseJSON(jsonString) {
return _.attempt(JSON.parse.bind(null, jsonString));
}
// Example Usage
var goodJson = '{"id":123}';
var badJson = '{id:123}';
var goodResult = parseJSON(goodJson);
var badResult = parseJSON(badJson);
if (_.isError(goodResult)) {
console.log('goodResult: handle error');
} else {
console.log('goodResult: continue processing');
}
// > goodResult: continue processing
if (_.isError(badResult)) {
console.log('badResult: handle error');
} else {
console.log('badResult: continue processing');
}
// > badResult: handle error
Always be sure to use JSON.parse in try catch block as node always throw an Unexpected Error if you have some corrupted data in your json so use this code instead of simple JSON.Parse
try{
JSON.parse(data)
}
catch(e){
throw new Error("data is corrupted")
}
As mentioned in the above answers, We can use JSON.parse() to parse the strings to JSON
But before parsing, be sure to parse the correct data or else it might bring your whole application down
it is safe to use it like this
let parsedObj = {}
try {
parsedObj = JSON.parse(data);
} catch(e) {
console.log("Cannot parse because data is not is proper json format")
}
Use JSON.parse(str);. Read more about it here.
Here are some examples:
var jsonStr = '{"result":true, "count":42}';
obj = JSON.parse(jsonStr);
console.log(obj.count); // expected output: 42
console.log(obj.result); // expected output: true
If you want to add some comments in your JSON and allow trailing commas you might want use below implemention:
var fs = require('fs');
var data = parseJsData('./message.json');
console.log('[INFO] data:', data);
function parseJsData(filename) {
var json = fs.readFileSync(filename, 'utf8')
.replace(/\s*\/\/.+/g, '')
.replace(/,(\s*\})/g, '}')
;
return JSON.parse(json);
}
Note that it might not work well if you have something like "abc": "foo // bar" in your JSON. So YMMV.
If the JSON source file is pretty big, may want to consider the asynchronous route via native async / await approach with Node.js 8.0 as follows
const fs = require('fs')
const fsReadFile = (fileName) => {
fileName = `${__dirname}/${fileName}`
return new Promise((resolve, reject) => {
fs.readFile(fileName, 'utf8', (error, data) => {
if (!error && data) {
resolve(data)
} else {
reject(error);
}
});
})
}
async function parseJSON(fileName) {
try {
return JSON.parse(await fsReadFile(fileName));
} catch (err) {
return { Error: `Something has gone wrong: ${err}` };
}
}
parseJSON('veryBigFile.json')
.then(res => console.log(res))
.catch(err => console.log(err))
I use fs-extra. I like it a lot because -although it supports callbacks- it also supports Promises. So it just enables me to write my code in a much more readable way:
const fs = require('fs-extra');
fs.readJson("path/to/foo.json").then(obj => {
//Do dome stuff with obj
})
.catch(err => {
console.error(err);
});
It also has many useful methods which do not come along with the standard fs module and, on top of that, it also bridges the methods from the native fs module and promisifies them.
NOTE: You can still use the native Node.js methods. They are promisified and copied over to fs-extra. See notes on fs.read() & fs.write()
So it's basically all advantages. I hope others find this useful.
You can use JSON.parse() (which is a built in function that will probably force you to wrap it with try-catch statements).
Or use some JSON parsing npm library, something like json-parse-or
Use this to be on the safe side
var data = JSON.parse(Buffer.concat(arr).toString());
NodeJs is a JavaScript based server, so you can do the way you do that in pure JavaScript...
Imagine you have this Json in NodeJs...
var details = '{ "name": "Alireza Dezfoolian", "netWorth": "$0" }';
var obj = JSON.parse(details);
And you can do above to get a parsed version of your json...
No further modules need to be required.
Just use
var parsedObj = JSON.parse(yourObj);
I don think there is any security issues regarding this
It's simple, you can convert JSON to string using JSON.stringify(json_obj), and convert string to JSON using JSON.parse("your json string").

Categories

Resources