How to combine these two codes, so it doesn't just covert csv to Json (first code), but also save this as an json array in an extra file?(second code)
this (first) code converts csv file to json array:
const fs = require("fs");
let fileReadStream = fs.createReadStream("myCsvFile.csv");
let invalidLineCount = 0;
const csvtojson = require("csvtojson");
csvtojson({ "delimiter": ";", "fork": true })
.preFileLine((fileLineString, lineIdx)=> {
let invalidLinePattern = /^['"].*[^"'];/;
if (invalidLinePattern.test(fileLineString)) {
console.log(`Line #${lineIdx + 1} is invalid, skipping:`, fileLineString);
fileLineString = "";
invalidLineCount++;
}
return fileLineString
})
.fromStream(fileReadStream)
.subscribe((dataObj) => {
console.log(dataObj);
// I added the second code hier, but it wirtes the last object of the array (because of the loop?)
}
});
and this (second) code saves the json array to an external file:
fs.writeFile('example.json', JSON.stringify(dataObj, null, 4);
The quistion is how to put the second codes into the first code (combine them)?
You can use .on('done',(error)=>{ ... }) method. (csvtojson). Push the data into a variable in subscribe method and write the data as JSON in .on('done'). (test was successful).
Check it out:
let fileReadStream = fs.createReadStream("username-password.csv");
let invalidLineCount = 0;
let data = []
csvtojson({ "delimiter": ";", "fork": true })
.preFileLine((fileLineString, lineIdx)=> {
let invalidLinePattern = /^['"].*[^"'];/;
if (invalidLinePattern.test(fileLineString)) {
console.log(`Line #${lineIdx + 1} is invalid, skipping:`, fileLineString);
fileLineString = "";
invalidLineCount++;
}
return fileLineString
})
.fromStream(fileReadStream)
.subscribe((dataObj) => {
// console.log(dataObj)
data.push(dataObj)
})
.on('done',(error)=>{
fs.writeFileSync('example.json', JSON.stringify(data, null, 4))
})
Not sure if you are able to change the library but I would definitely recommend Papaparse for this - https://www.npmjs.com/package/papaparse
Your code would then look something like this:
const fs = require('fs'), papa = require('papaparse');
var readFile = fs.createReadStream(file);
papa.parse(readFile, {
complete: function (results, file) {
fs.writeFile('example.json', JSON.stringifiy(results.data), function (err) {
if(err) console.log(err);
// callback etc
})
}
});
Related
I am currently trying to save an js object with some binary data and other values. The result should look something like this:
{
"value":"xyz",
"file1":"[FileContent]",
"file2":"[LargeFileContent]"
}
Till now I had no binary data so I saved everything in JSON. With the binary data I am starting to run into problems with large files (>1GB).
I tried this approach:
JSON.stringify or how to serialize binary data as base64 encoded JSON?
Which worked for smaller files with around 20MB. However if I am using these large files then the result of the FileReader is always an empty string.
The result would look like this:
{
"value":"xyz:,
"file1":"[FileContent]",
"file2":""
}
The code that is reading the blobs is pretty similar to the one in the other post:
const readFiles = async (measurements: FormData) => {
setFiles([]); //This is where the result is beeing stored
let promises: Array<Promise<string>> = [];
measurements.forEach((value) => {
let dataBlob = value as Blob;
console.log(dataBlob); //Everything is fine here
promises.push(
new Promise((resolve, reject) => {
const reader = new FileReader();
reader.readAsDataURL(dataBlob);
reader.onloadend = function () {
resolve(reader.result as string);
};
reader.onerror = function (error) {
reject(error);
};
})
);
});
let result = await Promise.all(promises);
console.log(result); //large file shows empty
setFiles(result);
};
Is there something else I can try?
Since you have to share the data with other computers, you will have to generate your own binary format.
Obviously you can make it as you wish, but given your simple case of just storing Blob objects with a JSON string, we can come up with a very simple schema where we first store some metadata about the Blobs we store, and then the JSON string where we replaced each Blob with an UUID.
This works because the limitation you hit is actually on the max length a string can be, and we can .slice() our binary file to read only part of it. Since we never read the binary data as string we're fine, the JSON will only hold a UUID in places where we had Blobs and it shouldn't grow too much.
Here is one such implementation I made quickly as a proof of concept:
/*
* Stores JSON data along with Blob objects in a binary file.
* Schema:
* 4 first bytes = # of blobs stored in the file
* next 4 * # of blobs = size of each Blob
* remaining = JSON string
*
*/
const hopefully_unique_id = "_blob_"; // <-- change that
function generateBinary(JSObject) {
let blobIndex = 0;
const blobsMap = new Map();
const JSONString = JSON.stringify(JSObject, (key, value) => {
if (value instanceof Blob) {
if (blobsMap.has(value)) {
return blobsMap.get(value);
}
blobsMap.set(value, hopefully_unique_id + (blobIndex++));
return hopefully_unique_id + blobIndex;
}
return value;
});
const blobsArr = [...blobsMap.keys()];
const data = [
new Uint32Array([blobsArr.length]),
...blobsArr.map((blob) => new Uint32Array([blob.size])),
...blobsArr,
JSONString
];
return new Blob(data);
}
async function readBinary(bin) {
const numberOfBlobs = new Uint32Array(await bin.slice(0, 4).arrayBuffer())[0];
let cursor = 4 * (numberOfBlobs + 1);
const blobSizes = new Uint32Array(await bin.slice(4, cursor).arrayBuffer())
const blobs = [];
for (let i = 0; i < numberOfBlobs; i++) {
const blobSize = blobSizes[i];
blobs.push(bin.slice(cursor, cursor += blobSize));
}
const pattern = new RegExp(`^${hopefully_unique_id}\\d+$`);
const JSObject = JSON.parse(
await bin.slice(cursor).text(),
(key, value) => {
if (typeof value !== "string" || !pattern.test(value)) {
return value;
}
const index = +value.replace(hopefully_unique_id, "") - 1;
return blobs[index];
}
);
return JSObject;
}
// demo usage
(async () => {
const obj = {
foo: "bar",
file1: new Blob(["Let's pretend I'm actually binary data"]),
// This one is 512MiB, which is bigger than the max string size in Chrome
// i.e it can't be stored in a JSON string in Chrome
file2: new Blob([Uint8Array.from({ length: 512*1024*1024 }, () => 255)]),
};
const bin = generateBinary(obj);
console.log("as binary", bin);
const back = await readBinary(bin);
console.log({back});
console.log("file1 read as text:", await back.file1.text());
})().catch(console.error);
I try to download, unzip and process a GTFS file in zip format. Downloading and unzipping are working, but I get error message when try to use txt files with gtfs-utils module in gtfsFunc(). Output is undefined. Delays are hardcoded just for testing purpose.
const dl = new DownloaderHelper('http://www.bkk.hu/gtfs/budapest_gtfs.zip', __dirname);
dl.on('end', () => console.log('Download Completed'))
dl.start();
myVar = setTimeout(zipFunc, 30000);
function zipFunc() {
console.log('Unzipping started...');
var zip = new AdmZip("./budapest_gtfs.zip");
var zipEntries = zip.getEntries();
zip.extractAllTo("./gtfsdata/", true);
}
myVar = setTimeout(gtfsFunc, 40000);
function gtfsFunc() {
console.log('Processing started...');
const readFile = name => readCsv('./gtfsdata/' + name + '.txt')
const filter = t => t.route_id === 'M4'
readStops(readFile, filter)
.then((stops) => {
const someStopId = Object.keys(stops)[0]
const someStop = stops[someStopId]
console.log(someStop)
})
}
As #ChickenSoups said, you are trying to filter stops files with route_id field and this txt doesnt have this field.
The fields that stops has are:
stop_id, stop_name, stop_lat, stop_lon, stop_code, location_type, parent_station, wheelchair_boarding, stop_direction
Perhaps what you need is read the Trips.txt file instead Stops.txt, as this file has route_id field.
And you can accomplish this using readTrips function:
const readTrips = require("gtfs-utils/read-trips");
And your gtfsFunc would be:
function gtfsFunc() {
console.log("Processing started...");
const readFile = (name) => {
return readCsv("./gtfsdata/" + name + ".txt").on("error", console.error);
};
//I used 5200 because your Trips.txt contains routes id with this value
const filterTrips = (t) => t.route_id === "5200";
readTrips(readFile, filterTrips).then((stops) => {
console.log("filtered stops", stops);
const someStopId = Object.keys(stops)[0];
const someStop = stops[someStopId];
console.log("someStop", someStop);
});
}
Or if what you really want is to read Stops.txt, you just need to change your filter
const filter = t => t.route_id === 'M4'
to use some valid field, for example:
const filter = t => t.stop_name=== 'M4'
Stop data don't have route_id field.
You should try other data, such as Trip or Route
You can look at the first row in your data file to see which field do they have.
GTFS data structure here
I have to parse a csv file of test cases exported from Zephyr for Jira. I need to format them in a way that I can import them into a new test management system. The new format requires test steps to be in one cell separated by a line break like so:
Cell formatting
Here is the script I have written to do the parsing. The steps and expected results are all added to the same cell but without the line breaks.
const fs = require("fs");
const Papa = require("papaparse");
// ******************* user defined values ******************* //
// file to parse
const file = process.argv[2];
// ******************* parse csv ******************* //
const parsedTests = [];
// create file object
const fileObject = fs.readFileSync(file, "utf8");
// set config
const config = {
header: true,
complete: function(results) {
return results;
}
};
// parse and save results
const results = Papa.parse(fileObject, config);
const arr = Object.values(results.data);
function addTests(parsedTests, test) {
const found = parsedTests.some(el => el.ExecutionId === test.ExecutionId);
if (!found) {
parsedTests.push(test);
}
return parsedTests;
}
function addSteps(parstedTests, test, el) {
const found = parsedTests.some(el => el.ExecutionId === test.ExecutionId);
if (found && test.OrderId > 1) {
test.Step += `${test.Step}\n\n`;
test.ExpectedResults += `${test.ExpectedResults}\n\n`;
}
}
arr.forEach(el => {
addTests(parsedTests, el);
});
arr.forEach(el => {
addSteps(parsedTests, el);
});
console.log(parsedTests)
// write back to csv
fs.writeFileSync("./parsedtests.csv", Papa.unparse(parsedTests));
I have a text file where each line is separated into 4 categories by colons and I want to put this into a JSON file, where each category is a value to the corresponding name in the JSON file.
Example data.txt file:
Date1:cat1:dog1:bug1
Date2:cat2:dog2:bug2
Date3:cat3:dog3:bug3
Example JSON file:
{
"Date1": {
"cat": "cat1",
"dog": "dog1",
"bug": "bug1"
},
"Date2": {
"cat": "cat2",
"dog": "dog2",
"bug": "bug2"
...
...
}
I've never used JSON before but I think that's how to format it. How would I sort each line using the colons as markers for the next value and store it in the JSON file with the correct name using JavaScript and Node.js?
Use the csv package if you don't want to handle parsing csv file by yourself.
const fs = require("fs");
const csv = require("csv");
const result = {};
const keys = ["cat", "dog", "bug"]
// Read data
const readStream = fs.createReadStream("yourfile.txt");
// Parser
const parser = csv.parse({ delimiter: ":" });
parser.on("data", (chunk) => {
result[chunk[0]] = {};
for(let i = 1; i < chunk.length; i ++) {
result[chunk[0]][keys[i - 1]] = chunk[i];
}
});
parser.on("end", () => {
console.log(result);
});
readStream.pipe(parser);
If your JSON has this defined structure you can go about it with the following code:
import * as fs from 'fs';
/* If you have a large file this is a bad Idea, refer to reading from a stream
* From zhangjinzhou's answer
*/
const file = fs.readFileSync('path/to/data.txt', 'utf8');
const json = file.split(/\n|\r\n/).map(line => {
const values = line.split(":");
let obj = {}
obj[values[0]] = {
cat: values[1],
dog: values[2],
bug: values[3],
};
return obj
}).reduce((acc, current) => Object.assign(acc, current), {})
Using RegExp and Array#forEach, convert the string to lines, then iterate over them and fill up the object with the corresponding data via:
const dataFileContent =
`Date1:cat1:dog1:bug1
Date2:cat2:dog2:bug2
Date3:cat3:dog3:bug3`;
function processData(data) {
// convert to lines
const lines = data.match(/[^\r\n]+/g) || [];
const object = {};
// iterate over the lines
lines.forEach(line => {
const parts = line.split(':');
const main = parts.shift();
const pattern = /^(.*?)(\d+)$/;
// create an object for each main part
object[main] = {};
// fill each main part with the sub parts
parts.forEach(part => {
const match = part.match(pattern) || [];
const key = match[1];
const value = match[2];
if (match) {
object[main][key] = key + value;
}
});
});
return object;
}
const processedData = processData(dataFileContent);
console.log(processedData);
Then convert the processedData to JSON by using JSON.stringify and save it to a file via:
const fs = require('fs');
...
// processData
...
const json = JSON.stringify(processedData);
fs.writeFile('my_json_file.json', json, 'utf8');
For larger files, consider using Streams in Node.js as suggested by #zhangjinzhou.
I have a txt file contains:
{"date":"2013/06/26","statement":"insert","nombre":1}
{"date":"2013/06/26","statement":"insert","nombre":1}
{"date":"2013/06/26","statement":"select","nombre":4}
how I can convert the contents of the text file as array such as:
statement = [
{"date":"2013/06/26","statement":"insert","nombre":1},
{"date":"2013/06/26","statement":"insert","nombre":1},
{"date":"2013/06/26","statement":"select","nombre":4}, ];
I use the fs module node js. Thanks
Sorry
I will explain more detailed:
I have an array :
st = [
{"date":"2013/06/26","statement":"insert","nombre":1},
{"date":"2013/06/26","statement":"insert","nombre":5},
{"date":"2013/06/26","statement":"select","nombre":4},
];
if I use this code :
var arr = new LINQ(st)
.OrderBy(function(x) {return x.nombre;})
.Select(function(x) {return x.statement;})
.ToArray();
I get the result I want.
insert select insert
but the problem my data is in a text file.
any suggestion and thanks again.
There is no reason for not to do your file parser yourself. This will work on any size of a file:
var fs = require('fs');
var fileStream = fs.createReadStream('file.txt');
var data = "";
fileStream.on('readable', function() {
//this functions reads chunks of data and emits newLine event when \n is found
data += fileStream.read();
while( data.indexOf('\n') >= 0 ){
fileStream.emit('newLine', data.substring(0,data.indexOf('\n')));
data = data.substring(data.indexOf('\n')+1);
}
});
fileStream.on('end', function() {
//this functions sends to newLine event the last chunk of data and tells it
//that the file has ended
fileStream.emit('newLine', data , true);
});
var statement = [];
fileStream.on('newLine',function(line_of_text, end_of_file){
//this is the code where you handle each line
// line_of_text = string which contains one line
// end_of_file = true if the end of file has been reached
statement.push( JSON.parse(line_of_text) );
if(end_of_file){
console.dir(statement);
//here you have your statement object ready
}
});
If it's a small file, you might get away with something like this:
// specifying the encoding means you don't have to do `.toString()`
var arrayOfThings = fs.readFileSync("./file", "utf8").trim().split(/[\r\n]+/g).map(function(line) {
// this try/catch will make it so we just return null
// for any lines that don't parse successfully, instead
// of throwing an error.
try {
return JSON.parse(line);
} catch (e) {
return null;
}
// this .filter() removes anything that didn't parse correctly
}).filter(function(object) {
return !!object;
});
If it's larger, you might want to consider reading it in line-by-line using any one of the many modules on npm for consuming lines from a stream.
Wanna see how to do it with streams? Let's see how we do it with streams. This isn't a practical example, but it's fun anyway!
var stream = require("stream"),
fs = require("fs");
var LineReader = function LineReader(options) {
options = options || {};
options.objectMode = true;
stream.Transform.call(this, options);
this._buffer = "";
};
LineReader.prototype = Object.create(stream.Transform.prototype, {constructor: {value: LineReader}});
LineReader.prototype._transform = function _transform(input, encoding, done) {
if (Buffer.isBuffer(input)) {
input = input.toString("utf8");
}
this._buffer += input;
var lines = this._buffer.split(/[\r\n]+/);
this._buffer = lines.pop();
for (var i=0;i<lines.length;++i) {
this.push(lines[i]);
}
return done();
};
LineReader.prototype._flush = function _flush(done) {
if (this._buffer.length) {
this.push(this._buffer);
}
return done();
};
var JSONParser = function JSONParser(options) {
options = options || {};
options.objectMode = true;
stream.Transform.call(this, options);
};
JSONParser.prototype = Object.create(stream.Transform.prototype, {constructor: {value: JSONParser}});
JSONParser.prototype._transform = function _transform(input, encoding, done) {
try {
input = JSON.parse(input);
} catch (e) {
return done(e);
}
this.push(input);
return done();
};
var Collector = function Collector(options) {
options = options || {};
options.objectMode = true;
stream.Transform.call(this, options);
this._entries = [];
};
Collector.prototype = Object.create(stream.Transform.prototype, {constructor: {value: Collector}});
Collector.prototype._transform = function _transform(input, encoding, done) {
this._entries.push(input);
return done();
};
Collector.prototype._flush = function _flush(done) {
this.push(this._entries);
return done();
};
fs.createReadStream("./file").pipe(new LineReader()).pipe(new JSONParser()).pipe(new Collector()).on("readable", function() {
var results = this.read();
console.log(results);
});
fs.readFileSync("myfile.txt").toString().split(/[\r\n]/)
This gets your each line as a string
You can then use UnderscoreJS or your own for loop to apply the JSON.parse("your json string") method to each element of the array.
var arr = fs.readFileSync('mytxtfile', 'utf-8').split('\n')
I think this is the simplest way of creating an array from your text file