CSV parser documentation

CSV parser documentation - javascript

I've been using fast-csv as my converter library for a while now. A problem emerged when a client actually attempted to upload a csv files that actually contained ';' as a delimiter instead of the default ','. The NPM documentation explicitly says that all methods should accept an 'option' (don't understand why not an object) to actually switch these flags. Of course I can always go into the source js file and change the delimiter manually, but I'd really like to understand this documentation since its all part of my growing as a developer, but still I can't manage to grasp it on how to actually use these options (delimiter) when parsing it on my code. If none of you guys can understand it either, maybe you have some recommendations regarding csv parsers on javascript? Maybe a manual script would be more versatile and useful?
Documentation sample from (fast-csv on npm):
All methods accept the following options
objectMode=true: Ensure that data events have an object emitted rather than the stringified version set to false to have a stringified buffer.
headers=false: Set to true if you expect the first line of your CSV to contain headers, alternatly you can specify an array of headers to use. You can also specify a sparse array to omit some of the columns.
ignoreEmpty=false: If you wish to ignore empty rows.
discardUnmappedColumns=false: If you want to discard columns that do not map to a header.
strictColumnHandling=false: If you want to consider empty lines/lines with too few fields as errors - Only to be used with headers=true
renameHeaders=false: If you want the first line of the file to be removed and replaced by the one provided in the headers option - Only to be used with headers=[String]
delimiter=',': If your data uses an alternate delimiter such as ; or \t.
Also, here is a sample code of how it works, and also how I use it (with pipe):
var stream = fs.createReadStream("my.csv");
var csvStream = csv()
.on("data", function(data){
console.log(data);
})
.on("end", function(){
console.log("done");
});
stream.pipe(csvStream);
//or
var csvStream = csv
.parse()
.on("data", function(data){
console.log(data);
})
.on("end", function(){
console.log("done");
});
stream.pipe(csvStream);
PS: I have tried it asking it elsewhere (where the package is published), but had no replies.

The NPM documentation explicitly says that all methods should accept
an 'option' (don't understand why not an object) to actually switch
these flags
The quoted text basically means that all methods accept a so-called options object as their last parameter. You can specify an alternate delimiter by setting the corresponding field in that object.
but I'd really like to understand this documentation since its all
part of my growing as a developer
I strongly recommend looking at the tests whenever you feel something's not clearly explained in the docs. There's actually a test case for the exact scenario you're describing:
it.should("support semicolon delimiters", function (next) {
var actual = [];
csv
.fromPath(path.resolve(__dirname, "./assets/test16.txt"), {headers: true, delimiter: ";"})
.on("data", function (data) {
actual.push(data);
})
.on("error", next)
.on("end", function (count) {
assert.deepEqual(actual, expected14);
assert.equal(count, actual.length);
next();
});
});

Related

Fetching and processing data with semicolon delimiters and no headers

I have had some trouble understanding the D3.JS fetch documentation:
My data source is:
20180601 000000;1.168200;1.168240;1.168140;1.168230;0;
20180601 000100;1.168220;1.168230;1.168190;1.168190;0;
20180601 000200;1.168180;1.168180;1.168080;1.168120;0;
20180601 000300;1.168130;1.168160;1.168130;1.168140;0;
where the format is:
%Y%m%d %H%M%S;number1;number2;number3;number4;number5;
My difficulties are:
Adding headers to the data
Dealing with semicolons as the delimiter instead of commas
1) From what I can work out I need to read the file without parsing it, then join a text string to the beginning of the file then finally parse the data.
d3.text(data.csv, function(error, textString){});
var headers = ["date","time","data1","data2"].join("\t");
d3.csv.parse(headers + textString);
2) I can use the dsv format and set the delimiter to semicolons?
d3.dsv(";", "text/plain")
The rough code I ended up with is:
var time_parse = d3.timeParse( '%Y%m%d %H%M%S');
var time_format = d3.timeFormat('%H%M');
d3.text(data.csv, function(error, textString){
var headers = ["time;number1;number2;number3;number4;number5;"].join("\t")
d3.csv.parse(headers + textString)
d3.dsv(";", "text/plain")
data.forEach(function(e,i){
data[i].time = time_parse(e.date);
})
})
Ideally I want the data to look like this when logged:
Time, Number1, Number2, Number3, Number4, Number5
00:00, 1.168200, 1.168240, 1.168140, 1.168230, 0
etc
What is the flaw in my thinking and can anyone offer advice on how to solve my problem and similar problems in the future?
Note: I am new to Javascript and d3 and although I have been able to work through most of the documentation involving drawing svgs, creating axis and scales, transitions etc with no problems, I am struggling to get my head around actually getting data from real sources (e.g the internet) and processing them into something workable. Please heavily critique anything I have said and offer advice, I want to learn.

It's not clear what version of d3 you are using, you reference the fetch API, but some of the code you have looks like d3v3 and v4 in the question code (which could be the problem) which doesn't use the fetch API. In any event, I'll go through v5, but also versions 4 and 3.
In all of these your thoughts look pretty close based on the code blocks you have. We need to:
we read in the dsv as text,
add headers (with an end of line \n),
and run everything through a dsv format function that will use a ; as a delimiter.
no need for d3.csv.parse though as in your question code block
In all the below I drop the date formatting for simplicity (oops, left it in the v5 demo).
Because of the use of d3-fetch module in d3v5, this approach is a bit different than the more closely related d3v3/v4 (closely related in that they both use the d3-request module, otherwise there's a fair bit of difference).
d3-fetch: d3v5
With d3v5, using the d3-fetch module the process could look like:
var dsv = d3.dsvFormat(";");
var headers = ["time;number1;number2;number3;number4;number5;\n"]
d3.text("dsv.dsv").then(function(text) {
var data = dsv.parse(headers+text);
console.log(data);
console.log(data.columns);
})
Example
d3-request: d3v4
There's a bit more flexibility with d3v4 for this.
If we look at the API docs, we see that d3.csv is equivalent to:
d3.request(url)
.mimeType("text/csv")
.response(function(xhr) { return d3.csvParse(xhr.responseText, row); });
(docs)
So if we create a new format with d3.dsvFormat we can run the content through the format and get our data, we can also tack on the headers in this process, all in one step:
var dsv = d3.dsvFormat(";");
var headers = ["time;number1;number2;number3;number4;number5;\n"]
d3.request("dsv.dsv")
.mimeType("text/plain")
.response(function(data) { return dsv.parse(headers + data.response) })
.get(function(data) {
// use data here:
console.log(data);
console.log(data.columns);
});
Example
This might be the more atypical approach, so we could emulate the way I did it with v5 above:
var psv = d3.dsvFormat(";");
var headers = ["time;number1;number2;number3;number4;number5;\n"]
d3.text("dsv.dsv", function(error,data) {
var data = psv.parse(headers + data.reponse)
console.log(data);
console.log(data.columns);
})
Example
d3-request: d3v3
As with the second option in d3v4 above and d3v5, we can parse in text and then run it through the dsv format function (here we only need to account for changes in d3 namespace between v3/v4):
var dsv = d3.dsv(";","text/plain");
var headers = ["time;number1;number2;number3;number4;number5;\n"]
d3.text("dsv.dsv", function(error,text) {
var data = dsv.parse(headers+text);
console.log(data);
// console.log(data.columns) // -> no columns property in v3
})
Example
Note
The ; at the end of each row will create an empty column as a value is expected after it before the next row.

Need Help to implement Tincan Javascript API

I'm working on tincan JavaScript API. The issue my data format is total change and TinCan have specified a why to pass data along with call. Help me to adjust my data in TinCan Api format. Here is sample data one of my call.
var data = {
"groupId": "groupId",
"groupName": "gNameEncrypt",
"tutorNames": "tutorNames",
"actorNames": "actorNames",
"otherNames": "otherNames"
};
Current what i do i simply decode this data and send it like this.
var actionList = new TinCan(
{
recordStores: [{
endpoint: "http://example.com",
username: username,
password: password,
allowFail: false
}]
});
var action = new TinCan.Agent({
"name": "insert"
});
actionList.getStatements({
'params': {
'agent': action,
'verb': {
'id': $.base64.encode(data)
}
},
'callback': function (err, data) {
console.info(data.more);
var urlref = "http://<?php echo $_SERVER['SERVER_NAME'] . ":" . $_SERVER['SERVER_PORT'] . $uriParts[0] . "?" ?>t=" + data.more.TutorToken;
window.location.href = urlref;
}
});
crypt.finish();
});

There are really two parts here:
need to get data into an xAPI (formerly Tin Can) format, and
the code itself.
In depth,
I think you need to take another look at how xAPI is used in general. Data is stored a JSON "Statement" object that has 3 required properties and various other optional ones. These properties often contain complex objects that are very extensible. It is hard to tell from what you've shown what you are really trying to capture and what the best approach would be. I suggest reading some material about the xAPI statement format. http://experienceapi.com/statements-101/ is a good starting point, and to get at least some coverage of all the possibilities continue with http://experienceapi.com/statements/ .
The code you've listed is attempting to get already stored statements based on two parameters rather than trying to store a statement. The two parameters being "agent" and "verb". In this case We can't tell what the verb is supposed to be since we don't know what data contains, I suspect this isn't going to make sense as a verb which is intended to be the action of a statement. Having said that the fact that the "actor" has a value of action is questionable, as that really sounds more like what a "verb" should contain. Getting the statements right as part of #1 should make obvious how you would retrieve those statements. As far as storing those statements, if you're using the TinCan interface object you would need to use the sendStatement method of that object. But this interface is no longer recommended, the recommended practice is to construct a TinCan.LRS object and interact directly with it, in which case you'd be using the saveStatement method.
I would recommend looking at the "Basic Usage" section of the project home page here: http://rusticisoftware.github.io/TinCanJS/ for more specifics look at the API doc: http://rusticisoftware.github.io/TinCanJS/doc/api/latest/

How to load multiple files with Queue.js and D3.js?

Situation
I am trying to load multiple xml files (located on server) without the need to declare the name of the files hard coded. For this I am trying to use the d3.queue library https://github.com/d3/d3-queue.
I have implemented the xml to force layout to my own needs (https://bl.ocks.org/mbostock/1080941), but there is one crucial flaw namely I need to manually type in the name of the xml file that I want to load...
Reproduce
Given (adjusted example from http://learnjsdata.com/read_data.html) :
queue()
.defer(d3.xml, "/mappings/Customer.hbm.xml")
.defer(d3.xml, "/mappings/Actor.hbm.xml")
.await(analyze);
function analyze(error, Customer, Actor) {
if(error) { console.log(error); }
// do stuff with Customer data, do stuff with Actor data
}
And given my implementation of the processing of an xml:
d3.xml("mappings/Customer.hbm.xml","application/xml", function(error,xml){
if (error) throw error;
// do stuff with the data retrieved from Customer.hbm.xml
});
Question
How do I combine above two snippets in such a way that I dont have to write the locations of the xml hard coded and pass all the parameters to the analyze function? Any nudge in the right direction would be much appreciated.
In psuedocode I have tried to code something like the following (but I cant get it to work):
function to get all names of the xmls from the mappings folder (probably with node.js fs.readdir or fs.readdirSync methods, but I am unsure of how that would work exactly)
for each xml .defer(d3.xml, nameofxml)
pass all the found names as parameters to the analyze function
In Java I would have chosen to do this with a var...args but I dont know how to do it in JS.

There's really two parts to this question:
How do I get a list of server-side files to client-side JavaScript?
Short answer is you don't without having a server-side api that can return that list. Depending on what backend you are using, you write a method that returns a JSON array of the files in your target directory. You call this first, get the response and then process them all with queue:
d3.json('/get/list/of/xml/files', function(error, fileArray){
var q = d3.queue();
fileArray.forEach(function(d){
q = q.defer(d3.xml, d);
});
q.await(analyze);
});
How do a process a variable number of arguments in JavaScript?
This is actually very well supported in JavaScript.
function analyze(error) {
if(error) { console.log(error); }
// skip 0 it's error variable
for (i = 1; i < arguments.length; i++) {
var xml = arguments[i];
...
}
}

Sanitizing JSON data for usage as JavaScript object

I'm going to be dynamically generating a JSON file which is then passed to SCEditor as the emoticons object; this data will come from the database, so essentially it should be safe, but one can never be 100% sure.
This is how it is being called:
// Create var to store emoticons
var emoticons = false;
$.getJSON('../../images/emoticons/default/emoticons.json')
.done(function(response) {
emoticons = response;
console.log(emoticons);
})
.always(function() {
// always initialize sceditor
$(".sceditor").sceditor({
// Other options.....
plugins: "bbcode",
emoticons: emoticons,
});
});
An example of the JSON file would look like:
{
"dropdown": {
":)": "smile.png",
":angel:": "angel.png",
":angry:": "angry.png",
"8-)": "cool.png",
":'(": "cwy.png",
}
}
So the emoticon code and filename are pulled from the database. Is there anything I need to do here other than escape double quotes? Whilst this data will be coming from the database, it's possible the codes/filenames will be provided by the user.
When I store them in the database I will be stripping tags with PHP's strip_tags function.
I wanted to avoid turning the data into html entities as it doesn't seem to play nice with the editor as it doesn't turn the emoticons into smileys within the editor if you say set the code as :") - it will literally output in the editor as :") rather than show the smiley.
Edit: To see an example of how the code is used check out the SCEditor demo. Only difference is the demo uses the default codes provided within the JS file itself and mine will be provided via a JSON file passed as an option.
What is are my best options here?

Transforming JSON in a node stream with a map or template

I'm relatively new to Javascript and Node and I like to learn by doing, but my lack of awareness of Javascript design patterns makes me wary of trying to reinvent the wheel, I'd like to know from the community if what I want to do is already present in some form or another, I'm not looking for specific code for the example below, just a nudge in the right direction and what I should be searching for.
I basically want to create my own private IFTTT/Zapier for plugging data from one API to another.
I'm using the node module request to GET data from one API and then POST to another.
request supports streaming to do neat things like this:
request.get('http://example.com/api')
.pipe(request.put('http://example.com/api2'));
In between those two requests, I'd like to pipe the JSON through a transform, cherry picking the key/value pairs that I need and changing the keys to what the destination API is expecting.
request.get('http://example.com/api')
.pipe(apiToApi2Map)
.pipe(request.put('http://example.com/api2'));
Here's a JSON sample from the source API: http://pastebin.com/iKYTJCYk
And this is what I'd like to send forward: http://pastebin.com/133RhSJT
The transformed JSON in this case takes the keys from the value of each objects "attribute" key and the value from each objects "value" key.
So my questions:
Is there a framework, library or module that will make the transform step easier?
Is streaming the way I should be approaching this? It seems like an elegant way to do it, as I've created some Javascript wrapper functions with request to easily access API methods, I just need to figure out the middle step.
Would it be possible to create "templates" or "maps" for these transforms? Say I want to change the source or destination API, it would be nice to create a new file that maps the source to destination key/values required.
Hope the community can help and I'm open to any and all suggestions! :)
This is an Open Source project I'm working on, so if anyone would like to get involved, just get in touch.

Yes you're definitely on the right track. There are two stream libs I would point you towards, through which makes it easier to define your own streams, and JSONStream which helps to convert a binary stream (like what you get from request.get) into a stream of parsed JSON documents. Here's an example using both of those to get you started:
var through = require('through');
var request = require('request');
var JSONStream = require('JSONStream');
var _ = require('underscore');
// Our function(doc) here will get called to handle each
// incoming document int he attributes array of the JSON stream
var transformer = through(function(doc) {
var steps = _.findWhere(doc.items, {
label: "Steps"
});
var activeMinutes = _.findWhere(doc.items, {
label: "Active minutes"
});
var stepsGoal = _.findWhere(doc.items, {
label: "Steps goal"
});
// Push the transformed document into the outgoing stream
this.queue({
steps: steps.value,
activeMinutes: activeMinutes.value,
stepsGoal: stepsGoal.value
});
});
request
.get('http://example.com/api')
// The attributes.* here will split the JSON stream into chunks
// where each chunk is an element of the array
.pipe(JSONStream.parse('attributes.*'))
.pipe(transformer)
.pipe(request.put('http://example.com/api2'));

As Andrew pointed out there's through or event-stream, however I made something even easier to use, scramjet. It works the same way as through, but it's API is nearly identical to Arrays, so you can use map and filter methods easily.
The code for your example would be:
DataStream
.pipeline(
request.get('http://example.com/api'),
JSONStream.parse('attributes.items.*')
)
.filter((item) => item.attibute) // filter out ones without attribute
.reduce((acc, item) => {
acc[item.attribute] = item.value;
return acc;
.then((result) => request.put('http://example.com/api2', result))
;
I guess this is a little easier to use - however in this example you do accumulate the data into an object - so if the JSON's are actually much longer than this, you may want to turn it back into a JSONStream again.

Develop Reference

JavaScript is the programming language of the Web.