Remove unwanted columns from CSV file using Papaparse - javascript

I have a situation where a user can upload a csv file. This CSV file contains a lot of data, but I am only interested in 2 columns (ID and Date). At the moment, I am parsing the CSV using Papaparse
Papa.parse(ev.data, {
delimiter: "",
newline: "",
quoteChar: '"',
header: true,
error: function(err, file, inputElem, reason) { },
complete: function (results) {
this.parsed_csv = results.data;
}
});
When this is run this.parsed_csv represents objects of data keyed by the field name. So if I JSON.stringify the output is something like this
[
{
"ID": 123456,
"Date": "2012-01-01",
"Irrelevant_Column_1": 123,
"Irrelevant_Column_2": 234,
"Irrelevant_Column_3": 345,
"Irrelevant_Column_4": 456
},
...
]
So my main question is how can I get rid of the columns I dont need, and just produce a new csv containing the columns ID and Date?
Thanks
One thing I realised, is there a way to add dynamic variables. For instance I am letting users select the columns I want to map. Now I need to do something like this
let ID = this.selectedIdCol;
this.parsed_csv = results.data.map(element => ({ID: element.ID, Date: element.Date}));
It is saying that ID is unused however. Thanks

let data = [
{
"ID": 123456,
"Date": "2012-01-01",
"Irrelevant_Column_1": 123,
"Irrelevant_Column_2": 234,
"Irrelevant_Column_3": 345,
"Irrelevant_Column_4": 456
},
...
]
just produce results by using the following code:
data = data.map(element => ({ID: element.ID, Date: element.Date}))
Now you have desired column, please generate a new CSV on these columns

As Serrurier pointed out above, You should use the step/chunk function to alter the data rather than after parse map as in memory data is already available.
PapaParse.parse(file, { skipEmptyLines: true, header: true, step: (results, parser) => {
results.data = _.pick(results.data , [ 'column1' 'column2']);
return results;
}});

Note that if you are loading a huge file, you will have the whole file in memory right after the parsing. Moreover it may freeze the browser due to the heavy workload. You can avoid that by reading and discarding columns :
row by row
chunk by chunk.
You should read Papaparse's FAQ before implementing that. To sum up, you will store required columns by extracting them from the step or chunk callbacks.

Related

Struggling to show data from javascript in html

Note: I am new to JavaScript and html so there are a lot of things i do not quite understand yet.
I am trying to make a web-application that uses a bus-schedule API to show the bus times around my apartment. I have managed to retrieve the data, but I struggle to display that data in the html. (edit: it is possibly a SDK, i dont really know the difference)
// this is the function write_bus() in my javascript file departures.js:
function write_bus() {
document.getElementById("bustime-kong").innerHTML = data;
}
// 'data' is a variable in javascript that contains the information I fetched from the client, and is what I am trying to show.It is on the following format:
const data = {
aimedArrivalTime: '2022-01-06T12:36:00+0100',
aimedDepartureTime: '2022-01-06T12:36:00+0100',
cancellation: false,
date: '2022-01-06',
destinationDisplay: {
frontText: 'xxx'
},
expectedDepartureTime: '2022-01-06T12:38:12+0100',
expectedArrivalTime: '2022-01-06T12:37:32+0100',
forAlighting: true,
forBoarding: true,
notices: [],
predictionInaccurate: false,
quay: {
id: 'xxx',
name: 'xxx',
publicCode: 'P2',
situations: [],
stopPlace: [Object]
}
}
<script src="departures.js"></script>
<input type="button" onclick="write_bus()" value="Busstider"> <br>
<div id="bustime-kong"></div>
(In order to save some space i removed 2/3 of the data)
I appreciate every bit of help i could get!
You can't insert raw objects like that as DOM text. You have to use specific properties, but if you are so inclined to insert object like that, use JSON.stringify() method.
document.getElementById("bustime-kong").innerHTML = JSON.stringify(data);
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/JSON/stringify
BTW, It is good practice to put <script> tag at the end to make sure HTML is loaded before running any JavaScript.
You can create a function that will parse through the json and print it in table form like below.
// this is the function write_bus() in my javascript file departures.js:
function write_bus() {
document.getElementById("bustime-kong").innerHTML = createTable(data);
}
// 'data' is a variable in javascript that contains the information I fetched from the client, and is what I am trying to show.It is on the following format:
const data = {
aimedArrivalTime: '2022-01-06T12:36:00+0100',
aimedDepartureTime: '2022-01-06T12:36:00+0100',
cancellation: false,
date: '2022-01-06',
destinationDisplay: {
frontText: 'xxx'
},
expectedDepartureTime: '2022-01-06T12:38:12+0100',
expectedArrivalTime: '2022-01-06T12:37:32+0100',
forAlighting: true,
forBoarding: true,
notices: [],
predictionInaccurate: false,
quay: {
id: 'xxx',
name: 'xxx',
publicCode: 'P2',
situations: [],
stopPlace: [Object]
}
}
function createTable(data){
var table = "<table>";
for(var key in data){
table+="<tr>";
table+="<td>"+key+"</td>";
table+="<td>"+data[key]+"</td>";
table+="</tr>";
}
table+="</table>";
return table;
}
<input type="button" onclick="write_bus()" value="Busstider"> <br>
<div id="bustime-kong"></div>
Since it is array of data you need to use Json.stringify method,
document.getElementById("bustime-kong").innerHTML = JSON.stringify(data);

How do I reuse the result arrays of Papa Parse?

I was reluctant to use Papa Parse, but now I realize how powerful it is. I am using Papa Parse on a local file, but I don't know how to use the results. I want to be able to use the results so I can combine the array with another and then sort highest to lowest based on a certain element. Console.log doesn't work. From what I have researched, it may have something to do with a callback function. I am stuck on how to do the callback function with Papa Parse. Thanks for any advice.
This is my output
Finished input (async).
Time: 43.90000000000873
Arguments(3)
0:
data:
Array(1136) [0 … 99]
0: (9) [
"CONTENT TYPE", "TITLE", "ABBR", "ISSN",
"e-ISSN", "PUBLICATION RANGE: START",
"PUBLICATION RANGE: LATEST PUBLISHED",
"SHORTCUT URL", "ARCHIVE URL"
]
1: (9) [
"Journals", "ACM Computing Surveys ",
"ACM Comput. Surv.", "0360-0300", "1557-7341",
"Volume 1 Issue 1 (March 1969)",
"Volume 46 Issue 1 (October 2013)",
"http://dl.acm.org/citation.cfm?id=J204",
"http://dl.acm.org/citation.cfm?id=J204&picked=prox"
]
Based on a conversation with you, it appears you're trying to retrofit the Papa Parse demo for your own needs. Below is a stripped down code snippet that should be drop-in-ready for your project and will get you started.
document.addEventListener('DOMContentLoaded', () => {
const file = document.getElementById('file');
file.addEventListener('change', () => {
Papa.parse(file.files[0], {
complete: function(results) {
// Here you can do something with results.data
console.log("Finished:", results.data);
}
});
})
});
<script src="https://cdnjs.cloudflare.com/ajax/libs/PapaParse/4.6.2/papaparse.js"></script>
<input type="file" id="file" />
Original Answer
Since I suspect you're loading your local csv file from the files system, and not an upload form, you'll need to use download: true to make it work.
Papa.parse('data.csv', {
download: true,
complete: function(results) {
console.log("Finished:", results.data);
}
});
Technically, when loading local files, you're supposed to supply Papa.parse with a File Object. This is a snippet from MDN File API documentation
File objects are generally retrieved from a FileList object returned
as a result of a user selecting files using the input element
If of course you're running this in NodeJS, then you'd just do the following:
const fs = require('fs');
const Papa = require('papaparse');
const csv = fs.createReadStream('data.csv');
Papa.parse(csv, {
complete: function(results) {
console.log("Finished:", results);
}
});
Documentation
https://www.papaparse.com/docs#local-files
https://developer.mozilla.org/en-US/docs/Web/API/File

Scripting MongoDB shell, return ID in clean format

The script below is supposed to return a CSV of values I want from mongo, all the data I want is returned, but two items are in a different format, and try as I might I cannot get the value only.
Question 1: The first returned item "$_id", returns ObjectId("5a4b7775d9cc09000185b908") but I want to get ONLY the value 5a4b7775d9cc09000185b908. Every time I try to parse it or use valueOf, it returns a blank value.
Question 2: The 4th item I am requesting is supposed to be a time format of how long something took using the two date values {$subtract: [ "$finished", "$started" ] } (start and finish times). What I get back is NumberLong(5844) which should be just the milisenconds.
Script
var cur=db.submissions.aggregate(
[
{$match: { started: {'$gte': ISODate('2018-01-02 01:01:01.001'), '$lte': ISODate('2018-01-02 13:15:59.999' ) } } },
{$project: {
data: [
"$_id",
{$dateToString: { format: "%Y-%m-%d %H:%M:%S", date: "$started" } },
{$dateToString: { format: "%Y-%m-%d %H:%M:%S", date: "$finished" } },
{$subtract: [ "$finished", "$started" ] },
"$inputs.x12InputFile.size"
]
}
}
]
)
cur.forEach(function(doc) {print(doc.data)})
Current Results
ObjectId("5a4b7775d9cc09000185b908"),2018-01-02 12:13:42,2018-01-02 12:13:48,NumberLong(5844),5322
ObjectId("5a4b77d530391100017d92df"),2018-01-02 12:15:18,2018-01-02 12:15:26,NumberLong(8593),5178
Expected Results
5a4b7775d9cc09000185b908,2018-01-02 12:13:42,2018-01-02 12:13:48,5844,5322
5a4b77d530391100017d92df,2018-01-02 12:15:18,2018-01-02 12:15:26,8593,5178
Any help would be appreciated. I am quite the newbie at scripting mongo queries, so details/examples in responses help if at all possible.
I wouldn't use aggregation framework for formatting results. You can't convert from ObjectId to string in aggregation piepline anyways.
var cursor = db.submissions.find({started:{'$gte': ISODate('2018-01-02
01:01:01.001'),'$lte': ISODate('2018-01-02 13:15:59.999' )}},{started : 1,
finished : 1, "inputs.inputFile.size" : 1});
cursor.forEach( function(doc) {
var arr=[];
arr.push(doc._id.str);
arr.push(doc.started.toISOString());
arr.push(doc.finished.toISOString());
arr.push(doc.finished.getTime() - doc.started.getTime());
print(arr)
});

Optimalization of firebase query. Getting data by ids

I'm new in Firebase. I would like to create an app (using Angular and AngularFire library), which shows current price of some wares. I have list all available wares in Firebase Realtime Database in the following format:
"warehouse": {
"wares": {
"id1": {
"id": "id1",
"name": "name1",
"price": "0.99"
},
"id2": {
"id": "id2",
"name": "name2",
"price": "15.00"
},
... //much more stuff
}
}
I'm using ngrx with my app, so I think that I can load all wares to store as an object not list because normalizing state tree. I wanted load wares to store in this way:
this.db.object('warehouse/wares').valueChanges();
The problem is wares' price will be refresh every 5 minutes. The number og wares is huge (about 3000 items) so one response will be weight about 700kB. I know that I will exceed limit downloaded data in a short time, in this way.
I want limit the loading data to interesing for user, so every user will can choose wares. I will store this choices in following way:
"users": {
"user1": {
"id": "user1",
"wares": {
"id1": {
"order": 1
},
"id27": {
"order": 2
},
"id533": {
"order": 3
}
},
"waresIds": ["id1", "id27", "id533"]
}
}
And my question is:
Is there a way to getting wares based on waresIds' current user? I mean, does it exist way to get only wares, whose ids are in argument array? F.e.
"wares": {
"id1": {
"id": "id1",
"name": "name1",
"price": "0.99"
},
"id27": {
"id": "id27",
"name": "name27",
"price": "0.19"
},
"id533": {
"id": "id533",
"name": "name533",
"price": "1.19"
}
}
for query like:
this.db.object('warehouse/wares').contains(["id1", "id27", "id533"]).valueChanges();
I saw query limits in Angular Fire like equalTo and etc. but every is for list. I'm totally confused. Is there anyone who can help me? Maybe I'm making mistakes in the design of the app structure. If so, I am asking for clarification.
Because you are saving the ids inside user try this way.
wares: Observable<any[]>;
//inside ngOnInit or function
this.wares = this.db.list('users/currentUserId/wares').snapshotChanges().map(changes => {
return changes.map(c => {
const id = c.payload.key; //gets ids under users/wares/ids..
let wares=[];
//now get the wares
this.db.list('warehouse/wares', ref => ref.orderByChild('id').equalTo(id)).valueChanges().subscribe(res=>{
res.forEach(data=>{
wares.push(data);
})
});
return wares;
});
});
There are two things you can do. I don't believe Firebase allows you to query for multiple equals values at once. You can however loop over the array of "ids" and query for each one directly.
I am assuming you already queried for "waresIds" and you've stored those ID's in an array named idArray:
for id in idArray {
database.ref('warehouse/wares').orderByChild('id').equalTo(id).once('value').then((snapshot) => {
console.log(snapshot.val());
})
}
In order to use the above query efficiently you'll have to index your data on id.
Your second option would be to use .childChanged to get only the updated data after your initial fetch. This should cut down drastically on the amount of data you need to download.
Yes , you can get exactly data that you want in firebase,
See official Firebase documents about filtering
You need to get each waresID
var waresID = // logic to get waresID
var userId = // logic to get userId
var ref = firebase.database().ref("wares/" + userId).child(waresID);
ref.once("value")
.then(function(snapshot) {
console.log(snapshot.val());
});
this will return only data related to that waresID or userId
Note: this is javascript code, i hope this will work for you.

Firebase - JS Query - Get a Node where the Child value has a Child with value=xxx

I ran into a little problem with my firebase queries.
I want to get all series with episodes that i haven't watched yet. At the moment i load them all from firebase and use the filter function in JavaScript. But i would like to minimize the Data Download from firebase a bit.
So i only want the get the series that match this dummycode criteria:
ref("series").orderByChild("seasons/*/episodes/*/episodeWatched").equalTo(false)
This is a small output of my structure in firebase
series >
<unique series id>
seriesName: string
seasons >
seasonCount: number
<season nummer>
episodes >
<episode number>
episodeName: string
episodeWatched: boolean
episodeAirDate: Date
a more "real" exampel would be:
{ series:
[{ 123456:
{ seriesName: "Mr Robot",
seasonCount: 3,
seasons:
[{ 0: episodeCount: 10,
episodes:
[{ 0:
{ episodeName: "Eps1.0_hellofriend.mov",
episodeWatched: true,
episodeAirDate: 27-05-2015 }
{ 1:
{ episodeName: "Eps1.1_ones-and-zer0es.mpeg",
episodeWatched: false,
episodeAirDate: 01-07-2015 }
...
}]
...
}]
}
}]
}
I have a problem with this "wildcards" in my dummycode above, i get only the values of specific nodes, and i can't concat the "orderByChild" function.
If i could concat the function, my aproach to this would be like:
ref("series").orderByChild("seasons").orderByChild("episodes").orderByChild("episodeWatched").equalTo(false)
At the moment i tried several things with wildcards or anything like that, the only thing that works at the moment is:
ref("series").orderByChild("seasons/0/episodes/0/episodeWatched").equalTo(false)
but this only checks: is the first episode of the first season checked.
Did anyone had this problem yet?
I think this is one of those times you duplicate / restructure your db to optimize for reads.
user/uid/notWatched: {
"showId_seasonId_epId": true
}

Categories

Resources