d3/js data manipulation dropping columns - javascript

I have a dataset in csv and it looks like below.
country,col1,col2,col3
Germany,19979188,11233906,43.7719591
UK,3839766,1884423,50.92349378
France,1363608,796271,41.60557873
Italy,957516,557967,41.72765781
I'd like to drop col1, col2 off while keeping country and col3. If possible, I'd like to wrap it into a function where I can pass column list that I'd like to drop/keep.
Using pandas, which I'm familiar with, I can easily do it. e.g. data.drop(['col1', 'col2'], axis = 1). But I found d3 way or js way in general is based on each row so couldn't come up with an idea to drop columns.
I was thinking of d3.map() taking desirable columns only. But I was stuck to build a general function that the column list can be passed in.
Could anyone have thoughts?

D3 fetch methods, like d3.csv, will retrieve the whole CSV and will create an array of objects based on that CSV. Because of that, filtering out some columns is useless. Actually, it's worse than useless: you'll spend time and resources with an unnecessary operation.
Therefore, the only useful solution is, if you have access and own that CSV, creating a new CSV without those columns. That way you'll have a smaller file, faster to load. Otherwise, if you cannot change the CSV itself, don't bother: just load the whole thing and use the columns you want (which will be properties in the objects), ignoring the others.
Finally, if you have a lot of data manipulation it might be interesting reducing the size of the objects in the data array. If that's your case, use a row function to return only the properties you want. For instance:
d3.csv(url, function (d){
return {country: d.country, col3: d.col3}
}).then(etc...)

Related

JSON in localforage, efficient way to update property values (Binarysearch, etc.)?

I would like to come straight to the point and show you my sample data, which is around the average of 180.000 lines from a .csv file, so a lot of lines. I am reading in the .csv with papaparse. Then I am saving the data as array of objects, which looks like this:
I just used this picture as you can also see all the properties my objects have or should have. The data is from Media Transperency Data, which is open source and shows the payments between institiutions.
The array of objects is saved by using the localforage technology, which is basically an IndexedDB or WebSQL with localstorage like API. So I save the data never on a sever! Only in the client!
The Question:
So my question is now, the user can add the sourceHash and/or targetHash attributes in a client interface. So for example assume the user loaded the "Energie Steiermark Kunden GmbH" object and now adds the sourceHash -- "company" to it. So basically a tag. This is already reflected in the client and shown, however I need to get this also in the localforage and therefore rewrite the initial array of objects. So I would need to search for every object in my huge 180.000 lines array that has the name "Energie Steiermark Kunden GmbH", as there can be multiple and set the property sourceHash to "company". Then save it again in the localforage.
The first question would be how to do this most efficient? I can get the data out of localforage by using the following method and set it respectively.
Get:
localforage.getItem('data').then((value) => {
...
});
Set:
localforage.setItem('data', dataObject);
However, the question is how do I do this most efficiently? I mean if the sourceNode only starts with "E" for example we don't need to search all sourceNode's. The same goes of course for the targetNode.
Thank you in advance!
UPDATE:
Thanks for the answeres already! And how would you do it the most efficient way in Javascript? I mean is it possible to do it in few lines. If we assume I have for example the current sourceHash "company" and want to assign it to every node starting with "Energie Steiermark Kunden GmbH" that appear across all timeNode's. It could be 20151, 20152, 20153, 20154 and so on...
Localforage is only a localStorage/sessionStorage-like wrapper over the actual storage engine, and so it only offers you the key-value capabilities of localStorage. In short, there's no more efficient way to do this for arbitrary queries.
This sounds more like a case for IndexedDB, as you can define search indexes over the data, for instance for sourceNodes, and do more efficient queries that way.

How to update series data array in highcharts

I have the following snippet of code which updates my chart type from one type to another.
chart.series.forEach(function(serie){
serie.update({
type: type,
stacking: stacking,
})
})
I can pass different values in from type to stacking.
If I want my new chart to use a different array of values, the arrays of which are already defined and contain data from the database; how would I do it?
The reason I wish to do this is because I have two values, which are entered in their own arrays, say for example, 40 and 50 from a database. These are then stacked in a column chart (It's the only way I can stack).
I've created another array to store both these values in the one array when deciding to show it in a pie chart (not stacked).
I've tried adding 'data:arrayName' in as a property but this prevents my chart from rendering. Any idea guys?
Thank you.
EDIT: My problem probably only required one object to call. I don't have that much going on so the solution in the possible duplicate isn't really helpful to me. The duplicate code is trying to generate a new array on the fly using loops, I already have an array in place, I'm just figuring out how to update the array once the chart has changed to another type.

Can I Create a Single HighCharts Graph from Multiple Data Sources (Multiple GoogleSheets in this case)

I’m trying to use data from multiple GoogleSheets to produce a single HighChart graph.
I’d like to do this without moving all the data into one area of a single spreadsheet, particularly as I want to use the drilldown option which would make it difficult to collect all the data together.
I thought I could pass the columns as an array and then reference the array in the data property of my chart, but I’m struggling to do that with even one column from one sheet.
I have searched for answers online, but I have not found anything relating to highcharts getting data from multiple sources.
Previous Research:
Using GoogleSheets Data as an array before creating the chart: (Removed Link) - The problem is that I could only use one GoogleSheets reference here as the chart object sits inside the data object.
API documentation - (Removed Link) – tells me I can access the columns but that’s not the part I’m having problems with
Querying the Chart: (Removed Link) - I have actually considered making hidden charts, then interrogating the data and making new charts from those, but that seems like a very long way round and I’m still not sure I could grab all the data I need.
Using two GoogleSheets for separate charts on the same page: (Removed Link) I have done this.
Please could you help me to understand how I can access the properties and methods of this object outside of the object itself?
Thank you.
My Code:
//Function to produce a data array ***Not Working - Cannot extract array from object method***
function getData(){
Highcharts.data({
googleSpreadsheetKey: '12x66_QEkTKg_UzvpHEygdTmfnu7oH81pSQYn78Hxt80',
googleSpreadsheetWorksheet: 4,
startColumn: 16,
endColumn: 22,
startRow: 63,
endRow: 76,
parsed: function (columns) {
var dataTest2 = [];
var columnLength = columns[1].length;
for (i = 0; i < columnLength; i = i + 1) {
dataTest2.push(columns[1][i]);
}
alert(dataTest2); //This works here, but not if I move it outside of the method, even if I return it.
var testReturn = this.googleSpreadsheetKey; //trying to return any property using "this" - I've also tried this.googleSpreadsheetKey.value but still undefined
return testReturn; //returns "Undefined"
}
});
}
You could use Google Sheets webQuery. Basically, this is a method to export the Spreadsheet's data into a given format such as csv, json, etc. In your case, the link should look like this:
https://docs.google.com/spreadsheet/tq?key=12x66_QEkTKg_UzvpHEygdTmfnu7oH81pSQYn78Hxt80&gid=4&tq=select%20A,%20B&tqx=reqId:1;out:csv;%20responseHandler:webQuery
Please note that here "tg?key" is the key of your Google Sheet, and "&gid=" is NOT 4, this only tells Highcharts to selected Sheet 4, but for Google Sheets look at the source link and copy the numbers which go after "&gid=". Furthermore, "&tq=" is used to select the columns of the Google Sheet, which in the link above selects "Column A" and "Column B". To find out more on how to select columns and query the output refer to:
https://developers.google.com/chart/interactive/docs/querylanguage?csw=1#Setting_the_Query_in_the_Data_Source_URL
Lastly, "&tqx=" is used to output your data in the format you want. The above link uses "out:csv" which will output the data as comma-separated values. This could as well be Json if you like. Have a look at this documentation:
https://developers.google.com/chart/interactive/docs/dev/implementing_data_source#requestformat
In order to implement this output into your javascript code which you would then use to built your chart, you could use external Javascript libraries to handle these outputs. If you output your data as CSV, you can use "papaparse.js" to parse the CSV into Json which you can be then read by highcharts and allows you to built the chart. Refer to this documentation:
http://papaparse.com/docs#remote-files
An alternative to this would be, to output your Google Sheets directly as Json, then use jquery to make an Ajax Call and load the JSON-encoded data into your Javascript code. Precisely, you could perhaps use jQuery.getJSON() to get the data. Look at this link for more details on how to get Json:
http://api.jquery.com/jquery.getjson/
Finally, it is up to you on which format you choose to output the data, I prefer using Json as it saves you the extra step of having to convert the CSV into Json. Whatever suits you best and is easier for you to understand.
Once you have your data, you may have to parse your Json objects with Json.parse(), and then organize your data into an array with .push(). As #jlbriggs stated, organize your data first before you built the chart. Afterwards, you can make two, three or more Ajax calls to import data from different sources. I would not use many as this will impact in your loading and slow down data transfer.
NB: It is important to format the data accordingly for Highcharts to recognize the data, so use parseFloat() to convert strings into numbers, Date.UTC() to convert strings into date, etc.
Hope this helps you.

Reading columns in their order of the csv file

When data is loaded from a csv file, is it possible to get the order of the columns ?
E.g. the typical way to load a csv file is by calling the d3.csv function:
d3.csv("data.csv", function(error, data) {
var ageNames = d3.keys(data[0]).filter(function(key) { return key !== "State"; });
Typically, columns are stored as properties of an object, so the order cannot be retrieved anymore. Also, d3.keys() returns an array with a undefined order.
I am asking this because I want to sort the columns by the order in the csv file. Any help would be greatly appreciated.
You don't need to use the name of the column to access the data, you can use the index of the keys. The index is determined by the order in the file, so exactly what you're looking for.
To get the keys of an object, use Object.keys():
var keys = Object.keys(data[0]);
data[0][keys[0]]; // datum in the first row, first column
Coming from relational views, it seems surprising that the d3.js way of reading from a file has the opposite properties:
The relational view:
rows don't have any guaranteed order
columns have a well defined order and can be addressed by refering to their position
The d3.js view (of a csv file):
rows have a well defined order and can be addressed by refering to their row number. (they are stored in an array)
columns end up as a set of properties in an Object, thus they don't have a well defined order
If you want to access the order of columns as in the csv file, you can safely assume the "for...in.." loop over the properties or the d3.keys() returns the columns in the given order.
This order can just be destroyed if columns are added or removed. This is for most practical applications not the case.
See also https://javascriptweblog.wordpress.com/2011/01/04/exploring-javascript-for-in-loops/. The one exception described is: When column names are sole numbers, then the Chrome browser orders them according to their number order but their column order.

Most Efficient way of Filtering an Html Table?

I have an ajax function which call a servlet to get list of products from various webservices, the number of products can go up to 100,000. I need to show this list in a html table.
I am trying to provide users an interface to filter this list based on several criteria. Currently I am using a simple jQuery plugin to achieve this, but I found it to hog memory and time.
The Javascript that I use basically uses regex to search and filter rows matching the filtering criteria.
I was thinking of an alternate solution wherein I filter the JSON array returned by my servlet and bind the html table to it. Is there a way to achieve this, if there is, then is it more efficient than the regex approach.
Going through up to 100,000 items and checking if they meet your criteria is going to take a while, especially if the criteria might be complex (must be CONDO with 2 OR 3 bedrooms NOT in zip code 12345 and FIREPLACE but not JACUZZI).
Perhaps your servlet could cache the data for the 100,000 items and it could do the filtering, based on criteria posted by the user's browser. It could return, say, "items 1-50 of 12,456 selected from 100,000" and let the user page forward to the next 50 or so, and even select how many items to get back (25, 50, all).
If they select "all" before narrowing down the number very far, then a halfway observant user will expect it to take a while to load.
In other words, don't even TRY to manage the 100,000 items in the browser, let the server do it.
User enters filter and hits
search.
Ajax call to database, database has indexes on appropriate
columns and the database does the filtering.
Database returns result
Show result in table. (Probably want it to be paged to
only show 100-1000 rows at a time
because 100,000 rows in a table can
really slow down your browser.
Edit: Since you don't have a database, the best you're going to be able to do is run the regex over the JSON dataset and add results that match to the table. You'll want to save the JSON dataset in a variable in case they change the search. (I'm assuming that right now you're adding everything to the table and then using the jquery table plugin to filter it)
I'm assuming that by filtering you mean only displaying a subset of the data; and not sorting.
As you are populating the data into the table add classes to each row for everything in that row you want to filter by. e.g.:
<tr class="filter1 filter2 filter3">....
<tr class="filter1 filter3">....
<tr class="filter2">....
<tr class="filter3">....
Then when you want to apply a filter you can do something like:
$('TR:not(.filter1)').hide();
I agree with Berry that 100000 rows in the browser is bit of a stretch, but if there's anything that comes close to handling something of that magnitude then it's jOrder. http://github.com/danstocker/jorder
Create a jOrder table based on your JSON, and add the most necessary indexes. I mean the ones that you must at all cost filter by.
E.g. you have a "Name" field with people's names.
var table = jOrder(json)
.index('name', ['Name'], { sorted: true, ordered: true });
Then, for instance, this is how you select the records where the Name field starts with "John":
var filtered = table.where([{ Name: 'John' }], { mode: jOrder.startof, renumber: true });
Later, if you need paging in your table, just feed the table builder a filtered.slice(...).
If you're getting back xml, you could just use jQuery selection
$('.class', context) where context is your xml response.
From this selection, you could just write the xml to the page and use CSS to style it. That's where I'd start at first, at least. I'm doing something similar in one of my applications, but my dataset is smaller.
I don't know what you mean by "bind"? You can parse JSON and then use for loop (or $.each()) to populate ether straight HTML or by using grid plugin's insert/add

Categories

Resources