How do I parse large amounts of data in XLSX with Javascript?

How do I parse large amounts of data in XLSX with Javascript? - javascript

I want to process some 48,000 rows to build a dashboard and show some stats based on the data in those rows. One particular field which has a length of 30 characters also has some data in form of substrings. How do I parse all of this data, row by row, to come up with the end result? There are plenty of examples out there, couldn't relate to them just as well.

I'm using the "js-xlsx" library in one of my application. The performance is considerably seems to be good.
Here is the github URL.
https://github.com/SheetJS/js-xlsx

Related

Most memory efficient way to store very large array of objects in javascript

I am trying to write an embedded javascript code in an OBIEE report. Basically the idea of the report is to take tabular data, (rows and columns) into the report and have the report provide a way to extract user specified columns from it and download resulting data in CSV or Excel file. I'm trying to do this by storing data as an array of objects. So something like this:-
[
{'column1':'Entry1','column2':'Entry2',...}
,{'column1':'Entry1','column2':'Entry2',...}
,....
]
Problem is I'm get a C-runtime error (std::bad_alloc) which I'm assuming is because of running out of memory because it works when I take in less number of rows. The expected data is to be a maximum of about 200 columns (could be empty or non empty) and 1-2 million rows. What is the most memory efficient way to store such data, one copy of full data and then one copy of data with only the required columns? I can't post exact code here due to security reasons as it's on a work laptop on secure server.

How to handle an extremely big table in a search?

I'm looking for suggestions on how to go about handling the following use case scenario with python django framework, i'm also open to using javascript libraries/ajax.
I'm working with pre-existing table/model called revenue_code with over 600 million rows of data.
The user will need to search three fields within one search (code, description, room) and be able to select multiple search results similar to kendo controls multi select. I first started off by combining the codes in django-filters as shown below, but my application became unresponsive, after waiting 10-15 minutes i was able to view the search results but couldn't select anything.
https://simpleisbetterthancomplex.com/tutorial/2016/11/28/how-to-filter-querysets-dynamically.html
I've also tried to use kendo controls, select2, and chosen because i need the user to be able to select as many rev codes as they need upward to 10-20, but all gave the same unresponsive page when it attempted to load the data into the control/multi-select.
Essentially what I'm looking for is something like this below, which allows the user to select multiple selections and will handle a massive amount of data without becoming unresponsive? Ideally i'd like to be able to query my search without displaying all the data.
https://petercuret.com/add-ajax-to-django-without-writing-javascript/
Is Django framework meant to handle this type of volume. Would it be better to export this data into a file and read the file? I'm not looking for code, just some pointers on how to handle this use case.

What the basic mechanism of "searching 600 millions"? Basically how database do that is to build an index, before search-time, and sufficiently general enough for different types of query, and then at search time you just search on the index - which is much smaller (to put into memory) and faster. But no matter what, "searching" by its nature, have no "pagination" concept - and if 600 millions record cannot go into memory at the same time, then multiple swapping out and in of parts of the 600 millions records is needed - the more parts then the slower the operation. These are hidden behind the algorithms in databases like MySQL etc.
There are very compact representation like bitmap index which can allow you to search on data like male/female very fast, or any data where you can use one bit per piece of information.
So whether Django or not, does not really matters. What matters is the tuning of database, the design of tables to facilitate the queries (types of indices), and the total amount of memory at server end to keep the data in memory.
Check this out:
https://dba.stackexchange.com/questions/20335/can-mysql-reasonably-perform-queries-on-billions-of-rows
https://serverfault.com/questions/168247/mysql-working-with-192-trillion-records-yes-192-trillion
How many rows are 'too many' for a MySQL table?

You can't load all the data into your page at once. 600 million records is too many.
Since you mentioned select2, have a look at their example with pagination.
The trick is to limit your SQL results to maybe 100 or so at a time. When the user scrolls to the bottom of the list, it can automatically load in more.
Send the search query to the server, and do the filtering in SQL (or NoSQL or whatever you use). Database engines are built for that. Don't try filtering/sorting in JS with that many records.

Handling huge amount of data in Fixed Data Tables

I am trying to use Fixed Data Tables in my Web Application, I am dealing with large amount of data like hundreds of thousands of records. I am trying to load all the data at a time to make the best use of Search and Sort functionalities of Data Table.
Here is the link to the data table which I am using.
It is consuming huge time to load data, which is expected, but after loading of data getting some glitches in browser, I mean it is getting stuck.
How to handle huge amount of data in Data Tables with complete functionality?

The main advantage of using the fixed data table is that you can render the entire table based on an array or an object.
The official link for the fixed data table is given at:
http://schrodinger.github.io/fixed-data-table-2/example-object-data.html
The following link consists of rendering the table on the basis of JSON data. Some additional features like client side sorting and filtering can also be added as you mentioned that the data you have is huge.

How to do a bulk insert while avoiding duplicates in Postgresql

I'm working in nodejs, hosted at Heroku (free plan so far).
I get the data from elsewhere automatically (this part work fine and I get JSON or CVS), and my goal is do add them into a Prostresql DB.
While, I'm new to DB mangement and Postgresql, I've made my research before posting this. I'm aware that the COPY command exist, and how to INSERT multiple data without duplicate. But my problem is a mix of both (plus another difficulty).
I hope my question is not breaking the rules.
Short version, I need to :
Add lots of data a once
Never create duplicate
Rename column name between source data and my table
Long version with details :
The data I collect are from multiples sources (2 for now but will get bigger) and are quite big (>1000).
I also need to remap the column name to one unified system. What could be called "firstDay" on one source is called "dateBegin" in another, and I want them to be called "startDate" in my table.
If I'm using INSERT, I take care of this myself (in JS) while constructing the query. But maybe COPY could do that in a better way. Also, INSERT seem to have a limit of data you can push in one time, and so I will need to divide my query multiple time and maybe use callback or promise to avoid drowning the DB.
And finally, I will update this DB regularly and automatically and they will be a lot of duplicate. Hopefully, every piece of data has an unique id, and I have made a column PRIMARY KEY in the table that store this id. I thought it may eliminate any problem with duplicate, but I may be wrong.
My first version was very ugly (for loop making a new query a every loop) and didn't work. I was thinking about doing 1000 data at a time in a recursive way waiting for callback before sending another batch. It seem clunky and time expensive to do it that way. COPY seem perfect if I can select/rename/remap columns and avoid duplicated. I've read the documentation and I don't see a way to do that.
Thank you very much, any help is welcome. I'm still learning so please be kind.

I have done this before using temporary tables to "stage" your data and then do an INSERT SELECT to move the data from staging to your production table.
For populating your staging table you can use bulk INSERTs or COPY.
For example,
BEGIN;
CREATE TEMPORARY TABLE staging_my_table ( // your columns etc );
// Now that you have your staging table you can bulk INSERT or COPY
// into it from your code, e.g.,
INSERT INTO staging_my_table (blah, bloo, firstDay) VALUES (1,2,3), (4,5,6), etc.
// Now you can do an INSERT into your live table from your staging, e.g.,
INSERT INTO my_table (blah, bloo, startDate)
SELECT cool, bloo, firstDay
FROM staging_my_table staging
WHERE NOT EXISTS (
SELECT 1
FROM mytable
WHERE staging.bloo = mytable.bloo
);
COMMIT;
There are always exceptions, but this might just work for you.
Have a good one

Using Jquery AutoComplete with dictionary list

I have a dictionary list of about 58040 words and i don't think jquery auto complete can handle that many words as the browser hangs.
The list is
words = ['axxx','bxxx','cxxx', an so on];
$(".CreateAddKeyWords input").autocomplete({ source: words });
Am i doing something wrong
Is there another free tool that i can use
Edit
i am using .net and i have retrieved the data fro the database and can loop through the data server side, but how do you send the data back, if json format how should the format look like?

Is there another free tool that i can use
Yes, instead of hardcoding 58040 words in your HTML or javascript file you could load them from a remote datasource using AJAX. Basically you will have a server side script which when queries with the current user input will prefilter the result and send it to the client to display suggestions.

You should assign a minimum length of user entry before searching (so it isn't querying with 1 or 2 characters).
$(".CreateAddKeyWords input").autocomplete({ source: words, minLength: 3 });
It's possible the browser is hanging because it is trying to search on the very first character which is not very useful. ~58k entries is not a large dataset by most regards, especially when you narrow it by 2-3 character contents requirements.

That's just way too much data to have it load in your webpage. Limit it to 2 letters.
1) set the autocomplete min length to at least 2
2) Create a webpage that returns JSON data - http://mydomain.com/words.php?q={letters}
You can have the filter sort be 'begins with' before 'contains'; or any variation you prefer.
Use that page as your remote data source. With the min length set, autocomplete knows when to query for new data.

I thought this was an interesting problem, and hacked up a backend service that solves auto-completion.
My code is at https://github.com/badgerman/fastcgi/ (look for complete.c), and the quick and dirty javascript proof of concept from that repository is currently at http://enno.kn-bremen.de/prefix.html (no guarantees that it will stay up for very long, since this is running on the Raspberry Pi in my home).

Develop Reference

JavaScript is the programming language of the Web.