Website remote render d3.js server side - javascript

Looking for a solution to an arguably strange problem. Ok, so we are using d3.js to plot charts and graphs. However our data sets can be very small, to intensely massive. Right now most of what we are doing is internal and just prototyping. However, we do show clients these charts and draw them in real time for them, quite often and rapidly change their inputs.
Doing this in D3 looks great, but can be slow as expected. I'm more interested in what the possibilities are for this process. Go to our website, loging, and show an instance of our dashboard being rendered remotely on the server. Our server cluster is a super demon beast so I'm not worried about it doing any heavy lifting. It can do these processes about 100x faster than our best pc so it seems if we could setup our website to create instances on the fly of our dashboard, BUT only have access to that user accounts data.
This is getting a bit convoluted so let me explain. We have a database, full of millions of data points. We have about 10 user accounts. Each have access to different pieces of this data. One has access to all of it, the other some of it. All of this is not the issue we are looking for a solution on. We are more interested in the ability of our server to create multiple instances of our site, through a window essentially, that the user is remotely controlling. Like a Remote Desktop in a way. We could even start with the user login form being part of the remote render. Where our system is fully hosted and operates on the server itself, and the we page is essentially a KVM on the server in a way. However it needs to handle multiple users at the same time.
We are using Centos 6.4 lots of python for the back end stuff, php HTML and a mixture of Postgres and SQLite, but I doubt any of this is important. Just want to cover my bases.

It seems unlikely to me that you'd be able meaningfully display millions of datapoints on a single screen without grouping and summarizing them in some way. Do the processing and summarize the data on the server and ship the resulting smaller datasets to the client, which will then plot your graphs and charts from that. It's likely you'll have more than one set of data now, but it should result in much better client performance. e.g.
{millions of points} -> transform on server -> data for bar chart to client
{millions of points} -> transform on server -> data for XY-scatter chart
etc.
What you've proposed is not really a programming issue, and isn't going to scale very well.

Related

Fetching millions of records with MariaDB, Sequelize & Node.js to display in DevExtreme PivotGrid

The title already sums it up.
I am currently facing this task: A SELECT statement results in 3.6 million records with many columns. This complete pack of data should be sent to the client, so that the browser can display everything in a pivot grid. So pagination is not an option, unfortunately.
The used stack is MariaDB, Node.js v8.11.3, Sequelize v3 and DevExtreme on the client side.
As you can imagine, the node server crashes with this amount of data.
I'd really appreciate any idea how this could be realized or if you have experience with such a task and came to the conclusion that this cannot be done with a web application, yet.
Thank you very much and have a great day!
Patrick
To answer your question: YOU CANNOT.
Providing huge datasets directly to client is generally a bad idea.
Imagine that for your example data is 1kb per row, you'll load 3Gb of data.
That much data will have to be sent on the network, so either server/client will run out of memory, or if no issue, it will be painfully slow to load and to then use.
There is no straightforward solution here, but you can avoid sending that much data
by rethinking how those data exist now to send only aggregation of those data, or by using a client library that uses pagination (pivot table of devexpress doesn't do that).
Just to answer part of the question, you can avoid crashing Node.js by avoiding loading the whole data-set into memory before sending it to the client with techniques like streaming and websocket.
3.6 million rows. Lets assume each row has 128 bytes of data, which is 460,800,000 bytes is 439 mb of raw data you're trying to select, which doesn't sound too bad.
But this data will be wrapped in objects/models/turned to json, so your memory requirement turns out at least roughly ten fold. 4.3GB.
Okay, still not too bad. Now we need to push it to browser, style it, wrap it in html, json etc...
We're going to push roughly 1.4GB in json to the client. client downloads happily. The json is in the browser. It's getting turned into an object. Memory times 4 roughly. 5.6GB. Not too shabby, but the browser will have copped out because it has a 256MB memory limit per tab(I've run into this when coding a game, will vary per browser).
But let's say it's a custom unbounded browser which can just do it
Iterate over the json and make a spreadsheet like display, create all DOM nodes, attach them to the DOM tree, attach event handlers, etc.. Memory times 20: 112GB.
So the customer has a big gaming rig with incredible amounts of RAM in it, a browser that can handle the addressing spaces, and an OS that can handle this.
Now you get into the fun territory of paging. The OS needs to page that RAM because too much goes unused and the OS has higher priority tasks to run whilst the user stares at the screen. No microsecond goes unspent. Write to disc, read from disc on every scroll, killing the hard drive of your client.
In short, the browser won't allow it because it has a memory limit. Explain to your customer what he want's requires a custom OS, custom browser, custom computer and still will be slow because of CPU limitations.
Just do what google docs does, load as needed. When the user scrolls, load the needed display data, no more no less, and unload data that's off screen for 5 minutes, to stay under your 256MB limit. When you have a query made up, it's just a simple question of setting an offset and limiting the number of results you want. The rest of the stuff works the same.
The real world has limits, your clients wishes do not. Bring them in balance.

JavaScript for visualizing, manipulating, and exporting data points?

I have been using Tkinter and Pygame for a GUI to visualize data points.
With Tkinter, files are chosen from a list, and then a Pygame window is open where the data points are graphed.
Pygame is not meant for data visualization, and I would like to use an alternative.
I want to use JavaScript to visualize the data, but I don't know how I would bring the large amounts of point data from my files into a browser window, and then have the user be able to manipulate the data (moving, adding, or deleting points), and then saving that data back out.
What tools do I need to bring the data into JavaScript, generate graphics for graphing and some other basic shapes (text, lines, dots), and then export the data out?
plotly.js is a great JS library that will allow you to visualize any data on the web with ease. You can check out the link and search for some tutorials on the internet, such as this one, but I don't think that is the problem.
The issue is with getting the data from a file.
When it comes to the web, you will need some sort of web server that can serve and receive files.
The web browser allows the user to input the file (this can be done with html tag input), then it sends a request containing the file data to your web server (this is where your JS code is), where you can then do whatever you like with the data and send a response back to be shown by the browser.
Although web server might be easy to get started in, with many different languages you can do it in (including JavaScript), I'm assuming it is not in your interest to build web servers as of yet.
If you insist on building this with JavaScript, you will have to find another method to get the data you want to plot, or actually build some kind of server that can handle these files you want the user to input. If this is really for you, here is a starting tutorial for building web server with Node.js (JavaScript), and here is another one for building web server with django (Python)
But if you don't like this idea and don't mind going back to Python, you can use matplotlib.
Python has a 3rd party module called matplotlib, which allows you to very easily plot points and graph them with many different customizations.
So, after you extract your points from your files, you can then remove whatever Pygame code you were using and instead very simply do:
import matplotlib.pyplot as plt
... # extracting your points from the files
plt.plot(xpoints, ypoints)
plt.show()
Where xpoints and ypoints are the points from the files
This will create a line graph. You can customize this graph by passing in an optional third argument to the plot function, for e.g:
plt.plot(xpoints, ypoints, "x")
This will only plot the points instead of drawing a line. You can also change "x" to "ro", "bo", "r+" and so many more.
You can refer to the link above to take a look at the matplotlib documentation and choose whether you would like to try it out or really still stick to JS. However IMO I think you will have to find another way to get your data in that case as building web servers takes time and understanding, and may be very confusing at first :)

Storing real time canvas session data on Node JS

Im writing a real time paint application using nodeJS + HTML5 canvas & websockets
currently server just acts as an relay and what ever each user draws is broadcast to the rest of the users.
The problem here is that when new users show up they start with a empty canvas.
I have two ideas on how to solve this,
1) the event driven approach - this is where i persist in memory each and every draw event. and when the new user shows up to the session. All events are reconstructed and sent to him/her.
2) the server maintains a copy of the canvas. So rather than just relaying the draw events, the server also renders all the draw events. When new user shows up, this state is then passed on to it.
Anyone has any thoughts on the pros and cons of the both approach or better yet, a better way to solve it!
This is my opinionated answer.
The best approach is to have the server maintain a copy of the of the data via a db. This means that whenever your client starts they will always have data to use in the case of dropped packets upon new client starting and also being able to maintain legacy data. When I developed a similar concept I used game objects as an example and got a good return to many clients. Without noticeable lag on a local net even with a faulty design concept. Hope this helps

Where in my stack can streams solve problems, replace parts and/or make it better?

If I take a look at the stream library landscape i see a lot of nice stuff (like mapping/reducing streams) but I'm not sure how to use them effectively.
Say I already have an express app that serves static files and has some JSON REST handlers connecting to a MongoDB database server. I have a client heavy app that can display information in widgets and charts (think highcharts) with the user filtering, drilling down into information etc. I would like to move to using real-time updating of the interface, and this is the perfect little excuse to introduce node.js into the project, I think, however the data isn't really real-time so pushing new data to a lot of client's isn't what I'm trying to achieve (yet). I just want a fast experience.
I want to use browserify, which gives me access to the node.js streams api in the browser (and more..) and given the enormity of the data sets, processing is done server-side (by a backend API over JSONP).
I understand that most of the connections at some point are already expressed as streams, but I'm not sure where I could use streams elsewhere effectively to solve a problem;
Right now, when sliders/inputs are changed, spinning loaders appear in affected components until the new JSON has arrived and is parsed and ready to be shot into the chart/widget. Putting a Node.JS server in between, can streaming things instead of request/responding chunks of JSONPified number data speed up the interactivity of the application?
Say that I have some time series data. Can a stream be reused so that when I say I want to see only a subset of the data (by time), I can have the stream re-send it's data, filtering out the ones I don't care about?
Would streaming data to a (high)chart be a better user experience then using a for loop and an array?

To Ajaxify Or Not?

I really love the way Ajax makes a web app perform more like a desktop app, but I'm worried about the hits on a high volume site. I'm developing a database app right now that's intranet based, that no more then 2-4 people are going to be accessing at one time. I'm Ajaxing the hell out of it, but it got me to wondering, how much Ajax is too much?
At what point does the volume of hits, outweigh the benefits seen by using Ajax? It doesn't really seem like it would, versus a whole page refresh, since you are, in theory, only updating the parts that need updating.
I'm curious if any of you have used Ajax on high volume sites and in what capacity did you use it? Does it create scaling issues?
On my current project, we do use Ajax and we have had scaling problems. Since my current project is a J2EE site that does timekeeping for the employees of a large urban city, we've found that it's best if the browser side can cache data that won't change for the duration of a user session. Fortunately we're moving to a model where we have a single admin process the timekeeping for as many employees as possible. This would be akin to how an ERP application might work (or an email application). Consequently our business need is that the browser-side can hold a lot of data, but we don't expect the volume of hits to be a serious problem. So we've kept an XML data island on the browser-side. In addition, we load data only on an as-needed basis.
I highly recommend the book Ajax Design Patterns or their site.
Ajax should help your bandwidth on a high volume site if that is your concern as you are, like you said, only updating the parts that need updating. My problem with Ajax is that your site can be rendered useless if the visitors do not have javascript enabled, and most of the time I do not feel like coding the site again for non-javascript users.
Look at it this way: AJAX must not be the only option because of the possibility of !script, it must exist as a layer on top of an existing architecture to provide a superior experience in some regards. Given that, it is impossible for AJAX to create more requests or more work than simple HTML because it is handling the exact same data transfer.
Where it can save you bandwidth and server load is because AJAX provides you the ability to transfer only the data. You can save on redundant HTML, image, css, etc requests with every page refresh whilst providing a snappier user experience.
As mike nvck points out the technique of polling is a big exception to this rule, but that's about the technique not the tech: you would have the same kind of impact if you had a simple page poll.
Understand the tool and use it for what it was designed. If AJAX implementation is reducing performance, you've done something wrong.
(fwiw, my experience of profiling AJAX vs simple HTML tends to result in ~60% bandwidth, ~80-90% performance benefits)
The most common scaling issue of ajax apps is when they are to set up to check back with the server to see if the content got updated in the meantime without the need for user actively requesting it. 5 clients checking every 10 seconds is not 5000 clients checking every 10 sec.
Ajax on one side reduces the server workload because it usually shows or refreshes just part of the page, while on the other side it increases number of hits to the server. I would say that all then depends of the architecture of your web application. If your application needs a lot of processing for every hit (like database access) regardless of size of the response, then Ajax will hit you a lot.

Categories

Resources