Parse and predict tsv values for Highcharts gantt - javascript

I have to build a small management dashboard that consists of several parts. Most of them were no problem, but on the current page I got completely stuck.
Explanation:
I have used an example that I have found and some examples in the Highcharts forum to build my own chart.
The basic function is working, I am getting my data from a tsv file, building the arrays and displaying them.
Now comes the tricky part and I am not able to get my head around it…
The data I'm getting is live data from SAP and is the record of current production processes. (The provided dataset is a finished one, but later I want to display the actual live data.)
Since it is a pilot there is no chance to access the SAP database at the moment and I have to handle the data in tsv format.
There are 3 different states for each package.
B10: start
B30: pause
B40: stop
At the moment I am just separating all working packages and printing them, but according to the management here there are many functionalities missing.
Prerequisite: Refresh every x seconds
What they want (the easy version):
If one of the processes gets a B10, its starts “running” and its bar should get green. If there is no B30 in the next rows, the last point get’s the current time as “now” and is extending (the bar is growing).
If there is a B30, the process should “pause”, the color of the bar should change to red (including data points of that process) and only continue if there comes another B10. Between B30 and the new B10 there are no datapoints.
On B40 the process simply stops and turns grey again.
My problem: I have no single idea how I should analyze the “future” data (next steps). Should I separately run through the tsv again and check it for occurrences of B30 or B40 in advance? What’s making my life even more complicated is that two independent processes could run at the same time (for ex. Process 1 and Process 40). Also an indicator line should “run” to indicate the current time.
What they want in the future (or how I call it: nightmare):
2 or even more persons could work at the same process, but they always want to see just one bar per process… In the case of multiple operators it also has to be true that all operators have to set their status to B30, otherwise the process will not interrupted.
They further want to zoom into the current day with only the processes of the day, and if a process (for example 40) is started in “advance”, it should appear in the same zoom level. The processes for the day and a schedule will be provided in a separated file and I have to compare my data points to the schedule.
I know, it’s a weird explanation for a weird request. I also don't want anyone to give me a finished product, but I have to develop this here by myself and I just got a real problem with the logic. If I didn't explain well enough or there are further questions, please ask me and I will answer immediately.

Related

Fastest way to read in very long file starting on arbitary line X

I have a text file which is written to by a python program, and then read in by another program for display on a web browser. Currently it is read in by JavaScript, but I will probably move this functionality to python, and have the results passed into javascript using an ajax Request.
The file is irregularly updated every now and then, sometimes appending one line, sometimes as many as ten. I then need to get an updated copy of the file to javascript for display in the web browser. The file may grow to as large as 100,000 lines. New data is always added to the end of the file.
As it is currently written, javascript checks the length of the file once per second, and if the file is longer than it was last time it was read in, it reads it in again, starting from the beginning, this quickly becomes unwieldy for files of 10,000+ lines. Doubly so since the program may sometimes need to update the file every single second.
What is the fastest/most efficient way to get the data displayed to the front end in javascript?
I am thinking I could:
Keep track of how many lines the file was before, and only read in from that point in the file next time.
Have one program pass the data directly to the other without it reading an intermediate file (although the file must still be written to as a permanent log for later access)
Are there specific benefits/problems with each of these approaches? How would I best implement them?
For Approach #1, I would rather not do file.next() 15,000 times in a for loop to get to where I want to start reading the file, is there a better way?
For Approach #2, Since I need to write to the file no matter what, am I saving much processing time by not reading it too?
Perhaps there are other approaches I have not considered?
Summary: The program needs to display in a web browser data from python that is constantly being updated and may grow as long as 100k lines. Since I am checking for updates every 1 second, It needs to be efficient, just in case it has to do a lot of updates in a row.
The function you seek is seek. From the docs:
f.seek(offset, from_what)
The position is computed from adding offset to a reference point; the reference point is selected by the from_what argument. A from_what value of 0 measures from the beginning of the file, 1 uses the current file position, and 2 uses the end of the file as the reference point. from_what can be omitted and defaults to 0, using the beginning of the file as the reference point.
Limitation for Python 3:
In text files (those opened without a b in the mode string), only seeks relative to the beginning of the file are allowed (the exception being seeking to the very file end with seek(0, 2)) and the only valid offset values are those returned from the f.tell(), or zero. Any other offset value produces undefined behaviour.
Note that seeking to a specific line is tricky, since lines can be variable length. Instead, take note of the current position in the file (f.tell()), and seek back to that.
Opening a large file and reading the last part is simple and quick: Open the file, seek to a suitable point near the end, read from there. But you need to know what you want to read. You can easily do it if you know how many bytes you want to read and display, so keeping track of the previous file size will work well without keeping the file open.
If you have recorded the previous size (in bytes), read the new content like this.
fp = open("logfile.txt", "rb")
fp.seek(old_size, 0)
new_content = fp.read() # Read everything past the current point
On Python 3, this will read bytes which must be converted to str. If the file's encoding is latin1, it would go like this:
new_content = str(new_content, encoding="latin1")
print(new_content)
You should then update old_size and save the value in persistent storage for the next round. You don't say how you record context, so I won't suggest a way.
If you can keep the file open continuously in a server process, go ahead and do it the tail -f way, as in the question that #MarcJ linked to.

Match a recurring number pattern in data stream

I have an ongoing stream of data that consists of nothing but a single integer for every piece of data I receive.
So I get something like:
6462
6533
6536
6530
6462
376135
623437
616665
616362
616334
Here a graph of a complete pattern.
Now I know I will get a specific pattern in this stream of ints with a certain error margin, I know it will never be the exact same pattern but I will get a very similar one every now and then. Its a very high amplitude pattern similar to the numbers shown in the example. It basically oscillates between 3 states and shows a finer grained difference in every state but the interesting parts are the big differences.
I have no experience in pattern matching and analyzing data streams and no idea where to start. Ideally I would provide my code an data set and it would check if the incoming data stream is matching that data set by a certain margin.
Which is my biggest problem, I can't compare complete data sets, I have to compare a complete set with one that is being constantly generated and see if my pattern is starting to occur in this constant stream.
The main program is currently running in JavaScript/node.js and I'm not sure if JavaScript is suitable for this task but it would be great if I can stay in JavaScript.
Though if there are libraries that help with these kind of tasks in other languages I would be keen to test them out.

Most efficient way to populate/distribute indexedDB [datastore] with 350000 records

So, I have a main indexedDB objectstore with around 30.000 records on which I have to run full text search queries. Doing this with the ydn fts plugin this generates a second objectstore with around 300.000 records. Now, as generating this 'index' datastore takes quite long I figured it had be faster to distribute the content of the index as well. This in turn generated a zip file of around 7MB which after decompressing on the client side gives more than 40MB of data. Currently I loop over this data inserting it one by one (async, so callback time is used to parse next lines) which takes around 20 minutes. As I am doing this in the background of my application through a webworker this is not entirely unacceptable, but still feels hugely inefficient. Once it has been populated the database is actually fast enough to be even used on mid to high end mobile devices, however the population time of 20 minutes to one hour on mobile devices is just crazy. Any suggestions how this could be improved? Or is the only option minimizing the number of records? (which would mean writing my own full text search... not something I would look forward to)
Your data size is quite large for mobile browser. Unless user constantly using your app, it is not worth sending all data to client. You should use server side for full text search, while catching opportunistically as illustrated by this example app. In this way, user don't have to wait for full text search indexing.
Full-text search require to index all tokens (words) except some stemming words. Stemming is activated only when lang is set to en. You should profile your app which parts is taking time. I guess browser is taking most of the time, in that case, you cannot do much optimization other than parallelization. Sending indexed data (as you described above) will not help much (but please confirm by comparing). Also Web worker will not help. I assume your app have no problem with slow respond due to indexing.
Do you have other complaint other than slow indexing time?

saving project as incremental json diffs?

I've been building a web paint program wherein the state of a user's artwork is saved as a json object. Every time I add to the client's undo stack (just an array of json objects describing the state of theproject), I want to save the state to the server too.
I am wondering if there an elegant way to [1] only send up the diffs and then [2] be able to download the project later and recreate the current state of the project? I fear this could get messy and am trending towards just uploading the complete json project state at every undo step. Any suggestions or pointers to projects which tackle this sort of problem gracefully?
Interesting - and pretty large - question.
A lot of implementations / patterns / solutions apply to this problem and they vary depending on the type of "document" you're keeping track of updates of.
Anyway, a simple approach to avoid getting mad is, instead than saving "states", saving "command which produced those states".
If your application is completely deterministic (which I assume it is, since it's a painting program), you can be sure that for every command at given time & position, the result will be the same at every execution.
So, I would instead note down an "alphabet" representing the commands available in your program:
Draw[x,y,size, color]
Erase[x,y,size]
Move[x,y]
and so on. You can take inspiration from SVG implementation. Then push/pull strings of commands to/from the server:
timestamp: MOVE[0,15]DRAW[15,20,4,#000000]ERASE[4,8,10]DRAW[15,20,4,#ff0000]
This is obviously only a general, pseudocoded idea. Hope you can get some inspiration.

Saving and updating the state of an AJAX based graph in Rails

I have a Google Chart's ColumnChart in a Rails project. This is generated and populated in JavaScript, by calling a Rails controller action which renders JSON.
The chart displays a month's worth of information for a customer.
Above the chart I have next and previous arrows which allow a customer to change the month displayed on the chart. These don't have any functionality as it stands.
My question is, what is the best way to save the state of the chart, in terms of it's current month for a customer viewing the chart.
Here is how the I was thinking of doing the workflow:
One of the arrows is selected.
This event is captured in JavaScript.
Another request to the Rails action rendering JSON is performed with an additional GET parameter
passed, based on an data attribute of the arrow button (Either + or - ).
The chart is re-rendered using the new JSON response.
Would the logic around incrementing or decrementing the graphs current date be performed on the server side? With the chart's date being stored in a session array defaulting to the current date on first load?
On the other hand would it make sense to save the chart state on the client side within the JavaScript code or in cookie, then manipulate the date before it's sent to the Rails controller?
I've been developing with Rails for about 6 months and feel comfortable with it, but have only just recently started developing with JavaScript, using AJAX. My experience tying JS code together with Rails is some what limited at this point, so looking for some advice/best practices about how to approach this.
Any advice is much appreciated.
I'm going to go through a couple of options, some good, some bad.
First, what you definitely don't want to do is maintain any notion of what month you are in in cookies or any other form of persistent server-side storage. Certainly sometimes server state is necessary, but it shouldn't be used when their are easy alternatives. Part of REST (which Rails is largely built around) is trying to represent data in pure attributes rather than letting it's state be spread around like that.
From here, most solutions are probably acceptable, and opinion plays a greater role. One thing you could do is calculate a month from the +/- sign using the current month and send that to the server, which will return the information for the month requested.
I'm not a huge fan of this though, as you have to write javascript that's capable of creating valid date ranges, and most of this functionality will probably be on the server already. Just passing a +/- and the current month to the server will work as well, you'll just have to do a bit of additional routing and logic to resolve the sign on the server to a different month.
While either of these would work, my preferred solution would instead have the initial request for the month generate valid representations of the neighbouring months, and returning this to the client. Then, when you update the graph with the requested data, you also replace the forward/backward links on the graph with the ones provided by the server. This provides a nice fusion of the benefits of the prior two solutions - no additional routing on the server, and no substantive addition to the client-side code. Also, you have the added benefit of being able to grey out transitions to months where no data was collected from the client (i.e. before they were a customer and the future). Without this, you'd have to create separate logic to handle client requests for information that doesn't exist, which is extra work for you and more confusion for the customer.

Categories

Resources