saving project as incremental json diffs?

saving project as incremental json diffs? - javascript

I've been building a web paint program wherein the state of a user's artwork is saved as a json object. Every time I add to the client's undo stack (just an array of json objects describing the state of theproject), I want to save the state to the server too.
I am wondering if there an elegant way to [1] only send up the diffs and then [2] be able to download the project later and recreate the current state of the project? I fear this could get messy and am trending towards just uploading the complete json project state at every undo step. Any suggestions or pointers to projects which tackle this sort of problem gracefully?

Interesting - and pretty large - question.
A lot of implementations / patterns / solutions apply to this problem and they vary depending on the type of "document" you're keeping track of updates of.
Anyway, a simple approach to avoid getting mad is, instead than saving "states", saving "command which produced those states".
If your application is completely deterministic (which I assume it is, since it's a painting program), you can be sure that for every command at given time & position, the result will be the same at every execution.
So, I would instead note down an "alphabet" representing the commands available in your program:
Draw[x,y,size, color]
Erase[x,y,size]
Move[x,y]
and so on. You can take inspiration from SVG implementation. Then push/pull strings of commands to/from the server:
timestamp: MOVE[0,15]DRAW[15,20,4,#000000]ERASE[4,8,10]DRAW[15,20,4,#ff0000]
This is obviously only a general, pseudocoded idea. Hope you can get some inspiration.

Related

Nodejs can i save an array in my backend, and use it when i need it?

I have a small website, when you go into it it'll show you a quote.
Till today what I was doing is, when a user goes to my website a random quote that directly comes from the database will be shown (when I say directly I mean a connection was made to the database and return a quote from it) but sometimes it took some time like 1 or 2 seconds, today I did something different when my nodejs application starts I grab every quote in the database and store them inside an Array. So when someone comes to my website I'll randomly choose a quote in the Array, and it was so much faster compared to the first way of doing it and I make some changes so when I add new quote to the database the Array automatically updated.
So here is my question, is it bad to store data inside an array and serve users with it?

There will be a few different answer according to your intentions. First of all, if the dataset of quotes are a lot in quantity. I assure you it is a very bad idea but if you are talking about a few items. Well, it's acceptable. However, if you are building a scalable application, it's not much recommended because you will keep all copies of the dataset in each node etc.
If you want a very fast quote storage, I would recommend redis (a key value storage for RAM). It shares the state for each node which means your all nodes connect to redis and the quotes are kept in redis so that you do not need to keep the copies and it becomes fast. Also, if you activate the disk record option, you can use redis as your primary quote storage. In the end, you won't update these quotes too much and they won't be searched with a complex query.
Alternatively, if your database is mysql, postgre or mongodb, you can activate ram storage option so that you don't need to keep that data on your array but directly take it form db which is much more fast but also queryable.

There's the old joke: The two hard things in software engineering are naming things, caching things, and off-by-one errors.
You're caching something: your array of strings. Then you select one at random from the array each time you need one.
What is right? You get your text string from memory, and eliminate the time-delay involved in getting it from the database. It's a good optimization.
What can go wrong?
Somebody can add or remove strings from your database, which makes your cache stale.
You can have so many text strings you blow out your nodejs RAM. This seems unlikely; it's hard to imagine a list of quotes that big. The Hebrew Bible, the New Testament, and the Qur'an together comprise less than a million words. You probably won't have more text in your quotable-quotes than that. 10-20 megabytes of RAM is nothing these days.
So, what about your stale cache in RAM? What to do?
You could ignore the problem. Who cares if the cache is stale?
You could reread the cache every so often.
Your use of RAM for this is a good optimization. But, it adds a cache to your application. A cache adds complexity, and the potential for a bug. Is the optimization worth the trouble? Only you can guess the answer to that question.
And, it's MUCH MUCH better than doing SELECT ... ORDER BY RAND() LIMIT 1; every time you need something random. That is a notorious query-performance antipattern.

"fragmenting" HTTP requests

I have an Angular app pulling data from a REST server. Each item we pull has some "core" data - what's needed to display it's basic representation - and then what I call "secondary" data, comments and other things that the user might want to see and might not.
I'm trying to optimize our request pattern to minimize the overall amount of time the user spends looking at a loading spinner: Pulling all (core/secondary) data at once causes the initial request to return far too slowly, but pulling only the bare essentials until the user asks for something we haven't requested yet also creates unnecessary load time, at least inasmuch as I could've anticipated them wanting to see it and loaded it while they were busy reading the core content.
So, right now I'm doing a "core content" pull first and then initiating a "secondary" pull at the end of the success callback from the first. This is going to be an experimental process, but I'm wondering what (if any) best practices have been established in this situation. (I'm sure a good answer to that is a google away, but in this instance I'm not quite sure what to google - thus the quotation marks in this question's title)
A more concrete question: Am I better off initiating many small HTTP transactions or a few large ones? My instinct is to do many small ones, particularly if I can anticipate a few things the user is likeliest to want to see first and get those loaded as soon as possible. But surely there's an asymptote here? Or am I off-base in this line of thinking entirely?

I use the same approach as you, and it works pretty well for a many-keyed, 10,0000+ collection.
The collection is paginated with ui.bootstrap.pagination, only a maximum of 10 items are displayed at once. It can be searched on title.
So my approach is to retrieve only id and title, for the whole collection, so the search can be used straight away.
Then, as the items displayed on screen are in an array, I place a $watch on that array. The job of the $watch is to go fetch full details of the items in the array (secondary pull), but of course only when the array is changed.
So, in the worst case scenario, you are pulling the full details of only 10 items.
Results are cached for more efficiency. It displays instant results, as the $watch acts as a pre-loader.
Am I better off initiating many small HTTP transactions or a few large ones?
I believe large transactions, for just a few items (the ones which are clickable on the screen) are very efficient.
Regarding the best practice bit: I suppose there are many ways to achieve your goals; however, the technique you are using works extremely well, as it retrieves only what is needed, and only just before it is needed.
Besides, it is simple enough to implement.
Also, like you I would have thought many smaller pulls were surely better than several large ones. However, I was advised to go for a large pull as a comment to this question: Fetching subdocuments with angular $http

To answer you question about which keywords to search for, I suggest:
progressive loading
An alternative could be using websockets and streaming loading: Oboe.js does this quite well:
http://oboejs.com/examples

Most efficient way to populate/distribute indexedDB [datastore] with 350000 records

So, I have a main indexedDB objectstore with around 30.000 records on which I have to run full text search queries. Doing this with the ydn fts plugin this generates a second objectstore with around 300.000 records. Now, as generating this 'index' datastore takes quite long I figured it had be faster to distribute the content of the index as well. This in turn generated a zip file of around 7MB which after decompressing on the client side gives more than 40MB of data. Currently I loop over this data inserting it one by one (async, so callback time is used to parse next lines) which takes around 20 minutes. As I am doing this in the background of my application through a webworker this is not entirely unacceptable, but still feels hugely inefficient. Once it has been populated the database is actually fast enough to be even used on mid to high end mobile devices, however the population time of 20 minutes to one hour on mobile devices is just crazy. Any suggestions how this could be improved? Or is the only option minimizing the number of records? (which would mean writing my own full text search... not something I would look forward to)

Your data size is quite large for mobile browser. Unless user constantly using your app, it is not worth sending all data to client. You should use server side for full text search, while catching opportunistically as illustrated by this example app. In this way, user don't have to wait for full text search indexing.
Full-text search require to index all tokens (words) except some stemming words. Stemming is activated only when lang is set to en. You should profile your app which parts is taking time. I guess browser is taking most of the time, in that case, you cannot do much optimization other than parallelization. Sending indexed data (as you described above) will not help much (but please confirm by comparing). Also Web worker will not help. I assume your app have no problem with slow respond due to indexing.
Do you have other complaint other than slow indexing time?

Timestamp-based conflict resolution without reliable time synchronization

Let's take as an example a js "app", that basically does CRUD, so it creates, updates and deletes (not really) some "records".
In the most basic case, one does not need to resolve conflicts in such an application because the ACID properties of the DBMS are used to eleminate concurrent updates (I'm skimming over a ton of details here, I know). When there's no way to emulate serial execution of updates, one can use timestamps so determine whch update "wins". Even then the client need not worry about timestamps, because they can be generated at request time on the server.
But what if we take it one step further and allow the updates to queue up on the client for some unspecified amount of time (say, to allow the app to work when there's no network connectivity) and then pushed to the server? Then the timestamp can not be generated on the server, since the time when the update was pushed to the server and the actual time when the update was performed may vary greatly.
In the ideal world, where all the clocks are synchronized this is not a problem - just generate a timestamp on the client at the time when the update is performed. But in reality, time often drifts from the "server" time (which is assumed to be perfect, after all, its us configuring the server, what could ever go wrong with it?) or is just plain wrong by hours (possible when you don't set the time zone, but instead update the time / date of the system to match). What would one do to account for reality in such a case?
Perhaps there's some other way of conflict resolution, that may be used in such a case?

Your question has two aspects :
Synchronizing/serializing at server using timestamps via ACID properties of database.
Queues that are with client (delays which server is not aware of).
If you are maintaining queues at client which push to server when it sees fit, then it better have trivial synchronizing. Because it is just defeating the purpose of timestamps, which server relies on.
The scope of ACID is limited here because if clients updates are not realtime, it cannot serialize based on timestamp of request created than or request arrival. It creates a scenario where a request R2 created later than request R1 arrives before R1.
Time is a relative concept, using a local time for client or for server will cause drift for the other. Also it does not scale (inefficient if you have several peer nodes - distributed). And it introduces a single point of failure.
To resolve this vector-clocks were designed. They are logical clocks that increment clock when event occurs on the machine atomically. BASE databases (Basically Available, Soft state, Eventual consistency) use it.
The result is that 1 and 2 will never work out. You should never queue requests which use timestamp for conflict resolution.

Nice challenge. While I appreciated the user568109's anwser, this is how I handled a similar situation in a CQRS/DDD application.
While in a DDD application I had a lot of different commands and queries, in a CRUD application, for each type of "record" we have CREATE, UPDATE and DELETE commands and a READ query.
In my system, on the client I kept track of the previous sync in a tuple containing: UTC Time on the Server, Time on the Client (lets call this LastSync).
READ
Read query won't partecipate to syncronization. Still, in some situation you could have to send to the server a LogRead command to keep track of the informations that were used to take decisions. Such kind of commands did contain entity's type, entity's identifier and LastSync.ServerTime).
CREATE
Create commands are idempotents by definition: they either success or fail (when a record with the same identity already exists). At sync time you will have to either notify the user of the conflict (so that he can handle the situation, eg by changing the identifier) or to fix timestamp as explained later.
UPDATE
Update commands are a bit trickier, since you should probably handle them differently on different type of records. To keep it simple you should be able to impose the users that the last update always wins and design the command to carry only the properties that should be updated (just like a SQL UPDATE statement). Otherwise you'll have to handle automatic/manual merge (but believe me, it's a nightmere: most users won't ever understand it!) Initially my customer required this feature for most of entities, but after a while they accepted that the last update wins to avoid such complexity. Moreover, in case of Update on a Deleted object you should notify the situation to the user and, according to the type of entity updated, apply the update or not.
DELETE
Detele commands should be simple, unless you have to notify the user that an update occurred that could have lead him to keep the record instead of delete it.
You should carefully analyze how to handle each of this command for each of your type of entity (and in the case of UPDATEs you could be forced to handle them differently for different set of properties to update).
SYNC PROCESS
The sync session should start sending to the server a message with
Current Time on the Client
LastSync
This way the server could calculate the offset between its time and the client's time and apply such offset to every command he recieve. Moreover it can check if the offset changed after the LastSync and choose a strategy to handle this change. Note that, this way, the server won't know when the client's clock was adjusted.
At the end of a successful sync (it's up to you to decide what successful means here), the client would update the LastSync tuple.
A final note
This is a quite complex solution. You should carefully ponder with the customer if such complexity give you enough value before starting implementing it.

Mass astar pathfinding

I'm trying to create a tower defence game in Javascript.
It's all going well apart from the pathfinding..
I'm using the astar code from this website: http://www.briangrinstead.com/blog/astar-search-algorithm-in-javascript which uses a binary heap (which I believe is fairly optimal)
The problem i'm having is I want to allow people to block the path of the "attackers". This means that each "attacker" needs to be able to find its way to the exit on its own (as someone could just cut off a single "attacker" and it would need to find its own way to the exit). Now 5/6 attackers can pathfind at any one time with no issue. But say the path is blocked for 10+ attackers, all 10 of them will need to fire its pathfinding script at the same time which just drops the FPS to about 1/2 per sec.
This must be a common problem for anyone who has a lot of entities pathfinding at anyone time, so I imagine there must be a better way than my approach.
So my question is: What is the best way to implement mass pathfinding algorithm to multiple "bots" in the most efficient way.
Thanks,
James

Use Anti-objects, this is the only way to get cheap pathfinding, afaik :
http://www.cs.colorado.edu/~ralex/papers/PDF/OOPSLA06antiobjects.pdf
Anti-object basically mean that instead of bots having individual ai, you will have one "swarm ai", which is bound to your game map.
p.s.: Here is another link about pathfinding in general (possibly the best online reference available):
http://theory.stanford.edu/~amitp/GameProgramming/index.html

Just cache the result.
Store the path as the value in a hash table (object), give each node a UUID, concatenate the UUIDs to form a unique hash table key and insert the path into it.
When you retrieve the path back out of the hash table, walk the path, and see if it's still valid, if not, recalculate and insert the new one back in.
There are many optimization that you can do :)
Like c69 said swarm AI or hive mind come to mind :P

Develop Reference

JavaScript is the programming language of the Web.