Timestamp-based conflict resolution without reliable time synchronization - javascript

Let's take as an example a js "app", that basically does CRUD, so it creates, updates and deletes (not really) some "records".
In the most basic case, one does not need to resolve conflicts in such an application because the ACID properties of the DBMS are used to eleminate concurrent updates (I'm skimming over a ton of details here, I know). When there's no way to emulate serial execution of updates, one can use timestamps so determine whch update "wins". Even then the client need not worry about timestamps, because they can be generated at request time on the server.
But what if we take it one step further and allow the updates to queue up on the client for some unspecified amount of time (say, to allow the app to work when there's no network connectivity) and then pushed to the server? Then the timestamp can not be generated on the server, since the time when the update was pushed to the server and the actual time when the update was performed may vary greatly.
In the ideal world, where all the clocks are synchronized this is not a problem - just generate a timestamp on the client at the time when the update is performed. But in reality, time often drifts from the "server" time (which is assumed to be perfect, after all, its us configuring the server, what could ever go wrong with it?) or is just plain wrong by hours (possible when you don't set the time zone, but instead update the time / date of the system to match). What would one do to account for reality in such a case?
Perhaps there's some other way of conflict resolution, that may be used in such a case?

Your question has two aspects :
Synchronizing/serializing at server using timestamps via ACID properties of database.
Queues that are with client (delays which server is not aware of).
If you are maintaining queues at client which push to server when it sees fit, then it better have trivial synchronizing. Because it is just defeating the purpose of timestamps, which server relies on.
The scope of ACID is limited here because if clients updates are not realtime, it cannot serialize based on timestamp of request created than or request arrival. It creates a scenario where a request R2 created later than request R1 arrives before R1.
Time is a relative concept, using a local time for client or for server will cause drift for the other. Also it does not scale (inefficient if you have several peer nodes - distributed). And it introduces a single point of failure.
To resolve this vector-clocks were designed. They are logical clocks that increment clock when event occurs on the machine atomically. BASE databases (Basically Available, Soft state, Eventual consistency) use it.
The result is that 1 and 2 will never work out. You should never queue requests which use timestamp for conflict resolution.

Nice challenge. While I appreciated the user568109's anwser, this is how I handled a similar situation in a CQRS/DDD application.
While in a DDD application I had a lot of different commands and queries, in a CRUD application, for each type of "record" we have CREATE, UPDATE and DELETE commands and a READ query.
In my system, on the client I kept track of the previous sync in a tuple containing: UTC Time on the Server, Time on the Client (lets call this LastSync).
READ
Read query won't partecipate to syncronization. Still, in some situation you could have to send to the server a LogRead command to keep track of the informations that were used to take decisions. Such kind of commands did contain entity's type, entity's identifier and LastSync.ServerTime).
CREATE
Create commands are idempotents by definition: they either success or fail (when a record with the same identity already exists). At sync time you will have to either notify the user of the conflict (so that he can handle the situation, eg by changing the identifier) or to fix timestamp as explained later.
UPDATE
Update commands are a bit trickier, since you should probably handle them differently on different type of records. To keep it simple you should be able to impose the users that the last update always wins and design the command to carry only the properties that should be updated (just like a SQL UPDATE statement). Otherwise you'll have to handle automatic/manual merge (but believe me, it's a nightmere: most users won't ever understand it!) Initially my customer required this feature for most of entities, but after a while they accepted that the last update wins to avoid such complexity. Moreover, in case of Update on a Deleted object you should notify the situation to the user and, according to the type of entity updated, apply the update or not.
DELETE
Detele commands should be simple, unless you have to notify the user that an update occurred that could have lead him to keep the record instead of delete it.
You should carefully analyze how to handle each of this command for each of your type of entity (and in the case of UPDATEs you could be forced to handle them differently for different set of properties to update).
SYNC PROCESS
The sync session should start sending to the server a message with
Current Time on the Client
LastSync
This way the server could calculate the offset between its time and the client's time and apply such offset to every command he recieve. Moreover it can check if the offset changed after the LastSync and choose a strategy to handle this change. Note that, this way, the server won't know when the client's clock was adjusted.
At the end of a successful sync (it's up to you to decide what successful means here), the client would update the LastSync tuple.
A final note
This is a quite complex solution. You should carefully ponder with the customer if such complexity give you enough value before starting implementing it.

Related

Confusion regarding bounded contexts and interaction between them

I'm trying to implement my first domain driven application, after going through Eric Evan's book on Domain-Driven Design. I'm a bit confused on how to go about this.
In my application, a user can purchase a service for getting them certain number of views on a video they post in Youtube, which is fulfilled by the other users of my app who watch those videos(Basically a replica of the many youtube promoter apps already available, for learning).
Say the service is represented in the app as an entity called WatchTime aggregate. The WatchTime entity contains some information like the id of user who purchased this service, the max number of views purchased, number of views already fulfilled, and points earned by someone who views the video once.
I decided to go with 3 bounded contexts, one for authentication, one for handling the watchtimes, like adding or removing them, and one for managing users and their data. Now the user has his personal info and some points that he collected while using the application.
At first I was thinking that all the user data and related actions be in the 3rd context, like adding more points to a user and or reducing his points, but then while making the model, I realized that that if the watch time purchasing service is going to be in the second one, then its going to have to communicate to the third one every time a WatchTime is purchased to tell a service there to reduce points for that purchase. It wouldn't make sense to keep them in two different ones.
So instead what I'm thinking of is have a model of the user in the 2nd bounded context, but with only points and the WatchTimes that this user purchased, so now it doesnt have to call something on the 3rd context.
My question is how to properly seperate things into contexts? Is it like based on the models, or should it be based on the functionality, and all models related to those functionality are going to be in the same context?
And another thing, how to ensure that all the objects of the same entity have the same value and properly persisted in the database? Should only one object representing a particular entity be present at a time, which will be persisted and disposed by the end of a function? Because I was thinking that if two objects representing the same entity be present at the same time, there's a possibility of both having different values or changing to different values.
If i sound like im rambling, please let me know if I have to be more clear. Thanks.
Bounded contexts basically define areas of functionality where the ubiquitous language (and thus the model) are the same. In different bounded contexts, "user" can mean different things: in a "user profile" context, you might have their email address but in the "viewing time" context, you'd just have the points granted and viewership purchased.
Re "another thing", in general you need to keep an aggregate strongly consistent and only allow an update to succeed if the update is cognizant of every prior update which succeeded, including any updates which succeeded after a read from the datastore. This is the single-writer principle.
There are a couple of ways to accomplish this. First, you can use optimistic concurrency control and store a version number with each aggregate. You then update the aggregate in the DB only if the version hasn't changed; otherwise you attempt the operation (performing all the validations etc.) against the new version of the aggregate. This requires some support in the DB for an atomic check of the version and update (e.g. a transaction).
An alternative approach (my personal preference) is to recognize that a DDD aggregate has a high level of mechanical sympathy to the actor model of computation (e.g. both are units of strong consistency). There are implementations of the actor model (e.g. Microsoft Orleans, Akka Cluster Sharding) which allow an aggregate to be represented by at most one actor at a given time (even if there is a cluster of many servers).

How to deal with a race condition

I'm pretty new to web development. From what I've read on race conditions I thought with node or JS they wouldn't be possible because of it being single threaded, but I see that is.. I guess wrong. With this little example can someone explain how it would work.
If there is a bank account with $1000 dollars in it and two people charge the account at the exact same second hitting the server at the exact same time. First person charges $600 and the second person charges $200.
The first charge would do $1000 - $600 leaving the balance at $400.
But since the second charge hit at the exact same time it would do $1000 - $200 leaving the balance at $800. When obviously the balance should now be $200.
From my understanding that would cause a race condition, no? How would you set this up to avoid this problem? I don't need exact code just maybe someone to explain this to me, or pseudo code.
Thanks in advance.
EDIT: I'll edit it for how the code would be set up initially causing the race condition.
Like the post below said. The code would be set up so that when the account is hit it would subtract the amount and give the new balance. Obviously that would cause the race condition.
Your example cannot be answered specifically without seeing the exact code being used as there are safe ways to write that code and unsafe ways to write it.
node.js is single threaded, but as soon as a request makes an async call, then other requests can run while that async request is being carried out. Thus, you can have multiple requests in flight at the same time. Whether or not this causes a "race condition" depends entirely upon how you write your code and, in your particular case, how you access the database.
If you write code like this (pseudo-code):
get total from database
subtract from total
write new total to database
And, the calls to the database are asynchronous (which they likely are), then you definitely have a race condition because in between the time you get the total and write the total, other requests could be attempting to access the same total value and attempting to modify it and one request will either not have the latest total value or the two will stomp on each other's results (one overwriting the other).
If, on the other hand, you have a database that can do an atomic modification of the total value in the database as in:
subtract x from total in database
Then, you will be protected from that particular race condition.
Because node.js is single threaded, it is not as complicated to write safe code in node.js as it is in a multi-threaded web server. This is because there is only one path of Javascript executing at the same time. So, until you make some sort of asynchronous I/O call, no other request will literally be running at the same time. This makes accessing shared variables in your node.js app massively simpler than in a true multi-threaded web server environment where any access to a shared variable must be protected by a mutex (or something similar). But, as soon as you make an async call, you must be aware that at that point in time, other requests can run.

Call SQL "function" (stored procedure?) every time a database column is selected

I am running MySQL 5.6. I have a number of various "name" columns in the database (in various tables). These get imported every year by each customer as a CSV data dump. There are a number of places that these names are displayed throughout this website. The issue is, the names have almost no formatting (and to this point, no sanitization existed upon importation):
Phil Eaton, PHIL EATON, Phil EATON, etc.
Thus, the website sometimes look like a mess when these names are involved. There are a number of ways that I can think to do this, but none that are particularly appealing.
First, I can have a filter in Javascript. However, as I said, these names exist in a number of places throughout this (large) site. I may end up missing a page. The names do not exist already within nice "name"-classed divs/spans, etc.
Second, I could filter in PHP (the backend). This seems about as effective as doing it in Javascript. I could do it on the API, but there was still not a central method for pulling names from the database. So I could still miss an API call anyway.
Finally, the obvious "best" way is to sanitize the existing data in place for each name column. Then at the same time, immediately start sanitizing all names that get imported each time we add a customer. The issue with the first part of this is that there are hundreds of millions of rows of names in the database. Updating these could take a long amount of time and be disruptive to the clients' daily routines.
So, the most appealing way to correct this in the short-term is to invoke a function every time a column is selected. In this way I could "decorate" every name column with a formatting function so the names will appear uniform on the frontend. So ultimately, my question is: is it possible to invoke a specific function in SQL to format each row every time a specific column is selected? In other words, maybe can I call a stored procedure every time a column is selected? (Point being, I'm trying to keep the formatting in SQL to avoid the propagation of usage.)
In MySQL you can't trigger something on SELECT, but I have an idea (it's only an idea, now I don't have time to try it, sorry).
You probably can create a VIEW on this table, with the same structure, but with the stored procedure applied to the names fields, and select from this view in your PHP.
But it has two backdraw:
You have to modify all your SELECT statements in your PHPs.
The server will always call that procedure. Maybe you can store the formatted values, then check for it (cache them).
On the other hand I agree with HLGEM, I also suggest to format the data on import, because it's a very bad practice to import something you don't check into a DB (SQL Injections?). The batch tasking is also a good idea to clean up the mess.
I presume names are called frequently so invoking a sanitization function every time they are called could severely slow down your system. Further, you can't just do a simple setting to get this, you would have to change every buit of SQL code that is run that includes names.
Personally how I would handle it is to fix the imports so they put in a sanitized version for new names. It is a bad idea to directly put any data into a database without some sort of staging and clean up.
Then I would tackle the old names and fix them in batches in a nightly run that is scheduled when the fewest people are using the system. You would have to do some testing on dev to determine how big a batch you could run without interfering with other things the database is doing. The alrger the batch the sooner you would get through all the names, but even though this will take time, it is the surest method of getting the data cleaned up and over time the data will appear better to the users. If the design of your datbase allows you to identify which are the more active names (such as an is_active flag for a customer or am order in the last year), I would prioritize the update by that. Alternatively, you could clean up one client at a time starting with whichever one has noticed the problem and is driving this change.
Other answers before give some possible solutions. But, the short answer for the specific option you are asking is : No. There is no such thing called a
"Select Statement Trigger", that too for a single column, although triggers come close for this kind of expectation, but only for Insert, Update and Delete operations.

Strategies to handle frequently changing time series data using socket io and angularjs

I'm currently sending data to browsers using socket io. There are many devices sending data to server which is then broadcasted to browsers. The data that is pushed from server is way too frequent like once every second per device and change in data is causing angular digest loop to kick off. This I guess will have a performance impact on mobile devices. I've changed the code to push data at 10 seconds frequency from server once with most recent data for all devices. I'd like to get it more real time, is there any best practices to deal with push model and being real time within angular? I know this is a performance related question without any hard numbers - I'd be happy to run some tests to give you numbers if you want.
Use $evalAsnyc. I pushing more than 100 update per second and there is no performance problem.

saving project as incremental json diffs?

I've been building a web paint program wherein the state of a user's artwork is saved as a json object. Every time I add to the client's undo stack (just an array of json objects describing the state of theproject), I want to save the state to the server too.
I am wondering if there an elegant way to [1] only send up the diffs and then [2] be able to download the project later and recreate the current state of the project? I fear this could get messy and am trending towards just uploading the complete json project state at every undo step. Any suggestions or pointers to projects which tackle this sort of problem gracefully?
Interesting - and pretty large - question.
A lot of implementations / patterns / solutions apply to this problem and they vary depending on the type of "document" you're keeping track of updates of.
Anyway, a simple approach to avoid getting mad is, instead than saving "states", saving "command which produced those states".
If your application is completely deterministic (which I assume it is, since it's a painting program), you can be sure that for every command at given time & position, the result will be the same at every execution.
So, I would instead note down an "alphabet" representing the commands available in your program:
Draw[x,y,size, color]
Erase[x,y,size]
Move[x,y]
and so on. You can take inspiration from SVG implementation. Then push/pull strings of commands to/from the server:
timestamp: MOVE[0,15]DRAW[15,20,4,#000000]ERASE[4,8,10]DRAW[15,20,4,#ff0000]
This is obviously only a general, pseudocoded idea. Hope you can get some inspiration.

Categories

Resources