Let's imagine a DB with two tables: items and orders. The first table contains all items available for sale. The second one keeps track of all orders made by customers. Each item has a column 'quantity', which says how many of these items are available in stock. When making a new order, the backend checks if ordered amount is not greater than amount in stock. If so, the order is not created. Otherwise the order is created and the quantity of available items is updated.
The problem is that when two orders are created simultaneously, both checks are executed at the same time and two orders are created (not knowing about each other). As a result, there are two orders in DB with the total ordered amount larger than the actual quantity in stock.
I've already searched on how to handle this issue and encountered such concepts as transactions, locks, isolation and so on. I got to understand these terms but still didn't get what architectural solution needs to be implemented.
What exactly I need to do to solve this trouble? What SQL query to write for checking the stock before creating an order? Should I wrap it in a transaction and apply some isolation level to it? Or maybe I just need to lock the tables when making an order? Is it possible to just make the order creating operation wait until concurrent order is created? Still have no answers to these questions.
Hope for your help. Thanks!
For low to mid volume you can implement real time processing. For high volume you need to settle for near-real time.
For real time processing there are two options:
Optimistic Locking: The app doesn't lock the records that need to modify but only reads them. When the processing is finishing (ideally after a short period of time) the app updates the records with "concurrency check". Typically the records will feature a version number, a timestamp, or in extreme cases the entire record will be compared. If the update pasess the concurrency check, then all is good, the transaction is committed, and the order is complete. If the concurrency check does not pass, compensating logic needs to take place to retry the action, to recover it somehow, or to consider it failed. The benefit of this strategy is that it can process more orders with the same hardware. The downside is that it's a more complex solution since it needs to take care of extra path, not just the happy path.
Pessimistic Locking: The app reads and locks all the necessary records. All locks needs to be taken in the same sequence by all other competing processes. If all the locks are secured, then the order can be processed safely without the fear of a hiccup at the end of the task. The benefit of this strategy is that is simple to understand and to debug. The downside of this strategy is that it locks are expensive to obtain and can significantly impact the bandwidth of the app -- that is, how many orders per minute it can process.
Finally, for high volume you probably need to settle for near real time -- aka deferred processing. There are two strategies:
If your app can triage the orders by some criteria (region, client, type of products, warehouse, etc.) you can implement a microservice or a set of queues where each instance of the service/queue serves separate clients or stock items. This can provide a decent level of parallelism in this case.
If there can be no triage for the orders, then a single queue can process all the orders, one by one. This can be slow for some apps.
There are many ways to solve the problem. Assuming you are not trying to create a major retail site here, the simplest is to lock the row in items.
The easiest way is probably not even to check until after the subtraction has been done.
UPDATE ITEMS set quantity_avail=quantity_avail - $2
where part_num= $1
returning (quantity_avail)
And then if the returned value is less than 0, do a rollback of the transaction and report to the user that it is now out of stock.
But now what happens if the shipping department drops it on the floor and breaks it while preparing it for shipping?
Related
I'm trying to implement my first domain driven application, after going through Eric Evan's book on Domain-Driven Design. I'm a bit confused on how to go about this.
In my application, a user can purchase a service for getting them certain number of views on a video they post in Youtube, which is fulfilled by the other users of my app who watch those videos(Basically a replica of the many youtube promoter apps already available, for learning).
Say the service is represented in the app as an entity called WatchTime aggregate. The WatchTime entity contains some information like the id of user who purchased this service, the max number of views purchased, number of views already fulfilled, and points earned by someone who views the video once.
I decided to go with 3 bounded contexts, one for authentication, one for handling the watchtimes, like adding or removing them, and one for managing users and their data. Now the user has his personal info and some points that he collected while using the application.
At first I was thinking that all the user data and related actions be in the 3rd context, like adding more points to a user and or reducing his points, but then while making the model, I realized that that if the watch time purchasing service is going to be in the second one, then its going to have to communicate to the third one every time a WatchTime is purchased to tell a service there to reduce points for that purchase. It wouldn't make sense to keep them in two different ones.
So instead what I'm thinking of is have a model of the user in the 2nd bounded context, but with only points and the WatchTimes that this user purchased, so now it doesnt have to call something on the 3rd context.
My question is how to properly seperate things into contexts? Is it like based on the models, or should it be based on the functionality, and all models related to those functionality are going to be in the same context?
And another thing, how to ensure that all the objects of the same entity have the same value and properly persisted in the database? Should only one object representing a particular entity be present at a time, which will be persisted and disposed by the end of a function? Because I was thinking that if two objects representing the same entity be present at the same time, there's a possibility of both having different values or changing to different values.
If i sound like im rambling, please let me know if I have to be more clear. Thanks.
Bounded contexts basically define areas of functionality where the ubiquitous language (and thus the model) are the same. In different bounded contexts, "user" can mean different things: in a "user profile" context, you might have their email address but in the "viewing time" context, you'd just have the points granted and viewership purchased.
Re "another thing", in general you need to keep an aggregate strongly consistent and only allow an update to succeed if the update is cognizant of every prior update which succeeded, including any updates which succeeded after a read from the datastore. This is the single-writer principle.
There are a couple of ways to accomplish this. First, you can use optimistic concurrency control and store a version number with each aggregate. You then update the aggregate in the DB only if the version hasn't changed; otherwise you attempt the operation (performing all the validations etc.) against the new version of the aggregate. This requires some support in the DB for an atomic check of the version and update (e.g. a transaction).
An alternative approach (my personal preference) is to recognize that a DDD aggregate has a high level of mechanical sympathy to the actor model of computation (e.g. both are units of strong consistency). There are implementations of the actor model (e.g. Microsoft Orleans, Akka Cluster Sharding) which allow an aggregate to be represented by at most one actor at a given time (even if there is a cluster of many servers).
I would appreciate your time on answering this question before I commit it to some code.
I've looked through multiple docs on MongoDB and various forums for the answer to this question, however, I would like to know exactly how certain series of operations execute based on my specific circumstances.
Fundamentally, I would like to perform a Read (findOne) operation on a specific document in a collection within its own shard to obtain its current value, modify it, and then store it back into that document (findOneAndUpdate). That sounds easy enough, however, this specific document can potentially get Written upon during the delay between that Read and then Write operation if I understand properly.
For more context, this is a stock market based application. The shard and collection I am referring to here is a user's portfolio which contains information such as the stock ticker, the number of shares owned of that stock, and the total cost they have historically paid for it. The issue here is that when a player sells a share, I have to proportionally adjust the historic total cost by multiplying a ratio of (formerOwnedShares - executedSoldShares)/(formerOwnedShares) where formerOwnedShares is information that needs to be Read, processed, then Updated (i.e. the user currently owns 10 shares and has paid $10 for them over time, they sell 5 shares, now they own 5 shares and have paid $5 to maintain a consistent average cost per share).
I am concerned about the delay between the Read and Update operations. Several other areas of code can operate on this portfolio document from other users (such as other users buying/selling shares with them), so there is a very real possibility that the first Read operation pulls information from an older version of this document that has just recently been updated.
Is there any way to process this sort of situation in one atomic operation in order to maintain perfect concurrency? The issue is clearly the result of having to update a document's value based on that document's most recent value - which I do not think you can do in a single findOneAndUpdate operation.
Thank you very much for your time.
Use a transaction with read concern snapshot, which is an implementation of MVCC, to utilize the same base document version for both operations (find and find-and-modify).
Is there any way to process this sort of situation in one atomic operation in order to maintain perfect concurrency?
This question makes no sense. First, what do you mean by "perfect concurrency". Second, concurrency is not the same as serialization (in fact, they are exactly opposite).
A baseline approach to performing multiple operations that appear sequential in a concucrrent environment is to 1) maintain a "version" field in the document, 2) upon 2nd and subsequent operations, condition the updates on the version not changing, 3) if updates failed retry the entire process until they succeed.
Transactions with snapshot read concern hopefully automate all of this for you.
So the gist of my question. Imagine you have a service that handles 2-3-4-10 actions. And to communicate in several components, you have 2-3-4-10 Subjects.
So, is it better to have 1 subject, and pass in on next an object identifying which of the actions it relates to, and filter inside your subscription...or have the lot of them and subscribe separately?
How many subjects is too many? They more or less remain active all at once throughout.
Kind of curious in an abstract a sense as possible, rather than my own usecase and whether or not it could be done better.
I work on large angular applications that use hundreds if not thousands of subjects (Subject, BehaviorSubject, ReplaySubject, AsyncSubject).
Is there a performance hit for using many subjects?
To this, I'd say it's not the subjects that matter, since they are only taking up memory space. It's the pipeline that you attach to them, which places them into the cpu's computation queue that matters. This is dependent on the pipelines themselves, and not the subjects. You could have a single subject that connects to a long computational heavy pipeline, which if done incorrectly, would slow your program down since javascript runs on a single thread of execution (you could use web workers to avoid this problem).
Therefore, the number of subjects is irrelavent here if we are talking about how "performant" your application is. It's the pipelines that determine if your application is slow. Ie, data moving down a pipe and having operators manipulate it.
StackBlitz single pipe that is computationally heavy to prove my point.
Is it better to have 1 subject, and pass in on next an object identifying which of the actions it relates to, and filter inside your subscription?
I would say this is more of a design decision to have a bus of information ("a single subject") passing all your data along, instead of breaking them up into their respective streams. This could be handy if your data is interconnected, meaning your events depend on each other, and if the order they appear within the stream matters (like navigation events: started, performing, ended, etc).
I would be unhappy if a dev used one giant bin to place all their data into instead of breaking it up into respective streams. Ie, if I have a user object, company information, and notifications, I'd expect these to have separation of concerns, and to not be delivered through a bus system (a single subject), but to instead be delivered through different services, with their own respective subjects and pipelines.
How many subjects is too many? They more or less remain active all at once throughout.
If you're doing trivial maps and filtering, then don't worry about how many subjects you're using. Worry about if your streams of data make logical/logistical sense, and that they are structured/siloed correctly.
StackBlitz program combining 1 million behavior subjects to prove my point.
So imagine, I want to retrieve all orders for an array of customers.
The arrayList in the example below will have an array of customer IDs.
This array will be passed into the get method below and processed asynchronously retrieving orders for each customer ID in the array.
Here's where I get lost. How can you paginate the database result set and pull only a small set of records at a time from the database without bring having to pull all the records across the network.
What's confusing me is the asynchronous nature as well as we won't know how many orders per customer there are? So how can you efficiently return a set page size at a time?
service.js
function callToAnotherService(id) {
return new Promise((resolve, reject) => {
//calls service passing id
}
}
exports.get = arrayList => Promise.all(arrayList.map(callToAnotherService))
.then((result) => result);
In MySQL there are more than one way to achieve this.
The method you choose depends on many variables, such your actual pagination method (whether you just want to have "previous" and "next" buttons, or actually want to provide range from 1...n, where n is total number of matching records divided by your your per page records count); also on database design, planned growth, partitioning and/or sharding, current and predicted database load, possible hard query limits (for example if you have years worth of records, you might require end user to choose a reasonable time range for the query (last month, last 3 months, last year, and so on...), so they don't overload the database with unrestricted and too broad queries, etc..
To paginate:
using simple previous and next buttons, you
can use the simple LIMIT [START_AT_RECORD,]
NUMBER_OF_RECORDS method, as Rick
James proposed
above.
using (all) page numbers, you need to know the number of matching
records, so based on your page size you'd know how many total pages
there'd be.
using a mix of two methods above. For example you could present a few clickable page numbers (previous/next 5 for example), as well as first and last links/buttons.
If you choose one of the last two options, you'd definitely need to know the total number of found records.
As I said above, there is more than one way to achieve the same goal. The choice must be made depending on the circumstances. Below I'm describing couple simpler ideas:
FIRST:
If your solution is session based, and you can persist the session, then you can use a temporary table into which you could select only order_id (assuming it's a primary key in the orders table). Optionally, if you want to get the counts (or otherwise filter) per customer, you can also add the second column as customer_id next to order_id from the orders table. Once you have propagated the temporary table with minimum data, you can just easily count rows from the temporary table and create your pagination based on that number.
Now as you start displaying pages, you only select the subset of these rows (using the LIMIT method above), and join the corresponding records (the rest of the columns) from orders on temporary table order_id.
This has two benefits: 1) Browsing records page-by-page would be fast, as it's not querying the (presumably) large orders table any more.2)You're not using aggregate queries on the orders table, as depending on the number of records, and the design, these would have pretty bad performance, as well as potentially impacting the performance of other concurrent users.
Just bear in mind that the initial temporary table creation would be a bit slower query. But it'd definitely be even more slower if you didn't restrict temporary table to only essential columns.Still, it's really advisable that you set a reasonable maximum hard limit (number of temporary table records, or some time range) for the initial query
SECOND:
This is my favourite, as with this method I've been able to solve customers' huge database (or specific queries) performance issues on more than one occasion. And we're talking about going from 50-55 sec query time down to milliseconds. This method is especially immune to database scalability related slow downs.
The main idea is that you can pre-calculate all kinds of aggregates (be that cumulative sum of products, or number of orders per customer, etc...). For that you can create an additional table to hold the aggregates (count of orders per customer in your example).
And now comes the most important part:
You must use custom database triggers, namely in your case you can use ON INSERT and ON DELETE triggers, which would update the aggregates table and would increase/decrease the order count for the specific customer, depending on whether an order was added/deleted. Triggers can fire either before or after the triggering table change, depending on how you set them up.
Triggers have virtually no overhead on the database, as they only fire quickly once per (inserted/deleted) record (unless you do something stupid and for example run COUNT(...) query from some big table, which would completely defeat the purpose anyway)I usually go even more granular, by having counts/sums per customer per month, etc...
When done properly it's virtually impossible for aggregate counts to go out of sync with the actual records. If you application enables order's customer_id change, you might also need to add ON UPDATE trigger, so the customer id change for order would automatically get reflected in the aggregates table.
Of course there are many more ways you can go with this. But these two above have proven to be great. It's all depending on the circumstances...
I'm hoping that my somewhat abstract answer can lead you on the right path, as I could only answer based on the little information your question presented...
In MySQL, use ORDER BY ... LIMIT 30, 10 to skip 30 rows and grab 10.
Better yet remember where you left off (let's say $left_off), then do
WHERE id > $left_off
ORDER BY id
LIMIT 10
The last row you grab is the new 'left_off'.
Even better is that, but with LIMIT 11. Then you can show 10, but also discover whether there are more (by the existence of an 11th row being returned from the SELECT.
I'm pretty new to web development. From what I've read on race conditions I thought with node or JS they wouldn't be possible because of it being single threaded, but I see that is.. I guess wrong. With this little example can someone explain how it would work.
If there is a bank account with $1000 dollars in it and two people charge the account at the exact same second hitting the server at the exact same time. First person charges $600 and the second person charges $200.
The first charge would do $1000 - $600 leaving the balance at $400.
But since the second charge hit at the exact same time it would do $1000 - $200 leaving the balance at $800. When obviously the balance should now be $200.
From my understanding that would cause a race condition, no? How would you set this up to avoid this problem? I don't need exact code just maybe someone to explain this to me, or pseudo code.
Thanks in advance.
EDIT: I'll edit it for how the code would be set up initially causing the race condition.
Like the post below said. The code would be set up so that when the account is hit it would subtract the amount and give the new balance. Obviously that would cause the race condition.
Your example cannot be answered specifically without seeing the exact code being used as there are safe ways to write that code and unsafe ways to write it.
node.js is single threaded, but as soon as a request makes an async call, then other requests can run while that async request is being carried out. Thus, you can have multiple requests in flight at the same time. Whether or not this causes a "race condition" depends entirely upon how you write your code and, in your particular case, how you access the database.
If you write code like this (pseudo-code):
get total from database
subtract from total
write new total to database
And, the calls to the database are asynchronous (which they likely are), then you definitely have a race condition because in between the time you get the total and write the total, other requests could be attempting to access the same total value and attempting to modify it and one request will either not have the latest total value or the two will stomp on each other's results (one overwriting the other).
If, on the other hand, you have a database that can do an atomic modification of the total value in the database as in:
subtract x from total in database
Then, you will be protected from that particular race condition.
Because node.js is single threaded, it is not as complicated to write safe code in node.js as it is in a multi-threaded web server. This is because there is only one path of Javascript executing at the same time. So, until you make some sort of asynchronous I/O call, no other request will literally be running at the same time. This makes accessing shared variables in your node.js app massively simpler than in a true multi-threaded web server environment where any access to a shared variable must be protected by a mutex (or something similar). But, as soon as you make an async call, you must be aware that at that point in time, other requests can run.