I'm building an instant messenger on mobile client that interacts with RESTful API through HTTP requests. The pagination endpoint is quite standard - it has starting location (offset) and number of items in a page (limit). I'm having trouble figuring out how to ensure 100% data consistency with pagination when the database can rapidly change.
For example, with some dozen participants, there could be a dozen new messages in a conversation within a second. I don't think it's far-fetched to guess that some of those messages can alter the database within the time the HTTP request for pagination comes back from the server. Fortunately, since this is a messenger I do not have to consider the possibility of data deletion and consider only the data addition.
Among my research, following two links were quite helpful but didn't provide clear solution:
How to ensure data integrity in paginated REST API?
How to implement robust pagination with a RESTful API when the resultset can change?
The only potential solution I can come up with is using the timestamp of the last object in the previously fetched page. So the HTTP query would have timestamp as a parameter, and the server would return a page of objects created after that timestamp.
Is there any potential problem I'm not seeing, or even better, a much better solution to this issue?
It seems that the method I've thought of has a name - cursor based pagination.
The link below has a great graphical description and explanation, plus an example in php.
http://www.sitepoint.com/paginating-real-time-data-cursor-based-pagination/
There's also a helpful guide from Django Framework that compares two different pagination techniques (LimitOffsetPagination and CursorPagination).
http://www.django-rest-framework.org/api-guide/pagination/
Cursor based pagination requires a unique, unchanging ordering of items. Facebook and Twitter use some generated IDs. As for me, I've decided to simply use timestamp at object creation, as it supports up to milliseconds precision. That should be good enough for now.
I am running MySQL 5.6. I have a number of various "name" columns in the database (in various tables). These get imported every year by each customer as a CSV data dump. There are a number of places that these names are displayed throughout this website. The issue is, the names have almost no formatting (and to this point, no sanitization existed upon importation):
Phil Eaton, PHIL EATON, Phil EATON, etc.
Thus, the website sometimes look like a mess when these names are involved. There are a number of ways that I can think to do this, but none that are particularly appealing.
First, I can have a filter in Javascript. However, as I said, these names exist in a number of places throughout this (large) site. I may end up missing a page. The names do not exist already within nice "name"-classed divs/spans, etc.
Second, I could filter in PHP (the backend). This seems about as effective as doing it in Javascript. I could do it on the API, but there was still not a central method for pulling names from the database. So I could still miss an API call anyway.
Finally, the obvious "best" way is to sanitize the existing data in place for each name column. Then at the same time, immediately start sanitizing all names that get imported each time we add a customer. The issue with the first part of this is that there are hundreds of millions of rows of names in the database. Updating these could take a long amount of time and be disruptive to the clients' daily routines.
So, the most appealing way to correct this in the short-term is to invoke a function every time a column is selected. In this way I could "decorate" every name column with a formatting function so the names will appear uniform on the frontend. So ultimately, my question is: is it possible to invoke a specific function in SQL to format each row every time a specific column is selected? In other words, maybe can I call a stored procedure every time a column is selected? (Point being, I'm trying to keep the formatting in SQL to avoid the propagation of usage.)
In MySQL you can't trigger something on SELECT, but I have an idea (it's only an idea, now I don't have time to try it, sorry).
You probably can create a VIEW on this table, with the same structure, but with the stored procedure applied to the names fields, and select from this view in your PHP.
But it has two backdraw:
You have to modify all your SELECT statements in your PHPs.
The server will always call that procedure. Maybe you can store the formatted values, then check for it (cache them).
On the other hand I agree with HLGEM, I also suggest to format the data on import, because it's a very bad practice to import something you don't check into a DB (SQL Injections?). The batch tasking is also a good idea to clean up the mess.
I presume names are called frequently so invoking a sanitization function every time they are called could severely slow down your system. Further, you can't just do a simple setting to get this, you would have to change every buit of SQL code that is run that includes names.
Personally how I would handle it is to fix the imports so they put in a sanitized version for new names. It is a bad idea to directly put any data into a database without some sort of staging and clean up.
Then I would tackle the old names and fix them in batches in a nightly run that is scheduled when the fewest people are using the system. You would have to do some testing on dev to determine how big a batch you could run without interfering with other things the database is doing. The alrger the batch the sooner you would get through all the names, but even though this will take time, it is the surest method of getting the data cleaned up and over time the data will appear better to the users. If the design of your datbase allows you to identify which are the more active names (such as an is_active flag for a customer or am order in the last year), I would prioritize the update by that. Alternatively, you could clean up one client at a time starting with whichever one has noticed the problem and is driving this change.
Other answers before give some possible solutions. But, the short answer for the specific option you are asking is : No. There is no such thing called a
"Select Statement Trigger", that too for a single column, although triggers come close for this kind of expectation, but only for Insert, Update and Delete operations.
I am using SSE for my real time application. I have sort of two types of notifications I need to check on database, one type is whenever the is an update, then sent to browser(it can take 1 hour, 2 hours etc.), ant the other type is I need to take data from database every 5 seconds, so I thought would it be better to use two sse scripts for each of these my two types? or should I chec everything in one script?Wont it be very slow if I use one script only? (by the way Im using php/mysql on server side)
I wrote something on this this morning. I would use one script personally and only include the long term update conditionally in the PHP when you need it.
This should cover everything you need with a few extras.
How to send json_encode data with HTML5 SSE
I have a set of data in a .js file that contains values of stock prices from 2013 to 2015, second by second
i use flotchart to update the chart in real time in order to follow the stock price movements instantly.
the problem is that anyone can open the .js file and have a look at the futur prices. I would like to keep the information hidden. Is it possible to do this without having to move my set of data on a server?
if it is not possible, what are my options to do it the best way in order to keep the process smooth?
as i plan to get one new value every second.
Thanks,
Deeprod
One idea is an ajax based pooling approach on a timer.
Depending on the technology you're using a more elegant solution could be a push framework called signalR that can push updates to your client.
http://signalr.net/
I have a lot of data which is date based in nature but with irregular reporting intervals. What I was hoping to do is to have my PHP backend send the data to the frontend JS/jQuery to display but in order to report effectively I need to be able to report by "week", "month", etc. I could try and do the SQL on the backend and structure different data series for each interval but that would be CPU intensive on the backend and I'd like the frontend to be a bit dynamic in being able to move between these intervals (aka, "show me monthly, no wait, show me weekly", etc.)
What I'm looking for -- I think -- is a JS/jQuery library that will help me take tabular data and group/aggregate it based on date based conditions. If need be I could try adding a column to the tabular structure which specifies the week number (and thereby simplifies the data math on frontend). In any case, I'm flexible at the moment and just hoping to hear of some good resources or approaches that have been tried before to solve this kind of problem.
Note: I am using jqGrid for tabular views on the frontend and Highcharts for graphical presentation. This is potentially not important but it might also open some creative alternatives.