Okay, let me start by saying that I know this is weird. I do.
But here goes:
Let's say I have an SQL database which stores my data. And let's say I don't have a choice in this, it has to be SQL. The application I'm building has somewhere in the region of 100,000 records in its database, and once every single record has been processed by the users of the application, they all go off and get sent to a different application entirely. So for a short period of time, this application will be in use, and then stops being used until the same time next year. While the application is in use, no external sources will be touching the database at all.
When the (Node) server starts up, it loads everything from the database, into an object literal on the server.
The client-side of this application, on a very basic level, makes requests (to an API on the server) for data, and sends updated versions of records back to the server once they've been processed.
So here's where it gets weird: Let's say I don't want to have the client-side application have to directly retrieve records from the database, nor do I want it to be able to write to them. So the data from the entire database already exists in memory on the server. There's a module on the server that can handle changing the representation of that data already (again, because the client application only interacts with APIs on the server, the database module exists to facilitate this).
Multiple users access the system at once, but due to the way the system works, it is not possible for two users to be sent the same record, so two users will never be sending an update back for the same record (records are processed individually, and sequentially).
So, let's say that I decided that, since I was already managing all of this data in memory on the server, I would just send an updated version of the current data, in its entirety, back to the database, every time it changed.
The question is, where does this rank on the crazy scale?
Performance, writing an entire database rather than single records, would obviously suffer. But, in a database that is only read from once (on start-up of the application), is that even a concern? If every operation other than "Write all the stuff when any of the stuff changes" happened in memory on the server, does it matter how long those updates actually take? If a new update to the database comes in whilst it's being updated, surely SQL will take care of this?
It feels like the correct way to do this of course, is to have each user directly getting their info from the database, and directly making updates to the database too (or at least interacting with API endpoints to make this happen), but, is just...not doing that, utter lunacy?
Like I said, I know it's weird, but other than the fact that "it feels kind of wrong", I'm not sure I'm convinced that it is in fact entirely wrong. So I figured that this place would have an opinion.
The way that I think it currently works is:
[SQL DB] is updated whenever a change happens on {in-memory DB}
{in-memory DB} is updated in various ways based on API calls to the server
makes requests for data, and sends updates to data, both of which are processed on the in-memory DB
Multiple requests can happen at the same time from the application, but mutliple users can not see the same record, because records are allocated to a given user before they're sent
Multiple updates can come from multiple users, each of which ultimately ends in the entire SQL database being saved to with the contents of the in-memory DB.
(Note: I'm not saying "is this the best way to do this". I'm just asking, is there a significant argument for caring about the performance of a database being written to, if it's not going to be read from again unless the server needs to be restarted)
What I think that I would do, in this situation, is to add an attribute to each cached record to indicate that the record is "dirty." In other words, that something has been done to it, by someone, since it was originally read from the database.
(You could also add an attribute that indicates that someone "has this particular record 'checked-out,'" so that you can be sure that two users are no updating the same record at the same time.)
At some convenient moment, you can then walk through the collection, posting the "dirty" records back to the database. Use an SQL Transaction, not only for efficiency but also to be sure that the final update to the database is atomic.
You will need to be very mindful of the possibility of race-conditions. One possible strategy is to use a Unix timestamp as a "dirty" indicator. A record is selected for posting to the database only if its "dirty-time" is greater-than or equal-to the timestamp when the commit-process was last run.
(And, P.S.: "no, I've seen even 'weirder' things than this, in all my crazy years in this crazy business ...)
I am currently in a situation as follows -
I am currently hosting a private webapp, which involves the sending of AJAX calls to the database server and displaying the results. (REST Architecture)
Users can query the database by toggling various buttons, which correspond to a specific query that performs the above operation.
Users have access to the entire database (product feature). However, we only allow the retrieval of a small portion of the database in a single call.
However, I do not want the user to be a able to create a bot that is able to recreate the entire database in a short period of time. At the moment, a user could easily do so by duplicating the HTTP request (access to JS script/packet analyzer).
I have considered the following solutions -
Bot detection algorithms
Captcha
iframe
disabling the ability to read source code/JS files with a custom browser
I would like to find out if any of these solutions are feasible, or if there are any better alternatives/architecture available. I just want to prevent the scrapping of the database.
Please help! Thank you!
I'm currently using AJAX and PHP to send updates to a postgreSQL database.
Say I had 1000 users all sending one ajax post request per second to a php script. Say that php script opened a connection, executed two SQL update commands every time it was run, and then closed the connection.
That would be 1000 connections per second - I'm guessing that isn't going to work out very well for me.
If it's not, how should I deal with it? I've read that node.js is a good solution - If it is are there any good guides for dealing with updating a postgreSQL from a webpage using javascript?
I already have data (some json, some other) in the postgreSQL database and it needs to stay in there, so ideally I would be able to just change the way the handshake between javascript and the database works and leave the rest the same.
As a side question: How many connections per second should I expect to be able to handle if that's my only bottleneck? And if there are more than the max 150 connections does it just queue the connection or does it do something obnoxious like post a message saying 'max connections hit' and not allow page loads?
Connection pooling or a "connection proxy". Try a search combining postgres and connection-pooling
There is a current survey on the Postgres project page:
What Connection Proxy do you use for PostgreSQL?
haproxy and pgbouncer are the most popular ATM.
I would start with pgBouncer at the Postgres Wiki.
Of course, the same connection being kept open and reused only works for the same session user. You can switch context within the same connection with SET role, though:
How to check role of current PostgreSQL user from Qt application?
I have a web application in PHP. One component of the application is submitting the data to a backend pipeline (also written in PHP). The pipeline is an external script, which my application calls using 'exec' php function. The pipeline is a multistep pipeline which executes several programs taking input from the program run before. I want to be able to display a message on the application page which submits to the pipeline, with completion of each step in the pipeline, such as
Step 1 completed... Now doing step 2.
Step 2 completed... Now doing step 3.
and so on and so forth.
I am open to use javascript/AJAX to do this, and also any other language compatible with PHP.
Any help is greatly appreciated.
Well working on the assumption that you have some kind of database backing your PHP front-end and pipeline, you don't specifically need something compatible with PHP, but rather something that could interface with your database.
Without having any further details on what you've set up/tried, etc. I can only offer an overview of the workflow I would use in this situation.
Front-end script is submitted and pushes the request into a processing queue.
User is shown a "processing, please wait" type page. This page makes a long-polling AJAX request or a Websocket connection to the front-end site to a script which polls the database for updates on the pipeline processing.
The pipeline scripts chain off each other and push into the database the details of their completion which are read off by the Websocket/long-polling front-end script and returned to the user via Javascript to display on the page.
Using the database as a go-between would be the easiest and most flexible approach. You could also use other languages if you're more comfortable with them so long as they're compatible with your database used on the PHP/pipeline side.
EDIT
While I don't have any links to a tutorial on exactly what you want to do, there are some basics behind it that you can use to piece together your solution:
I would start by getting your pipeline (processing) script to run in the background at interval using cron. Alternatively you could daemonize that pipeline using something like this PHP-Daemon framework so that it runs in the background. Perhaps having a cron task to check if it's running and restart it if needed.
From there you can build a table in your database that contains status updates on the processing tasks at hand and build a PHP script that checks the status of a given task and outputs JSON data about it. This JSON data could easily be read using AJAX requests, the simplest of which would probably be jQuery's .ajax() method. This could be called at interval on the client side "status" page using a setTimeout call at the end of each "loop" to poll for status changes every X seconds. This would be the easiest implementation of what you're after, although not the best performing or optimal way to do it.
So the general workflow here would change to this:
Front-end script is submitted and pushes the request into processing queue with status set to pending.
User is shown the processing page which pings status.php?task=12345 every X seconds.
Back-end daemon/cron script picks up the request and begins processing, pushing status updates into the database at interval.
status.php script begins to return different status information in the JSON code and it is displayed to the user.
I have been tasked with reading information from a table on a 3rd party page. The website will have multiple pages and thus will have to have the bookmarklet run on it once per page. I currently have the bookmarketlet pulling the data, and putting it into a pipe delimited array. I would like to send this pipe delimited array to a server side function that, in case of injection, sanitizes the data and then checks if it exists in a temp table, if it doesn't exist the the table, then insert.
After all of that is said and done, the script will send information about what happened during the server side scripting and the results will be presented to the user on the web page where the bookmarklet was executed.
I have looked into JSON, AJAX, and JavaScript as possible solutions to submit and work with data(Which I quickly detoured away from).
I am limited to using Microsoft solutions because of the environment I am working in.
So my question is, what would be best and how would I go about this? I have been unable to understand or execute any of these solutions.
What would be be most efficient way to post data to a database and get a response in a Microsoft environment using a bookmarklet on a 3rd party page, and get a response that the user sees?