I need to gather information when user sees an article. User will browse through 1-30 articles in a minute (maybe even more if user just scrolls through everything looking something specific). I was wondering which way i can keep my server costs at minimum:
At client side javascript i push article id's into an array and send it to server when there is 30-60 id's. At server i loop through all the id's and insert them into database.
Every single time when user sees an article i will send one article id to server. In some cases this can cause over 60 requests in a minute. At server i insert the id into database.
In most of the cases, there is always a trade-off. And a lot of times, the optimal solution lies somewhere in the middle. I feel you should support both and use them interchangeably depending on the situation. Please go through following scenarios:
Will your end-user have bandwidth issues? If yes, it may make sense to go with option 2 or reduce the number of articles to a number such that it can be easily fetched at lower bandwidth as well.
Assuming the user does not has bandwidth issues such that loading of 30-60 articles won't take a lot of time for user, you can go with option 1 and keep using this option for subsequent fetch as well.
A lot of times it will make sense to go with option 1 for initial fetch and then fetch a lower number of articles after that.
Regarding server cost, it will make sense to send 30-60 articles together provided user reads them all. If you feel he won't read them all, find an optimal number using your app's analytics and send those number of articles in one go, provided bandwidth won't be an issue for user.
tl;dr; In data you should trust. Use your intuition, existing app usage patterns, and bandwidth availability of the user to make an informed decision. Also, server cost is not the only thing. Experience matters more, I think.
Related
Good evening,
my project uses the MEAN Stack and has a few collections and a single database from which the data is retrieved.
Thinking about how the user would interface itself with the webapp I am going to build, I figured that my idea of the application is quite a bit of a waste.
Now, the application is hosted on a private server on the LAN, making it very fast on requests and it's running an express server.
The application is made around employee management, services and places where the services can take place. Just describing, so to have an idea.
The "ring to rule them all" is pretty much the first collection, services, which starts the core of the application. There's a page that let's you add rows, one for each service that you intend to do and within that row you choose an employee to "run the service", based on characteristics that this employee has, meaning that if the service is about teaching Piano, the employee must know how to play Piano. The same logic works for the rest of the "columns" that will build up my row into a full service recognized by the app as such.
Now, what I said above is pretty much information retrieval from a database and logic to make the application model the information retrieved and build something with it.
My question or rather my doubt comes from how I imagined the querying to work for each field that is part of the service row. Right now I'm thinking about querying the database (mongodb) each time I have to pick a value for a field, but if you consider that I might want to add a 100 rows, each of which would have 10 fields, that would make up for a lot of requests to the database. Now, that doesn't seem elegant, nor intelligent to me, but I can't come up with a better solution or idea.
Any suggestions or rule of thumbs for a MEAN newb?
Thanks in advance!
EDIT: Answer to a comment question which was needed.
No, the database is pretty static (unless the user willingly inserts a new value, say a new employee that can do a service). That wouldn't happen very often. Considering the query that would return all the employees for a given service, those employees would (ideally) be inside an associative array, with the possibility to be "pop'd" from it if chosen for a service, making them unavailable for further services (because one person can't do two services at the same time). Hope I was clear, I'm surely not the best person at explaining oneself.
It would query the database on who is available when a user looks at that page and another query if the user assigns an employee to do a service.
In general 1 query on page load and another when data is submitted is standard.
You would only want to use an in memory cache for
frequent queries but most databases will do this automatically.
values that change frequently like:
How many users are connected
Last query sent
Something that happens on almost every query (>95%)
I have an Angular app pulling data from a REST server. Each item we pull has some "core" data - what's needed to display it's basic representation - and then what I call "secondary" data, comments and other things that the user might want to see and might not.
I'm trying to optimize our request pattern to minimize the overall amount of time the user spends looking at a loading spinner: Pulling all (core/secondary) data at once causes the initial request to return far too slowly, but pulling only the bare essentials until the user asks for something we haven't requested yet also creates unnecessary load time, at least inasmuch as I could've anticipated them wanting to see it and loaded it while they were busy reading the core content.
So, right now I'm doing a "core content" pull first and then initiating a "secondary" pull at the end of the success callback from the first. This is going to be an experimental process, but I'm wondering what (if any) best practices have been established in this situation. (I'm sure a good answer to that is a google away, but in this instance I'm not quite sure what to google - thus the quotation marks in this question's title)
A more concrete question: Am I better off initiating many small HTTP transactions or a few large ones? My instinct is to do many small ones, particularly if I can anticipate a few things the user is likeliest to want to see first and get those loaded as soon as possible. But surely there's an asymptote here? Or am I off-base in this line of thinking entirely?
I use the same approach as you, and it works pretty well for a many-keyed, 10,0000+ collection.
The collection is paginated with ui.bootstrap.pagination, only a maximum of 10 items are displayed at once. It can be searched on title.
So my approach is to retrieve only id and title, for the whole collection, so the search can be used straight away.
Then, as the items displayed on screen are in an array, I place a $watch on that array. The job of the $watch is to go fetch full details of the items in the array (secondary pull), but of course only when the array is changed.
So, in the worst case scenario, you are pulling the full details of only 10 items.
Results are cached for more efficiency. It displays instant results, as the $watch acts as a pre-loader.
Am I better off initiating many small HTTP transactions or a few large ones?
I believe large transactions, for just a few items (the ones which are clickable on the screen) are very efficient.
Regarding the best practice bit: I suppose there are many ways to achieve your goals; however, the technique you are using works extremely well, as it retrieves only what is needed, and only just before it is needed.
Besides, it is simple enough to implement.
Also, like you I would have thought many smaller pulls were surely better than several large ones. However, I was advised to go for a large pull as a comment to this question: Fetching subdocuments with angular $http
To answer you question about which keywords to search for, I suggest:
progressive loading
An alternative could be using websockets and streaming loading: Oboe.js does this quite well:
http://oboejs.com/examples
So, I have a main indexedDB objectstore with around 30.000 records on which I have to run full text search queries. Doing this with the ydn fts plugin this generates a second objectstore with around 300.000 records. Now, as generating this 'index' datastore takes quite long I figured it had be faster to distribute the content of the index as well. This in turn generated a zip file of around 7MB which after decompressing on the client side gives more than 40MB of data. Currently I loop over this data inserting it one by one (async, so callback time is used to parse next lines) which takes around 20 minutes. As I am doing this in the background of my application through a webworker this is not entirely unacceptable, but still feels hugely inefficient. Once it has been populated the database is actually fast enough to be even used on mid to high end mobile devices, however the population time of 20 minutes to one hour on mobile devices is just crazy. Any suggestions how this could be improved? Or is the only option minimizing the number of records? (which would mean writing my own full text search... not something I would look forward to)
Your data size is quite large for mobile browser. Unless user constantly using your app, it is not worth sending all data to client. You should use server side for full text search, while catching opportunistically as illustrated by this example app. In this way, user don't have to wait for full text search indexing.
Full-text search require to index all tokens (words) except some stemming words. Stemming is activated only when lang is set to en. You should profile your app which parts is taking time. I guess browser is taking most of the time, in that case, you cannot do much optimization other than parallelization. Sending indexed data (as you described above) will not help much (but please confirm by comparing). Also Web worker will not help. I assume your app have no problem with slow respond due to indexing.
Do you have other complaint other than slow indexing time?
So I have an umbraco site with a number of products in it that is content managed, I need to search/filter this dataset on the front end based on 5 criteria.
I'd estimate I will have 300 products. I need to filter this data very fast and hide show options that are no longer relevant based on the previous selections.
I'm currently building a webservice and jquery implementation using AJAX.
Is the best way to do this to load it into a javascript data structure and operate on it there or will AJAX calls be fast enough? Obviously this will mean duplicating the functionality on the server side for non-javascript users.
If you need to filter the data "very fast" then I imagine the best way is to preload all the data then manipulate it client side. If you're waiting for an Ajax response every time the user needs to filter the data then it's not going to be as fast as filtering it on the client (assuming they haven't got an ancient computer running IE6).
It would depend on the complexity of your filtering. If all your doing is showing results where, for example, the product's price is greater than $10, then that will definitely be much faster. If you're going to be doing complex searches then it's possible that it could be faster to process serverside. The other question is how much data is saved for each product - preloading a few hundred products with a lot of data may take some time.
As always, the only way you'll truly be able to answer this question is by profiling the two solutions.
This is a javascript security question: suppose a page finds out the screen resolution of the computer, such as 1024 x 768, and want to use an AJAX call to log this data into the DB.
Is there a way to actually prevent fake data from being entered into the DB? I think whatever the HTML or Javascript does, the user can reverse engineer the code so that some fake numbers get entered into the DB, or is there a way prevent it from happening totally? (100% secure).
Update: or in a similar situation... if i write a simple javascript game... is there a way for the user to send back the score by AJAX and lie about their score?
If you start with the assumption that the user you are communicating with is malicious, then no; there is nothing you can do to control what data they pass you. Certainly not with 100% certainty - in the worst case, they can use network tools to rewrite or replace any "correct" content with whatever they want.
If you just want to prevent casual maliciousness, you could obfuscate or encrypt your code and/or data. This will not deter a determined attacker.
If you actually trust the real user, but suspect that others might try to impersonate them, you can use other techniques like a dynamic canary: send the user a random number, and if they return that same number to you, you know that it really came from them. (Or you're being hit by a man-in-the-middle attack, but hey; that's what SSL is for.)
It's not possible to stop users from sending any numbers they like back from JavaScript.
I think the best you could do is do some sort of check on the server-side to make sure the numbers sent back look like a realistic resolution.
I'm not sure why someone would spend the time to spoof those numbers in the first place though.
Yes, you are correct. Since you're using client-side code, you have to tell the
user's computer (and thus the user) in one way or another, whatever encryption or obfuscation you're using. There's no way around it.
For the resolution, it would basically be impossible to determine if it's valid resolution. My resolution is usually sent to the server as 5120 x 1600, which seems pretty unrealistic, but it's because the 2 screens are often sent as 1. Otherwise, there is a such a huge variety of possibilities in screen resolutions and screen configurations, you'd probably remove a lot of valid ones, although they might be few.
For the game score, you could do additional checks that make it more complicated to check. Things like sending multiple notices of the score throughout the game and requiring X number to ensure that the score received is valid. (IE, must receive one between 200-300, 400-500, 700-800 and then the final score of 1000.) With the final score, you could also have some kind of encrypted value that can only be used once or that contains some data with a CRC on it. Basically, in the end, require receiving other data than just the score, especially for higher scores.
To attempt an answer by elaborating on comments made by Dok, and yourself, there is a clear distinction between manipulating an application to 'cheat' it out of something, whether it be an online business to get something cheaper or a MMPORG to get more experience, than manipulating it in such a way that it renders the interface incorrectly and diminishes the overall user experience for that particular (hacker?) user.
Your time would be better spent focusing on other aspects of your site. I don't recommend the users of my site manipulate the HTML to make it look funny on their machines, but I'm not going to go all out and obfuscate my server output to stop them from hurting themselves. In your case, range checking against pre-defined safe values, making use of the DB, to ensure the user is viewing with an 'allowed' resolution puts unnecessary burden on your application, and takes time to do.