I have a large DB having > 20,000 rows. I have two tables songs and albums:
table songs contains songid, albumid, songname and table albums contains albumid, albumname
Currently when user searches for a song I give results instantly as soon as he starts typing. Just like Google Instant.
What I am using is: Everytime user types I send that query string to my backend php file and there I execute that query in my DB like this:
SELECT * FROM songs, albums WHERE songs.albumid = albums.albumid AND songs.songname LIKE '%{$query_string}%';
But it's very inefficient to use DB queries everytime and also it's not scalable as my DB is growing everyday.
Therefore I want the same feature but faster and efficient and scalable.
Also, I dont want it to be exact pattern matching, for example:
Suppose, if user types "Rihana" instead of "Rihanna" then it should be able to give the results related to Rihanna.
Thanks.
You should index table Songs songname column on first n chars, say 6, to get better performance for the query.
Trigger the query search only after n chars have been typed, say 3 (jquery autocomplete has this option, for example)
You may also consider an in-memory DB if performance is truly crucial (sounds like it is) and the amount of data will not consume too much resident memory.
Google, btw, does not use a legacy RDBMS to perform its absurdly fast searches (continually amazed...)
First of all you should find MySQL's FULLTEXT search support far faster than your current approach.
I suspect with the kind of speed you'd like from this solution and the support for searching for mis-spelled words that you'd be better off investigating some kind of more featured full text search engine. These include:
Sphinx Search
Solr
ElasticSearch
Try full text search.
Indexing requires MyISAM tables though.
If you need ACID and full text search, use PostgreSQL.
Related
I made a web page for selling items online. The website has a lot of products but will probably have multiple thousand products in the near future. The website contains a search bar and I want to create a search results page, but I am not sure what the best way of doing this is. I thought about using JavaScript to loop through the list of all the products until it finds a match. But this process is probably too slow. My question is: What is the best way to store a large list of items, and what is the best way to find matches from the list for the search query? I now that many people use SQL databases for storing lists but is that method any better than simply storing everything in a JavaScript list, and why? Also how do I find a match in the list? Can I use JavaScript or is it necessary or better to use a language like PHP?
I'm trying to come with the most practical search possible for an auto suggest search bar like the one on instagrams (it's able to match regex of strangers). So let's say I have a million user with their own unique handle. I want to be able to do the suggest search as the user is typing on the front-end. For ex if the current strings of the user type field are 'dav' its will match any unique handles in the entire Mongodb for that regex. I know mongo does support find field with regex but how do I limit the maximum of results for efficiency. I cannot find an answer on this for Mongoose. Say the user is search for 'dave' and there's like 3k results that match 'dave' but it will stop at search at the first 10 and return those result. So the user gonna have to be more specific with the unique handle. Thank you.
You can do that using limit() (and additionally skip() if you want to use pagination)
db.users.find(...).limit(10)
This will return the first 10 results.
When looking at products like DnD Insider and the Kindle app, users can quickly search for matching text strings in a large structure of text data. If I were to make a web application that allowed users to quickly search a "rulebook" (or similar text) for a matching entry and pull up the data to read, how should I organize the data?
I don't think it's a good idea to put all the data into memory. But if I stored it in some kind of database, what would be a good way to search the database and retrieve the appropriate matching entry?
So far, I believe I'm going to use the Boyer-Moore algorithm to actually do the searching. I can put the various sections of rule-text into different database entries. The user search will prioritize searching section titles over section body text. Since the text will be static and not user-editable, perhaps an array to store every word would work?
Typically some kind of inverted index is used for this purpose: https://en.wikipedia.org/wiki/Inverted_index
Basically this is a map from each word to a list of the places in which it appears. Each "place" could be a (document ID, occurrence count), or something more precise if you want to support phrase searching or if you want to give more weight to matches in titles, etc.
Search results are usually ranked with some variant of tf-idf: https://en.wikipedia.org/wiki/Tf%E2%80%93idf
I'm looking for suggestions on how to go about handling the following use case scenario with python django framework, i'm also open to using javascript libraries/ajax.
I'm working with pre-existing table/model called revenue_code with over 600 million rows of data.
The user will need to search three fields within one search (code, description, room) and be able to select multiple search results similar to kendo controls multi select. I first started off by combining the codes in django-filters as shown below, but my application became unresponsive, after waiting 10-15 minutes i was able to view the search results but couldn't select anything.
https://simpleisbetterthancomplex.com/tutorial/2016/11/28/how-to-filter-querysets-dynamically.html
I've also tried to use kendo controls, select2, and chosen because i need the user to be able to select as many rev codes as they need upward to 10-20, but all gave the same unresponsive page when it attempted to load the data into the control/multi-select.
Essentially what I'm looking for is something like this below, which allows the user to select multiple selections and will handle a massive amount of data without becoming unresponsive? Ideally i'd like to be able to query my search without displaying all the data.
https://petercuret.com/add-ajax-to-django-without-writing-javascript/
Is Django framework meant to handle this type of volume. Would it be better to export this data into a file and read the file? I'm not looking for code, just some pointers on how to handle this use case.
What the basic mechanism of "searching 600 millions"? Basically how database do that is to build an index, before search-time, and sufficiently general enough for different types of query, and then at search time you just search on the index - which is much smaller (to put into memory) and faster. But no matter what, "searching" by its nature, have no "pagination" concept - and if 600 millions record cannot go into memory at the same time, then multiple swapping out and in of parts of the 600 millions records is needed - the more parts then the slower the operation. These are hidden behind the algorithms in databases like MySQL etc.
There are very compact representation like bitmap index which can allow you to search on data like male/female very fast, or any data where you can use one bit per piece of information.
So whether Django or not, does not really matters. What matters is the tuning of database, the design of tables to facilitate the queries (types of indices), and the total amount of memory at server end to keep the data in memory.
Check this out:
https://dba.stackexchange.com/questions/20335/can-mysql-reasonably-perform-queries-on-billions-of-rows
https://serverfault.com/questions/168247/mysql-working-with-192-trillion-records-yes-192-trillion
How many rows are 'too many' for a MySQL table?
You can't load all the data into your page at once. 600 million records is too many.
Since you mentioned select2, have a look at their example with pagination.
The trick is to limit your SQL results to maybe 100 or so at a time. When the user scrolls to the bottom of the list, it can automatically load in more.
Send the search query to the server, and do the filtering in SQL (or NoSQL or whatever you use). Database engines are built for that. Don't try filtering/sorting in JS with that many records.
Are there any best practices for returning large lists of orders to users?
Let me try to outline the problem we are trying to solve. We have a list of customers that have 1-5,000+ orders associated to each. We pull these orders directly from the database and present them to the user is a paginated grid. The view we have is a very simple "select columns from orders" which worked fine when we were first starting but as we are growing, it's causing performance/contention problems. Seems like there are a million and one ways to skin this cat (return only a page worth of data, only return the last 6 months of data, etc.) but like I said before just wondering if there are any resources out there that provide a little more hand holding on how to solve this problem.
We use SQL Server as our transaction database and select the data out in XML format. We then use a mixture of XSLT and Javascript to create our grid. We aren't married to the presentation solution but are married to the database solution.
My experience.
Always set default values in the UI for the user that are reasonable. You don't want them clicking "Retrieve" and getting everything.
Set a limit to the number of records that can be returned.
Only return from the database the records you are going to display.
If forward/backward consistencency is important, store the entire results set from the query in a temp table and return just the page you need to display. When paging up/down retrieve the next set from the temp table.
Make sure your indexs are covering your queries.
Use different queries for different purposes. Think "Open Orders" vs "Closed Orders". These might perfrom much better as different queries instead of one generic query.
Set parameter defualts in the stored procedures. Protect your query from a UI that is not setting reasonable limits.
I wish we did all these things.
I'd recommend doing some profiling to find the actual bottlenecks. Perhaps you have access to Visual Studio Profiler? http://msdn.microsoft.com/en-us/magazine/cc337887.aspx There are plenty of good profilers out there.
Otherwise, my first stop would be pagination to bring back less records from the db, which is easier on the connection and the memory footprint. Take a look at this (I'm assuming you're on SQL Server >= 2005)
http://www.15seconds.com/issue/070628.htm
I"m not sure from the question exactly what UI problem you are trying to solve.
If it's that the customer can't work with a table that is just one big amorphous blob, then let him sort on the fields: order date, order number, your SKU number, his SKU number maybe, and I guess others,too. He might find it handy to do a multi-column stable sort, too.
If it's that the table headers scroll up and disappears when he scrolls down through his orders, that's more difficult. Read the SO discussion to see if the method there gives a solution you can use.
There is also a JQuery mechanism for keeping the header within the viewport.
HTH
EDIT: plus I'll second #Iain 's answer: do some profiling.
Another EDIT: #Scott Bruns 's answer reminded me that when we started designing the UI, the biggest issue by far was limiting the number of records the user had to look at. So, yes I agree with Scott that you should give the user some way to see only a limited number of records right from the start; that is, before he ever sees a table, he has told you a lot about what he wants to see.
Stupid question, but have you asked the users of your application for input on what records that they would like to see initially?