Algorithm to mine millions of records [closed]

Algorithm to mine millions of records [closed] - javascript

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I have more than a million chat records of data in the format of
chat_message
city
timestamp
Now, we need to check for keywords related to travel like "travel" or "accomodation" or "hotels" etc. Let us say we have gathered around 15 keywords related to travel.
Requirement is to mine the chat message related to travel using the keywords. how?
Solution I can think of - Have an array of travel related keywords. Then scan through all the messages for each keyword(some string matching algo).
I think the solution is pretty brute force, any more ideas on a more efficient algo to search, or set up of the chat-records or/and keywords?

You mileage may vary.
If your host language is JavaScript, I recommend you to use some full-text search engine, such as lunrjs.It requires pre-processing your raw data, for example, tokenization, stemming and indexing. And then you can search data more conveniently.
Still, your data set is quite large, at least for browsers(since you are using JavaScript). If you are going to implement this on client side, many details other than algorithm need to be taken into consideration. Memory allocation, data transferring, not to list.
However, if you are on server side, more mature solutions like ElasticSearch worth your consideration.

Related

Clean Unique ID's [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
Problem summary:
We are developing an App and want to give our users an easily memorable ID to share items in a fast fashion.
Problem detailed:
We are currently uniquily identifying items by the automatically generated ID's from MongoDB. Unique ID's in MongoDB are long and hard to remember. We would like to give our users easily memorable ID's to easily share items with other users.
For example an element in the App needs to be worked on, telling a colleague to go to ourapp.com/a7wefweg43tr is a pain in the ass, what we would like is to in addition to the unique id to technically identify the element within the app a more human readable / memorable unique id for sharability. (Very similar to what Jira is offering (AA-0001, etc.)
Are there best practices on how to implement this / any JS libraries that would do the job ?

If you can have a central service that you would call out to to generate unique IDs, do that and in this service implement any pattern for ID generation you like. MongoDB IDs are designed to be generated by non-communicating clients in a manner that is as collision-free as possible.

human-readable and technical unique are on the opposite end of the same road ... why not let the creator of the ticket type in a human-designed short-tag, check if already existing and provide a slightly extended version if already used. like account-creation in online-games.
if no context between item and id is needed ... put in a dictionary of common-words, create a new random tripple of these words, check if already existing, repeat if already used. easy to memorize and tell by phone. like the military spelling alpha-zulu-tango

Good way of searching users? I have an idea but am unsure [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
So this is all more theoretical since I am sorta planning ahead here but I have a generic Parse User class that I am trying to impliment a sort of search function so one user could search for his friends user.
My initial plan was to use the Parse Query contains method to find all the users who contain 'xxx' characters in their name. However.. I noticed that parse noted this would be slow with a large database.. ideally my hope for the app would be to have thousands of users. I Know that can sound a bit ambitious but that is what I am thinking.
Is parse just not the right platform for this?
I had thought about downloading all the user objects and then using local code to filter through them quickly but that surely couldn't be faster.
Would love to hear your guys thoughts!

Your idea could work, but it could be slow. You can do some things to make it a bit faster, but contains is always going to be slow for the server to do. If you can change to 'begins with' instead of contains that will be faster. Exact matches should be faster again.
If you limit the search results that would help. So, as the user types, don't make any request to the server until 3 characters have been typed, and set the query limit to 5 results. Ideally also add a timer so if the user types a fourth character within 1 second the request isn't made. If the request is made and another character is typed cancel the current request before making a new one.
As more characters are typed in you could extend the limit to get more results.
Definitely don't download everything and search locally. Your users should also only really be accessible to cloud code because it can use the master key (your users should have an ACL which denies access to other users).

Performance: Object array vs database for 100s of records? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I have non-sensitive data in an object array:
data = [ { prop1: value, prop2: value}, ... ]
which contains about 200 objects and will probably grow to the low thousands. Reads are much more than writes, and I am using some Javascript to sort/extract from the array as needed. While I'd appreciate to automate sorting and filtering of data, using a database even for a few thousand records seems a bit of an overkill.
At what size would an object array begin to severely impact performance, and when would a database make more sense?
EDIT: My dilemma is, can I safely load the entire array client-side and let the browser do the heavy-lifting, saving me the trouble of managing a database for a simple set of data and operations?

There are obvious physical constraints on the length of an Array, because of the indexing constraints of 32 bit hardware systems (as far as I'm aware the JavaScript specification doesn't currently encourage utilising 64 bit architecture when working with Arrays).
According to this answer:
the longest possible array could have 232 - 1 = 4,294,967,295 = 4.29 billion elements
That's obviously a lot of records.
The burning question is: will it impact the user if I use large arrays? It clearly depends on the hardware running the code. And if you can't be sure on the hardware (which you can't when serving JavaScript to browsers on the web), you should assume that the technology is the lowest common denominator (in a derogatory sense, not a mathematical sense).
That means researching your demographic, looking into consumer statistics, browser statistics, testing on various architectures/setups, and then still worrying about the question after it's all finished...
Or.
Use a data-store that does one thing and one thing well.
I think, deep down, you know what you need to do. It's an interesting question, but a red-herring in terms of achieving peace-of-mind.

technical considerations for including an extraneous javascript library in a Rails project [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I recently included Handlebars.js in a rails project, and a coworker balked at the notion. What are the realistic technical considerations when including an extra javascript library into a rails project?
Does the addition of an extraneous library significantly slow down the site delivery and user experience? Is this an example of engineering drama?
Has this been measured?

Adding additional libraries slows down the site delivery by several hundred milliseconds. It also requires some client time to parse and run its onload()-type functionality. From a human standpoint, it requires a bit of time to get used to using the new library. Depending on the level of complexity, usefulness, and time-saving of the library, this may be an acceptable tradeoff.
Handlebars is a great tool for templating, but you really need everybody on your team to be on board to use it. It's not very nice to simply introduce a brand new way of doing things without really discussing things. Handlebars is a big enough change to warrant at least a discussion, if not a vote.
If you were just wanting to put it there to see if it would work in the future, or maybe just convert over a page or two, then you should do that in a separate branch and do a quick prototype and demo for the team.
Depending on whether there is a valid business case and legitimate usefulness, you and the team can decide whether to convert your application to use it.

JavaScript distributed computing project [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I made a website that does absolutely nothing, and I've proven to myself that people like to stay there - I've already logged 11+ hours worth of cumulative time on the page.
My question is whether it would be possible (or practical) to use the website as a distributed computing site.
My first impulse was to find out if there were any JavaScript distributed computing projects already active, so that I could put a piece of code on the page and be done. Unfortunately, all I could find was a big list of websites that thought it might be a cool idea.
I'm thinking that I might want to start with something like integer factorization - in this case, RSA numbers. It would be easy for the server to check if an answer was correct (simply test for modulus equals zero), and also easy to implement.
Is my idea feasible? Is there already a project out there that I can use?

Take a look at http://www.igvita.com/2009/03/03/collaborative-map-reduce-in-the-browser/ and http://www.igvita.com/2009/03/07/collaborative-swarm-computing-notes/

Develop Reference

JavaScript is the programming language of the Web.

Algorithm to mine millions of records [closed] - javascript

Related

Clean Unique ID's [closed]

Good way of searching users? I have an idea but am unsure [closed]

Performance: Object array vs database for 100s of records? [closed]

technical considerations for including an extraneous javascript library in a Rails project [closed]

JavaScript distributed computing project [closed]

Categories

Resources