Getting a unique parameter off of client's device - javascript

I need to create complete random numbers using a seed in JavaScript. I'm not using the built-in Math.random(), but rather something else that can take a seed and generate a random number based on that.
This solution is supposed to serve a situation in which two users log in at the same time to a website (it happens A LOT, and I'm getting a lot of users with identical IDs because of it). Math.random() isn't working for me, and I can't use timestamps because these also don't provide an accurate number (they're not being sampled every MS). I also don't want to use any ajax call in order to get an IP or something similar.
Is there anything anyone can think of that's either unique, or might be rare enough to use to create a good seed?
**EDIT: ** This is NOT a duplicate of the GUID generation question, because that one is still using Math.random(). I can't use that function anywhere in my code. I sometime have thousands of hits at more-or-less the exact same moment, and that's what screws up the random. It's also the reason why I need to find some attribute I can use as a seed.

Related

Nodejs can i save an array in my backend, and use it when i need it?

I have a small website, when you go into it it'll show you a quote.
Till today what I was doing is, when a user goes to my website a random quote that directly comes from the database will be shown (when I say directly I mean a connection was made to the database and return a quote from it) but sometimes it took some time like 1 or 2 seconds, today I did something different when my nodejs application starts I grab every quote in the database and store them inside an Array. So when someone comes to my website I'll randomly choose a quote in the Array, and it was so much faster compared to the first way of doing it and I make some changes so when I add new quote to the database the Array automatically updated.
So here is my question, is it bad to store data inside an array and serve users with it?
There will be a few different answer according to your intentions. First of all, if the dataset of quotes are a lot in quantity. I assure you it is a very bad idea but if you are talking about a few items. Well, it's acceptable. However, if you are building a scalable application, it's not much recommended because you will keep all copies of the dataset in each node etc.
If you want a very fast quote storage, I would recommend redis (a key value storage for RAM). It shares the state for each node which means your all nodes connect to redis and the quotes are kept in redis so that you do not need to keep the copies and it becomes fast. Also, if you activate the disk record option, you can use redis as your primary quote storage. In the end, you won't update these quotes too much and they won't be searched with a complex query.
Alternatively, if your database is mysql, postgre or mongodb, you can activate ram storage option so that you don't need to keep that data on your array but directly take it form db which is much more fast but also queryable.
There's the old joke: The two hard things in software engineering are naming things, caching things, and off-by-one errors.
You're caching something: your array of strings. Then you select one at random from the array each time you need one.
What is right? You get your text string from memory, and eliminate the time-delay involved in getting it from the database. It's a good optimization.
What can go wrong?
Somebody can add or remove strings from your database, which makes your cache stale.
You can have so many text strings you blow out your nodejs RAM. This seems unlikely; it's hard to imagine a list of quotes that big. The Hebrew Bible, the New Testament, and the Qur'an together comprise less than a million words. You probably won't have more text in your quotable-quotes than that. 10-20 megabytes of RAM is nothing these days.
So, what about your stale cache in RAM? What to do?
You could ignore the problem. Who cares if the cache is stale?
You could reread the cache every so often.
Your use of RAM for this is a good optimization. But, it adds a cache to your application. A cache adds complexity, and the potential for a bug. Is the optimization worth the trouble? Only you can guess the answer to that question.
And, it's MUCH MUCH better than doing SELECT ... ORDER BY RAND() LIMIT 1; every time you need something random. That is a notorious query-performance antipattern.

What is the guarantee of uniqueness of shortid?

I'm trying to include a field in Mongodb document called myId. I am using shortid. I am wondering, in case of big data, like millions of documents in a collections:
What's the guarantee that the shortid will be always unique and never ever be repeated for any other document?
What keeps a track of the generated ids?
What are the chances of the id been repeated?
What's the guarantee that the shortid will be always unique and never ever be repeated for any other document
to cut a long story short: these shortids are pretty much just "hashed" timestamps. Not unix timestamps, their own breed, but non the less not much more than timestamps.
All that bling with Random is pretty much that, just bling.
As long as all these shortids are generated on the same computer (a single thread) with the same seed, collisions are impossible.
What keeps a track of the generated ids?
A counter that gets incremented when a you request ids to fast, so that the same timestamp is hit. This counter is reset to 0 as soon as a new timestamp is reached.
There's nothing significant, that is really random in there.
What are the chances of the id been repeated?
during usage, little to non existant.
As far as I can tell, the only two things that may lead to a collision are
changing the seed for the prng (leads to a new alphabet, so that newer dates may be encoded to ids that already have been generated with a different seed; although not very likely, but possible)
generating ids on multiple threads/machines because the counter is not synced.
Summary: I'd nag about pretty much everything in that code, but even as it is, it does the job, reliable. And I've told you the limitations.
Shortid generates a random 64 bit id. This is done in multiple step, but the base of it is this pseudo-random function:
function getNextValue() {
seed = (seed * 9301 + 49297) % 233280;
return seed/(233280.0);
}
To generate the same id twice, this function has to return the same exact values in the same exact order in the same exact second. This is very rare, but can happen if they reset the timer (based on the comments, they do, but it's still rare).

Generating unique base-64 ID's in javascript

I'm making an application in which I need to generate Unique IDs.
When generating IDs the best way to avoid clashes, is it simply brute force generate-then-check'ing, or is there a way to garuantee unique generation.
I'm sure the brute force method would do well for a while however I have a feeling companies like Google aren't using this method.
The other things is how to actually generate them in node.js.
I thought of generating int's from '0 - 63' and then parsing to base-64 like this:
var map = {
0:0, 1:1 ... 16:"Q" ... 63:"/"
}; /* I used object so you can see the indexes,
this would be an array - or even a string? */
for (var i = IDLENGTH, id=""; i--;) id+=map[~~(Math.random*64)];
However this seems inneficient, especially having the map in the first place.
I saw this peice of code in another SO post
console.log(new Buffer("Hello World").toString('base64'));
> SGVsbG8gV29ybGQ=
Which seems to make more sense, but this doesn't seem to fit what I need, isn't this conversion of char-sets?
This is quite an open question, but you probably want to generate an UUID. Basically it's a random 128 bit number. If done correctly, the probability for a duplicate is extremely low. I believe for most applications it's not strictly needed to check, but you might still want to do it.
Use a well tested library instead of implementing this yourself, I recommend node-uuid. Generating good random numbers is very similar to crypto operations - it's not trivial to get right.

How can dates and random numbers be used for evil in Javascript?

The ADsafe subset of Javascript prohibits the use of certain things that are not safe for guest code to have access to, such as eval, window, this, with, and so on.
For some reason, it also prohibits the Date object and Math.random:
Date and Math.random
Access to these sources of non-determinism is restricted in order to make it easier to determine how widgets behave.
I still don't understand how using Date or Math.random will accomodate malevolence.
Can you come up with a code example where using either Date or Math.random is necessary to do something evil?
According to a slideshow posted by Douglas Crockford:
ADsafe does not allow access to Date or random
This is to allow human evaluation of ad content with confidence that
behavior will not change in the future. This is for ad quality and
contractual compliance, not for security.
I don't think anyone would consider them evil per se. However the crucial part of that quote is:
easier to determine how widgets behave
Obviously Math.random() introduces indeterminism so you can never be sure how the code would behave upon each run.
What is not obvious is that Date brings similar indeterminism. If your code is somehow dependant on current date it will (again obviously) work differently in some conditions.
I guess it's not surprising that these two methods/objects are non-functional, in other words each run may return different result irrespective to arguments.
In general there are some ways to fight with this indeterminism. Storing initial random seed to reproduce the exact same series of random numbers (not possible in JavaScript) and supplying client code with sort of TimeProvider abstraction rather than letting it create Dates everywhere.
According to their website, they don't include Date or Math.random to make it easier to determine how third party code will behave. The problem here is Math.random (using Date you can make a psuedo-random number as well)- they want to know how third party code will behave and can't know that if the third party code is allowed access to random numbers.
By themselves, Date and Math.random shouldn't pose security threats.
At a minimum they allow you to write loops that can not be shown to be non-terminating, but may run for a very long time.
The quote you exhibit seem to suggest that a certain amount of static analysis is being done (or is at least contemplated), and these features make it much harder. Mind you these restrictions aren't enough to actually prevent you from writing difficult-to-statically-analyze code.
I agree with you that it's a strange limitation.
The justification that using date or random would make difficult to predict widget behavior is of course nonsense. For example implement a simple counter, compute the sha-1 of the current number and then act depending on the result. I don't think it's any easier to predict what the widget will do in the long term compared to a random or date... short of running it forever.
The history of math has shown that trying to classify functions on how they compute their value is a path that leads nowhere... the only sensible solution is classifying them depending on the actual results (black box approach).

Network-efficient difference between two strings in Javascript

I have a web application where a client side editor is editing a really really large text which is known on the server side.
The client can make any kind of modifications to this text.
What is the most network-efficient way to transmit the result difference in a way that the server understands? Also, since this will happen on client side (Javascript), I would also like it to be 'fast' (or at least not noticeably slow)
Some scenarios:
User modifies ONE character
User modifies several sentences in random positions
User erases everything and results in a blank text.
I cannot use diff-like syntax since it's not network efficent, it checks lines, where examples 1 and 3 will produce horrible differences (especially the last one, where the result will be more than the old itself).
Anyone has experience in this matter? User operates on a really large set of data - around 3-5MB of text, and uploading the whole "new" content is a big no-no.
To be clear, I'm looking for a "protocol" of transfer, string comparison is not the issue.
I'm not very familiar with this topic but I can point you to an open source (Apache License 2.0) project which may be very useful.
It is a Diff, Match and Patch library written in several languages, including JavaScript, from a Google engineer and it is used in several online collaborative editing services.
Here are a list of resources:
The Diff, Match and Patch project
The MobWrite project (Editor implementation based on the above project)
"Differential Synchronization" (A Google Tech Talk by the engineer)
A simple approach, assuming that you know the copy on the server isn't going to change, would just be to send a list of edits (deletions and additions), with the deletions represented as a start and end index, and the additions represented as a start index and the text to insert.
If you have more than a simple diff algorithm to work with (I'm not sure exactly what you mean by "string comparison is not the issue"), you could also detect moved or copied chunks of text, and send those as the start and end index of the moved or copied piece of text, as well as the destination to insert it.
Note that you'll need to make sure to keep track of whether your indices refer to the original document, or the document as edited so far. An easy approach to avoid this problem is to always perform the edits from the end of the document towards the beginning; then earlier edits won't affect the offsets specified by later edits.
For an example of an approach like this, see the ed format that diff -e outputs. This is basically input that could be fed into the ed line-oriented text editor. If you want the absolute smallest diffs to send across you may want to do character based indexing rather than line based indexing, but the same basic approach could work.
Any edits the user's performing can be efficiently broken down into: delete from X for length Y; insert at X text "whatever". X and Y are offsets in characters from the start of the text; Y is a number of characters; "whatever" is any string of characters. You say you need no help computing the diff, but an example is here, except it's richer in its output than you need, but does identify "removals and insertions", so, just change the output part.
The exact format in which you send the data to the server can be tuned, but I don't think there's much mileage in doing that -- pending measurement, I'd start by sending the commands as D for delete or I for insert, the numbers in decimal, the inserted string in quoted form. Once you have some statistics on actual transfers being performed, you can see how much overhead is in the numbers (decimal vs binary) and quotes, but I suspect that may not be all that meaningful (if it proves to be, there are all sort of things you can try, such as giving offsets from the latest point of insertion or deletion, rather than always from the start, to make things faster).
You can sample what the user is doing every few seconds, and just send the incremental changes over those last few seconds (if any) -- this way, each packet you're sending will be small, and if the net connection or the user's computer/browser crash, the user won't have lost much work.
You could just send changes every 500ms, so, whatever changes were made in the last 500ms would be sent, but you only send data when there was a change.
In this you could then send the position of the changed word(s) and just send the entire word, but I would have the position be from the front of the text.
It won't be several sentences worth, but there may be several words involved, but, if you send them in order of change then the result should be consistent.
Because there are so many ways to do edits--even within short periods of time like 500ms--including dragging and dropping, or cutting and pasting, large sections of text around within the document or from outside it--I don't know if there's going to be something that will cover all scenarios really well. This is certainly a non-answer to your question at face value, but I would consider carefully the trouble of developing and maintaining something like this compared to changing the interface to restrict the text size and breaking existing texts into smaller pieces.
Maybe that's not possible in your situation, but if it is, I would guess it would be much less trouble in the end to dodge the issue in this way and just send full documents after an edit.

Categories

Resources