Javascript D3 Histogram: thresholds producing wrong number of bins

Javascript D3 Histogram: thresholds producing wrong number of bins - javascript

I'm in the process of creating a histogram JS script using D3, and it all seems to be working correctly... except for the number of bins.
Following is the relevant part of my code:
//Define the scales for the x and y attributes
var x = d3.scaleBand()
.range([0, width])
.padding(configProperties.barPadding);
var y = d3.scaleLinear()
.range([height,0]);
//Create the bins
var bins = d3.histogram()
.domain(d3.extent(data))
.thresholds(configProperties.binsCount)
(data);
console.log("number of bins: " + bins.length); //9
console.log("intended number of bins: " + configProperties.binsCount); //10
If I set configProperties.binsCount to 9, bins.length is still 9.
If I set configProperties.binsCount to 14, bins.length is still 9.
If I set binsCount to 15 or higher, however... bins.length outputs 23.
My understanding of how histogram.thresholds works based on the documentation is that if I give it a value, it will divide the data into that many + 1 equal segments (i.e. that many bins). However, it doesn't seem to be doing that at all. All of the example code that I could find seemed to indicate that I am using it correctly, but I can't get the number of bins that I need.
I've also tried using d3.ticks as a thresholds argument, but I encounter the same issue.
Is there something I'm missing? Does it have to do with my domain? Thanks in advance.

You are passing a count (that is, a simple number) to the thresholds function, not an array.
What you're seeing is the expected behaviour when you pass a number. According to the same docs:
If a count is specified instead of an array of thresholds, then the domain will be uniformly divided into approximately count bins;
Let's see it in this demo:
var data = d3.range(100);
const histogram = d3.histogram()
.value(d => d)
.thresholds(5);
var bins = histogram(data);
console.log("The number of bins is " + bins.length)
<script src="https://d3js.org/d3.v4.js"></script>
As you can see, count is 5 and the number of bins is also 5.
If you pass an array, however, the behaviour is what you expect: the number of bins will be array.length + 1:
Thresholds are defined as an array of values [x0, x1, …]. Any value less than x0 will be placed in the first bin; any value greater than or equal to x0 but less than x1 will be placed in the second bin; and so on. Thus, the generated histogram will have thresholds.length + 1 bins.
Here is the demo:
var data = d3.range(100);
const histogram = d3.histogram()
.value(d => d)
.thresholds([10, 30, 50, 70, 90]);
var bins = histogram(data);
console.log("The number of bins is " + bins.length)
<script src="https://d3js.org/d3.v4.js"></script>
As you can see, the array has 5 values and the number of bins is 6.
Finally, have in mind that the actual number of bins depends on the data you pass to the histogram generator. That explains the other results you're describing in your question.

I realize this is a little old, and that Gerardo explained how to do what you were asking, but he didn't actually answer the why of the question. So here's that, in case anyone else comes across this question and is curious. If you pass a number to the thresholds function, D3 finds a number of bins that is near to that number, such that the thresholds are 'nice' numbers. And it's the choosing of those 'nice' numbers that results in the number of bins being different than what you specify.
So if your data goes from 0 to 24.37, and you request 8 bins, the thresholds will not be multiples of 3.481428571428... ( = 24.37 / (8-1)). Instead D3 will pick a 'nice' maximum of 25, and the threshold will be multiples of 2.5 (to make 10 bins) or multiples of 5 (to make 5 bins). These numbers are much nicer to display on a graph, and are what a human would probably choose if they were making the histogram by hand.

Related

linear scale value between two values

I have a three values.
let a = 10;
let b = 200;
let c = 140;
I want to make the line scale chart by which straight line starting point is 10 and end point is 140.
Now I have to make the calculation by which 140 value will be lie in between the line.
Its like 0, 50 & 100. ) is starting point, 100 is end point and 50 is the central point.
So I pass the 50 percent so that it will be on the center.
I have made the UI. I just need to pass the percent value by which UI will be made.
Any suggestion for this will highly appreciable

If I understand you right the formula you need is
(c-a)/(b-a)*100
or in javascript
((c-a)/(b-a)*100).toFixed(1)+'%`

Inverted proportion

I have information that
1X = 98 N
98X = 0.01020408163265306 N
How can I calculate number N based on number X from range 1 to 98, knowing that when number X=1 is equal N=98 and when X=98 then N=0.01020408163265306
Maybe it is silly, but how to write function to calculate number N from X ?

Update: Now that you've posted more data, I'll amend my answer to say that this function does a pretty good job of approximating the data you provided:
y = 100*exp(-x/3.5)
I guessed a form that uses the exponential function. The value at zero is clear, as is the value when x becomes large. I guessed a time constant that gave a good visual fit.

You can think of it as a fractional function;
Result = N/(X squared)
Hopefully this is what you were looking for. But if you want N to be a function of X:
N = 1/X, which is an inverse function.

Generate random numbers with logarithmic distribution and custom slope

Im trying to generate random integers with logarithmic distribution. I use the following formula:
idx = Math.floor(Math.log((Math.random() * Math.pow(2.0, max)) + 1.0) / Math.log(2.0));
This works well and produces sequence like this for 1000 iterations (each number represents how many times that index was generated):
[525, 261, 119, 45, 29, 13, 5, 1, 1, 1]
Fiddle
I am now trying to adjust the slope of this distribution so it doesn't drop as quickly and produces something like:
[150, 120, 100, 80, 60, ...]
Blindly playing with coefficients didn't give me what I wanted. Any ideas how to achieve it?

You mention a logarithmic distribution, but it looks like your code is designed to generate a truncated geometric distribution instead, although it is flawed. There is more than one distribution called a logarithmic distribution and none of them are that common. Please clarify if you really do mean one of them.
You compute floor[log_2 U] where U is uniformly distributed from 1 to (2^max)+1. This has a 1/2^max chance to produce max, but you clamp that to max-1. So, you have a 1/2^max chance to produce 0, 2/2^max chance to produce 1, 4/2^max chance to produce 2, ... up to a 1/2 + 1/2^max chance to produce max-1.
Present in your code, but missing from the description in the question, is that you are flipping the computed index around with
idx = (max-idx) - 1
After this, your chance to produce 0 is 1/2 + 1/2^max, and your chance to produce a value of k is 1/2^(k+1).
I think it is a mistake to let U be uniform on [1,2^max+1]. Instead, I think you want U to be uniform on [1,2^max]. Then your chance to generate idx=k is 2^(max-k-1)/((2^max)-1).
idx = Math.floor(Math.log((Math.random()*(Math.pow(2.0, max)-1.0)) + 1.0) / Math.log(2.0));
zmii's comment that you could get a flatter distribution by replacing both 2.0s with a value closer to 1.0 is good. The reason it produced unsatisfactory results for small values of max was that you were sampling uniformly from [1,1.3^max+1] instead of [1,1.3^max]. The extra +1 made a larger difference when max was smaller and the base was smaller. Try the following:
var zmii = 1.3;
idx = Math.floor(Math.log((Math.random()*(Math.pow(zmii, max)-1.0))+1.0) / Math.log(zmii));

Linear regression to predict the y-value for the trend series

I have [x,y] pairs where x value is in Unix- time values and y in float. I am needing to find the best fit line for this series. I am using the linear regression model as in this link below:
http://dracoblue.net/dev/linear-least-squares-in-javascript/159/
I am getting the values correctly. But, Since my x-data is in unix timestamp, I get really huge values. So, has any one got any suggestions on how to tone it down? I tried using seconds instead of milliseconds, by diving the x-data by 1000. But, that just makes the difference in the final y-values very negligible and I don't see a proper trendline.
Any help would be appreciated.
Thanks,S.

Make it start at 0 : substract each occurence of a x value by what was the first x (say x0) value.
For instance, line 31 of your link :
replace x = values_x[v]; with x = values_x[v] - values_x[0];
If values_x is ordered and ascending then it should be ok

Can you subtract the first x value to the entire series so that the x start from 0?

Influence Math.random()

I'm looking for a way to influence Math.random().
I have this function to generate a number from min to max:
var rand = function(min, max) {
return Math.floor(Math.random() * (max - min + 1)) + min;
}
Is there a way to make it more likely to get a low and high number than a number in the middle?
For example; rand(0, 10) would return more of 0,1,9,10 than the rest.

Is there a way to make it more likely to get a low and high number than a number in the middle?
Yes. You want to change the distribution of the numbers generated.
http://en.wikipedia.org/wiki/Random_number_generation#Generation_from_a_probability_distribution

One simple solution would be to generate an array with say, 100 elements.
In those 100 elements represent the numbers you are interested in more frequently.
As a simple example, say you wanted number 1 and 10 to show up more frequently, you could overrepresent it in the array. ie. have number one in the array 20 times, number 10 in the array 20 times, and the rest of the numbers in there distributed evenly. Then use a random number between 0-100 as the array index. This will increase your probability of getting a 1 or a 10 versus the other numbers.

You need a distribution map. Mapping from random output [0,1] to your desired distribution outcome. like [0,.3] will yield 0, [.3,.5] will yield 1, and so on.

Sure. It's not entirely clear whether you want a smooth rolloff so (for example) 2 and 8 are returned more often than 5 or 6, but the general idea works either way.
The typical way to do this is to generate a larger range of numbers than you'll output. For example, lets start with 5 as the base line occurring with frequency N. Let's assume that you want 4 or 7 to occur at frequency 2N, 3 or 8 at frequency 3N, 2 or 9 and frequency 4N and 0 or 10 at frequency 5N.
Adding those up, we need values from 1 to 29 (or 0 to 28, or whatever) from the generator. Any of the first 5 gives an output of 0. Any of the next 4 gives and output of 1. Any of the next 3 gives an output of 2, and so on.
Of course, this doesn't change the values returned by the original generator -- it just lets us write a generator of our own that produces numbers following the distribution we've chosen.

Not really. There is a sequence of numbers that are generated based off the seed. Your random numbers come from the sequence. When you call random, you are grabbing the next element of the sequence.

Can you influence the output of Math.random in javascript (which runs client side)?
No. At least not in any feasible/practical manner.
But what you could do is to create your own random number generator that produces number in the distribution that you need.

There are probably an infinite number of ways of doing it, and you might want to think about the exact shape/curvature of the probability function.
It can be probably be done in one line, but here is a multi-line approach that uses your existing function definition (named rand, here):
var dd = rand(1,5) + rand(0,5);
var result;
if (dd > 5)
result = dd - 5;
else result = 6 - dd;

One basic result is that if U is a random variable with uniform distribution and F is the cumulative distribution you want to sample from, then Y = G(X) where G is the inverse of F has F as its cumulative distribution. This might not necessarily be the most efficient way of doing and generating random numbers from all sort of distributions is a research subfield in and of itself. But for a simple transformation it might just do the trick. Like in your case, F(x) could be 4*(x-.5)^3+.5, it seems to satisfy all constraints and is easy to invert and use as a transformation of the basic random number generator.

Develop Reference

JavaScript is the programming language of the Web.

Javascript D3 Histogram: thresholds producing wrong number of bins - javascript

Related

linear scale value between two values

Inverted proportion

Generate random numbers with logarithmic distribution and custom slope

Linear regression to predict the y-value for the trend series

Influence Math.random()

Categories

Resources