Imitate high CPU and memory usage in NodeJS

Imitate high CPU and memory usage in NodeJS - javascript

I want to check the autoscaling behavior of our NestJS/NodeJS application inside a cluster. I want to generate CPU and/or memory usage >70% based on requests count per second.
I have tried to accumulate multiplications of random numbers on every request for 1 second but seems like requests are processed one by one and never generate too much load.
What would be your suggestion?

If you are trying to basically drain the computer using nodejs, the following works for me:
Generate way too many random numbers and multiply them with a VERY LARGE random number
Put all of the numbers above in an array and sort it
Parse the list into integers
Filter the list to have only primes remaining
That should be enough to take up memory (as the lists aren't recycled by the GC so quickly) and the sorting and prime bits should lag out the proccess. If its too much just scale down the array size. :)

Related

What is probability of random generator repeating more than once?

Imagine we have two independent pseudo-random number generators using same algorithm but seeded differently. And we are generating numbers of same size using these generators, say 32-bit integers. Provided algorithm gives us uniform distribution, there is 1/2^32 probability (or is it?) of a collision. If a collision just happened, what is the probability the very next pair will also be a collision? It seems for me this probability might be different (higher) from that initial uniform-based collision chance. Most of currently existing pseudo-random number generators hold internal state to maintain own stability, and recently happened collision might signal those internal states are somewhat "entangled" giving modified (higher) chance of a collision to happen again.
The question is probably too broad to give any precise answer, but revealing general directions/trends could also be nice. Here are some interesting aspects:
Does size of initial collision matter? Is there a difference after a
collision of 8 consecutive bits vs 64 bits? How approximately chance
of next collision depends on size of generated sequence?
Does pattern of pair generation matter? For example, we could find
initial collision by executing first generator only once and
"searching" second generator. Or we could invoke each generator on
every iteration.
I'm particularly interested in default javascript Math.random().
32-bit integers can be generated of that like
this (for example). EDIT: As pointed in comments, conversion of
random value from [0; 1) range should be done carefully, as exponent of
such values is very likely to repeat (and it takes decent part of result
extracted this way).

What would be the best approach to normalise data for an LSTM model (using Tensorflow) with this wide range of values?

I am new to machine learning so still trying to wrap my head around concepts, please bear this in mind if my question may not be as concise as needed.
I am building a Tensorflow JS model with LSTM layers for time-series prediction (RNN).
The dataset used is pinged every few hundred milliseconds (at random intervals). However, the data produced can come in very wide ranges e.g. Majority of data received will be of value 20, 40, 45 etc. However sometimes this value will reach 75,000 at the extreme end.
So the data range is between 1 to 75,000.
When I normalise this data using a standard min/max method to produce a value between 0-1, the normalised data for the majority of data requests will be to many small significant decimal places. e.g.: '0.0038939328722009236'
So my question(s) are:
1) Is this min/max the best approach for normalising this type of data?
2) Will the RNN model work well with so many significant decimal places and precision?
3) Should I also be normalising the output label? (of which there will be 1 output)
Update
I have just discovered a very good resource on a google quick course, that delves into preparing data for ML. One technique suggested would be to 'clip' the data at the extremes. Thought I would add it here for reference: https://developers.google.com/machine-learning/data-prep

After doing a bit more research I think I have a decent solution now;
I will be performing two steps, with the first being to use 'quantile bucketing' (or sometimes called 'binning' ref: https://developers.google.com/machine-learning/data-prep/transform/bucketing).
Effectively it involves splitting the range of values down into smaller subset ranges, and applying an integer value to each smaller range of values. e.g. A initial range of 1 to 1,000,000 could be broken down into ranges of 100k. So 1 to 100,000 would be range number 1, 100,001 to 200,000 would be range number 2.
In order to have an even distribution of samples within each bucket range, due to the skewed dataset I have, I modify the subset ranges so they capture roughly the same samples in each 'bucket' range. For example, the first range of the example above, could be 1 to 1,000 instead of 1 to 100,000. The next bucket range would be 1,001 to 2,000. The third could be 2,001 to 10,000 and so on..
In my use case I ended up with 22 different bucket ranges. The next step is my own adaptation, since I don't want to have 22 different features (as seems to be suggested in the link). Instead, I apply the standard min/max scaling to these bucket ranges, resulting in the need for only 1 feature. This gives me the final result of normalised data between 0 and 1, which perfectly deals with my skewed dataset.
Now the lowest normalised value I get (other than 0) is 0.05556.
Hope this helps others.

Get the Population Standard Deviation of streaming input data

So let's say I have a sensor that's giving me a number, let's say the local temperature, or whatever really, every 1/100th of a second.
So in a second I've filled up an array with a hundred numbers.
What I want to do is, over time, create a statistical model, most likely a bell curve, of this streaming data so I can get the population standard deviation of this data.
Now on a computer with a lot of storage, this won't be an issue, but on something small like a raspberry pi, or any microprocessor, storing all the numbers generated from multiple sensors over a period of months becomes very unrealistic.
When I looked at the math of getting the standard deviation, I thought of simply storing a few numbers:
The total running sum of all the numbers so far, the count of numbers, and lastly a running sum of (each number - the current mean)^2.
Using this, whenever I would get a new number, I would simply add one to the count, add the number to the running sum, get the new mean, add the (new number - new mean)^2 to the running sum, divide that by the count and root that, to get the new standard deviation.
There are a few problems with this approach, however:
It would take 476 years to overflow the sum of numbers streaming in assuming the data type is temperature and the average temperature is 60 degrees Fahrenheit and the numbers are streamed at a 100hz.
The same level of confidence cannot be held for the sum of the (number - mean)^2 since it is a sum of squared numbers.
Most importantly, this approach is highly inaccurate since for each number a new mean is used, which completely obliterates the entire mathematical value of a standard deviation, especially a population standard deviation.
If you believe a population standard deviation is impossible to achieve, then how should I go about a sample standard deviation? Taking every nth number will still result in the same problems.
I also don't want to limit my data set to a time interval, (ie. a model for only the last 24 hours of sensor data is made) since I want my statistical model to be representative of the sensor data over a long period of time, namely a year, and if I have to wait a year to do testing and debugging or even getting a useable model, I won't be having fun.
Is there any sort of mathematical work around to get a population, or at least a sample standard deviation of an ever increasing set of numbers, without actually storing that set since that would be impossible, and still being able to accurately detect when something is multiple standard deviations away?
The closest answer I've seen is: wikipedia.org/wiki/Algorithms_for_calculating_variance#Online_algorithm, however, but I have no idea what this is saying and if this requires storage of the set of numbers.
Thank you!

Link shows code, and it is clear that you need to store only 3 variables: number of samples so far, current mean and sum of quadratic differences

Optimized Bulk (Chunk) Upload Of Objects Into IndexedDB

I want to add objects into some table in IndexedDB in one transaction:
_that.bulkSet = function(data, key) {
var transaction = _db.transaction([_tblName], "readwrite"),
store = transaction.objectStore(_tblName),
ii = 0;
_bulkKWVals.push(data);
_bulkKWKeys.push(key);
if (_bulkKWVals.length == 3000) {
insertNext();
}
function insertNext() {
if (ii < _bulkKWVals.length) {
store.add(_bulkKWVals[ii], _bulkKWKeys[ii]).onsuccess = insertNext;
++ii;
} else {
console.log(_bulkKWVals.length);
}
}
};
Looks like that it works fine, but it is not very optimized way of doing that especially if the number of objects is very high (~50.000-500.000). How could I possibly optimize it? Ideally I want to add first 3000, then remove it from the array, then add another 3000, namely in chunks. Any ideas?

Inserting that many rows consecutively, is not possible to get good performance.
I'm an IndexedDB dev and have real-world experience with IndexedDB at the scale you're talking about (writing hundreds of thousands of rows consecutively). It ain't too pretty.
In my opinion, IDB is not suitable for use when a large amount of data has to be written consecutively. If I were to architect an IndexedDB app that needed lots of data, I would figure out a way to seed it slowly over time.
The issue is writes, and the problem as I see it is that the slowness of writes, combined with their i/o intensive nature, makes gets worse over time. (Reads are always lightening fast in IDB, for what it's worth.)
To start, you'll get savings from re-using transactions. Because of that your first instinct might be to try to cram everything into the same transaction. But from what I've found in Chrome, for example, is that the browser doesn't seem to like long-running writes, perhaps because of some mechanism meant to throttle misbehaving tabs.
I'm not sure what kind of performance you're seeing, but average numbers might fool you depending on the size of your test. The limiting faster is throughput, but if you're trying to insert large amounts of data consecutively pay attention to writes over time specifically.
I happen to be working on a demo with several hundred thousand rows at my disposal, and have stats. With my visualization disabled, running pure dash on IDB, here's what I see right now in Chrome 32 on a single object store with a single non-unique index with an auto-incrementing primary key.
A much, much smaller 27k row dataset, I saw 60-70 entries/second:
* ~30 seconds: 921 entries/second on average (there's always a great burst of inserts at the start), 62/second at the moment I sampled
* ~60 seconds: 389/second average (sustained decreases starting to outweigh effect initial burst) 71/second at moment
* ~1:30: 258/second, 67/second at moment
* ~2:00 (~1/3 done): 188/second on average, 66/second at moment
Some examples with a much smaller dataset show far better performance, but similar characteristics. Ditto much larger datasets - the effects are greatly exaggerated and I've seen as little as <1 entries per second when leaving for multiple hours.

IndexedDB is actually designed to optimize for bulk operations. The problem is that the spec and certain docs does not advertice the way it works. If paying certain attention to the parts in the IndexedDB specification that defines how all the mutating operations in IDBObjectStore works (add(), put(), delete()), you'll find out that it allow callers to call them synchronously and omit listening to the success events but the last one. By omitting doing that (but still listen to onerror), you will get enormous performance gains.
This example using Dexie.js shows the possible bulk speed as it inserts 10,000 rows in 680 ms on my macbook pro (using Opera/Chromium).
Accomplished by the Table.bulkPut() method in the Dexie.js library:
db.objects.bulkPut(arrayOfObjects)

What makes this the fastest JavaScript for printing 1 to 1,000,000 (separated by spaces) in a web browser?

I was reading about output buffering in JavaScript here, and was trying to get my head around the script the author says was the fastest at printing 1 to 1,000,000 to a web page. (Scroll down to the header "The winning one million number script".) After studying it a bit, I have a few questions:
What makes this script so efficient compared to other approaches?
Why does buffering speed things up?
How do you determine the proper buffer size to use?
Does anyone here have any tricks up her/his sleeve that could optimize this script further?
(I realize this is probably CS101, but I'm one of those blasted, self-taught hackers and I was hoping to benefit from the wisdom of the collective on this one. Thanks!)

What makes this script so efficient compared to other approaches?
There are several optimizations that the author is making to this algorithm. Each of these requires a fairly deep understanding of how the are underlying mechanisms utilized (e.g. Javascript, CPU, registers, cache, video card, etc.). I think there are 2 key optimizations that he is making (the rest are just icing):
Buffering the output
Using integer math rather than string manipulation
I'll discuss buffering shortly since you ask about it explicitly. The integer math that he's utilizing has two performance benefits: integer addition is cheaper per operation than string manipulation and it uses less memory.
I don't know how JavaScript and web browsers handle the conversion of an integer to a display glyph in the browser, so there may be a penalty associated with passing an integer to document.write when compared to a string. However, he is performing (1,000,000 / 1000) document.write calls versus 1,000,000 - 1,000 integer additions. This means he is performing roughly 3 orders of magnitude more operations to form the message than he is to send it to the display. Therefore the penalty for sending an integer vs a string to document.write would have to exceed 3 orders of magnitude offset the performance advantage of manipulating integers.
Why does buffering speed things up?
The specifics of why it works vary depending on what platform, hardware, and implementation you are using. In any case, it's all about balancing your algorithm to your bottleneck inducing resources.
For instance, in the case of file I/O, buffer is helpful because it takes advantage of the fact that a rotating disk can only write a certain amount at a time. Give it too little work and it won't be using every available bit that passes under the head of the spindle as the disk rotates. Give it too much, and your application will have to wait (or be put to sleep) while the disk finishes your write - time that could be spent getting the next record ready for writing! Some of the key factors that determine ideal buffer size for file I/O include: sector size, file system chunk size, interleaving, number of heads, rotation speed, and areal density among others.
In the case of the CPU, it's all about keeping the pipeline full. If you give the CPU too little work, it will spend time spinning NO OPs while it waits for you to task it. If you give the CPU too much, you may not dispatch requests to other resources, such as the disk or the video card, which could execute in parallel. This means that later on the CPU will have to wait for these to return with nothing to do. The primary factor for buffering in the CPU is keeping everything you need (for the CPU) as close to the FPU/ALU as possible. In a typical architecture this is (in order of decreasing proximity): registers, L1 cache, L2 cache, L3 cache, RAM.
In the case of writing a million numbers to the screen, it's about drawing polygons on your screen with your video card. Think about it like this. Let's say that for each new number that is added, the video card must do 100,000,000 operations to draw the polygons on your screen. At one extreme, if put 1 number on the page at a time and then have your video card write it out and you do this for 1,000,000 numbers, the video card will have to do 10^14 operations - 100 trillion operations! At the other extreme, if you took the entire 1 million numbers and sent it to the video card all at once, it would take only 100,000,000 operations. The optimal point is some where in the middle. If you do it one a time, the CPU does a unit of work, and waits around for a long time while the GPU updates the display. If you write the entire 1M item string first, the GPU is doing nothing while the CPU churns away.
How do you determine which buffer size to use?
Unless you are targeting a very specific and well defined platform with a specific algorithm (e.g. writing packet routing for an internet routing) you typically cannot determine this mathematically. Typically, you find it empirically. Guess a value, try it, record the results, then pick another. You can make some educated guesses of where to start and what range to investigate based on the bottlenecks you are managing.
Does anyone here have any tricks up her/his sleeve that could optimize this script further?
I don't know if this would work and I have not tested it however, buffer sizes typically come in multiples of 2 since the under pinnings of computers are binary and word sizes are typically in multiples of two (but this isn't always the case!). For example, 64 bytes is more likely to be optimal than 60 bytes and 1024 is more likely to be optimal than 1000. One of the bottlenecks specific to this problem is that most browsers to date (Google Chrome being the first exception that I'm aware of) have javascript run serially within the same thread as the rest of the web page rendering mechanics. This means that the javascript does some work filling the buffer and then waits a long time until the document.write call returns. If the javascript was run as separate process, asynchronously, like in chrome, you would likely get a major speed up. This is of course attacking the source of the bottleneck not the algorithm that uses it, but sometimes that is the best option.
Not nearly as succinct as I would like it, but hopefully it's a good starting point. Buffering is an important concept for all sorts of performance issues in computing. Having an good understanding of the underlying mechanisms that your code is using (both hardware and software) is extremely useful in avoiding or addressing performance issues.

I would bet the slowest thing in printing 1m numbers is the browser redrawing the page, so the fewer times you call document.write(), the better. Of course this needs to be balanced against large string concatenations (because they involve allocating and copying).
Determining the right buffer size is found through experimentation.
In other examples, buffering helps align along natural boundaries. Here are some examples
32 bit CPUs can transfer 32 bits more efficiently.
TCP/IP packets have maximum sizes.
File I/O classes have internal buffers.
Images, like TIFFs, may be stored with their data in strips.
Aligning with the natural boundaries of other systems can often have performance benefits.

One way to think about it is to consider that a single call to document.write() is very expensive. However, building an array and joining that array into a string is not. So reducing the number of calls to document.write() effectively reduces the overall computational time needed to write the numbers.
Buffers are a way to try and tie together two different cost pieces of work.
Computing and filling arrays has a small cost for small arrays, bug large cost for large arrays.
document.write has a large constant cost regardless of the size of the write but scales less than o(n) for larger inputs.
So queuing up larger strings to write, or buffering them, speeds overall performance.
Nice find on the article by the way.

So this one has been driving me crazy cause I don't think it really is the fastest. So here is my experiment that anyone can play with. Perhaps I wrote it wrong or something, but it would appear that doing it all at once instead of using a buffer is actually a faster operation. Or at least in my experiments.
<html>
<head>
<script type="text/javascript">
function printAllNumberBuffered(n, bufferSize)
{
var startTime = new Date();
var oRuntime = document.getElementById("divRuntime");
var oNumbers = document.getElementById("divNumbers");
var i = 0;
var currentNumber;
var pass = 0;
var numArray = new Array(bufferSize);
for(currentNumber = 1; currentNumber <= n; currentNumber++)
{
numArray[i] = currentNumber;
if(currentNumber % bufferSize == 0 && currentNumber > 0)
{
oNumbers.textContent += numArray.join(' ');
i = 0;
}
else
{
i++;
}
}
if(i > 0)
{
numArray.splice(i - 1, bufferSize - 1);
oNumbers.textContent += numArray.join(' ');
}
var endTime = new Date();
oRuntime.innerHTML += "<div>Number: " + n + " Buffer Size: " + bufferSize + " Runtime: " + (endTime - startTime) + "</div>";
}
function PrintNumbers()
{
var oNumbers = document.getElementById("divNumbers");
var tbNumber = document.getElementById("tbNumber");
var tbBufferSize = document.getElementById("tbBufferSize");
var n = parseInt(tbNumber.value);
var bufferSize = parseInt(tbBufferSize.value);
oNumbers.textContent = "";
printAllNumberBuffered(n, bufferSize);
}
</script>
</head>
<body>
<table border="1">
<tr>
<td colspan="2">
<div>Number: <input id="tbNumber" type="text" />Buffer Size: <input id="tbBufferSize" type="text" /><input type="button" value="Run" onclick="PrintNumbers();" /></div>
</td>
</tr>
<tr>
<td style="vertical-align:top" width="30%">
<div id="divRuntime"></div>
</td>
<td width="90%">
<div id="divNumbers"></div>
</td>
</tr>
</table>
</body>
</html>

Develop Reference

JavaScript is the programming language of the Web.