Retrying failed attempts with NodeJS/ES6 - javascript

Foreword: I am newer to Javascript after coming from a C++ background.
I am writing a NodeJS app using a public npm library to request a few sets of data. The data source rate limits requests, and in a few extreme cases these rate limits are hit. When these limits are hit, the API will only return "Rate Limit Exceeded" for a few seconds before processing more requests.
When I receive the data, I try to parse it using .map(). However, when the rate limit exceeds, the whole app comes crashing down because map() is only available for arrays, and the "Rate Limit Exceeded" message is just a simple object.
if (message.message === 'Rate limit exceeded') { //This check doesn't work btw
console.log('There was a problem parsing data from the server: ' + err);
return;
}
var items = data.map(item => ({
time: new Date(item[0] * 1000),
low: item[1],
high: item[2],
open: item[3],
close: item[4],
volume: Number(item[5])
}));
for(var i = 0; i < items.length; i++)
dataStore.push(items[i]);
I want to approach this by detecting the "Rate Limit Exceeded" message, waiting a few seconds, and then retrying.
From my current understanding, setTimeout() would be a good candidate for this, but I do not understand how to get it to work recursively. Essentially, I would like it to re-request data every five seconds until the data is correctly processed.
TL;DR: I want a function to recursively call itself with setTimeout() until it properly receives data; or if there is a better way to achieve this, I am all ears.

What you can do here is use setInterval() and keep on checking every n seconds. Once you get the data, you can exit out the setInterval() using clearInterval().
Note that am using jQuery here in this context as am not sure if you are using any particular NPM package to request the data.
let retryAfter = 10000; //10 seconds
setInterval(function fetchData() {
$.get('//api.jsonbin.io/b/5a3823a38aaf400a97709c43', (data) => {
// Keep on retrying
console.log(data);
// If you get the data, just exit the setInterval
if(data) {
clearInterval(fetchData);
}
});
}(), retryAfter);
//() brackets here is to execute the function without the first 10 sec delay
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
As of now, in this example, my setInterval() will run just once as it gets the data at first. But in your example, it will run until you receive the data.
Also, I can't suggest how you are supposed to compare the error message of yours as am not sure the API which is returning you the error message is in the form of a JSON or plain text. It depends on how you can compare the error message based on the response data type.

You could just do
if(Array.isArray(data) === false) return;
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/isArray

Related

Bulk Upsert Javascript stored procedure always exceeds execution cap of 5 seconds and results in a timeout

I'm currently running a script in python SDK which programmatically bulk upserts 1.5 million documents into a collection in azure cosmos db. I've been using the bulk import sproc from the samples provided in the github repo: https://github.com/Azure/azure-cosmosdb-js-server/tree/master/samples/stored-procedures, the only change being that I've swapped collection.createDocument with collection.upsertDocument. I'll include my sproc in full below.
The stored procedure does run successfully - it upserts documents consistently and relatively quickly. Although this will be the case only up until around 30% progress when this error will be thrown:
CosmosHttpResponseError: (RequestTimeout) Message: {"Errors":["The requested operation exceeded maximum alloted time. Learn more: https://aka.ms/cosmosdb-tsg-service-request-timeout"]}
ActivityId: 9f2357c6-918c-4b67-ba20-569034bfde6f, Request URI: /apps/4a997bdb-7123-485a-9808-f952db2b7e52/services/a7c137c6-96b8-4b53-a20c-b9577981b353/partitions/305a8287-11d1-43f8-be1f-983bd4c4a63d/replicas/132488328092882514p/, RequestStats:
RequestStartTime: 2020-11-03T23:43:59.9158203Z, RequestEndTime: 2020-11-03T23:44:05.3858559Z, Number of regions attempted:1
ResponseTime: 2020-11-03T23:44:05.3858559Z, StoreResult: StorePhysicalAddress: rntbd://cdb-ms-prod-centralus1-fd22.documents.azure.com:14354/apps/4a997bdb-7123-485a-9808-f952db2b7e52/services/a7c137c6-96b8-4b53-a20c-b9577981b353/partitions/305a8287-11d1-43f8-be1f-983bd4c4a63d/replicas/132488328092882514p/, LSN: -1, GlobalCommittedLsn: -1, PartitionKeyRangeId: , IsValid: False, StatusCode: 408, SubStatusCode: 0, RequestCharge: 0, ItemLSN: -1, SessionToken: , UsingLocalLSN: False, TransportException: null, ResourceType: StoredProcedure, OperationType: ExecuteJavaScript, SDK: Microsoft.Azure.Documents.Common/2.11.0
Is there a way to add some retry logic or to extend the timeout period for bulk upserts? I believe the section of code in the sproc below if (!isAccepted) getContext().getResponse().setBody(count); is supposed to help with this scenario but it doesn't seem to work in my case.
Bulk upsert stored procedure in Javascript:
function bulkUpsert(docs) {
var collection = getContext().getCollection();
var collectionLink = collection.getSelfLink();
// The count of imported docs, also used as current doc index.
var count = 0;
// Validate input.
if (!docs) throw new Error("The array is undefined or null.");
var docsLength = docs.length;
if (docsLength == 0) {
getContext().getResponse().setBody(0);
return;
}
// Call the CRUD API to create a document.
tryCreate(docs[count], callback);
// Note that there are 2 exit conditions:
// 1) The upsertDocument request was not accepted.
// In this case the callback will not be called, we just call setBody and we are done.
// 2) The callback was called docs.length times.
// In this case all documents were created and we don't need to call tryCreate anymore. Just call setBody and we are done.
function tryCreate(doc, callback) {
var isAccepted = collection.upsertDocument(collectionLink, doc, callback);
// If the request was accepted, callback will be called.
// Otherwise report current count back to the client,
// which will call the script again with remaining set of docs.
// This condition will happen when this stored procedure has been running too long
// and is about to get cancelled by the server. This will allow the calling client
// to resume this batch from the point we got to before isAccepted was set to false
if (!isAccepted) {
getContext().getResponse().setBody(count);
}
}
// This is called when collection.upsertDocument is done and the document has been persisted.
function callback(err, doc, options) {
if (err) throw err;
// One more document has been inserted, increment the count.
count++;
if (count >= docsLength) {
// If we have created all documents, we are done. Just set the response.
getContext().getResponse().setBody(count);
} else {
// Create next document.
tryCreate(docs[count], callback);
}
}
}
I think that the problem may lie in the stored procedure rather than the python script, if this isn't the case though I can provide my python script. Any help on this would be massively appreciated, it's been a head scratcher for me for days now!
Extra Info:
Throughput = 10,000, partition upsert size ~ 1.9MB consistently.
If anyone else has this problem, the workaround I've used is to increase the throughput to 100,000 instead of 10,000 temporarily whilst the bulk upsert operation is underway. The error doesn't occur if you use that bulk upsert stored procedure in conjunction with a sufficiently high throughput. I think the timeout was happening frequently once the bulk upsert operation had upserted around 30% of the 1.5 million records, likely because the throughput wasn't divided sufficiently between partitions and it was causing a bottleneck. I may have to again assign a greater throughput to my container once it is used in practice or maybe I'll be able to reduce it to save costs. Either way the code to do this is quite simple with just the method below:
new_throughput = 10000; container.replace_throughput(new_throughput)
Stored procedures have a bounded execution time of 5 seconds. However you can write your stored procedure to handle bounded execution by checking a boolean return value and then use the count of items inserted in each invocation of the stored procedure to track and resume progress across batches. There is an example here.

nodeJS wait for event that cannot be promisified

I'm trying to read an STDIN PIPE from my nodejs file and make a POST request to an URL with every line given fom STDIN then wait for the response, read next line, send, wait etc.
'use strict';
const http = require('http');
const rl = require('readline').createInterface(process.stdin,null);
rl.on('line', function (line) {
makeRequest(line); // I need to wait calling the next callback untill the previous finishes
}).on('close',function(){
process.exit(0);
});
the problem is, rl.on('line') will instantly read thousands of lines from my pipe and launch thousands of requests instantly what will lead into an EMFILE exception. I know this is the expected behavior of non-blocking IO but in this case, one cannot use promises/futures because .on('line') is a callback itself and I cannot manipulate it to not trigger without loosing data from my input.
So, if callbacks cannot be used and timeout hacks aren't elegant enough how can one break out of the curse of nonblockIO?
Keep a counter of active requests (increment on send, decrement on response). Once the counter exceeds a constant (say, 200), (check on every 'line' event) call rl.pause(). On every response, check if the counter is smaller than your constant, and if it is, call rl.resume(). This should limit the rate of requests and current lines in memory, and fix your problem.
Node's readline class has pause and resume functions that defer to the underlying stream equivalents. These functions are specifically made for throttling parts of a pipeline to assist with bottlenecks. See the following example from the stream.Readable.pause documentation:
var readable = getReadableStreamSomehow();
readable.on('data', (chunk) => {
console.log('got %d bytes of data', chunk.length);
readable.pause();
console.log('there will be no more data for 1 second');
setTimeout(() => {
console.log('now data will start flowing again');
readable.resume();
}, 1000);
});
That gives you fine grained control over how much data flows into your URL fetching code.

IndexedDB and large amount of inserts on Angular app

I'm struggling with amounts of 20-50k JSON object response from server which I should insert into our indexeddb datastore.
Response is repeated with foreach and every single row is added with each. Calls with response less than 10k rows are working fine and inserted within a minute or so. But when the amounts get larger, the database goes unresponsive after a while and returns this error message
"db Error err=transaction aborted for unknown reason"
I'm using a Dexie wrapper for the database and an angular wrapper for dexie called ngDexie.
var deferred = $q.defer();
var progress = 0;
// make the call
$http({
method: 'GET',
headers: headers,
url: '/Program.API/api/items/getitems/' + user
}).success(function (response) {
// parse response
var items = angular.fromJson(response);
// loop each item
angular.forEach(items, function (item) {
// insert into db
ngDexie.put('stuff', item).then(function () {
progress++;
$ionicLoading.show({
content: 'Loading',
animation: 'fade-in',
template: 'Inserting items to db: ' + progress
+ '/' + items.length,
showBackdrop: true,
maxWidth: 200,
showDelay: 0
});
if (progress == items.length) {
setTimeout(function () {
$ionicLoading.hide();
}, 500);
deferred.resolve(items);
}
});
});
}).error(function (error) {
$log('something went wrong');
$ionicLoading.hide();
});
return deferred.promise;
Do I have the wrong approach with dealing with the whole data in one chunk? Could there be better alternatives? This whole procedure is only done once when the user opens up the site. All help is greatly appreciated. The target device is tablets running Android with Chrome.
Since you are getting a unknown error, there is something going wrong with I/O. My guess is the db underneath has troubles handling the amout of data. May try to split up in batches with a maximum of 10k each.
A transaction can fail for reasons not tied to a particular IDBRequest. For example due to IO errors when committing the transaction, or due to running into a quota limit where the implementation can't tie exceeding the quota to a partcular request. In this case the implementation MUST run the steps for aborting a transaction using the transaction as transaction and the appropriate error type as error. For example if quota was exceeded then QuotaExceededError should be used as error, and if an IO error happened, UnknownError should be used as error.
you can find this in the specs
An other possibility, do you have any indexes defined on the objectstore? Because for every index you have, that index needs to be maintained with every insert.
If you insert many new records i would suggest using add. This was added for performance reasons. See the documentation here:
https://github.com/FlussoBV/NgDexie/wiki/ngDexie.add
I had problems with massive bulk insert (100.000 - 200.000 records). I've solved all my IndexedDB performance problems using bulkPut() from Dexie library. It has this important feature:
Dexie has a kick-ass performance. It's bulk methods take advantage of
a not well known feature in indexedDB that makes it possible to store
stuff without listening to every onsuccess event. This speeds up the
performance to a maximum.
Dexie: https://github.com/dfahlander/Dexie.js
BulkPut() -> http://dexie.org/docs/Table/Table.bulkPut()

Nodejs Mongoose - Serve clients a single query result

I'm looking to implement a solution where I can query the Mongoose Database on a regular interval and then store the results to serve to my clients.
I'm assuming this will reduce my response time when my users pull the collection.
I attempted to implement this plan by creating an empty global object and then writing a function that queries the db and then stores the results as the global object mentioned previously. At the end of the function I setTimeout for 60 seconds and then ran the function again. I call this function the first time the server controller gets called when the app is first run.
I then set my clients up so that when they requested the collection, it would first look to see if the global object exists, and if so return that as the response. I figured this would cut my 7-10 second queries down to < 1 sec.
In my novice thinking I assumed that Nodejs being 'single-threaded' something like this could work quite well - but it just seemed to eat up all my RAM and cause fatal errors.
Am I on the right track with my thinking or is it better to query the db every time people pull the collection?
Here is the code in question:
var allLeads = {};
var getAllLeads = function(){
allLeads = {};
console.log('Getting All Leads...');
Lead.find().sort('-lastCalled').exec(function(err, leads) {
if (err) {
console.log('Error getting leads');
} else {
allLeads = leads;
}
});
setTimeout(function(){
getAllLeads();
}, 60000);
};
getAllLeads();
Thanks in advance for your assistance.

Strange issue with socket.on method

I am facing a strange issue with calling socket.on methods from the Javascript client. Consider below code:
for(var i=0;i<2;i++) {
var socket = io.connect('http://localhost:5000/');
socket.emit('getLoad');
socket.on('cpuUsage',function(data) {
document.write(data);
});
}
Here basically I am calling a cpuUsage event which is emitted by socket server, but for each iteration I am getting the same value. This is the output:
0.03549148310035006
0.03549148310035006
0.03549148310035006
0.03549148310035006
Edit: Server side code, basically I am using node-usage library to calculate CPU usage:
socket.on('getLoad', function (data) {
usage.lookup(pid, function(err, result) {
cpuUsage = result.cpu;
memUsage = result.memory;
console.log("Cpu Usage1: " + cpuUsage);
console.log("Cpu Usage2: " + memUsage);
/*socket.emit('cpuUsage',result.cpu);
socket.emit('memUsage',result.memory);*/
socket.emit('cpuUsage',cpuUsage);
socket.emit('memUsage',memUsage);
});
});
Where as in the server side, I am getting different values for each emit and socket.on. I am very much feeling strange why this is happening. I tried setting data = null after each socket.on call, but still it prints the same value. I don't know what phrase to search, so I posted. Can anyone please guide me?
Please note: I am basically Java developer and have a less experience in Javascript side.
You are making the assumption that when you use .emit(), a subsequent .on() will wait for a reply, but that's not how socket.io works.
Your code basically does this:
it emits two getLoad messages directly after each other (which is probably why the returning value is the same);
it installs two handlers for a returning cpuUsage message being sent by the server;
This also means that each time you run your loop, you're installing more and more handlers for the same message.
Now I'm not sure what exactly it is you want. If you want to periodically request the CPU load, use setInterval or setTimeout. If you want to send a message to the server and want to 'wait' for a response, you may want to use acknowledgement functions (not very well documented, but see this blog post).
But you should assume that for each type of message, you should only call socket.on('MESSAGETYPE', ) once during the runtime of your code.
EDIT: here's an example client-side setup for a periodic poll of the data:
var socket = io.connect(...);
socket.on('connect', function() {
// Handle the server response:
socket.on('cpuUsage', function(data) {
document.write(data);
});
// Start an interval to query the server for the load every 30 seconds:
setInterval(function() {
socket.emit('getLoad');
}, 30 * 1000); // milliseconds
});
Use this line instead:
var socket = io.connect('iptoserver', {'force new connection': true});
Replace iptoserver with the actual ip to the server of course, in this case localhost.
Edit.
That is, if you want to create multiple clients.
Else you have to place your initiation of the socket variable before the for loop.
I suspected the call returns average CPU usage at the time of startup, which seems to be the case here. Checking the node-usage documentation page (average-cpu-usage-vs-current-cpu-usage) I found:
By default CPU Percentage provided is an average from the starting
time of the process. It does not correctly reflect the current CPU
usage. (this is also a problem with linux ps utility)
But If you call usage.lookup() continuously for a given pid, you can
turn on keepHistory flag and you'll get the CPU usage since last time
you track the usage. This reflects the current CPU usage.
Also given the example how to use it.
var pid = process.pid;
var options = { keepHistory: true }
usage.lookup(pid, options, function(err, result) {
});

Categories

Resources