Request Size is too large - javascript

I have written the Stored procedure for bulkInsert where I am handling the SP's timeout as well. Still I am getting "Request size is too large" exception while executing the SP. Giving below the SP. Please help me, where I am wrong. I have taken all the code from pluralsight only. and handling in the same way like they have.
function spBulkInsert(docs){
if (!docs) {
throw new Error('Documents array is null or not defined!');
}
var context = getContext();
var collection = context.getCollection();
var response = context.getResponse();
var docCount = docs.length;
if (docCount == 0) {
response.setBody(0);
return;
}
var count = 0;
createDoc(docs[0]);
function createDoc(doc) {
var isAccepted = collection.createDoucument(collection.getSelfLink(), doc, docCreated);
if (!isAccepted) {
response.setBody(count);
}
}
function docCreated(err, doc) {
if (err) throw err;
count++;
if (count == docCount) response.setBody(count);
else createDoc(docs[count]);
}
};
Code for handling above SP:
var totalInsertedCount=0;
while (totalInsertedCount < data.Count)
{
var insertedCount = await client.ExecuteStoredProcedureAsync<int>(
UriFactory.CreateStoredProcedureUri("TestDoc", "coll", "spBulkInsert"),
new RequestOptions { PartitionKey = new PartitionKey("partitionKey") }, data);
totalInsertedCount += insertedCount;
Console.WriteLine("Inserted {0} documents ({1} total, {2} remaining)", insertedCount, totalInsertedCount, data.Count - totalInsertedCount);
data= data.GetRange(insertedCount, data.Count - insertedCount);
}

Just as summary,The document size in the request exceeded the allowable document size for a request. The max allowable document size is 2MB. which mentioned here .
Stored Procedure Bulk importing data is a process of executing Stored Procedure, just only one HTTP request, and the size of the requested documents per HTTP request is limited by Cosmos DB below 2MB.
Suggestions:
1.You can split your document data and import them in batches.
2.You can try to simplify your document data, such as removing unnecessary ' ' and '\n' etc.

Regardless of where the write occurs (SP or API or in the portal) there's always 2MB limitation for the document size in Cosmos DB. The documents must be split/disembedded/chained/linked/etc on the client side before submitting to Cosmos DB.

Related

Azure CosmosDb Stored Procedure IfMatch Predicate

In a DocDb stored procedure, as the first step in a process retrieving data that I'm mutating, I read and then use the data iff it matches the etag like so:
collection.readDocument(reqSelf, function(err, doc) {
if (doc._etag == requestEtag) {
// Success - want to update
} else {
// CURRENTLY: Discard the read result I just paid lots of RUs to read
// IDEALLY: check whether response `options` or similar indicates retrieval
was skipped due to doc not being present with that etag anymore
...
// ... Continue with an alternate strategy
}
});
Is there a way to pass an options to the readDocument call such that the callback will be informed "It's changed so we didn't get it, as you requested" ?
(My real problem here is that I can't find any documentation other than the readDocument undocumentation in the js-server docs)
Technically you can do that by creating a responseOptions object and passing it to the call.
function sample(selfLink, requestEtag) {
var collection = getContext().getCollection();
var responseOptions = { accessCondition: { type: "IfMatch", condition: requestEtag } };
var isAccepted = collection.readDocument(selfLink, responseOptions, function(err, doc, options) {
if(err){
throw new Error('Error thrown. Check the status code for PreconditionFailed errors');
}
var response = getContext().getResponse();
response.setBody(doc);
});
if (!isAccepted) throw new Error('The query was not accepted by the server.');
}
However, even if the etag you provide is not the one that the document has, you won't get an error and you will properly get the document itself back. It's just not supposed to work with that using the readDocument function in a stored procedure.
Thanks to some pushing from #Nick Chapsas, and this self-answer from #Redman I worked out that in my case I can achieve my goal (either read the current document via the self-link, or the newer one that has replaced it bearing the same id) by instead generating an Alt link within the stored procedure like so:
var docId = collection.getAltLink() + "/docs/"+req.id;
var isAccepted = collection.readDocument(docId, {}, function (err, doc, options) {
if (err) throw err;
// Will be null or not depending on whether it exists
executeUpsert(doc);
});
if (!isAccepted) throw new Error("readDocument not Accepted");

Node.js update client-accessible JSON file

A beginner's question as I am new to web programming. I am using the MEAN stack and writing a JSON file within the server in order to make some weather information available to any connected clients.
I am updating the JSON file every hour using the node-schedule library. Will the constant updating of the file from the server cause any concurrency issues if the clients happen to be attempting to access the file's data at the same time?
Code snippet below:
server.js
function updateWeatherFile() {
var weather = require('weather-js');
var w = "";
weather.find({search: weatherSearch, degreeType: 'C'}, function(err, result) {
if(err)
console.log(err);
w = JSON.stringify(result, null, 2);
fs.writeFile('public/weather.json', w, function(err) {
if(err) {
console.log(err);
}
});
});
}
if(scheduleWeather) {
var schedule = require('node-schedule');
var sequence = '1 * * * *'; // cron string to specify first minute of every hour
var j = schedule.scheduleJob(sequence, function(){
updateWeatherFile();
console.log('weather is updated to public/weather.json at ' + new Date());
});
}
else {
updateWeatherFile();
}
client_sample.js
// get the current weather from the server
$http.get('weather.json').then(function(response) {
console.log(response['data'][0]['current']);
vm.weather = response['data'][0]["current"].skytext;
vm.temperature = response['data'][0]["current"].temperature;
});
NodeJs is single threaded environment.
However To read and write files Node starts external processes and eventually the file can be accessed to read and write simultaneously. In this case the concurrency is not handled by Node, but by the Operational System.
If you think this concurrency may harm you program, consider using a lock file as commented and explained here.

Node.js - http request is not working when inside a while loop

I'm using the unirest library to fetch all of the data from an api, which is split up by offset and limits parameters, and has no finite number of results.
I'm using a while condition to iterate through the data and at the point where no results are returned, I end the loop by setting an 'incomplete' variable to false.
But for some reason, when I run the following code nothing happens (as in no data is added to my database and nothing is outputted to the console) until I get the 'call_and_retry_last allocation failed' error (assuming this happens when a while loop goes on too long). But when I remove the while condition altogether the code works fine.
Is there a particular reason why this isn't working?
Here's my code:
var limit = 50,
offset = 0,
incomplete = true;
while (incomplete) {
// make api call
unirest.get("https://www.theapiurl.com")
.header("Accept", "application/json")
.send({ "limit": limit, "offset": offset })
.end(function (result) {
// parse the json response
var data = JSON.parse(result.raw_body);
// if there is data
if( data .length > 0 )
{
// save the api data
// + increase the offset value for next set of data
offset += limit;
}
else
{
// if there is no data left, end loop
incomplete = false;
console.log("finished!");
}
});
}
You can use recurrcive function as
function getServerData(offset){
//Your api service with callback.if there is a data then call it again with the new offset.
}
function getServerData(1);

Node.js process out of memory

I have written a service to download files from an external partner site. There are around 1000 files of 1 MB each. My process is going out of memory every time I reach around 800 files.
How should I identify the root cause ?
var request = require('sync-request');
var fs = require('graceful-fs')
function find_starting_url(xyz_category){
feed_url = "<url>"
response = request("GET", feed_url).getBody().toString()
response = JSON.parse(response)
apiListings = response['apiGroups']['affiliate']['apiListings']
starting_url = apiListings[xyz_category]['availableVariants']['v0.1.0']['get']
return starting_url
}
function get_all_files(feed_category, count, next_url, retry_count){
var headers = {
'Id': '<my_header>',
'Token': '<my key>'
}
console.log(Date())
console.log(count)
if(next_url){
products_url = next_url
}
else{
products_url = find_starting_url(feed_category)
}
try{
var products = request("GET", products_url, {"headers": headers}).getBody().toString()
var parsed = JSON.parse(products)
var home = process.env.HOME
var fd = fs.openSync(home + "/data/abc/xyz/" + feed_category + "/" + count + ".json", 'w')
fs.writeSync(fd, products)
fs.closeSync(fd)
next_url = parsed['nextUrl']
count++;
if(next_url){
get_all_files(feed_category, count, next_url)
}
}catch(e){
if(retry_count >= 5){
console.log("TERRIBLE ENDING!!!", e)
}else{
retry_count++;
console.log("some error... retrying ..", e)
get_all_files(feed_category, count, next_url, retry_count)
}
}
}
var feed_category = process.argv[2]
get_all_files(feed_category, 1)
You're calling a synchronous function recursively so every single request you have and all the data from each request is retained in memory in your local variables until all of the requests are done and all the recursive calls can unwind and then finally free all the sets of local variables. This requires monster amounts of memory (as you have discovered).
It would be best to restructure your code so that the current request is processed, written to disk and then nothing from that request is retained when it goes onto the next request. The simplest way to do that would be to use a while loop instead of a recursive call. In pseudo code:
initialize counter
while (more to do) {
process the next item
increment counter
}
I don't understand the details of what your code is trying to do well enough to propose a rewrite, but hopefully you can see how you can replace the recursion with the type of non-recursive structure above.
It's because you are performing a recursive call to the get_all_files function and it's keeping the body variable in memory for every single execution, since every child execution needs to be completed before the memory is released.

Parse afterSave function getting skipped over

so i have a messaging app using parse.com as my backend. When i send a message from the app it saves it on Parse.com to a class called "NewMessages". Then in my cloud code i have an afterSave function dedicated to this class so that when a new object gets saved to "NewMessages" it picks a random user attaches it to the message and saves it in a new class called "Inbox". Then it deletes the original message from "NewMessages".
So the "NewMessages" class should always be empty right? But when I send a bunch of messages very quickly some get skipped over. How do i fix this?
Is there a better way to structure this than using afterSave?
function varReset(leanBody, leanSenderName, leanSenderId, randUsers){
leanBody = "";
leanSenderName = "";
leanSenderId = "";
randUsers = [];
console.log("The variables were set");
}
Parse.Cloud.afterSave("Lean", function(leanBody, leanSenderName, leanSenderId, randUsers, request) {
varReset(leanBody, leanSenderName, leanSenderId, randUsers);
var query = new Parse.Query("NewMessages");
query.first({
success: function(results){
leanBody = (results.get("MessageBody"));
leanSenderName = (results.get("senderName"));
leanSenderId = (results.get("senderId"));
getUsers(leanBody, leanSenderName, leanSenderId);
results.destroy({
success: function(results){
console.log("deleted");
}, error: function(results, error){
}
});
}, error: function(error){
}
});
});
function getUsers(leanBody, leanSenderName, leanSenderId, response){
var query = new Parse.Query(Parse.User);
query.find({
success: function(results){
var users = [];
console.log(leanBody);
console.log(leanSenderName);
//extract out user names from results
for(var i = 0; i < results.length; ++i){
users.push(results[i].id);
}
for(var i = 0; i < 3; ++i){
var rand = users[Math.floor(Math.random() * users.length)];
var index = users.indexOf(rand);
users.splice(index, 1);
randUsers.push(rand);
}
console.log("The random users are " + randUsers);
sendMessage(leanBody, leanSenderName, leanSenderId, randUsers);
}, error: function(error){
response.error("Error");
}
});
}
function sendMessage(leanBody, leanSenderName, leanSenderId, randUsers){
var Inbox = Parse.Object.extend("Inbox");
for(var i = 0; i < 3; ++i){
var inbox = new Inbox();
inbox.set("messageBody", leanBody);
inbox.set("senderName", leanSenderName);
inbox.set("senderId", leanSenderId);
inbox.set("recipientId", randUsers[i]);
console.log("leanBody = " + leanBody);
console.log("leanSenderName = " + leanSenderName);
console.log("leanSenderId = " + leanSenderId);
console.log("recipient = " + randUsers[i]);
inbox.save(null, {
success: function(inbox) {
// Execute any logic that should take place after the object is saved.
alert('New object created with objectId: ' + inbox.id);
},
error: function(inbox, error) {
// Execute any logic that should take place if the save fails.
// error is a Parse.Error with an error code and message.
alert('Failed to create new object, with error code: ' + error.message);
}
});
}
}
Have you checked your logs? You may be falling afoul of resource limits (https://parse.com/docs/cloud_code_guide#functions-resource). If immediacy is not important, it may be worth looking into set up a background job that runs every few minutes and tackles undelivered messages. It may also be possible to combine the two approaches: having the afterSave function attempt to do an immediate delivery to Inboxes, while the background job picks up any NewMessages left over on a regular basis. Not the prettiest solution but at least you have a bit more reliability. (You'll have to think about race conditions though where the two may attempt deliveries on the same NewMessage.)
Regarding your question about a better structure, if the two classes are identical (or close enough), is it possible to just have a Messages class? Initially the "to" field will be null but is assigned a random recipient on a beforeSave function. This may be faster and neater.
EDIT: Adding a 3rd observation which was originally a comment:
I saw that you are using a Query.first() in afterSave in order to find the NewMessage to take care of. Potentially, a new NewMessage could have snuck in between the time afterSave was called, and the Query was run. Why not get the ID of the saved NewMessage and use that in the Query, instead of first()?
query.get(request.object.id,...);
This ensures that the code in afterSave handles the NewMessage that it was invoked for, not the one that was most recently saved.

Categories

Resources