MongoDB: Javascript out of memory - javascript

I'm generating some random data in order to store them in mongodb.
By generating a lot of data and storing them first in an array (for seperating the generation from inserting for measuring) an out of memory error occurs.
The code:
for (i=0; i<amount; i++)
{
doc = {starttime:get_datetime(), endtime:get_datetime(), tos: null, sourceport: get_port(), sourcehost: get_ip(), duration: get_duration() , destinationhost: get_ip(), destinationport: get_port(), protocol: get_protocol(), flags: get_flags(), packets: get_packets()};
docs[i]=doc;
}
I chose e.g. amount = 10.000.000.
all functions look like:
function get_flags( )
{
var tmpstring= Math.floor((Math.random()*8)+1);
return tmpstring;
}
How does such an error occurs? How can I solve that problem?

How does such an error occur? The docs array needs memory so adding 10million entries would mean using up (e.g.) 100x10million bytes (if each doc entry is 100 bytes) which is 1GB of memory.
Proposed solution: Maybe try running the generate-insert cycle in batches of say 1000 entries. So generate 1000 docs, save them and reuse the array for next 1000 docs and so on.

Related

How to solve the issue of NetSuite Restlet timeout limit issue?

Now I am working on NetSuite Restlet for the first time.
I have the following data retrieved from savedSearch.
{
"recordType": "receipt",
"id": "sample-id",
"values": {
"customer.customerid": "sample-id",
"customer.customercompany": "sample-customercompany",
"customer.addressone": "sample-addressone",
"customer.addresstwo": "sample-addresstwo",
"customer.addresscity": "sample-addresscity",
"customer.addressstate": "sample-addressstate",
"country": "Australia",
"transacitionrecordid": "sample-id",
"unit": "Dollar",
"total": "120"
}
}
And I have to loop the resultsets and push each record to the array and return the array at the end.
There are no fields that I can drop. All the fields have to be included.
The problem is that the number of records is roughly 31,000.
When I run my script, the execution goes over 5 mins which is the Restlet execution time limit.
Here is my script.
define(['N/search'], function(search) {
function get(event) {
var saved = search.load({ id: "search-id" });
var searchResultSet = saved.run();
var results = [];
var start = 0;
do {
searchRecords = searchResultSet.getRange({ start: start, end: start + 1000 });
start = start + 1000;
results.concat(searchRecords);
} while(results.length);
return JSON.stringify(results); // return as string for now to see the output on browser
}
return {
get: get
};
})
This is what my script looks like.
Ideally, I call this script once and return the whole 31,000 records of data.
However, due to the execution limit, I am thinking of passing a parameter(works as a pointer?index?cursor) and passing this variable to getRange function as a starting index.
I have tested and I can call for 10,000 records. So call this script 3 times by passing the parameter like 0, 10000, 20000.
But is there any better way to solve this issue? What I am really looking for is to call this script only once and return 31,000 records without having the issue of timeout.
Can I have any suggestions, please?
Thank you very much in advance.
It sounds like you need a map reduce script type. I am not sure what the overall result you are trying to achieve is but in general MapReduce scripts are made for processing large amounts of data.
The script can be scheduled OR you can trigger it using N/Task (seems like this is what you need if you want to trigger it from the RESTLET)
MapReduce scripts have 4 native functions. Each function has its own usage limit each time it is triggered which makes it ideal for processing large amounts of data like this.
The first is used for generating the dataset (you can return a search: return search.create({}) )
Second is for grouping data - it will run once per search result.
Third is for executing code on the data - it will run once per unique key that was passed from the previous funciton.
Fourth is for summarizing the script. You can catch errors here etc.
It is possible that 31k results will be to large for the first function to query in which case you can split it into sections. For example get up to 5k results and then in the summarize check if there are still more results to process and if there are trigger the script again (to do this you will need multiple deployments and a marker on the transaction to tell you that it was updated OR a global field which holds the the last chunk of data that was processed)

How to deserialize dumped BSON with arbitrarily many documents in JavaScript?

I have a BSON file that comes from a mongoexport of a database. Let's assume the database is todo and the collection is items. Now I want to load the data offline into my RN app. Since the collection may contain arbitrarily many documents (lets say 2 currently), I want to use a method to parse the file however many documents it contains.
I have tried the following methods:
Use external bsondump executable.
We can convert the file to JSON using a external command
bsondump --outFile items.json items.bson
But I am developing a mobile app, so invoking a third-party executable in shell command is not ideal. Plus, the output contains several lines of one-line JSON objects, so the output is technically not a correct JSON file. So parsing afterwards is not graceful.
Use deserialize in js-bson library
According to the js-bson documentation, we can do
const bson = require('bson')
const fs = require('fs')
bson.deserialize(fs.readFileSync(PATH_HERE))
But this raises an error
Error: buffer length 173 must === bson size 94
and by adding this option,
bson.deserialize(fs.readFileSync(PATH_HERE), {
allowObjectSmallerThanBufferSize: true
})
the error is resolved but only returns the first document. Because the documentation doesn't mention that this function can only parse 1-document collection, I wonder if there is some option that enables multiple document reading.
Use deserializeStream in js-bson
let docs = []
bson.deserializeStream(fs.readFileSync(PATH_HERE), 0, 2, docs, 0)
But this methods requires a parameter of the document count (2 here).
Use bson-stream library
I am actually using react-native-fetch-blob instead of fs, and according to their documentation, the stream object does not have a pipe method, which is the one-and-only method demonstrated in bson-stream doc. So although this method does not require the number of documents, I am confused how to use it.
// fs
const BSONStream = require('bson-stream');
fs.createReadStream(PATH_HERE).pipe(new BSONStream()).on('data', callback);
// RNFetchBlob
const RNFetchBlob = require('react-native-fetch-blob');
RNFetchBlob.fs.readStream(PATH_HERE, ENCODING)
.then(stream => {
stream.open();
stream.can_we_pipe_here(new BSONStream())
stream.onData(callback)
});
Also I'm not sure about the above ENCODING above.
I have read the source code of js-bson and has figured out a way to solve the problem. I think it's better to keep a detailed record here:
Approach 1
Split documents by ourselves, and feed the documents to parser one-by-one.
BSON internal format
Let's say the .json dump of our todo/items.bson is
{_id: "someid#1", content: "Launch a manned rocket to the sun"}
{_id: "someid#2", content: "Wash my underwear"}
Which clearly violates the JSON syntax because there isn't an outer object wrapping things together.
The internal BSON is of similar shape, but it seems BSON allows this kind of multi-object stuffing in one file.
Then for each document, the four leading bytes indicates the length of this document, including this prefix itself and the suffix. The suffix is simply a 0 byte.
The final BSON file resembles
LLLLDDDDDDD0LLLLDDD0LLLLDDDDDDDDDDDDDDDDDDDDDD0...
where L is length, D is binary data, 0 is literally 0.
The algorithm
Therefore, we can develop a simple algorithm to get the document length, do the bson.deserialize with allowObjectSmallerThanBufferSize which will get a first document from buffer start, then slice off this document and repeat.
About encoding
One extra thing I mentioned is encoding in the React Native context. The libraries dealing with React Native persistent seems to all lack the support of reading the raw buffer from a file. The closest choice we have is base64, which is a string representation of any binary file. Then we use Buffer to convert base64 strings to buffers and feed into the algorithm above.
The code
deserialize.js
const BSON = require('bson');
function _getNextObjectSize(buffer) {
// this is how BSON
return buffer[0] | (buffer[1] << 8) | (buffer[2] << 16) | (buffer[3] << 24);
}
function deserialize(buffer, options) {
let _buffer = buffer;
let _result = [];
while (_buffer.length > 0) {
let nextSize = _getNextObjectSize(_buffer);
if (_buffer.length < nextSize) {
throw new Error("Corrupted BSON file: the last object is incomplete.");
}
else if (_buffer[nextSize - 1] !== 0) {
throw new Error(`Corrupted BSON file: the ${_result.length + 1}-th object does not end with 0.`);
}
let obj = BSON.deserialize(_buffer, {
...options,
allowObjectSmallerThanBufferSize: true,
promoteBuffers: true // Since BSON support raw buffer as data type, this config allows
// these buffers as is, which is valid in JS object but not in JSON
});
_result.push(obj);
_buffer = _buffer.slice(nextSize);
}
return _result;
}
module.exports = deserialize;
App.js
import RNFetchBlob from `rn-fetch-blob`;
const deserialize = require('./deserialize.js');
const Buffer = require('buffer/').Buffer;
RNFetchBlob.fs.readFile('...', 'base64')
.then(b64Data => Buffer.from(b64Data, 'base64'))
.then(bufferData => deserialize(bufferData))
.then(jsData => {/* Do anything here */})
Approach 2
The above method reads the files as a whole. Sometimes when we have a very large .bson file, the app may crash. Of course one can change the readFile to readStream above and add various checks to determine if the current chunk contains an ending of a document. This can be troublesome, and we are actually re-writing the bson-stream library!
So instead, we can create a RNFetchBlob file stream, and another bson-stream parsing stream. This brings us back to the attempt #4 in the question.
After reading the source code, the BSON parsing stream is inherited form a node.js Transform string. Instead of piping, we can manually forward chunks and events from onData and onEnd to on('data') and on('end').
Since bson-stream does not support passing options to underlying bson library calls, one may want to tweak the library source code a little in their own projects.

MongoImport csv combine/concat various columns to one array for import

I have another interesting case which I have never faced before, so I'm asking help from SO community and also share my experience with it.
The case || What we have:
A csv file (exported from other SQL DB) with such structure
(headers):
ID,SpellID,Reagent[0],Reagent[1..6]Reagent[7],ReagentCount[0],ReagentCount[1..6],ReagentCount[7]
You could also check a full -csv data file here, at my
dropbox
My gist from Github, which helps you to understand how MongoImport works.
What we need:
I'd like to receive such structure(schema) to import it into MongoDB collection:
ID(Number),SpellID(Number),Reagent(Array),ReagentCount(Array)
6,898,[878],[1]
with ID, SpellID, and two arrays, in first we store all Reagent IDs, like [0,1,2,3,4,5,6,7] from all Reagent[n] columns, and in the second array we have the array with the same length that represent quantity of ReagentIDs, from all ReagentCount[n]
OR
A transposed objects with such structure (schema):
ID(Number),SpellID(Number),ReagentID(Number),Quantity/Count(Number)
80,2675,1,2
80,2675,134,15
80,2675,14,45
As you may see, the difference between the first example and this one, that every document in the collection represents each ReagentID and it's quantity to SpellID. So if one Spell_ID have N different reagents it will be N documents in the collection, cause we all know, that there can't be more then 7 unique Reagent_ID belonging to one Spell_ID according to our -csv file.
I am working on this problem right now, with the help of node js and npm i csv (or any other modules for parsing csv files). Just to make my csv file available for importing to my DB via mongoose. I'll be very thankful for all those, who could provide any relevant contribution to this case. But anyway, I will solve this problem eventually and share my solution in this question.
As for the first variant I guess there should be one-time script for MongoImport that could concat all columns from Reagent[n] & ReagentCount[n] to two separate arrays like I mentioned above, via -fields but unfortunately I don't know it, and there are no examples on SO or official Mongo docs relevant to it. So if you have enough experience with MongoImport feel free to share it.
Finally I solve my problem as I want it to, but without using mongoimport
I used npm i csv and write function for parsing my csv file. In short:
async function FuncName (path) {
try {
let eva = fs.readFileSync(path,'utf8');
csv.parse(eva, async function(err, data) {
//console.log(data[0]); we receive headers, if they exist
for (let i = 1; i < data.length; i++) { //we start from 1, because 0 is headers, if we don't have it, then we start from 0
console.log(data[i][34]); //where i is row number and j(34) is a header address
}
});
} catch (err) {
console.log(err);
}
}
It loops over csv file and shows data in array that allows you to operate with them as you want it to.

Querying a parse table and eagerly fetching Relations for matching

Currently, I have a table named Appointments- on appointments, I have a Relation of Clients.
In searching the parse documentation, I haven't found a ton of help on how to eagerly fetch all of the child collection of Clients when retrieving the Appointments. I have attempted a standard query, which looked like this:
var Appointment = Parse.Object.extend("Appointment");
var query = new Parse.Query(Appointment);
query.equalTo("User",Parse.User.current());
query.include('Rate'); // a pointer object
query.find().then(function(appointments){
let appointmentItems =[];
for(var i=0; i < appointments.length;i++){
var appt = appointments[i];
var clientRelation = appt.relation('Client');
clientRelation.query().find().then(function(clients){
appointmentItems.push(
{
objectId: appt.id,
startDate : appt.get("Start"),
endDate: appt.get("End"),
clients: clients, //should be a Parse object collection
rate : appt.get("Rate"),
type: appt.get("Type"),
notes : appt.get("Notes"),
scheduledDate: appt.get("ScheduledDate"),
confirmed:appt.get("Confirmed"),
parseAppointment:appt
}
);//add to appointmentitems
}); //query.find
}
});
This does not return a correct Clients collection-
I then switched over to attempt to do this in cloud code- as I was assuming the issue was on my side for whatever reason, I thought I'd create a function that did the same thing, only on their server to reduce the amount of network calls.
Here is what that function was defined as:
Parse.Cloud.define("GetAllAppointmentsWithClients",function(request,response){
var Appointment = Parse.Object.extend("Appointment");
var query = new Parse.Query(Appointment);
query.equalTo("User", request.user);
query.include('Rate');
query.find().then(function(appointments){
//for each appointment, get all client items
var apptItems = appointments.map(function(appointment){
var ClientRelation = appointment.get("Clients");
console.log(ClientRelation);
return {
objectId: appointment.id,
startDate : appointment.get("Start"),
endDate: appointment.get("End"),
clients: ClientRelation.query().find(),
rate : appointment.get("Rate"),
type: appointment.get("Type"),
notes : appointment.get("Notes"),
scheduledDate: appointment.get("ScheduledDate"),
confirmed:appointment.get("Confirmed"),
parseAppointment:appointment
};
});
console.log('apptItems Count is ' + apptItems.length);
response.success(apptItems);
})
});
and the resulting "Clients" returned look nothing like the actual object class:
clients: {_rejected: false, _rejectedCallbacks: [], _resolved: false, _resolvedCallbacks: []}
When I browse the data, I see the related objects just fine. The fact that Parse cannot eagerly fetch relational queries within the same call seems a bit odd coming from other data providers, but at this point I'd take the overhead of additional calls if the data was retrieved properly.
Any help would be beneficial, thank you.
Well, in your Cloud code example - ClientRelation.query().find() will return a Parse.Promise. So the output clients: {_rejected: false, _rejectedCallbacks: [], _resolved: false, _resolvedCallbacks: []} makes sense - that's what a promise looks like in console. The ClientRelation.query().find() will be an async call so your response.success(apptItems) is going to be happen before you're done anyway.
Your first example as far as I can see looks good though. What do you see as your clients response if you just output it like the following? Are you sure you're getting an array of Parse.Objects? Are you getting an empty []? (Meaning, do the objects with client relations you're querying actually have clients added?)
clientRelation.query().find().then(function(clients){
console.log(clients); // Check what you're actually getting here.
});
Also, one more helpful thing. Are you going to have more than 100 clients in any given appointment object? Parse.Relation is really meant for very large related collection of other objects. If you know that your appointments aren't going to have more than 100 (rule of thumb) related objects - a much easier way of doing this is to store your client objects in an Array within your Appointment objects.
With a Parse.Relation, you can't get around having to make that second query to get that related collection (client or cloud). But with a datatype Array you could do the following.
var query = new Parse.Query(Appointment);
query.equalTo("User", request.user);
query.include('Rate');
query.include('Clients'); // Assumes Client column is now an Array of Client Parse.Objects
query.find().then(function(appointments){
// You'll find Client Parse.Objects already nested and provided for you in the appointments.
console.log(appointments[0].get('Clients'));
});
I ended up solving this using "Promises in Series"
the final code looked something like this:
var Appointment = Parse.Object.extend("Appointment");
var query = new Parse.Query(Appointment);
query.equalTo("User",Parse.User.current());
query.include('Rate');
var appointmentItems = [];
query.find().then(function(appointments){
var promise = Parse.Promise.as();
_.each(appointments,function(appointment){
promise = promise.then(function(){
var clientRelation = appointment.relation('Clients');
return clientRelation.query().find().then(function(clients){
appointmentItems.push(
{
//...object details
}
);
})
});
});
return promise;
}).then(function(result){
// return/use appointmentItems with the sub-collection of clients that were fetched within the subquery.
});
You can apparently do this in parallel, but that was really not needed for me, as the query I'm using seems to return instantaniously. I got rid of the cloud code- as it didnt seem to provide any performance boost. I will say, the fact that you cannot debug cloud code seems truly limiting and I wasted a bit of time waiting for console.log statements to show themselves on the log of the cloud code panel- overall the Parse.Promise object was the key to getting this to work properly.

MongoDb bulk insert limit issue

Im new with mongo and node. I was trying to upload a csv into the mongodb.
Steps include:
Reading the csv.
Converting it into JSON.
Pushing it to the mongodb.
I used 'csvtojson' module to convert csv to json and pushed it using code :
MongoClient.connect('mongodb://127.0.0.1/test', function (err, db) { //connect to mongodb
var collection = db.collection('qr');
collection.insert(jsonObj.csvRows, function (err, result) {
console.log(JSON.stringify(result));
console.log(JSON.stringify(err));
});
console.log("successfully connected to the database");
//db.close();
});
This code is working fine with csv upto size 4mb; more than that its not working.
I tried to console the error
console.log(JSON.stringify(err));
it returned {}
Note: Mine is 32 bit system.
Is it because there a document limit of 4mb for 32-bit systems?
I'm in a scenario where I can't restrict the size and no.of attributes in the csv file (ie., the code will be handling various kinds of csv files). So how to handle that? I there any modules available?
If you are not having a problem on the parsing the csv into JSON, which presumably you are not, then perhaps just restrict the list size being passed to insert.
As I can see the .csvRows element is an array, so rather than send all of the elements at once, slice it up and batch the elements in the call to insert. It seems likely that the number of elements is the cause of the problem rather than the size. Splitting the array up into a few inserts rather than 1 should help.
Experiment with 500, then 1000 and so on until you find a happy medium.
Sort of coding it:
var batchSize = 500;
for (var i=0; i<jsonObj.csvRows.length; i += batchSize) {
var docs = jsonObj.csvRows.slice(i, i+(batchSize -1));
db.collection.insert( docs, function(err, result) {
// Also don't JSON covert a *string*
console.log(err);
// Whatever
}
}
And doing it in chunks like this.
You can make those data as an array of elements , and then simply use the MongoDB insert function, passing this array to the insert function

Categories

Resources