Insert only if unique using knex - javascript

I am writing a Node.js Script which accepts an array of objects that represent a different devices recording different measurements. I would like to store information about the device in a PSQL database using Knex.js, but only want to store the device information from objects representing a new/unique device. Validation of the device_id before insert seems to work as long as the same device shows up in different POST requests. But when the same device shows up
in the same POST request it seems as if the asynchronous nature of the program causes the validation to occur before insertion is complete.
I've tried to make the script call two separate async-await functions (one to validate and the other to actually insert) but not sure if this is the easiest approach or if I did it right since it failed anyways.
app.post('/testsite', (req, res) => {
const data = (req.body.measurements);
for (let i = 0; i < data.length; i++) {
database.select("device_id").from("device_info").where("device_id", data[i].device_id)
.then(MatchList => {
console.log(MatchList);
if (MatchList.length === 0) {
return database('device_info')
.returning('device_id')
.insert({
device_id: data[i].device_id,
device_name: data[i].device_name,
site_id: data[i].site_id,
site_name: data[i].site_name
})
.then((newdevice) => {
console.log('inserted device id', newdevice);
});
}
return;
});
}
});
I expect it to not insert when the validation fails, but it seems like the validation never fails even when it should and I get this error:
Unhandled rejection error: duplicate key value violates unique constraint "device_info_pkey"

I'm not sure if the issue is due to the asynchronous nature of knex, or if the issue is something else. However, I re-wrote your code to use async/await syntax to make it more readable. I also am just checking if device comes back as null instead of an array since I added .first(). Could you check and see if device gets console logged when you call this function?
app.post('/testsite', async (req, res) => {
const data = (req.body.measurements);
for (let i = 0; i < data.length; i++) {
const device = await database
.select('*')
.from('device_info')
.where('device_id', data[i].device_id)
.first();
console.log(device);
if (device) {
const newDevice = await database
.from('device_info')
.insert({
device_id: data[i].device_id,
device_name: data[i].device_name,
site_id: data[i].site_id,
site_name: data[i].site_name
})
.returning('*');
return newDevice;
} else {
return device;
}
}
});

I was finally able to solve the issue. Essentially just using async await on the search and insert was not enough as the for loop would still move on to the next element. To get around this I made the POST request call an async function.
app.post('/testsite', (req,res) =>
{ const data = (req.body.measurements);
postfxn(data); } )
This function awaits every iteration of the for loop.
async function postfxn(data){
for(let i =0; i<data.length; i++){
await insertdevice(data[i]); }
}
the insert device function then uses async await to search for the device and insert if it is not in the database already as suggested by technogeek1996's answer.
Hope this helps others with similar issues

Related

Mongoose Cursor: http bulk request from collection

I have a problem which is relevant to rxJS and bulk HTTP requests from the huge collection (1M+ docs)
I have the following code, with quite simple logic. I push all the docs from the collection to allplayers array and making bulk 20 HTTP requests to API at once (guess you understand why it's limited) So, the code works fine, but I guess it's time to refactor it from this:
const cursor = players_db.find(query).lean().cursor();
cursor.on('data', function(player) { allPlayers.push(player); });
cursor.on('end', function() {
logger.log('warng',`S,${allPlayers.length}`);
from(allPlayers).pipe(
mergeMap(player => getPlayer(player.name, player.realm),20),
).subscribe({
next: player => console.log(`${player.name}#${player.realm}`),
error: error => console.error(error),
complete: () => console.timeEnd(`${updatePlayer.name}`),
});
});
As for now, I'm using find with cursor with (batchSize), but if I understood this right (via .length), and according to this question: {mongoose cursor batchSize} batchSize is just a way of optimization and it's not return me array of X docs.
So what should I do now and what operator should I choose for rxJS?
For example I could form arrays with necessary length (like 20) and transfer it to rxJS as I use it before. But I guess there should be another way, where I could use rxJS inside this for promise loop
const players = await players_db.find(query).lean().cursor({batchSize: 10});
for (let player = await players.next(); player != null; player = await players.next()) {
//do something via RxJS inside for loop
}
Also I found this question {Best way to query all documents from a mongodb collection in a reactive way w/out flooding RAM} which also relevant to my problem and I understand the logic, but don't the syntax of it. I also know that cursor variable isn't a doc I cann't do anything useful with it. Or actually I could?
rxJS's bufferCount is a quite interesting operator
https://gist.github.com/wellcaffeinated/f908094998edf54dc5840c8c3ad734d3 probable solution?
So, in the end I found that rxJS isn't needed (but can be used) for this case.
The solution was quite simple and using just MongoCursor:
async function BulkRequest (bulkSize = 10) {
try {
let BulkRequest_Array = [];
const cursor = collection_db.find({}).lean().cursor({batchSize: bulkSize});
cursor.on('data', async (doc) => {
BulkRequest_Array.push(/*any function or axios instance*/)
if (BulkRequest_Array.length >= bulkSize) {
cursor.pause();
console.time(`========================`);;
await Promise.all(BulkRequest_Array);
BulkRequest_Array.length = 0;
cursor.resume();
console.timeEnd(`========================`);
}
}
} catch (e) {
console.error(e)
}
}
BulkRequest();

Trying to make an external api call then insert data in to mongo db and then render the resulting entry to the template

I am fairly new to web development and am having a difficult time getting this route right. I want the route to call out to API then insert the data from the api into my mongo db and then render that result to a handlebars template. Not getting the promises thing I guess. Code below:
The code runs the api call, gets back data and even inserts into db but will not render the template after(I originally had it rendering with the api response but I want to have an db id attached when it renders over to the template). I am sure it has something to do with the promises. I have tried with callbacks with no luck so tried with async/await functions and this doesn't seem to work either. Again I still have issues with multiple callbacks so I was trying something else.
The code runs the api call, gets back data and even inserts into db but will not render the template after(I originally had it rendering with the api response but I want to have an db id attached when it renders over to the template). I am sure it has something to do with the promises. I have tried with callbacks with no luck so tried with async/await functions and this doesn't seem to work either. Again I still have issues with multiple callbacks so I was trying something else.
async function getRecipeData(param) {
let res = await axios.get("https://api.edamam.com/search?q=" + param + "&app_id=0abb0580&app_key=bc931d03c51359082244df2fa414c487");
var dataArray = res.data.hits
return (dataArray);
}
async function insertSearchedRecipes(resArray) {
let response = await
Recipe.create({
name: resArray[i].recipe.label,
image: resArray[i].recipe.image,
url: resArray[i].recipe.url
});
return response;
};
router.get('/getRecipes/:ingredient', function (req, res) {
res.redirect("/");
var params = req.params.ingredient;
console.log(params);
let recipeFind = getRecipeData(params);
recipeFind.then(function (result) {
console.log(result[0]);
for (i = 0; i < result.length; i++) {
var recipeFindCreate = insertSearchedRecipes(result);
};
recipeFindCreate.then(function (results) {
console.log("HELLO")
// console.log(results);
res.render("recipeResults", {
data: results
});
});
});
});
Standard trap async / await. The values returned from the function defined as async ALWAYS you must obtain by await / then.
By calling recipeFindCreate you only wait for last one, rest are make async.
let _ = [];
for (i = 0; i < result.length; i++) {
_.push(insertSearchedRecipes(result));
};
let results = await Promise.all(_);
As you refactorize the code, make sure that each variable is declared in a function.
async function insertSearchedRecipes(resArray) {
let response = await
Recipe.create({
name: resArray.recipe.label, //NOT: `resArray[i]`
image: resArray.recipe.image,
url: resArray.recipe.url
});
return response;
};
//and:
var recipeFindCreate = insertSearchedRecipes(result[i]); // not `result`
Also, take a look at the bulk / bath insert theme - adding a data series made in 1 operation.
BTW: try don't use var (global scope). Better use let / const

Mongoose inserting same data three times instead of iterating to next data

I am trying to seed the following data to my MongoDB server:
const userRole = {
role: 'user',
permissions: ['readPost', 'commentPost', 'votePost']
}
const authorRole = {
role: 'author',
permissions: ['readPost', 'createPost', 'editPostSelf', 'commentPost',
'votePost']
}
const adminRole = {
role: 'admin',
permissions: ['readPost', 'createPost', 'editPost', 'commentPost',
'votePost', 'approvePost', 'approveAccount']
}
const data = [
{
model: 'roles',
documents: [
userRole, authorRole, adminRole
]
}
]
When I try to iterate through this object / array, and to insert this data into the database, I end up with three copies of 'adminRole', instead of the three individual roles. I feel very foolish for being unable to figure out why this is happening.
My code to actually iterate through the object and seed it is the following, and I know it's actually getting every value, since I've done the console.log testing and can get all the data properly:
for (i in data) {
m = data[i]
const Model = mongoose.model(m.model)
for (j in m.documents) {
var obj = m.documents[j]
Model.findOne({'role':obj.role}, (error, result) => {
if (error) console.error('An error occurred.')
else if (!result) {
Model.create(obj, (error) => {
if (error) console.error('Error seeding. ' + error)
console.log('Data has been seeded: ' + obj)
})
}
})
}
}
Update:
Here is the solution I came up with after reading everyone's responses. Two private functions generate Promise objects for both checking if the data exists, and inserting the data, and then all Promises are fulfilled with Promise.all.
// Stores all promises to be resolved
var deletionPromises = []
var insertionPromises = []
// Fetch the model via its name string from mongoose
const Model = mongoose.model(data.model)
// For each object in the 'documents' field of the main object
data.documents.forEach((item) => {
deletionPromises.push(promiseDeletion(Model, item))
insertionPromises.push(promiseInsertion(Model, item))
})
console.log('Promises have been pushed.')
// We need to fulfil the deletion promises before the insertion promises.
Promise.all(deletionPromises).then(()=> {
return Promise.all(insertionPromises).catch(()=>{})
}).catch(()=>{})
I won't include both promiseDeletion and promiseInsertion as they're functionally the same.
const promiseDeletion = function (model, item) {
console.log('Promise Deletion ' + item.role)
return new Promise((resolve, reject) => {
model.findOneAndDelete(item, (error) => {
if (error) reject()
else resolve()
})
})
}
Update 2: You should ignore my most recent update. I've modified the result I posted a bit, but even then, half of the time the roles are deleted and not inserted. It's very random as to when it will actually insert the roles into the server. I'm very confused and frustrated at this point.
You ran into a very common problem when using Javascript: You shouldn't define (async) functions in a regular for (-in) loop. What happens, is that while you loop through the three values the first async find is being called. Since your code is async, nodejs does not wait for it to finish, before it continues to the next loop iteration and counts up to the third value, here the admin rule.
Now, since you defined your functions in the loop, when the first async call is over, the for-loop already looped to the last value, which is why admin is being inserted three times.
To avoid this, you can just move the async functions out of the loop to force a call by value rather than reference. Still, this can bring up a lot of other problems, so I'd recommend you to rather have a look at promises and how to chain them (e.g. Put all mongoose promises in an array and the await them using Promise.all) or use the more modern async/await syntax together with the for-of loop that allows for both easy readability as well as sequential async command instructions.
Check this very similar question: Calling an asynchronous function within a for loop in JavaScript
Note: for-of is being discussed as to performance heavy, so check if this applies to your use-case or not.
When using async functions in loops could cause some problems.
You should change the way you work with findOne to make it synchronous function
First you need to set your function to async, and then use the findOne like so:
async function myFucntion() {
let res = await Model.findOne({'role':obj.role}).exec();//Exec will fire the function and give back a promise which the await can handle.
//do what you need to do here with the result..
}

Refractroing: return or push value to new array value from mongoose callback

Actually I'm not sure that Title of my question is 'correct', if you
have any idea with it, you could leave a comment and I'll rename it.
I am trying to rewrite my old function which make http-requests and insert many object at mongoDB via mongoose. I already have a working version of it, but I face a problem while using it. Basically, because when I'm trying to insertMany 20 arrays from 20+ request with ~50'000 elements from one request it cause a huge memory leak. Even with MongoDB optimization.
Logic of my code:
function main() {
server.find({locale: "en_GB"}).exec(function (err, server) {
for (let i = 0; i < server.length; i++) { //for example 20 servers
rp({url: server[i].slug}).then(response => {
auctions.count({
server: server[i].name,
lastModified: {$gte: response.data.files[0].lastModified}
}).then(function (docs) {
if (docs < 0) {
//We don't insert data if they are already up-to-date
}
else {
//I needed response.data.files[0].url and server[i].name from prev. block
//And here is my problem
requests & insertMany and then => loop main()
})
}
})
}).catch(function (error) {
console.log(error);
})
}
})
}
main()
Actually I have already trying many different things to fix it. First-of-all I was trying to add setInterval after else block like this:
setTimeout(function () {
//request every server with interval, instead of all at once
}, 1000 * (i + 1));
but I create another problem for myself because I needed to recursive my main() function right after. So I can't use: if (i === server[i].length-1) to call garbage collector or to restart main() because not all server skip count validation
Or let's see another example of mine:
I change for (let i = 0; i < server.length; i++) from 3-rd line to .map and move it from 3-rd line close to else block but setTimeout doesn't work with .map version, but as you may already understand script lose correct order and I can't make a delay with it.
Actually I already understand how to fix it at once. Just re-create array via let array_new = [], array_new.push = response.data.files[0].url with use of async/await. But I'm not a big expert in it, so I already waste a couple of hours. So the only problem for now, that I don't know how to return values from else block
As for now I'm trying to form array inside else block
function main() {
--added let array_new = [];
[v1]array_new.url += response.data.files[0].url;
[v2]array_new.push(response.data.files[0].url);
return array_new
and then call array_new array via .then , but not one of these works fine for now. So maybe someone will give me a tip or show me already answered question #Stackoverflow that could be useful in my situation.
Since you are essentially dealing with promises, you can refactor your function logic to use async await as follows:
function async main() {
try {
const servers = await server.find({locale: "en_GB"}).exec()
const data = servers.map(async ({ name, slug }) => {
const response = await rp({ url: slug })
const { lastModified, url } = response.data.files[0]
const count = await auctions.count({
server: name,
lastModified: { $gte: lastModified }
})
let result = {}
if (count > 0) result = { name, url }
return result
}).filter(d => Object.keys(d).length > 0)
Model.insertMany(data)
} catch (err) {
console.error(err)
}
}
Your problem is with logic obscured by your promises. Your main function recursively calls itself N times, where N is the number of servers. This builds up exponentially to eat memory both by the node process and MongoDB handling all the requests.
Instead of jumping into async / await, start by using the promises and waiting for the batch of N queries to complete before starting another batch. You can use [Promise.all] for this.
function main() {
server.find({locale: "en_GB"}).exec(function (err, server) {
// need to keep track of each promise for each server
let promises = []
for (let i = 0; i < server.length; i++) {
let promise = rp({
url: server[i].slug
}).then(function(response) {
// instead of nesting promises, return the promise so it is handled by
// the next then in the chain.
return auctions.count({
server: server[i].name,
lastModified: {
$gte: response.data.files[0].lastModified
}
});
}).then(function (docs) {
if (docs > 0) {
// do whatever you need to here regarding making requests and
// inserting into DB, but don't call main() here.
return requestAndInsert();
}
}).catch(function (error) {
console.log(error);
})
// add the above promise to out list.
promises.push(promise)
}
// register a new promise to run once all of the above promises generated
// by the loop have been completed
Promise.all(promises).then(function () {
// now you can call main again, optionally in a setTimeout so it waits a
// few seconds before fetchin more data.
setTimeout(main, 5000);
})
})
}
main()

Asynchronous loop inside an asynchronous loop: a good idea?

I have some javascript code which does the following:
Read a .txt file and fill up an array of objects
Loop through these itens
Loop through an array of links inside each of these itens and make a request using nightmarejs
Write the result in Sql Server
My code is like this:
const Nightmare = require('nightmare');
const fs = require('fs');
const async = require('async');
const sql = require('mssql');
var links = recuperarLinks();
function recuperarLinks(){
//Read the txt file and return an array
}
const bigFunction = () => {
var aparelho = '';
async.eachSeries(links, async function (link) {
console.log('Zip Code: ' + link.zipCode);
async.eachSeries(link.links, async function(url){
console.log('URL: ' + url);
try {
await nightmare.goto(link2)
.evaluate(function () {
//return some elements
})
.end()
.then(function (result) {
//ajust the result
dadosAjustados.forEach(function (obj) {
//save the data
saveDatabase(obj, link.cep);
});
});
} catch (e) {
console.error(e);
}
}
}, function(err){
console.log('Erro: ');
console.log(err);
})
}, function (erro) {
if (erro) {
console.log('Erro: ');
console.log(erro);
}
});
}
async function salvarBanco(dados, cep){
const pool = new sql.ConnectionPool({
user: 'sa',
password: 'xxx',
server: 'xxx',
database: 'xxx'
});
pool.connect().then(function(){
const request = new sql.Request(pool);
const insert = "some insert"
request.query(insert).then(function(recordset){
console.log('Dado inserido');
pool.close();
}).catch(function(err){
console.log(err);
pool.close();
})
}).catch(function(err){
console.log(err);
});
}
lerArquivo();
It works fine, but i'm finding this async loop inside another async loop like a hack of some sort.
My outputs are something like this:
Fetching Data from cep 1
Fetching Data from url 1
Fetching Data from cep 2
Fetching Data from url 2
Fetching Data from cep 3
Fetching Data from url 3
Then it starts making the requests. Is there a better (and possibly a correct way) of doing this?
If you want to serialize your calls to nightmare.goto() and you want to simplify your code which is what you seem to be trying to do with await, then you can avoid mixing the callback-based async library with promises and accomplish your goal by only using promises like this:
async function bigFunction() {
var aparelho = '';
for (let link of links) {
for (let url of link.links) {
try {
let result = await nightmare.goto(url).evaluate(function () {
//return some elements
}).end();
//ajust the result
await Promise.all(dadosAjustados.map(obj => saveDatabase(obj, link.cep)));
} catch (e) {
// log error and continue processing
console.error(e);
}
}
}
}
Asynchronous loop inside an asynchronous loop: a good idea?
It's perfectly fine and necessary sometimes to nest loops that involve asynchronous operations. But, the loops have to be designed carefully to both work appropriately and to be clean, readable and maintainable code. Your bigFunction() does not seem to be either to me with your mix of async coding styles.
It works fine, but i'm finding this async loop inside another async loop like a hack of some sort.
If I were teaching a junior programmer or doing a code review for code from any level of developer, I would never allow code that mixes promises and the callback-based async library. It's just mixing two completely different programming styles for both control flow and error handling and all you get is a very hard to understand mess. Pick one model or the other. Don't mix. Personally, it seems to me that the future of the language is Promises for both async control flow and error propagation so that's what I would use.
Note: This appears to be pseudo-code because some references used in this code are not used or defined such as result and dadosAjustados. So, you will have to adapt this concept to your real code. For the future, we can offer you much more complete answers and often make suggested improvements that you're not even aware of if you include your real code, not an abbreviated pseudo-code.

Categories

Resources