Implemented nested async calls in node.js - javascript

Node noobie here. After doing a lot of searching and reading a lot of articles, posts, and Stackoverflow discussions, I remain stuck when implementing nested async calls in node.js. Here is the scenario I'm trying to implement:
My code receives an event to open the third party CSV file. Once the file is opened, the content must be validated. If the data is valid, it gets persisted to the backend data store. Any exception to the validation rules must be stored in a separate error log on the back end.
I've modularized the solution into separate functions: one function to open the file, the next function to validate the contents of the file, the next function to persist the content, and finally last function to persist eventual errors generated by the validation function.
Each of these functions incurs certain latency, which necessitates using callbacks. But because the nesting of the callbacks can go a few levels deep, I've tried to keep my code shallow and to separate concerns and compartmentalize the logic in separate modules/functions. Now when I'm trying to string them together, I am failing at implementing correct callback logic.
For example, my top level module looks something like this:
module.exports = {
openFile: (filePath) => {
// code to open file and load the content
readAndParseFile(fileContent, validateAndReturnStatus);
}
}
function readAndParseFile(fileContent, next) {
// code to parse file content into rows of data
next(rows, persistValues);
}
function validateAndReturnStatus(rows, next) {
// code to validate
if(valid) {
persistValues(row, sendResponse);
}
}
function persistValues(row, next) {
//code to persist validated data
next('success');
}
function sendResponse(status) {
console.log(status);
}
Above is a very simplified pseudo-code, but I hope it illustrates my intention. My question is: what is the most reliable, correct way to nest functions that are unavoidably latent and make sure that the batch process is waiting on the correct status at the end of the processing?
Thanks.

Related

How to initialize a child process with passed in functions in Node.js

Context
I'm building a general purpose game playing A.I. framework/library that uses the Monte Carlo Tree Search algorithm. The idea is quite simple, the framework provides the skeleton of the algorithm, the four main steps: Selection, Expansion, Simulation and Backpropagation. All the user needs to do is plug in four simple(ish) game related functions of his making:
a function that takes in a game state and returns all possible legal moves to be played
a function that takes in a game state and an action and returns a new game state after applying the action
a function that takes in a game state and determines if the game is over and returns a boolean and
a function that takes in a state and a player ID and returns a value based on wether the player has won, lost or the game is a draw. With that, the algorithm has all it needs to run and select a move to make.
What I'd like to do
I would love to make use of parallel programming to increase the strength of the algorithm and reduce the time it needs to run each game turn. The problem I'm running into is that, when using Child Processes in NodeJS, you can't pass functions to the child process and my framework is entirely built on using functions passed by the user.
Possible solution
I have looked at this answer but I am not sure this would be the correct implementation for my needs. I don't need to be continually passing functions through messages to the child process, I just need to initialize it with functions that are passed in by my framework's user, when it initializes the framework.
I thought about one way to do it, but it seems so inelegant, on top of probably not being the most secure, that I find myself searching for other solutions. I could, when the user initializes the framework and passes his four functions to it, get a script to write those functions to a new js file (let's call it my-funcs.js) that would look something like:
const func1 = {... function implementation...}
const func2 = {... function implementation...}
const func3 = {... function implementation...}
const func4 = {... function implementation...}
module.exports = {func1, func2, func3, func4}
Then, in the child process worker file, I guess I would have to find a way to lazy load require my-funcs.js. Or maybe I wouldn't, I guess it depends how and when Node.js loads the worker file into memory. This all seems very convoluted.
Can you describe other ways to get the result I want?
child_process is less about running a user's function and more about starting a new thread to exec a file or process.
Node is inherently a single-threaded system, so for I/O-bound things, the Node Event Loop is really good at switching between requests, getting each one a little farther. See https://nodejs.org/en/docs/guides/event-loop-timers-and-nexttick/
What it looks like you're doing is trying to get JavaScript to run multiple threads simultaniously. Short answer: can't ... or rather it's really hard. See is it possible to achieve multithreading in nodejs?
So how would we do it anyway? You're on the right track: child_process.fork(). But it needs a hard-coded function to run. So how do we get user-generated code into place?
I envision a datastore where you can take userFn.ToString() and save it to a queue. Then fork the process, and let it pick up the next unhandled thing in the queue, marking that it did so. Then write to another queue the results, and this "GUI" thread then polls against that queue, returning the calculated results back to the user. At this point, you've got multi-threading ... and race conditions.
Another idea: create a REST service that accepts the userFn.ToString() content and execs it. Then in this module, you call out to the other "thread" (service), await the results, and return them.
Security: Yeah, we just flung this out the window. Whether you're executing the user's function directly, calling child_process#fork to do it, or shimming it through a service, you're trusting untrusted code. Sadly, there's really no way around this.
Assuming that security isn't an issue you could do something like this.
// Client side
<input class="func1"> // For example user inputs '(gamestate)=>{return 1}'
<input class="func2">
<input class="func3">
<input class="func4">
<script>
socket.on('syntax_error',function(err){alert(err)});
submit_funcs_strs(){
// Get function strings from user input and then put into array
socket.emit('functions',[document.getElementById('func1').value,document.getElementById('func2').value,...
}
</script>
// Server side
// Socket listener is async
socket.on('functions',(funcs_strs)=>{
let funcs = []
for (let i = 0; i < funcs_str.length;i++){
try {
funcs.push(eval(funcs_strs));
} catch (e) {
if (e instanceof SyntaxError) {
socket.emit('syntax_error',e.message);
return;
}
}
}
// Run algorithm here
}

Page Object Pattern asynchronous using node.js selenium

I am having a hard time trying to adjust to asynchronous using node.js. I ran into an issue when using selenium-webdriver and the page object pattern. I feel like somethings have to be synchronous when doing automation testing or your tests will fail because you clicked a button before inserting data. I am having an issue similar to this. I want to add an employee and then search for the employee, but the search for employee is performing before add employee.
var employee = new Employee('grimlek', 'Charles', 'Sexton', 'TitleTitle',
'Upper Management', 'Company Admin', 'Contractor', '-7', 'Remote',
'05212016', '3369407787', '3368791234', 'charles#example.com',
'charles.sexton', 'Skype', 'abcdefgh');
driver.get('https://website.com/login')
.then(function() {
//This behaves as intended
loginPage.login('company.admin', 'password') })
.then(function() {
//Add employee
employeePage.addEmployee(employee) })
.then(function() {
//Search for employee after employee is added
employeePage.searchEmployee(employee)});
EmployeePage Object
var EmployeePage = function (driver) {
this.addEmployee = function (employee) {
driver.findElement(webdriver.By.css('button[class=\'btn btn-default\']')).then(function (element) {
//
//Search employee function is done before the line below this
//
element.click();
}).then(function () {
setTimeout(function () {
driver.findElement(webdriver.By.id('employee_username')).then(function (element) {
element.sendKeys(employee.username);
});
driver.findElement(webdriver.By.id('employee_first_name')).then(function (element) {
element.sendKeys(employee.firstName);
});
driver.findElement(webdriver.By.id('employee_last_name')).then(function (element) {
element.sendKeys(employee.lastName);
});
driver.findElement(webdriver.By.id('employee_title_id')).then(function (element) {
element.sendKeys(employee.title);
});
driver.findElement(webdriver.By.id('employee_role')).then(function (element) {
element.sendKeys(employee.role);
});
}, 5000);
});
//
//
//Search employee should occur when the thread leaves the function
//
};
this.searchEmployee = function (employee) {
driver.findElement(webdriver.By.css('input[class=\'form-control ng-pristine ng-valid\']')).then(function(element) {
element.sendKeys(employee.firstName + ' ' + employee.lastName);
});
};
};
module.exports = EmployeePage;
I know that both searchEmployee and addEmployee functions don't return a promise and I am trying to chain them with the .then function. I do believe this is sorta my problem but I need help with how it should be done and not how I can rig it. Should I use callbacks? I have worked on this problem for going on four hours now and I have tried googling and doing research on various topics. If I didn't provide enough code please let me know and I will provide a simplified runnable example.
A laudable goal is to make each test independent. If a change is made to the application (e,g, bug fix) only the impacted test(s) need to be executed. Also, it makes moving to grid thinkable.
But this is difficult to achieve in practice. Your test has to include all tests needed to satisfy the prerequisites.
Cucumber has feature files that include scenarios Each scenario is a test. Scenarios are executed in the order they are listed in the feature file. So one way to organize things is to include all the prerequisite scenarios before your test in a feature file, You can add tag(s) before the Feature statement so that when you execute that tag the entire feature file runs. Perhaps the first scenario resets (a subset of) the database to a know state.
The trick would be to run features in parallel on multiple machines. If you point those multiple clients to the same server beware that the features should not create or update overlapping entities that could collide when written to the database by the server. E.g. "What do you mean that user 'tom' already exists?" Each feature needs to create a unique user name.
The way of approach using cucumber is to divide you steps for every individual operation.
Ex:
Given I am on XYZ Form
And I provide all form details
In above case, for step And I provide all form details you will be including all the fields in step definition and start filling the fields say name, last name, address in single step definition.
Instead of this we should divide the step for every individual field like:
Given I am on XYZ Form
And I provide name details in XYZ Form
And I provide last name details in XYZ Form
And I provide address details in XYZ Form
And then we will be writing 3 step definition which of course will be running sequentially.
You may feel that the typing work got increased and step definitions got increased unnecessarily, but this will actually help you when a field gets removed from the application itself, you will be only needing to delete related step from future file.
More over you can easily test validation for fields by just commenting one of the step in your feature file.
And your code will be more easy to maintain as every steps is working independently.
And of course sequential work will get achieved.

Wait until the data returns, synchronous call with angular and breeze

I'm working on a small blog engine where the user can create blog entry and possible to link tags to an entry. It is many-to-many relation, but due to that Breeze cannot yet manage this relation I have to expose the join table to breeze so that I can persist the data step-by-step. And my problem is here.
Tables:
BlogEntry
BlogEntryTag
Tag
Scenario:
user opens the "new blog entry" form or selects an existing one to be edited
enters the text, etc
selects one or more tags
Business logic:
create a new entity by Breeze / query the selected one
save the blog entry (1st server call which gives back the blog_id if the blog entry is new one)
check the already existing connections between the tags and blog entry, if the blog entry is edited then the already existing blogEntry-tag relations might change ( 2nd server call)
based on the tag name selecting the tag_id from tag table (3rd server call)
create the BlogEntrytag entities by breeze
persist the BlogEntrytag entities into database ( 4th server call)
I think the order must be consecutive.
I have this code and as you can see the attached screenshot the console logging marked by '_blogEntryEnttity' does not wait until the data returns from the server and it will be executed before the console logging marked by '_blogEntryEnttity inside'. The code will throw a reference exception when it tries to set up the title property a few line later.
var blogEntryEntityQueryPromise = datacontext.blogentry.getById(_blogsObject.id);
blogEntryEntityQueryPromise.then(function (result)
{
console.log('result', result);
_blogEntryEntity = result[0];
console.log('_blogEntryEnttity inside', _blogEntryEntity);
//if I need synchronous execution then I have to put the code here which must be executed consecutively
});
console.log('_blogEntryEnttity', _blogEntryEntity);
}
//mapping the values we got
_blogEntryEntity.title = _blogsObject.title;
_blogEntryEntity.leadWithMarkup = _blogsObject.leadWithMarkup;
_blogEntryEntity.leadWithoutMarkup = _blogsObject.leadWithoutMarkup;
_blogEntryEntity.bodyWithMarkup = _blogsObject.bodyWithMarkup;
_blogEntryEntity.bodyWithoutMarkup = _blogsObject.bodyWithoutMarkup;
console.log('_blogEntryEnttity', _blogEntryEntity);
The example comes from here.
My question is that, why it is not wait until the data comes back? What is the way of handling cases like this?
However, I figured out that, if I need synchronous execution then I should place the code into the success method following the data retrieving from the promise. However, I really don't like this solution because my code will be ugly after a while and hard to maintain.
The datacontext.blogentry.getById looks like below and the implementation is in an abstract class, you can find the code below too. The whole repository pattern comes from John Papa's course on Pluralsight.
Repository class method
function getById(id)
{
return this._getById(this.entityName, id);
}
Abstract repository class method. According to Breeze's documentation page the EntityQuery class' execute method returns a Promise.
function _getById(resource, id) {
var self = this;
var manager = self.newManager;
var Predicate = breeze.Predicate;
var p1 = new Predicate('id', '==', id);
return EntityQuery.from(resource)
.where(p1)
.using(manager).execute()
.then(success).catch(_queryFailed);
function success(data) {
return data.results;
}
}
I appreciate your help in advance!
I don't think you need all these round trips. I'd do this:
Query all available Tag entities, so they'll be in the EntityManager's cache (you need these to populate the UI anyway).
If it's an existing BlogEntry, just query the BlogEntry and all its associated BlogEntryTag entities; Breeze will connect the BlogEntryTags to their associated Tags in the cache. You'll add/delete BlogEntryTags if the user selects/unselects Tags for the BlogEntry.
var query = EntityQuery.from("BlogEntries").where("id", "==", id).expand("BlogEntryTags");
If it's a new BlogEntry, it won't have any BlogEntryTags. You'll create these when you save, after the user selects some tags.
Save the added/updated BlogEntry and any added/deleted BlogEntryTag entities to the database in a single saveChanges call.
See the Presenting Many-to-Many doc and its associated plunker for a deeper dive. The UI is different from what you want, but the underlying concepts are useful.
why it is not wait until the data comes back?
Because promises don't magically synchronize execution. They're still asynchronous, they still rely on callbacks.
What is the way of handling cases like this?
You need to put the code that should wait in the then callback.
However, I really don't like this solution because my code will be ugly after a while and hard to maintain.
Not really, you can write concise and elegant asynchronous code with promises. If your code is becoming too much spaghetti, abstract parts of it in own functions. You should be able to get to a clean and flat promise chain.

Conflicting purposes of IndexedDB transactions

As I understand it, there are three somewhat distinct reasons to put multiple IndexedDB operations in a single transaction rather than using a unique transaction for each operation:
Performance. If you’re doing a lot of writes to an object store, it’s much faster if they happen in one transaction.
Ensuring data is written before proceeding. Waiting for the “oncomplete” event is the only way to be sure that a subsequent IndexedDB query won’t return stale data.
Performing an atomic set of DB operations. Basically, “do all of these things, but if one of them fails, roll it all back”.
#1 is fine, most databases have the same characteristic.
#2 is a little more unique, and it causes issues when considered in conjunction with #3. Let’s say I have some simple function that writes something to the database and runs a callback when it's over:
function putWhatever(obj, cb) {
var tx = db.transaction("whatever", "readwrite");
tx.objectStore("whatever").put(obj);
tx.oncomplete = function () { cb(); };
}
That works fine. But now if you want to call that function as a part of a group of operations you want to atomically commit or fail, it's impossible. You'd have to do something like this:
function putWhatever(tx, obj, cb) {
tx.objectStore("whatever").put(obj).onsuccess = function () { cb(); };
}
This second version of the function is very different than the first, because the callback runs before the data is guaranteed to be written to the database. If you try to read back the object you just wrote, you might get a stale value.
Basically, the problem is that you can only take advantage of one of #2 or #3. Sometimes the choice is clear, but sometimes not. This has led me to write horrible code like:
function putWhatever(tx, obj, cb) {
if (tx === undefined) {
tx = db.transaction("whatever", "readwrite");
tx.objectStore("whatever").put(obj);
tx.oncomplete = function () { cb(); };
} else {
tx.objectStore("whatever").put(obj).onsuccess = function () { cb(); };
}
}
However even that still is not a general solution and could fail in some scenarios.
Has anyone else run into this problem? How do you deal with it? Or am I simply misunderstanding things somehow?
The following is just opinion as this doesn't seem like a 'one right answer' question.
First, performance is an irrelevant consideration. Avoid this factor entirely, unless later profiling suggests a material problem. Chances of perf issues are ridiculously low.
Second, I prefer to organize requests into transactions solely to maintain integrity. Integrity is paramount. Integrity as I define it here simply means that the database at any one point in time does not contain conflicting or erratic data. Essentially the database is never able to enter into a 'bad' state. For example, to impose a rule that cross-store object references point to valid and existing objects in other stores (a.k.a. referential integrity), or to prevent duplicated requests such as a double add/put/delete. Obviously, if the app were something like a bank app that credits/debits accounts, or a heart-attack monitor app, things could go horribly wrong.
My own experience has led me to believe that code involving indexedDB is not prone to the traditional facade pattern. I found that what worked best, in terms of organizing requests into different wrapping functions, was to design functions around transactions. I found that quite often there are very few DRY violations because every request is nearly always unique to its transactional context. In other words, while a similar 'put object' request might appear in more than one transaction, it is so distinct in its behavior given its separate context that it merits violating DRY.
If you go the function per request route, I am not sure why you are checking if the transaction parameter is undefined. Have the caller create the function and then pass it to the requests in turn. Expect the tx to always be defined and do not over-zealously guard against it. If it is ever not defined there is either a serious bug in indexedDB or in your calling function.
Explicitly, something like:
function doTransaction1(db, onComplete) {
var tx = db.transaction(...);
tx.onComplete = onComplete;
doRequest1(tx);
doRequest2(tx);
doRequest3(tx);
}
function doRequest1(tx) {
var store = tx.objectStore(...);
// ...
}
// ...
If the requests should not execute in parallel, and must run in a series, then this indicates a larger and more difficult design issue.

PyBossa loading and presenting tasks

I am trying to set up a project on CrowdCrafting.org by using the PyBOSSA framework.
I followed their tutorial for project development.
The first parts seemed very clear to me, creating the project and adding the tasks worked fine.
Then I built my own HTML webpage to present the task to the users. Now the next step would be to load the tasks from the project, present them to the users, and save their answers.
Unfortunately, I don't understand how to do this.
I will try to formulate some questions to make you understand my problem:
How can I try this out? The only way seems to be by updating the code and then running pbs update_project
Where can I find documentation for PyBossa.js? I just saw (in the tutorial and on other pages) that there are some functions like pybossa.taskLoaded(function(task, deferred){}); and pybossa.presentTask(function(task, deferred){});. But I don't know how they work and what else there is. This page looks like it would contain some documentation, but it doesn't (broken links or empty index).
How do I use the library? I want to a) load a task, b) present it to the user, c) show the user his progress, and, d) send the answer. So I think I'll have to call 4 different functions. But I don't know how.
Looking at the example project's code, I don't understand what this stuff about loading disqus is. I think disqus is a forum software, but I am not sure about that and I don't know what this has to do with my project (or theirs).
As far as I understand, the essential parts of the JS-library are:
pybossa.taskLoaded(function(task, deferred) {
if ( !$.isEmptyObject(task) ) {
deferred.resolve(task);
}
else {
deferred.resolve(task);
}
});
pybossa.presentTask(function(task, deferred) {
if ( !$.isEmptyObject(task) ) {
// choose a container within your html to load the data into (depends on your layout and on the way you created the tasks)
$("#someID").html(task.info.someName);
// by clickin "next_button" save answer and load next task
$("#next_button").click( function () {
// save answer into variable here
var answer = $("#someOtherID").val();
if (typeof answer != 'undefined') {
pybossa.saveTask(task.id, answer).done(function() {
deferred.resolve();
});
}
});
}
else {
$("#someID").html("There are no more tasks to complete. Thanks for participating in ... ");
}
});
pybossa.run('<short name>');
I will try to answer your points one by one:
You can either run pbs update project or go to the project page >
tasks > task presenter and edit the code there.
I believe this link works, and there you should find the
information you want.
So, once you've created the project and added the tasks and the
presenter (the HTML you've built) you should include the Javascript
code inside the presenter itself. You actually only need to write
those two functions: pybossa.taskLoaded(function(task,
deferred){}); and pybossa.presentTask(function(task, deferred){});
Within the first one you'll have to write what you want to happen
once the task has been loaded but before you're ready to present it
to the user (e.g. load additional data associated to the tasks,
other than the task itself, like images from external sites). Once
this is done, you must call deferred.resolve(), which is the way
to tell pybossa.js that we are done with the load of the task
(either if it has been successful or some error has happened).
After that, you must write the callback for the second one
(pybossa.presentTask) where you set up everything for your task,
like the event handlers for the button answer submission and here is
where you should put the logic of the user completing the task
itself, and where you should then call pybossa.saveTask(). Again,
you should in the end call deferred.resolve() to tell pybossa.js
that the user is done with this task and present the next one. I
would recommend you to do in inside the callback for
pybossa.saveTask(task).done(callbackFunc()), so you make sure you
go on to the next task once the current one has correctly been
saved.
You can forget about that discuss code. These are only templates
provided, in which there is included some code to allow people
comment about the tasks. For that, Disquss is used, but it is up to
you whether you want to use it or not, so you can safely remove this
code.

Categories

Resources