Server-side solution for handling asynchronous tasks - javascript

I'm building an application in React that lets users upload pictures to the S3 bucket. After the upload is finished, I want to spin up some workers to process the pictures through a neural network that will analyze the image and tag it based on its content.
I don't want to run this action on the server itself, rather delegate it to a separate set of instances that will handle the processing. What would be the best solution to handle such a problem? It'll need to scale nicely depending on the amount of data to process.
It'd be great if this can be easily integrated using Node.js or Python if possible.


How to use a CNN code in python inside a website?

I have website with backend in Python (Django) and JavaScript hosted on heroku. Also, I have code in python that does image classification with EfficientNet, so I want to integrate this code into my website.
The logical sequence of ideas is as follows:
The user upload an image on the site;
This image will be classified with the Python code;
The algorithm will return an image;
The returned image should be posted on the site.
Does anyone know what would be the best way to do this?
First of all, yes, if it is possible to implement what you are mentioning, I would implement the following:
Use celery to implement asynchronous tasks where when the photo is uploaded, Django tells celery that it has to do the asynchronous task (in this case, use the CNN) and can leave a pending status for the photo and once the task is complete, it changes the status and would appear published on the platform.
I recommend using asynchronous tasks for this because of the following:
The use of the convolutional neural network can take a certain time, let us remember that the default maximum response time of an HTTP request is 30 seconds and it could cut the request, the user would see it as an error and he can also complain because uploading a photo must wait a while and for user purposes they would think that the site is slow. The implementation of asynchronous tasks allows first in the HTTP request to indicate to the user that the image is being analyzed and secondly you do not have a limit of 30 seconds to analyze, in case of having many image uploads at the same time it can crash the server. That is why with celery you can even implement queues to solve this (Using redis or rabbitMQ).
If you want to implement knowing the status of the image in real time, you could add the use of a websocket, where when uploading the image in the response you get a URL that is the one of the websocket where you would receive information about the image once processed. You can use django-channels for it

Difference between WebApp that connects to API vs backend rendering

Sometimes when I create basic web tools, I will start with a nodeJS backend, typically creating an API server with ExpressJS. When certain routes are hit, the server responds by rendering the HTML from EJS using the live state of the connection and then sends it over to the browser.
This app will typically expose a directory for the public static resources and will serve those as well. I imagine this creates a lot of overhead for this form of web app, but I'm not sure.
Other times I will start with an API (which could be the exact same nodeJS structure, with no HTML rendering, just state management and API exposure) and I will build an Angular2 or other HTML web page that will connect to the API, load in information on load, and populate the data in the page.
These pages tend to rely on a lot of AJAX calls and jQuery in order to refresh angular components after a bunch of async callbacks get triggered. In this structure, I'll use a web server like Apache to serve all the files and define the routes, and the JS in the web pages will do the rest.
What are the overall strengths and weaknesses of both? And why should I use one strategy versus the other? Are they both viable and dependent upon scale and use? I imagine horizontal scaling with load balancers could work in both situations.
There is no good or bad approach you could choose. Each of the approaches you described above have some advantages and you need to decide which one suits best to your project.
Some points that you might consider:
Server-side processing
Security - You dont have to expose sensitive information (API tokens, logins etc).
More control - You will have more control over what you do with your resources
"Better" client support - Some clients (IE) do not support same things as the others. Rendering HTML on the server rather than manipulating it on client will give you more support for clients.
It can be simpler to pre-render your resources on server rather than dealing with asynchronous approach on client.
SEO, social sharing etc. - How your server sends resources, thats how bots see them. If you pre-render everything on the server bot will be able to scrape your site, tag it etc. If you do it on the client, it will just see non-processed page. That being said, there are ways to work around that.
Client-side processing
Waiting times. Doing stuff on the client-side will improve your load times. But be careful not to do too many things since JS is single-threaded and heavy stuff will block your UI.
CDN - you can serve static resources (HTML, CSS, JS etc) from CDN which will be much faster than serving them from your server app directly
Testing - It is easy to mock backend server when testing your UI.
Client is a front-end for particular application/device etc. The more logic you put into client, the more code you will have to replicate across different clients. Therefore if you plan to have mobile app, it will be better to have collection of APIs to call rather than including your logic in the client.
Security - Whatever runs on the client can be fully read by the client. No matter how much you minify, compress, encrypt everything a resourceful person will always be able to do whatever he wants with your code
I did not mark pro/con on each point on purpose because it is up to you to decide which it is.
This list could go on and on, I didn't want to think about more points because it is very subjective, and in the end it depends on the developer and the application.
I personally tend to choose "client making ajax requests" approach or blend of both - pre-render something on the server and client takes care of rest. Be careful with the latter though as it will break your automated tests, IDE integration etc. if not implemented correctly.
Last note - You should always do crucial validations on the server. Never rely on data from client.

Uploading large image files and video to Google Cloud Storage

I am using the standard python app engine environment and currently looking at how one goes about uploading multiple large media files to Google Cloud Storage (Public Readable) using App Engine or the Client directly (preferred).
I currently send a bunch of smaller images (max 20 - between 30 and 100k on average), at the same time directly via a POST to the server. These images are provided by the client and put in my projects default bucket. I handle the requests images using a separate thread and write them one at a time to the cloud and then associate them with an ndb object. This is all fine and dandy when the images are small and do not cause the request to run out of memory or invoke a DeadlineExceededError.
But what is the best approach for large image files of 20mb+ a piece or video files of up to 1GB in size? Are there efficient ways to do this from the client directly, would this be possible via the Json api ,a resumable upload, for example? If so, are there any clear examples of how to do this purely in javascript on the client? I have looked at the docs but it's not intuitively obvious at least to me.
I have been looking at the possibilities for a day or two but nothing hits you with a clear linear description or approach. I notice in the Google Docs there is a way using PHP to upload via a POST direct from the client... this just relevant to using PHP on app engine or is there an equivalent to createUploadUrl for python or javascript?
Anyway, I'll keep exploring but any pointers would be greatly appreciated.
Try BlobStore with Cloud Storage or the Image Service

Moving node.js server javascript processing to the client

I'd like some opinions on the practical implications of moving processing that would traditionally be done on the server to be handled instead by the client in a node.js web app.
Example case study:
The user uploads a CSV file containing a years worth of their bank statement entries. We want to parse the file, categorise each entry and calculate cumulative values for each category so that we can store the newly categorised statement in a db and display spending analysis to the user.
The entries are categorised by matching strings in the descriptions. There are many categories and many entries and it takes a fair amount of time to process.
In our node.js server, we can happily free up the event loop whilst waiting for network responses and so on, but if there is any data crunching or similar processing, the server will be blocked from responding to requests, and this seems unavoidable.
Traditionally, the CSV file would be passed to the server, the server would process, save in db, and send back the output of the processing.
It seems to make sense in our single threaded node.js server that this processing is handled by the browser, and the output displayed and sent to server to be stored. Of course the client will have to wait while this is done, but their processing will not be preventing the server from responding to requests from other clients.
I'm interested to see if anyone has had experience build apps using this model.
So, the question is.. are there any issues in getting browsers rather than the server to handle, wherever possible, any processing that will block the event loop? Is this a good/sensible/viable approach to node.js application development?
I don't think trusting client processed data is a good idea.
Instead you should look into creating a work queue that a separate process listens on, separating the CPU intensive tasks from your node.js process handling HTTP requests.
My proposed data flow would be:
HTTP upload request
App server (save raw file somewhere the worker process can access)
Notification to 'csv' work queue
Worker processes uploaded csv file.
Although perfectly possible, simply shifting the processing to the client machine does not solve the basic problem.
Now the client's event loop is blocked, preventing the user from interacting with the browser. Browsers tend to detect this problem and stop execution of the page's script altogether. Something your users will certainly hate.
There is no way around either delegating or splitting up the work-load.
Using a second process (for example a 2nd node instance) for doing the number crunching server-side has the added benefit of allowing the operating system to use a 2nd CPU core. Ideally you run as many Node instances as you have CPU cores in the server and balance your work-load between them. Have a look at the diode module for some inspiration on how to implement multi-process communication in node.

How to create temporary files on the client machine, from Web Application?

I am creating a Web Application using JSP, Struts, EJB and Servlets. The Application is a combined CRM and Accounting Package so the Database size is very huge. So, in order to make Execution faster, I want prevent round trips to the Database.
For that purpose, what I want to do is create some temporary XML files on the client Machine and use them whenever required. How can I do this, as Javascript do not permits me to do so. Is there any way of doing this? Or, is there any other solution which I can adopt in order to make my application Faster?
You do not have unfettered access to the client file system to create a temporary file on the client. The browser sandbox prevents this for very good reasons.
What you can do, perhaps, is make some creative use of caching in the browser. jQuery's data method is an example of this. TIBCO General Interface makes extensive use of a browser cache for XML data. Their code is open source and you could take a look to see how they've implemented their browser cache.
If the database is large and you are attempting to store large files, the browser is likely not going to be a great place for that data. If, however, the information you want to store is fairly small, using an in-browser cache may accomplish what you'd like.
You should be caching on the web server.
As you've no doubt realised by now, there is a very limited set of things you can do on the client machine from a web app (eg, write cookie).
You can make your application use the browser plugin Google Gears, that allows you a real clientside storage.
Apart from that, remember, there is a huge overhead for every single request, if needed you can easily stack a few 100 kB in one response, but far away users might only be able to execute a few requests per second. Try to keep the number of requests down, even if it means adding overhead in form of more data.
#justkt Actually, there is no good reason to not allow a web application to store data. Indeed HTML5 specifications include a database similar to the one offered by Google Gears, browser support is just a bit too sporadic for relying on that feature.
If you absolutely want to cache it on the client, you can create the file on your server and make your web app retrieve it. This way the browser will fetch it and keep it on the client cache.
But keep in mind that this could be a pain for the client if the file is large enough.

