I have written an application in node.js which takes input from user and generates pdfs file based on few templates.
I am using pdfkit npm for this purpose. My application is running in production. But my application is very slow, below are the reasons :
What problem I am facing :
It is working in sync manner. I can explain it by giving an example- Suppose a request come to the application to generate a pdf, is starts processing and after processing it returns back the response with generated pdf url. But if multiple request comes to the server it process each request one by one(in sync manner).
All request in queue have to wait untill the previous one is finished.
Maximum time my application gives Timeout or Internal Server Error.
I can not change the library, why ?
There are 40 templates I have written in js for pdfkit. And each template is of 1000 - 3000 lines.
If I will change the lib, i have to rewrite those templates according to new library.
It will take many months to rewrite and test it properly.
What solution I am using now :
I am managing a queue now, once a request come it got queued and a satisfactory message send back in response to the user.
Why this solution is not feasible ?
User should be provided valid pdf url upon success of request. But in queue approach, user is getting only a confirmation message. And pdf is being processed later in queue.
What kind of solution I am seeking now ?
Any way through which I can make this application multi-threaded/asynchronous, So that it will be capable of handling multiple request on a time without blocking the resource?
Please save my life.
I hate to break it to you, but doing computation in the order tasks come in is a pretty fundamental part of node. It sounds like loading these templates is a CPU-bound task, and since Node is single-threaded, it knocks these off the queue in the order they come in.
On the other hand, any framework would have a similar problem. Node being single-threading means its actually very efficient, because it doesn't lose cycles to context switching.
How many PDF-generations can your program handle at once? What type of hardware are you running this on? If it's failing on a few requests a second, then there's probably a programming fix.
For node, the more things you can make asynchronous the better. For example, any time you're reading a file in, it should be asynchronous.
Can you post the code for one of your PDF-creating request functions?
Related
I have website with backend in Python (Django) and JavaScript hosted on heroku. Also, I have code in python that does image classification with EfficientNet, so I want to integrate this code into my website.
The logical sequence of ideas is as follows:
The user upload an image on the site;
This image will be classified with the Python code;
The algorithm will return an image;
The returned image should be posted on the site.
Does anyone know what would be the best way to do this?
First of all, yes, if it is possible to implement what you are mentioning, I would implement the following:
Use celery to implement asynchronous tasks where when the photo is uploaded, Django tells celery that it has to do the asynchronous task (in this case, use the CNN) and can leave a pending status for the photo and once the task is complete, it changes the status and would appear published on the platform.
I recommend using asynchronous tasks for this because of the following:
The use of the convolutional neural network can take a certain time, let us remember that the default maximum response time of an HTTP request is 30 seconds and it could cut the request, the user would see it as an error and he can also complain because uploading a photo must wait a while and for user purposes they would think that the site is slow. The implementation of asynchronous tasks allows first in the HTTP request to indicate to the user that the image is being analyzed and secondly you do not have a limit of 30 seconds to analyze, in case of having many image uploads at the same time it can crash the server. That is why with celery you can even implement queues to solve this (Using redis or rabbitMQ).
If you want to implement knowing the status of the image in real time, you could add the use of a websocket, where when uploading the image in the response you get a URL that is the one of the websocket where you would receive information about the image once processed. You can use django-channels for it
The Goal
I have two processes running:
Node.js server which handles external communication (to other computers)
C code which runs in an infinite loop doing real-time processing
The node.js server will receive data from an external computer, send it to the C code (which is already running), wait for a reply, and send that reply back to an external computer.
HOWEVER, the C code cannot wait for input from the node.js server. It needs to be continuously running using the most recent data from the js code (updating those recent values each time new data is received). In other words, the C code needs non-blocking input from the node.js server. The node.js server should get an event (bound to a callback) each time the C code periodically sends it information back.
What I have done
The node.js code already handles external IO correctly (it's fairly complete)
The C code in isolation does the task I want, just without getting data from the node.js code
I am now trying to combine them so that the C code gets its sensor data from the node.js code (as I described above).
How I think I can do it
My current best guess of how to do this is using sockets (I think they're non-blocking?). I've never done this before, though, so before I spend a bunch of time learning about how sockets work, and writing a bunch of code, I want to make sure that this is a method which might actually work. So, I want to see if anyone has any suggestions on the easiest way to achieve what I described. If it does turn out to be sockets than at least I know I'm not wasting my time. Thanks for any suggestions!
Can I have a value automatically pushed onto a website (using AJAX = JQuery?), when something is done on the server side and python is ready to send it, rather than just in a response to a request by the website.
How do I make Jquery ready to accept this adhoc data?
Tutorial I'm learning from:
http://flask.pocoo.org/docs/patterns/jquery/#the-html
Walk's answer is correct, but these days there is another option. Web Sockets can push data from a server to a browser. For some languages (notably Node.js) there are sophisticated libraries for handling web sockets and these often deal with fallbacks for older browsers.
Python has a number of libraries depending on what you need. These two are some of the most popular:
https://github.com/abourget/gevent-socketio
https://github.com/stephenmcd/django-socketio
You can have jquery make the AJAX request and wait for a period of time while the server processes the response and returns it, you may risk the transaction timing out. This is optimal if the responses are quick to return.
Another method would be to have a timed script that polls the server for updates. The server will place the updates for a specific session in, for instance a database or in memory, and then return the results. This is optimal if it's a long running process.
I'd like some opinions on the practical implications of moving processing that would traditionally be done on the server to be handled instead by the client in a node.js web app.
Example case study:
The user uploads a CSV file containing a years worth of their bank statement entries. We want to parse the file, categorise each entry and calculate cumulative values for each category so that we can store the newly categorised statement in a db and display spending analysis to the user.
The entries are categorised by matching strings in the descriptions. There are many categories and many entries and it takes a fair amount of time to process.
In our node.js server, we can happily free up the event loop whilst waiting for network responses and so on, but if there is any data crunching or similar processing, the server will be blocked from responding to requests, and this seems unavoidable.
Traditionally, the CSV file would be passed to the server, the server would process, save in db, and send back the output of the processing.
It seems to make sense in our single threaded node.js server that this processing is handled by the browser, and the output displayed and sent to server to be stored. Of course the client will have to wait while this is done, but their processing will not be preventing the server from responding to requests from other clients.
I'm interested to see if anyone has had experience build apps using this model.
So, the question is.. are there any issues in getting browsers rather than the server to handle, wherever possible, any processing that will block the event loop? Is this a good/sensible/viable approach to node.js application development?
I don't think trusting client processed data is a good idea.
Instead you should look into creating a work queue that a separate process listens on, separating the CPU intensive tasks from your node.js process handling HTTP requests.
My proposed data flow would be:
HTTP upload request
App server (save raw file somewhere the worker process can access)
Notification to 'csv' work queue
Worker processes uploaded csv file.
Although perfectly possible, simply shifting the processing to the client machine does not solve the basic problem.
Now the client's event loop is blocked, preventing the user from interacting with the browser. Browsers tend to detect this problem and stop execution of the page's script altogether. Something your users will certainly hate.
There is no way around either delegating or splitting up the work-load.
Using a second process (for example a 2nd node instance) for doing the number crunching server-side has the added benefit of allowing the operating system to use a 2nd CPU core. Ideally you run as many Node instances as you have CPU cores in the server and balance your work-load between them. Have a look at the diode module for some inspiration on how to implement multi-process communication in node.
I have a need to send alerts to a web-based monitoring system written in RoR. The brute force solution is to frequently poll a lightweight controller with javascript. Naturally, the downside is that in order to get a tight response time on the alerts, I'd have to poll very frequently (every 5 seconds).
One idea I had was to have the AJAX-originated polling thread sleep on the server side until an alert arrived on the server. The server would then wake up the sleeping thread and get a response back to the web client that would be shown immediately. This would have allowed me to cut the polling interval down to once every 30 seconds or every minute while improving the time it took to alert the user.
One thing I didn't count on was that mongrel/rails doesn't launch a thread per web request as I had expected it to. That means that other incoming web requests block until the first thread's sleep times out.
I've tried tinkering around with calling "config.threadsafe!" in my configuration, but that doesn't seem to change the behavior to a thread per request model. Plus, it appears that running with config.threadsafe! is a risky proposition that could require a great deal more testing and rework on my existing application.
Any thoughts on the approach I took or better ways to go about getting the response times I'm looking for without the need to deluge the server with requests?
You could use Rails Metal to improve the controller performance or maybe even separate it out entirely into a Sinatra application (Sinatra can handle some serious request throughput).
Another idea is to look into a push solution using Juggernaut or similar.
One approach you could consider is to have (some or all of) your requests create deferred monitoring jobs in an external queue which would in turn periodically notify the monitoring application.
What you need is Juggernaut which is a Rails plugin that allows your app to initiate a connection and push data to the client. In other words your app can have a real time connection to the server with the advantage of instant updates.