Here is what I'm doing (using this: https://firebase.google.com/docs/functions/task-functions):
Launch an array of callable functions from the site (up to 500 cloud functions are launched after the click of a button from each user) - The same cloud function is called 500 times
Gets them in a queue to be processed at the desired rate
Each of the functions has the same task:
- Get a specific file from an API call (which takes some time)
- Download it to firebase storage (also not instant)
- Finally, update the Firestore database accordingly
Here is my issue:
Doing this is working fine with one user, however, having 2 or more users at the same time will not scale as wanted.
Indeed, the second user has to wait for the 500 cloud functions from the first user to be completed before it can start running its own 500 functions. (Which can take 30min) (Since they get added to the queue)
Edit: As the first comment said, the first user is not actually waiting on all 500 to run some for the second user to proceed, however, the point is that both users are "conflicting" (increase the time of the first user process), and will conflict even more if another user come to start his process as well
So my questions are:
Is there a way to have a queue specific to each user somehow?
If not, how should I approach this using cloud functions? Is this possible?
If not possible with cloud functions, what would you advise?
Any help will be appreciated
Edit: Possible solutions I'm thinking of so far:
1- Minimize the time each of the functions takes and increase the number of function that can run in parallel without exceeding API call possibilities
2- Handle all the work into one big function per user call (without going to the 9min limit if possible) (that means having a 500 array loop inside the function instead of launching 500 cloud functions)
3- Others?
Related
i have a firestore and project that needs to be updated automatically without user interaction but i do not know how to go about it, any help would be appreciated. take a look at the json to understand better
const party = {
id: 'bgvrfhbhgnhs',
isPrivate: 'true',
isStarted: false,
created_At: '2021-12-26T05:20:29.000Z',
start_date: '2021-12-26T02:00:56.000Z'
}
I want to update the isStarted field to true once the current time is equal to start_date
I think you will need Firebase Cloud Function, although I don't understand exactly what you mean.
With Cloud Functions, you can automatically run (add, delete, update, everything) codes on Google's Servers without the need for application and user interaction.
For example, in accordance with your example, it can automatically set "isStarted" to true when it hits the "start_date" time. If you want to code a system that does not require user interaction and should work automatically, you should definitely use Cloud Functions. Otherwise, you cannot do this on the application side.
For more info visit Cloud Functions
Ok, I managed to find a workaround to updating my documents automatically without user interaction since the google billing service won’t accept my card to enable cloud functions for my project, I tried what I could to make my code work and I don’t know if other peeps would follow my idea or if my idea would solve similar issues.
What I did was that in my nextjs file I created an API endpoint to fetch and update documents after installing the firebase admin SDK, so I fetched all documents and converted all start_date fields in each document to time and checked for documents whose start date is less than or equal to current date, so after getting the document, I ran a firestore function to Update the document.
Tho this will only run when you make a request to my domain.com/api/update-parties and never run again
In other to make it run at scheduled intervals, I signed up for a free tier account at https://www.easycron.com and added my API endpoint to EASYCRON to make requests to my endpoint at a minute interval, so when the request hits my endpoint, it runs my code like other serverless functions😜. Easy peezy.
this question might be duplicated but I am still not getting the answer. I am fairly new to node.js so I might need some help. Many have said that node.js is perfectly free to run incoming requests asynchronously, but the code below shows that if multiple requests hit the same endpoint, say /test3, the callback function will:
Print "test3"
Call setTimeout() to prevent blocking of event loop
Wait for 5 seconds and send a response of "test3" to the client
My question here is if client 1 and client 2 call /test3 endpoint at the same time, and the assumption here is that client 1 hits the endpoint first, client 2 has to wait for client 1 to finish first before entering the event loop.
Can anybody here tells me if it is possible for multiple clients to call a single endpoint and run concurrently, not sequentially, but something like 1 thread per connection kind of analogy.
Of course, if I were to call other endpoint /test1 or /test2 while the code is still executing on /test3, I would still get a response straight from /test2, which is "test2" immediately.
app.get("/test1", (req, res) => {
console.log("test1");
setTimeout(() => res.send("test1"), 5000);
});
app.get("/test2", async (req, res, next) => {
console.log("test2");
res.send("test2");
});
app.get("/test3", (req, res) => {
console.log("test3");
setTimeout(() => res.send("test3"), 5000);
});
For those who have visited, it has got nothing to do with blocking of event loop.
I have found something interesting. The answer to the question can be found here.
When I was using chrome, the requests keep getting blocked after the first request. However, with safari, I was able to hit the endpoint concurrently. For more details look at the following link below.
GET requests from Chrome browser are blocking the API to receive further requests in NODEJS
Run your application in cluster. Lookup Pm2
This question needs more details to be answer and is clearly an opinion-based question. just because it is an strawman argument I will answer it.
first of all we need to define run concurrently, it is ambiguous if we assume the literal meaning in stric theory nothing RUNS CONCURRENTLY
CPUs can only carry out one instruction at a time.
The speed at which the CPU can carry out instructions is called the clock speed. This is controlled by a clock. With every tick of the clock, the CPU fetches and executes one instruction. The clock speed is measured in cycles per second, and 1c/s is known as 1 hertz. This means that a CPU with a clock speed of 2 gigahertz (GHz) can carry out two thousand million (or two billion for those in the US) for the rest of us/world 2000 million cycles per second.
cpu running multiple task "concurrently"
yes you're right now-days computers even cell phones comes with multi core which means the number of tasks running at the same time will depend upon the number of cores, but If you ask any expert such as this Associate Staff Engineer AKA me will tell you that is very very rarely you'll find a server with more than one core. why would you spend 500 USD for a multi core server if you can spawn a hold bunch of ...nano or whatever option available in the free trial... with kubernetes.
Another thing. why would you handle/configurate node to be incharge of the routing let apache and/or nginx to worry about that.
as you mentioned there is one thing call event loop which is a fancy way of naming a Queue Data Structure FIFO
so in other words. no, NO nodejs as well as any other programming language out there will run
but definitly it depends on your infrastructure.
I currently have a Node.JS server set up that is able to read and write data from a FireBase database when a request is made from a user.
I would like to implement time based events that result in an action being performed at a certain date or time. The key thing here though, is that I want to have the freedom to do this in seconds (for example, write a message to console after 30 seconds have passed, or on Friday the 13th at 11:30am).
A way to do this would be to store the date/time an action needs be performed in the database, and read from the database every second and compare the current date/time with events stored so we know if an action needs to be performed at this moment. As you can imagine though, this would be a lot of unnecessary calls to the database and really feels like a poor way to implement this system.
Is there a way I can stay synced with the database without having to call every second? Perhaps I could then store a version of the events table locally and update this when a change is made to the database? Would that be a better idea? Is there another solution I am missing?
Any advice would be greatly appreciated, thanks!
EDIT:
How I currently initialise the database:
firebase.initializeApp(firebaseConfig);
var database = firebase.database();
How I then get data from the database:
await database.ref('/').once('value', function(snapshot){
snapshot.forEach(function(childSnapshot){
if(childSnapshot.key === userName){
userPreferences = childSnapshot.val().UserPreferences;
}
})
});
The Firebase once() API reads the data from the database once, and then stops observing it.
If you instead us the on() API, it will continue observing the database after getting the initial value - and call your code whenever the database changes.
It sounds like you're looking to develop an application for scheduling. If that's the case you should check out node-schedule.
Node Schedule is a flexible cron-like and not-cron-like job scheduler
for Node.js. It allows you to schedule jobs (arbitrary functions) for
execution at specific dates, with optional recurrence rules. It only
uses a single timer at any given time (rather than reevaluating
upcoming jobs every second/minute).
You then can use the database to keep a "state" of the application so on start-up of the application you read all the upcoming jobs that will be expected and load them into node-schedule and let node-schedule do the rest.
The Google Cloud solution for scheduling a single item of future work is Cloud Tasks. Firebase is part of Google Cloud, so this is the most natural product to use. You can use this to avoid polling the database by simply specifying exactly when some Cloud Function should run to do the work you want.
I've written a blog post that demonstrates how to set up a Cloud Task to call a Cloud Functions to delete a document in Firestore with an exact TTL.
I'm streaming and processing tweets in Firebase Cloud Functions using the Twitter API.
In my stream, I am tracking various keywords and users of Twitter, hence the influx of tweets is very high and a new tweet is delivered even before I have processed the previous tweet, which leads to lapses as the new tweet sometimes does not get processed.
This is how my stream looks:
...
const stream = twitter.stream('statuses/filter', {track: [various, keywords, ..., ...], follow: [userId1, userId2, userId3, userId3, ..., ...]});
stream.on('tweet', (tweet) => {
processTweet(tweet); //This takes time because there are multiple network requests involved and also sometimes recursively running functions depending on the tweets properties.
})
...
processTweet(tweet) essentially is compiling threads from twitter, which takes time depending upon the length of the thread. Sometimes a few seconds also. I have optimised processTweet(tweet) as much as possible to compile the threads reliably.
I want to run processTweet(tweet) parallelly and queue the tweets that are coming in at the time of processing so that it runs reliably as the twitter docs specify.
Ensure that your client is reading the stream fast enough. Typically you should not do any real processing work as you read the stream. Read the stream and hand the activity to another thread/process/data store to do your processing asynchronously.
Help would be very much appreciated.
This twitter streaming API will not work with Cloud Functions.
Cloud Functions code can only be invoked in response to incoming events, and the code may only run for up to 9 minutes max (default 60 seconds). After that, the function code is forced to shut down. With Cloud Functions, there is no way to continually process some stream of data coming from an API.
In order to use this API, you will need to use some other compute product that allows you to run code indefinitely on a dedicated server instance, such as App Engine or Compute Engine.
Background
I have a Firebase Cloud Function which sometimes can take a while to complete (20-60 seconds).
It gets triggered by a write to Firebase, which means that it starts processing at the onCreate event.
Problem
I have been doing some scripted testing by creating a new record in Firebase every N seconds, but it seems that if N is less than 20 seconds, the next onCreate trigger just doesn't fire.
In other words I end up in a situation like this:
Firebase:
record1
record2
record3
record4
Results written by the triggered function to another node in Firebase:
result-from-record1
...
record2, record3, record4 does not seem to trigger the function again.
Homework
I have re-checked Firebase documentation, but I cannot seem to find any information that explains this case.
There is some information about quotas for connected users, but it's only about connected users, not about the same triggers firing many times before the previously triggered function completes.
Questions
What is the default behavior of Firebase triggered functions in case it gets triggered while the previously triggered function is still running?
Is there any way to maybe cancel the running function if it gets triggered by a new onWrite?
Is there any queue of those triggered and running functions? (this queue doesn't seem to be the one)
What is the default behavior of Firebase triggered functions in case it gets triggered while the previously triggered function is still running?
There is no guarantee about how functions are invoked - they could happen in sequence on a single server instance, or they could run in run in parallel on multiple server instances. The order of invocation of functions is also not guaranteed.
Is there any way to maybe cancel the running function if it gets triggered by a new onWrite?
No.
Is there any queue of those triggered and running functions? (this queue doesn't seem to be the one)
There is no visible queue. Internally, Cloud Functions is using pubsub to manage the stream of events emitted by the database, but this is an implementation detail, and you have no direct control over how it works.
As for why your function doesn't seem to execute when you expect - there's not enough detail in your question to make a guess. Without seeing actual code, as well as the specific steps to take to reproduce the issue, it's not possible to say.
You might want to watch my video series on how Cloud Functions works in order to better understand its behavior.