Node.js Twitter API cursors

Node.js Twitter API cursors - javascript

I'm using npm-twit to get followers of a specific account.
The Twitter API returns up to 5000 results from a single GET request.
If the user I'm querying has over 5000 followers a "next_cursor" value is returned with the data.
To get the next 5000 results, I need to re-run the GET function, passing it the "next_cursor" value as an argument. I just can't seem to work out how to do it.
I was thinking a while loop, but I can't reset the global variable, I think because of scope:
var cursor = -1
while ( cursor != 0 ) {
T.get('followers/ids', { screen_name: 'twitter' }, function (err, data, response) {
// Do stuff here to write data to a file
cursor = data["next_cursor"];
})
}
Obviously I'm not a JS genius, so any help would be much appreciated.

The issue you are having is due to Node.js being asynchronous.
T.get('followers/ids', { screen_name: 'twitter' }, function getData(err, data, response) {
// Do stuff here to write data to a file
if(data['next_cursor'] > 0) T.get('followers/ids', { screen_name: 'twitter', next_cursor: data['next_cursor'] }, getData);
})
}
Please note:
I gave a name to the internal callback function. That is so that we can recursively call it from the inside.
The loop is replaced with a recursive callback.
If there is a next_cursor data, then we call T.get using the same function getData.
Be aware that Do stuff here code will be executed many times (as many as there are next cursors). Since it is recursive callback - the order is guaranteed.
If you do not like the idea of recursive callbacks, you can avoid it by:
Finding out beforehand all the next_cursor's if possible, and generate requests using for loop.
Alternatively, use asynchronous-helper modules like Async (though for learning purposes, I would avoid modules unless you are fluent in the concept already).

Consider testing with some 5K+ account.
const T = new Twit(tokens)
function getFollowers (screenName, followers = [], cur = -1) {
return new Promise((resolve, reject) => {
T.get('followers/ids', { screen_name: screenName, cursor: cur, count: 5000 }, (err, data, response) => {
if (err) {
cur = -1
reject(err)
} else {
cur = data.next_cursor
followers.push(data.ids)
if (cur > 0) {
return resolve(getFollowers(screenName, followers, cur))
} else {
return resolve([].concat(...followers))
}
}
})
})
}
async function getXaqron () {
let result = await getFollowers('xaqron')
return result
}
console.log(getXaqron().catch((err) => {
console.log(err) // Rate limit exceeded
}))

Struggled with this one.. Everything seemed to work, but data['next_cursor'] didn't change, EVER!
Code should be like this:
T.get('followers/ids', { screen_name: 'twitter' }, function getData(err, data, response) {
// Do stuff here to write data to a file
if(data['next_cursor'] > 0) T.get('followers/ids', { screen_name: 'twitter', cursor: data['next_cursor'] }, getData);
})
}
Parameter for Twit isn't "next_cursor", it's just "cursor" ;)

Related

Possible race condition with cursor when using Promise.all

In the project that I am working on, built using nodejs & mongo, there is a function that takes in a query and returns set of data based on limit & offset provided to it. Along with this data the function returns a total count stating all the matched objects present in the database. Below is the function:
// options carry the limit & offset values
// mongoQuery carries a mongo matching query
function findMany(query, options, collectionId) {
const cursor = getCursorForCollection(collectionId).find(query, options);
return Promise.all([findManyQuery(cursor), countMany(cursor)]);
}
Now the problem with this is sometime when I give a large limit size I get an error saying:
Uncaught exception: TypeError: Cannot read property '_killCursor' of undefined
At first I thought I might have to increase the pool size in order to fix this issue but after digging around a little bit more I was able to find out that the above code is resulting in a race condition. When I changed the code to:
function findMany(query, options, collectionId) {
const cursor = getCursorForCollection(collectionId).find(query, options);
return findManyQuery(cursor).then((dataSet) => {
return countMany(cursor).then((count)=> {
return Promise.resolve([dataSet, count]);
});
);
}
Everything started working perfectly fine. Now, from what I understand with regard to Promise.all was that it takes an array of promises and resolves them one after the other. If the promises are executed one after the other how can the Promise.all code result in race condition and the chaining of the promises don't result in that.
I am not able to wrap my head around it. Why is this happening?

Since I have very little information to work with, I made an assumption of what you want to achieve and came up with the following using Promise.all() just to demonstrate how you should use Promise.all (which will resolve the array of promises passed to it in no particular order. For this reason, there must be no dependency in any Promise on the order of execution of the Promises. Read more about it here).
// A simple function to sumulate findManyQuery for demo purposes
function findManyQuery(cursors) {
return new Promise((resolve, reject) => {
// Do your checks and run your code (for example)
if (cursors) {
resolve({ dataset: cursors });
} else {
reject({ error: 'No cursor in findManyQuery function' });
}
});
}
// A simple function to sumulate countMany for demo purposes
function countMany(cursors) {
return new Promise((resolve, reject) => {
// Do your checks and run your code (for example)
if (cursors) {
resolve({ count: cursors.length });
} else {
reject({ error: 'No cursor in countMany' });
}
});
}
// A simple function to sumulate getCursorForCollection for demo purposes
function getCursorForCollection(collectionId) {
/*
Simulating the returned cursor using an array of objects
and the Array filter function
*/
return [{
id: 1,
language: 'Javascript',
collectionId: 99
}, {
id: 2,
language: 'Dart',
collectionId: 100
},
{
id: 3,
language: 'Go',
collectionId: 100
}, {
id: 4,
language: 'Swift',
collectionId: 99
}, {
id: 5,
language: 'Kotlin',
collectionId: 101
},
{
id: 6,
language: 'Python',
collectionId: 100
}].filter((row) => row.collectionId === collectionId)
}
function findMany(query = { id: 1 }, options = [], collectionId = 0) {
/*
First I create a function to simulate the assumed use of
query and options parameters just for demo purposes
*/
const filterFunction = function (collectionDocument) {
return collectionDocument.collectionId === query.id && options.indexOf(collectionDocument.language) !== -1;
};
/*
Since I am working with arrays, I replaced find function
with filter function just for demo purposes
*/
const cursors = getCursorForCollection(collectionId).filter(filterFunction);
/*
Using Promise.all([]). NOTE: You should pass the result of the
findManyQuery() to countMany() if you want to get the total
count of the resulting dataset
*/
return Promise.all([findManyQuery(cursors), countMany(cursors)]);
}
// Consuming the findMany function with test parameters
const query = { id: 100 };
const collectionId = 100;
const options = ['Javascript', 'Python', 'Go'];
findMany(query, options, collectionId).then(result => {
console.log(result); // Result would be [ { dataset: [ [Object], [Object] ] }, { count: 2 } ]
}).catch((error) => {
console.log(error);
});

There are ways to write this function in a "pure" way for scalability and testing.
So here's your concern:
In the project that I am working on, built using nodejs & mongo, there is a function that takes in a query and returns set of data based on limit & offset provided to it. Along with this data the function returns a total count stating all the matched objects present in the database.
Note: You'll need to take care of edge case.
const Model = require('path/to/model');
function findManyUsingPromise(model, query = {}, offset = 0, limit = 10) {
return new Promise((resolve, reject) => {
model.find(query, (error, data) => {
if(error) {
reject(error);
}
resolve({
data,
total: data.length || 0
});
}).skip(offset).limit(limit);
});
}
// Call function
findManyUsingPromise(Model, {}, 0, 40).then((result) => {
// Do something with result {data: [object array], total: value }
}).catch((err) => {
// Do something with the error
});

Multiple Mongoose Calls Within a For Each Loop

I am reading a JSON object and looping through each item. I am first checking to see if the item already exists in the database and if so I want to log a message. If it doesn't already exist I want to add it.
This is working correctly however, I would like to add a callback or finish the process with process.exit();
Because the mongoose calls are asynchronous I can't put it at the end of the for loop because they haven't finished.
Whats the best way I should handle this?
function storeBeer(data) {
data.forEach((beer) => {
let beerModel = new Beer({
beer_id: beer.id,
name: beer.name,
image_url: beer.image_url
});
Beer.findOne({
'name': beer.name
}).then(function (result) {
if (result) {
console.log(`Duplicate ${result.name}`)
} else {
beerModel.save(function (err, result) {
console.log(`Saved: ${result.name}`)
});
}
});
});
}
Is there anything I should read up on to help solve this?

One means of managing asynchronous resources is through Promise objects, which are first-class objects in Javascript. This means that they can be organized in Arrays and operated on like any other object. So, assuming you want to store these beers in parallel like the question implies, you can do something like the following:
function storeBeer(data) {
// This creates an array of Promise objects, which can be
// executed in parallel.
const promises = data.map((beer) => {
let beerModel = new Beer({
beer_id: beer.id,
name: beer.name,
image_url: beer.image_url
});
return Beer.findOne({
'name': beer.name
}).then(function (result) {
if (result) {
console.log(`Duplicate ${result.name}`)
} else {
beerModel.save(function (err, result) {
console.log(`Saved: ${result.name}`)
});
}
});
);
});
return Promise.all(promises);
}
Don't forget that the storeBeer function is now asynchronous, and will need to be utilized either through a Promise chain or through async/await.
For example, to add process exit afterwards:
async function main() {
const beers = [ ... ];
await storeBeer(beer);
process.exit(0);
}
You can also modify the above example to invoke the storeBeer function within a try / catch block to exit with a different error code based on any thrown errors.

How do I get the args out of this twitter api call

I made a twitter bot and it's working. But it's a bunch of nested logic that I would like to refactor into functions.
I have this twitter API call and I want to return the reply parameter,
T.get('trends/place', { id: '23424977' }, function(err, reply) {
// THE WHOLE APP IS BASICALLY IN HERE
{
It won't let me name the function like
T.get('trends/place', { id: '23424977' }, function getTrends(err, reply) {
// THE WHOLE APP IS BASICALLY IN HERE
{
I messed around with some other ideas but no luck.
The whole bot is here https://glitch.com/edit/#!/trending-mishap?path=server.js

As best as I can understand the question, the issue is that you want to separate out the code inside the callback into separate functions. That's fine, nothing prevents your doing that.
Here's a rough example:
T.get('trends/place', { id: '23424977' }, getTrends);
function getTrends(err, reply) {
if (err) {
handleError(err);
return;
}
doSomethingWith(reply);
}
function doSomthingWith(reply) {
// ...
}
etc., etc.

Move your function out of the .get parameters and then call it in the .get callback, passing it the reply.
var yourSpecialFunction = function(values) {
// do things here
};
T.get('trends/place', { id: '23424977' }, function(err, reply) {
if (err) {
// handle the error
} else {
yourSpecialFunction(reply);
}
}

Execute Multiple async operation at gets notified when all of them completes (Typescript)

I have the following data structure in my firebase
- ActionSheet
- PendingApproval
- SomeKey1
- user: 1
- data: 'walk the dog'
- SomeKey2
- user: 2
- data: 'brush the cat'
- Approved
- SomeKey3
- user: 1
- data: 'feed fish'
I want to download all data from user 1 in this structure such that I can display to the user to 'walk the dog' and 'feed fish'
At the moment this is how I do it. I am "chaining" the calls such that the 2nd call does not start until the first one finishes.
ngOnInit() {
this.databaseService.searchCurrentUserPendingApproval()
.first()
.subscribe(results => {
for (let result of results) {
this.mySubmissions.push(result);
}
this.databaseService.searchCurrentUserApprovedAction()
.first()
.subscribe(results => {
for (let result of results) {
this.mySubmissions.push(result);
}
// do some logic to the this.mySubmissio array at this point
doSomething(this.mySubmission);
}, error => {
console.log('Error download Actioned Timesheet', error)
})
}, error => {
console.log('error retreiving pending submission from firebase', error)
})
}
searchCurrentUserPendingTimesheets(key: string) {
let path = '/ActionSheet/PendingApproval' + key;
return this.af.database.list(path, {
query: {
orderByChild: 'user',
equalTo: 1
}
})
}
// search for own timesheet in firebase
searchCurrentUserActionedTimesheets(key: string) {
let path = '/ActionSheet/Approved' + key;
return this.af.database.list(path, {
query: {
orderByChild: 'user',
equalTo: 1
}
})
}
This works. However, the biggest problem is that this is a terrible way of doing it and it would be slow for larger "chain".
As the two query does not depend on each other, a better way to do it would be to to query the two branch at the same time and get notified when both of the operation completes (Doesn't matter which one comes back first as I will sort the array at the end').
In other words, I want to run doSomething(this.mySubmission) once all of the async calls mentioned is completed. Is there a proper typescript way to achieve this?
Note. When I was working with Swift, I was able to achieve this with dispath_group_enter/dispatch_group_leave.

You can try with forkJoin as below:
Observable.forkJoin(
this.databaseService.searchCurrentUserPendingApproval().first(),
this.databaseService.searchCurrentUserApprovedAction().first()
).subscribe([pendingResults, approvedResults] => {
this.mySubmissions = this.mySubmissions.concat(pendingResults, approvedResults);
doSomething(this.mySubmissions);
}, error => console.log('Error download Actioned Timesheet', error))

You can wait for multiple observables using forkJoin.

RxJS Observable fire onCompleted after a number of async actions

I'm trying to create an observable that produces values from a number of asynchronous actions (http requests from a Jenkins server), that will let a subscriber know once all the actions are completed. I feel like I must be misunderstanding something because this fails to do what I expect.
'use strict';
let Rx = require('rx');
let _ = require('lodash');
let values = [
{'id': 1, 'status': true},
{'id': 2, 'status': true},
{'id': 3, 'status': true}
];
function valuesObservable() {
return Rx.Observable.create(function(observer) {
_.map(values, function(value) {
var millisecondsToWait = 1000;
setTimeout(function() { // just using setTimeout here to construct the example
console.log("Sending value: ", value);
observer.onNext(value)
}, millisecondsToWait);
});
console.log("valuesObservable Sending onCompleted");
observer.onCompleted()
});
}
let observer = Rx.Observer.create((data) => {
console.log("Received Data: ", data);
// do something with the info
}, (error) => {
console.log("Error: ", error);
}, () => {
console.log("DONE!");
// do something else once done
});
valuesObservable().subscribe(observer);
Running this, I get output:
valuesObservable Sending onCompleted
DONE!
Sending value: { id: 1, status: true }
Sending value: { id: 2, status: true }
Sending value: { id: 3, status: true }
While what I would like to see is something more like:
Sending value: { id: 1, status: true }
Received Data: { id: 1, status: true }
Sending value: { id: 2, status: true }
Received Data: { id: 2, status: true }
Sending value: { id: 3, status: true }
Received Data: { id: 3, status: true }
valuesObservable Sending onCompleted
DONE!
I don't actually care about the order of the items in the list, I would just like the observer to receive them.
I believe what is happening is that Javascript asynchronously fires the timeout function, and proceeds immediately to the observer.onCompleted() line. Once the subscribing observer receives the onCompleted event (is that the right word?), it decides that it's done and disposes of itself. Then when the async actions complete and the observable fires onNext, the observer no longer exists to take any actions with them.
If I'm right about this, I'm still stumped about how to make it behave in the way I would like. Have I stumbled into an antipattern without realising it? Is there a better way of approaching this whole thing?
Edit:
Since I used setTimeout to construct my example, I realised I can use it to partially solve my problem by giving the observable a timeout.
function valuesObservable() {
return Rx.Observable.create(function(observer) {
let observableTimeout = 10000;
setTimeout(function() {
console.log("valuesObservable Sending onCompleted");
observer.onCompleted();
}, observableTimeout);
_.map(values, function(value) {
let millisecondsToWait = 1000;
setTimeout(function() {
console.log("Sending value: ", value);
observer.onNext(value)
}, millisecondsToWait);
});
});
}
This gets me all of the information from the observable in the order I want (data, then completion) but depending on the choice of timeout, I either may miss some data, or have to wait a long time for the completion event. Is this just a inherent problem of asynchronous programming that I have to live with?

Yes there is a better way. The problem right now is that you are relying on time delays for your synchronization when in fact you can use the Observable operators to do so instead.
The first step is to move away from directly using setTimeout. Instead use timer
Rx.Observable.timer(waitTime);
Next you can lift the values array into an Observable such that each value is emitted as an event by doing:
Rx.Observable.from(values);
And finally you would use flatMap to convert those values into Observables and flatten them into the final sequence. The result being an Observable that emits each time one of the source timers emits, and completes when all the source Observables complete.
Rx.Observable.from(values)
.flatMap(
// Map the value into a stream
value => Rx.Observable.timer(waitTime),
// This function maps the value returned from the timer Observable
// back into the original value you wanted to emit
value => value
)
Thus the complete valuesObservable function would look like:
function valuesObservable(values) {
return Rx.Observable.from(values)
.flatMap(
value => Rx.Observable.timer(waitTime),
value => value
)
.do(
x => console.log(`Sending value: ${value}`),
null,
() => console.log('Sending values completed')
);
}
Note the above would work as well if you weren't using demo stream, i.e. if you had really http streams you could even simplify by using merge (or concat to preserve order)
Rx.Observable.from(streams)
.flatMap(stream => stream);
// OR
Rx.Observable.from(streams).merge();
// Or simply
Rx.Observable.mergeAll(streams);

The best way to construct an observable is to use the existing primitive and then a combination of the existing operators. This avoids a few headaches (unsubscription, error management etc.). Then Rx.Observable.create is certainly useful when nothing else fits your use case. I wonder if generateWithAbsoluteTime would fit.
Anyways, here the issue you run into is that you complete your observer before you send him data. So basically you need to come up with a better completion signal. Maybe :
complete x seconds after last value emitted if no new value is emitted
complete when a value is equal to some 'end' value

With thanks to #paulpdaniels, this is the final code that did what I wanted, including the calls to Jenkins:
'use strict';
let Rx = require('rx');
let jenkinsapi = require('jenkins'); // https://github.com/silas/node-jenkins/issues
let jenkinsOpts = {
"baseUrl": "http://localhost:8080",
"options": {"strictSSL": false},
"job": "my-jenkins-job",
"username": "jenkins",
"apiToken": "f4abcdef012345678917a"
};
let jenkins = jenkinsapi(JSON.parse(JSON.stringify(jenkinsOpts)));
function jobInfoObservable(jenkins, jobName) {
// returns an observable with a containing a single list of builds for a given job
let selector = {tree: 'builds[number,url]'};
return Rx.Observable.fromNodeCallback(function(callback) {
jenkins.job.get(jobName, selector, callback);
})();
}
function buildIDObservable(jenkins, jobName) {
// returns an observable containing a stream of individual build IDs for a given job
return jobInfoObservable(jenkins, jobName).flatMap(function(jobInfo) {
return Rx.Observable.from(jobInfo.builds)
});
}
function buildInfoObservable(jenkins, jobName) {
// returns an observable containing a stream of http response for each build in the history for this job
let buildIDStream = buildIDObservable(jenkins, jobName);
let selector = {'tree': 'actions[parameters[name,value]],building,description,displayName,duration,estimatedDuration,executor,id,number,result,timestamp,url'};
return buildIDStream.flatMap(function(buildID) {
return Rx.Observable.fromNodeCallback(function(callback) {
jenkins.build.get(jobName, buildID.number, selector, callback);
})();
});
}
let observer = Rx.Observer.create((data) => {
console.log("Received Data: ", data);
// do something with the info
}, (error) => {
console.log("Error: ", error);
}, () => {
console.log("DONE!");
// do something else once done
});
buildInfoObservable(jenkins, jenkinsOpts.job).subscribe(observer);
By relying on the Rx built-in operators I managed to avoid messing about with timing logic altogether. This is also much cleaner than nesting multiple Rx.Observable.create statements.

Develop Reference

JavaScript is the programming language of the Web.

Node.js Twitter API cursors - javascript

Related

Possible race condition with cursor when using Promise.all

Multiple Mongoose Calls Within a For Each Loop

How do I get the args out of this twitter api call

Execute Multiple async operation at gets notified when all of them completes (Typescript)

RxJS Observable fire onCompleted after a number of async actions

Categories

Resources