How to delay a function like this? [duplicate] - javascript

Using the Google Geocoder v3, if I try to geocode 20 addresses, I get an OVER_QUERY_LIMIT unless I time them to be ~1 second apart, but then it takes 20 seconds before my markers are all placed.
Is there any other way to do it, other than storing the coordinates in advance?

No, there is not really any other way : if you have many locations and want to display them on a map, the best solution is to :
fetch the latitude+longitude, using the geocoder, when a location is created
store those in your database, alongside the address
and use those stored latitude+longitude when you want to display the map.
This is, of course, considering that you have a lot less creation/modification of locations than you have consultations of locations.
Yes, it means you'll have to do a bit more work when saving the locations -- but it also means :
You'll be able to search by geographical coordinates
i.e. "I want a list of points that are near where I'm now"
Displaying the map will be a lot faster
Even with more than 20 locations on it
Oh, and, also (last but not least) : this will work ;-)
You will less likely hit the limit of X geocoder calls in N seconds.
And you will less likely hit the limit of Y geocoder calls per day.

You actually do not have to wait a full second for each request. I found that if I wait 200 miliseconds between each request I am able to avoid the OVER_QUERY_LIMIT response and the user experience is passable. With this solution you can load 20 items in 4 seconds.
$(items).each(function(i, item){
setTimeout(function(){
geoLocate("my address", function(myLatlng){
...
});
}, 200 * i);
}

Unfortunately this is a restriction of the Google maps service.
I am currently working on an application using the geocoding feature, and I'm saving each unique address on a per-user basis. I generate the address information (city, street, state, etc) based on the information returned by Google maps, and then save the lat/long information in the database as well. This prevents you from having to re-code things, and gives you nicely formatted addresses.
Another reason you want to do this is because there is a daily limit on the number of addresses that can be geocoded from a particular IP address. You don't want your application to fail for a person for that reason.

I'm facing the same problem trying to geocode 140 addresses.
My workaround was adding usleep(100000) for each loop of next geocoding request. If status of the request is OVER_QUERY_LIMIT, the usleep is increased by 50000 and request is repeated, and so on.
And of cause all received data (lat/long) are stored in XML file not to run request every time the page is loading.

EDIT:
Forgot to say that this solution is in pure js, the only thing you need is a browser that supports promises https://developer.mozilla.org/it/docs/Web/JavaScript/Reference/Global_Objects/Promise
For those who still needs to accomplish such, I've written my own solution that combines promises with timeouts.
Code:
/*
class: Geolocalizer
- Handles location triangulation and calculations.
-- Returns various prototypes to fetch position from strings or coords or dragons or whatever.
*/
var Geolocalizer = function () {
this.queue = []; // queue handler..
this.resolved = [];
this.geolocalizer = new google.maps.Geocoder();
};
Geolocalizer.prototype = {
/*
#fn: Localize
#scope: resolve single or multiple queued requests.
#params: <array> needles
#returns: <deferred> object
*/
Localize: function ( needles ) {
var that = this;
// Enqueue the needles.
for ( var i = 0; i < needles.length; i++ ) {
this.queue.push(needles[i]);
}
// return a promise and resolve it after every element have been fetched (either with success or failure), then reset the queue.
return new Promise (
function (resolve, reject) {
that.resolveQueueElements().then(function(resolved){
resolve(resolved);
that.queue = [];
that.resolved = [];
});
}
);
},
/*
#fn: resolveQueueElements
#scope: resolve queue elements.
#returns: <deferred> object (promise)
*/
resolveQueueElements: function (callback) {
var that = this;
return new Promise(
function(resolve, reject) {
// Loop the queue and resolve each element.
// Prevent QUERY_LIMIT by delaying actions by one second.
(function loopWithDelay(such, queue, i){
console.log("Attempting the resolution of " +queue[i-1]);
setTimeout(function(){
such.find(queue[i-1], function(res){
such.resolved.push(res);
});
if (--i) {
loopWithDelay(such,queue,i);
}
}, 1000);
})(that, that.queue, that.queue.length);
// Check every second if the queue has been cleared.
var it = setInterval(function(){
if (that.queue.length == that.resolved.length) {
resolve(that.resolved);
clearInterval(it);
}
}, 1000);
}
);
},
/*
#fn: find
#scope: resolve an address from string
#params: <string> s, <fn> Callback
*/
find: function (s, callback) {
this.geolocalizer.geocode({
"address": s
}, function(res, status){
if (status == google.maps.GeocoderStatus.OK) {
var r = {
originalString: s,
lat: res[0].geometry.location.lat(),
lng: res[0].geometry.location.lng()
};
callback(r);
}
else {
callback(undefined);
console.log(status);
console.log("could not locate " + s);
}
});
}
};
Please note that it's just a part of a bigger library I wrote to handle google maps stuff, hence comments may be confusing.
Usage is quite simple, the approach, however, is slightly different: instead of looping and resolving one address at a time, you will need to pass an array of addresses to the class and it will handle the search by itself, returning a promise which, when resolved, returns an array containing all the resolved (and unresolved) address.
Example:
var myAmazingGeo = new Geolocalizer();
var locations = ["Italy","California","Dragons are thugs...","China","Georgia"];
myAmazingGeo.Localize(locations).then(function(res){
console.log(res);
});
Console output:
Attempting the resolution of Georgia
Attempting the resolution of China
Attempting the resolution of Dragons are thugs...
Attempting the resolution of California
ZERO_RESULTS
could not locate Dragons are thugs...
Attempting the resolution of Italy
Object returned:
The whole magic happens here:
(function loopWithDelay(such, queue, i){
console.log("Attempting the resolution of " +queue[i-1]);
setTimeout(function(){
such.find(queue[i-1], function(res){
such.resolved.push(res);
});
if (--i) {
loopWithDelay(such,queue,i);
}
}, 750);
})(that, that.queue, that.queue.length);
Basically, it loops every item with a delay of 750 milliseconds between each of them, hence every 750 milliseconds an address is controlled.
I've made some further testings and I've found out that even at 700 milliseconds I was sometimes getting the QUERY_LIMIT error, while with 750 I haven't had any issue at all.
In any case, feel free to edit the 750 above if you feel you are safe by handling a lower delay.
Hope this helps someone in the near future ;)

I have just tested Google Geocoder and got the same problem as you have.
I noticed I only get the OVER_QUERY_LIMIT status once every 12 requests
So I wait for 1 second (that's the minimum delay to wait)
It slows down the application but less than waiting 1 second every request
info = getInfos(getLatLng(code)); //In here I call Google API
record(code, info);
generated++;
if(generated%interval == 0) {
holdOn(delay); // Every x requests, I sleep for 1 second
}
With the basic holdOn method :
private void holdOn(long delay) {
try {
Thread.sleep(delay);
} catch (InterruptedException ex) {
// ignore
}
}
Hope it helps

This worked well for me, after intermittent trial and error over the past couple days. I am using react instant-search-hooks via Algolia with Nextjs and Sanity for a new jobs site for a large company.
Postal Code is a facet for filtering/sorting/query matching that is defined in the algolia index. In another script file, I map out all of these facets (postal code, city, etc); Now that I have 100 returned files they can be mapped out by iterating through a mapped asynchronous import and the lat/lng coords matched to the corresponding zip codes defining a job posting (there are ~2500 postings but only ~100 zip codes to narrow down the coordinates of)
import * as dotenv from "dotenv";
dotenv.config();
import {
googleNetwork,
axiosConfig as googleAxiosConfig
} from "../utils/google-axios";
import JSONData from "../../public/data/postalCode/2022/05/26.json";
import fs from "fs";
import { join } from "path";
import type { GeneratedGeolocData } from "../types/algolia";
import { timezoneHelper } from "../utils/timezone-helper";
import { Unenumerate } from "../types/helpers";
let i = 0;
i < JSONData.postalCodes.facetHits.length;
i++;
const getGeoCode = (
record: Unenumerate<typeof JSONData.postalCodes.facetHits>
) =>
function () {
return JSONData.postalCodes.facetHits.map(async (data = record, u) => {
const googleBase = process.env.NEXT_PUBLIC_GOOGLE_MAPS_BASE_PATH ?? "";
const googleApiKey =
process.env.NEXT_PUBLIC_TAKEDA_JOBS_GOOGLE_SERVICES ?? "";
const params: (string | undefined)[][] = [
["address", data.value],
["key", googleApiKey]
];
const query = params
.reduce<string[]>((arr, [k, v]) => {
if (v) arr.push(`${k}=${encodeURIComponent(v)}`);
return arr;
}, [])
.join("&");
return await googleNetwork("GET")
.get(`${googleBase}geocode/json?${query}`, googleAxiosConfig)
.then(dat => {
const geoloc = dat.data as GeneratedGeolocData;
const {
[0]: Year,
[2]: Month,
[4]: Day
} = new Date(Date.now())
.toISOString()
.split(/(T)/)[0]
.split(/([-])/g);
const localizedTimestamp = timezoneHelper({
dateField: new Date(Date.now()),
timezone: "America/Chicago"
});
return setTimeout(
() =>
fs.appendFileSync(
join(
process.cwd(),
`public/data/geoloc/${Year}/${Month}/${Day}-${[i]}.json`
),
JSON.stringify(
{
generated: localizedTimestamp,
_geoloc: {
postalCode: data.value,
geolocation: geoloc
}
},
null,
2
)
),
1000
);
});
});
};
getGeoCode(JSONData.postalCodes.facetHits[i]);
It took a lot less time than anticipated -- under 4 seconds for 100 unique results to generate
Context on the Unenumerate type -- Unenumerate strips the internal repeating unit within an array:
type Unenumerate<T> = T extends Array<infer U> ? U : T;

Related

Recursive Steam API Call does not terminate

I am calling an API endpoint for one of Steam's games through their web api using axios and promises in Node.js. Each JSON response from the endpoint returns 100 match objects, of which only about 10 to 40 (on average) are of interest to my use case. Moreover, I have observed that the data tends to be repeated if called many times within, say, a split second.
What I am trying to achieve is get 100 match_ids (not whole match objects) that fit my criteria in an array by continuously (recursively) calling the api until I get 100 unique match_ids that serve my purpose.
I am aware that calling the endpoint within a loop is naive and it exceeds the call limits of 1 request per second set by their web api. This is why I've resorted to recursion to ensure that each promise is resolved and the array filled with match_ids before proceeding on. The issue I am having is, my code does not terminate and at each stage of the recursive calls, the values are the same (e.g. last match id, the actual built up array, etc.)
function makeRequestV2(matchesArray, lastId) {
// base case
if (matchesArray.length >= BATCH_SIZE) {
console.log(matchesArray);
return;
}
steamapi
.getRawMatches(lastId)
.then(response => {
const matches = response.data.result.matches;
// get the last id of fetched chunk (before filter)
const lastIdFetched = matches[matches.length - 1].match_id;
console.log(`The last Id fetched: ${lastIdFetched}`);
let filteredMatches = matches
.filter(m => m.lobby_type === 7)
.map(x => x.match_id);
// removing potential dups
matchesArray = [...new Set([...matchesArray, ...filteredMatches])];
// recursive api call
makeRequestV2(matchesArray, lastIdFetched);
})
.catch(error => {
console.log(
"HTTP " + error.response.status + ": " + error.response.statusText
);
});
}
makeRequestV2(_matchIds);
// this function lies in a different file where the axios call happens
module.exports = {
getRawMatches: function(matchIdBefore) {
console.log("getRawMatches() executing.");
let getURL = `${url}${config.ENDPOINTS.GetMatchHistory}/v1`;
let parameters = {
params: {
key: `${config.API_KEY}`,
min_players: `${initialConfig.min_players}`,
skill: `${initialConfig.skill}`
}
};
if (matchIdBefore) {
parameters.start_at_match_id = `${matchIdBefore}`;
}
console.log(`GET: ${getURL}`);
return axios.get(getURL, parameters);
}
}
I'm not exceeding the request limits and all that, but the same results keep coming up.
BATCH_SIZE is 100 and
_matchIds = []
I would start with replacing the line:
matchesArray = [...new Set([...matchesArray, ...filteredMatches])];
with this one:
filteredMatches.filter(item => matchesArray.indexOf(item) === -1).forEach(item=>{
matchesArray.push(item)
})
What you were doing was that you effectively replaced the matchesArray var inside your function with new reference. I mean the var that you sent in function parameter from outside was no longer the same var inside the function. If you use matchesArray.push - you do not change the var reference though and the var in outer scope is accurately updated - just as is your intention.
This is the reason why _matchIds remains empty: each time there is a call to makeRequestV2, the inner variable matchesArray becomes 'detouched' from outer scope (during assignment statement execution) and although it gets populated, the outer scoped var still points to the original reference and stays untouched.

Looping through an array asynchronously changes values for different indexes

I have an asynchronous loop that sends the index value to save the results. However, for some reason, the first time this item loops, it records the value properly. The second time it loops. it changes the value in the first loop and the value in the second loop, and so forth and so on. It seems to do this inconsistently depending on the input.
Below is the code
promises[address_key] = addresses[address_key].nearest.map(function(currentValue, charger_key) { // This loops through items creating promises
return new Promise(function(resolve) {
addresses[address_key].nearest[charger_key].directions = get_driving_directions(addresses[address_key].nearest[charger_key]);
addresses[address_key].nearest[charger_key].map_img = get_static_map_url(addresses[address_key].nearest[charger_key]);
get_driving_distance(address_key, charger_key, resolve); // this calls the get_driving_distance function below
});
});
/* more code here */
// Calls distance matrix from Google's api.
function get_driving_distance(address_key, charger_key, resolve) {
var distance_matrix = new google.maps.DistanceMatrixService();
distance_matrix.getDistanceMatrix( // calls distance matrix in Google API
{
origins: [addresses[address_key].address],
destinations: [new google.maps.LatLng(addresses[address_key].nearest[charger_key].latitude, addresses[address_key].nearest[charger_key].longitude)],
unitSystem: google.maps.UnitSystem.IMPERIAL,
travelMode: 'DRIVING'
}, process_distance_matrix(address_key, charger_key, resolve) // calls process_distance_matrix function sending over variables necessary.
);
}
// processes data from the distance matrix in the function get_driving_distance
function process_distance_matrix(address_key, charger_key, callback) {
return function(response, status) {
if (response.rows[0].elements[0].status == 'OK') {
console.log("response", response, 'status', status, 'address_key', address_key, 'charger_key', charger_key);
console.log("Records before:", addresses[0].nearest[0].distance, response.rows[0].elements[0].distance.text, 'address_key', address_key, 'charger_key', charger_key);
// Update the global variable with the distance data. This is recording data in wrong fields.
addresses[address_key].nearest[charger_key].distance = {
'text': response.rows[0].elements[0].distance.text,
'number': response.rows[0].elements[0].distance.value / 1609.344,
'over_limit': (max_driving_distance ? (response.rows[0].elements[0].distance.value / 1609.344 > max_driving_distance) : false)
};
console.log("Records after:", addresses[0].nearest[0].distance, response.rows[0].elements[0].distance.text, 'address_key', address_key, 'charger_key', charger_key);
} else {
display_error(address_key + ') ' + addresses[address_key].address + ' - Error getting driving distance');
addresses[address_key].errors.push('Error getting driving distance');
progress_status.error++;
}
callback();
}
}
The rest of the code is too much to post, so here is a link to the rest of the code: https://jsfiddle.net/RobertoMejia/cqyyLh27/
The original loop is a for loop on line 68. This loops through addresses and passes address_key to refer to the global object.
There is a second loop on line 183. It is a .map loop. This runs through chargers, and passes charger_key to refer to the global object.
Notice the console.logs in the middle of the function. Those are to show how the variable changes where it shouldn't. The display the object in question before and after the declaration each time. It also shows the address key and the charger key at the time of execution.
Any help would be appreciated.
I think the problem is here in process_addresses:
addresses[address_key].nearest = charger_data.sort( sort_by_closest( addresses[address_key].geo ) );
// Takes the top results based on the number of results wanted.
addresses[address_key].nearest = addresses[address_key].nearest.slice(0, $('#number_of_results').val() );
If the same charger is near multiple addresses, that charger will be in the nearest array for all of them. So when you add the driving directions to the charger from one address, you're replacing the directions from the previous address.
There are two solutions:
The simplest is to clone the charger objects before putting them into the nearest array.
addresses[address_key].nearest = addresses[address_key].nearest.slice(0, $('#number_of_results').val()).map(function(charger) {
return $.extend({}, charger);
});
Another way is to use a different object to hold the directions, and make the charger a property of it:
addresses[address_key].nearest = addresses[address_key].nearest.slice(0, $('#number_of_results').val()).map(function(charger) {
return { charger: charger };
});
This is an application of Wheeler's famous aphorism: "All problems in computer science can be solved by another level of indirection"
Modified fiddle (using the first solution)

How to handle Shopify's API call limit using microapps Node.js module

I have been banging my head finding an answer for this I just cant figure out. I am using a Node.js module for the Shopify API by microapps. I have a JSON object containing a list of product id's and skus that I need to update so I looping through the file and calling a function that calls the api. Shopify's API limits calls to it and sends a response header with the value remaining. This node modules provides an object containing the limits and usage. My question is based on the code below how to can at a setTimeout or similar when I am reaching the limit. Once you make your first call it will return the limits object like this:
{
remaining: 30,
current: 10,
max: 40
}
Here is what I have without respecting the limits as everything I tried fails:
const products = JSON.parse(fs.readFileSync('./skus.json','utf8'));
for(var i = 0;i < products.length; i++) {
updateProduct(products[i]);
}
function updateProduct(product){
shopify.productVariant.update(variant.id, { sku: variant.sku })
.then(result => cb(shopify.callLimits.remaining))
.catch(err => console.error(err.statusMessage));
}
I know I need to implement some sort of callback to check if the remaining usage is low and then wait a few seconds before calling again. Any help would be greatly appreciated.
I would use something to limit the execution rate of the function used by shopify-api-node (Shopify.prototype.request) to create the request, for example https://github.com/lpinca/valvelet.
The code below is not tested but should work. It should respect the limit of 2 calls per second.
var Shopify = require('shopify-api-node');
var valvelet = require('valvelet');
var products = require('./skus');
var shopify = new Shopify({
shopName: 'your-shop-name',
apiKey: 'your-api-key',
password: 'your-app-password'
});
// Prevent the private shopify.request method from being called more than twice per second.
shopify.request = valvelet(shopify.request, 2, 1000);
var promises = products.map(function (product) {
return shopify.productVariant.update(product.id, { sku: product.sku });
});
Promise.all(promises).then(function (values) {
// Do something with the responses.
}).catch(function (err) {
console.error(err.stack);
});
Try making use of the autoLimit option, for example:
import Shopify from 'shopify-api-node';
const getAutoLimit = (plan: string) => {
if (plan === 'plus') {
return { calls: 4, interval: 1000, bucketSize: 80 };
} else {
return { calls: 2, interval: 1000, bucketSize: 40 };
}
};
const shopify = new Shopify({
shopName: process.env.SHOPIFY_SHOP_NAME!,
apiKey: process.env.SHOPIFY_SHOP_API_KEY!,
password: process.env.SHOPIFY_SHOP_PASSWORD!,
apiVersion: '2020-07',
autoLimit: getAutoLimit(process.env.SHOPIFY_SHOP_PLAN),
});
export default shopify;
According to the library's documentation:
- `autoLimit` - Optional - This option allows you to regulate the request rate
in order to avoid hitting the [rate limit][api-call-limit]. Requests are
limited using the token bucket algorithm. Accepted values are a boolean or a
plain JavaScript object. When using an object, the `calls` property and the
`interval` property specify the refill rate and the `bucketSize` property the
bucket size. For example `{ calls: 2, interval: 1000, bucketSize: 35 }`
specifies a limit of 2 requests per second with a burst of 35 requests. When
set to `true` requests are limited as specified in the above example. Defaults
to `false`.
And this is the version I tried: "shopify-api-node": "^3.3.2"
Regarding the rate limits, refer to Shopify's documentation.
try this...
const Shopify = require("shopify-api-node");
const waitonlimit = 2;
let calllimitremain = 40;
const shopify = new Shopify({
shopName: process.env.SHOPIFY_URL,
apiKey: process.env.SHOPIFY_KEY,
password: process.env.SHOPIFY_PWD,
autoLimit: true,
});
shopify.on("callLimits", (limits) => {
calllimitremain = limits.remaining;
if (limits.remaining < 10) {
console.log(limits);
}
});
exports.update = async () => {
//Run this before update
while (calllimitremain <= waitonlimit) {
shopify.product.list({ limit: 1, fields: "id, title" });
console.log(`Waiting for bucket to fill: ${calllimitremain}`);
}
//update
await shopify.productVariant.update(
onlineVariantId,
{ compare_at_price: price, price: promo }
);
};
If you look at Shopify code, their github repository has a CLI. That CLI is dealing with the limits. You can quickly learn how Shopify deals with these limits, looking at their code.
Since their code is in Ruby, it is pretty easy to digest. It should not take a skilled JS programmer more than a few minutes to see how to deal with limits based on this code, even abstracting from Ruby.
So my suggestion is to read that Shopify code and try and then morph your JS code to match the same pattern.

Assemble paginated ajax data in a Bacon FRP stream

I'm learning FRP using Bacon.js, and would like to assemble data from a paginated API in a stream.
The module that uses the data has a consumption API like this:
// UI module, displays unicorns as they arrive
beautifulUnicorns.property.onValue(function(allUnicorns){
console.log("Got "+ allUnicorns.length +" Unicorns");
// ... some real display work
});
The module that assembles the data requests sequential pages from an API and pushes onto the stream every time it gets a new data set:
// beautifulUnicorns module
var curPage = 1
var stream = new Bacon.Bus()
var property = stream.toProperty()
var property.onValue(function(){}) # You have to add an empty subscriber, otherwise future onValues will not receive the initial value. https://github.com/baconjs/bacon.js/wiki/FAQ#why-isnt-my-property-updated
var allUnicorns = [] // !!! stateful list of all unicorns ever received. Is this idiomatic for FRP?
var getNextPage = function(){
/* get data for subsequent pages.
Skipping for clarity */
}
var gotNextPage = function (resp) {
Array.prototype.push.apply(allUnicorns, resp) // just adds the responses to the existing array reference
stream.push(allUnicorns)
curPage++
if (curPage <= pageLimit) { getNextPage() }
}
How do I subscribe to the stream in a way that provides me a full list of all unicorns ever received? Is this flatMap or similar? I don't think I need a new stream out of it, but I don't know. I'm sorry, I'm new to the FRP way of thinking. To be clear, assembling the array works, it just feels like I'm not doing the idiomatic thing.
I'm not using jQuery or another ajax library for this, so that's why I'm not using Bacon.fromPromise
You also may wonder why my consuming module wants the whole set instead of just the incremental update. If it were just appending rows that could be ok, but in my case it's an infinite scroll and it should draw data if both: 1. data is available and 2. area is on screen.
This can be done with the .scan() method. And also you will need a stream that emits items of one page, you can create it with .repeat().
Here is a draft code (sorry not tested):
var itemsPerPage = Bacon.repeat(function(index) {
var pageNumber = index + 1;
if (pageNumber < PAGE_LIMIT) {
return Bacon.fromCallback(function(callback) {
// your method that talks to the server
getDataForAPage(pageNumber, callback);
});
} else {
return false;
}
});
var allItems = itemsPerPage.scan([], function(allItems, itemsFromAPage) {
return allItems.concat(itemsFromAPage);
});
// Here you go
allItems.onValue(function(allUnicorns){
console.log("Got "+ allUnicorns.length +" Unicorns");
// ... some real display work
});
As you noticed, you also won't need .onValue(function(){}) hack, and curPage external state.
Here is a solution using flatMap and fold. When dealing with network you have to remember that the data can come back in a different order than you sent the requests - that's why the combination of fold and map.
var pages = Bacon.fromArray([1,2,3,4,5])
var requests = pages.flatMap(function(page) {
return doAjax(page)
.map(function(value) {
return {
page: page,
value: value
}
})
}).log("Data received")
var allData = requests.fold([], function(arr, data) {
return arr.concat([data])
}).map(function(arr) {
// I would normally write this as a oneliner
var sorted = _.sortBy(arr, "page")
var onlyValues = _.pluck(sorted, "value")
var inOneArray = _.flatten(onlyValues)
return inOneArray
})
allData.log("All data")
function doAjax(page) {
// This would actually be Bacon.fromPromise($.ajax...)
// Math random to simulate the fact that requests can return out
// of order
return Bacon.later(Math.random() * 3000, [
"Page"+page+"Item1",
"Page"+page+"Item2"])
}
http://jsbin.com/damevu/4/edit

Instagram API - paging through 'new' posts

So I'm using node.js and the module instagram-node-lib to download metadata for Instagram posts. I have a couple of hashtags that I want to search for, and I want to download all existing posts (handling request failure during pagination) as well as monitor all new posts.
I have managed to crack the first part - downloading all existing posts and handling failure (I noticed that sometimes the Instagram API would just fail on me, so I've added redundancy to remember the last successful page I downloaded and attempt again from that point). For anyone who is interested, here is my code (note, I use Postgres to save the posts, and I've abbreviated/obfuscated some of the code for ease of reading and for commercial purposes) **apologies for the length of code, but I think this will come in useful to someone:
var db = new (require('./postgres'))
,api = require("instagram-node-lib")
;
var HASHTAGS = ["fluffy", "kittens"] //this is just an example!
,CLIENT_ID = "YOUR_CLIENT_ID"
,CLIENT_SECRET = "YOUR_CLIENT_SECRET"
,HOST = "https://api.instagram.com"
,PORT = 443
,PATH = "/v1/media/popular?client_id=" + CLIENT_ID
;
var hashtagIndex = 0
,settings
;
/**
* Initialise the module for use
*/
exports.initialise = function(){
api.set("client_id", CLIENT_ID);
api.set("client_secret", CLIENT_SECRET);
if( !settings){
settings = {
hashtags: []
}
for( var i in HASHTAGS){
settings.hashtags[i] = {
name: HASHTAGS[i],
maxTagId: null,
minTagId: null,
nextMaxTagId: null,
}
}
}
// console.log(settings);
db.initialiseSettings(); //I haven't included the code for this - basically just loads settings from the database, overwriting the defaults above if they exist, otherwise it creates them using the above object. I store the settings as a JSON object in the DB and parse them on load
execute();
}
function execute(){
var params = {
name: HASHTAGS[hashtagIndex],
complete: function(data, pagination){
var hashtag = settings.hashtags[hashtagIndex];
//from scratch
if( !hashtag.maxTagId){
console.log('Downloading old posts from scratch');
getOldPosts();
}
//still loading old (previously failed)
else if( hashtag.nextMaxTagId){
console.log('Downloading old posts from last saved position');
getOldPosts(hashtag.nextMaxTagId);
}
//new posts only
else {
console.log('Downloading new posts only');
getNewPosts(hashtag.minTagId);
}
},
error: function(msg, obj, caller){
apiError(msg, obj, caller);
}
}
api.tags.info(params);
}
function getOldPosts(maxTagId){
console.log();
var params = {
name: HASHTAGS[hashtagIndex],
count: 100,
max_tag_id: maxTagId || undefined,
complete: function(data, pagination){
console.log(pagination);
var hashtag = settings.hashtags[hashtagIndex];
//reached the end
if( pagination.next_max_tag_id == hashtag.maxTagId){
console.log('Downloaded all posts for #' + HASHTAGS[hashtagIndex]);
hashtag.nextMaxTagId = null; //reset nextMaxTagId - that way next time we execute the script we know to just look for new posts
saveSettings(function(){
next();
}); //Another function I haven't include - just saves the settings object, overwriting what is in the database. Once saved, executes the next() function
}
else {
//from scratch
if( !hashtag.maxTagId){
//these values will be saved once all posts in this batch have been saved. We set these only once, meaning that we have a baseline to compare to - enabling us to determine if we have reached the end of pagination
hashtag.maxTagId = pagination.next_max_tag_id;
hashtag.minTagId = pagination.min_tag_id;
}
//if there is a failure then we know where to start from - this is only saved to the database once the posts are successfully saved to database
hashtag.nextMaxTagId = pagination.next_max_tag_id;
//again, another function not included. saves the posts to database, then updates the settings. Once they have completed we get the next page of data
db.savePosts(data, function(){
saveSettings(function(){
getOldPosts(hashtag.nextMaxTagId);
});
});
}
},
error: function(msg, obj, caller){
apiError(msg, obj, caller);
//keep calm and try again - this is our failure redundancy
execute();
}
}
var posts = api.tags.recent(params);
}
/**
* Still to be completed!
*/
function getNewPosts(minTagId){
}
function next(){
if( hashtagIndex < HASHTAGS.length - 1){
console.log("Moving onto the next hashtag...");
hashtagIndex++;
execute();
}
else {
console.log("All hashtags processed...");
}
}
Ok so here is my dilema about solving the next piece of the puzzle - downloading new posts (in other words, only those new posts that have come into existence since I last downloaded all the posts). Should I use Instagram subscriptions or is there a way to implement paging similar to what I've already used? I'm worried that if I use the former solution then if there is a problem with my server and it goes down for a period of time then I will miss out on some posts. I' worried that if I use the latter solution then it might not be possible to page through the records, because is the Instagram API set up to enable forward paging rather than backward paging?
I've attempted to post questions in the Google Instagram API Developers Group a couple of times and none of my messages seem to be appearing in the forum so I thought I'd resort to trusty stackoverflow

Categories

Resources