Fetch list of 50,000 most subscribed channels - javascript

I'm trying to figure out a way to grab the top 50,000 most subscribed youtube channels using javascript. These only need to be grabbed once and will be stored in a file to be used for an autocomplete input in a webpage.
I've gotten pretty close to getting the first top 50 by using search:list (/youtube/v3/search) by searching with parameters maxResults=50, order=viewCount, part=snippet, type=channel, fields=nextPageToken,items(snippet(channelId,title))
Returning:
{
"nextPageToken": "CDIQAA",
"items": [{
"snippet": {
"channelId": "UC-9-kyTW8ZkZNDHQJ6FgpwQ",
"title": "Music"
}
},{
"snippet": {
"channelId": "UC-lHJZR3Gqxm24_Vd_AJ5Yw",
"title": "PewDiePie"
}
},{
"snippet": {
"channelId": "UCVPYbobPRzz0SjinWekjUBw",
"title": "Анатолий Шарий"
}
},{
"snippet": {
"channelId": "UCam8T03EOFBsNdR0thrFHdQ",
"title": "VEGETTA777"
}
},...
Then all I'd have to do is fetch that 1000 more times using the nextPageToken to get a list of the top 50,000.
Unfortunately, sorting by relevance, rating, viewCount, or nothing is not yielding the 50 most subscribed channels, and there doesn't seem to be any sort of way to order them by subscriber count according to the documentation; so it seems like i am stuck.

Just before you writing your 50 results in file (or database), you can make one more API call, using channelId field from your result, and merge all of them with comma delimited and make another API call Channels: list.
On that page for example you can use following parameters:
(these are IDs from your example above)
part=statistics
id=UC-9-kyTW8ZkZNDHQJ6FgpwQ,UC-lHJZR3Gqxm24_Vd_AJ5Yw,UCVPYbobPRzz0SjinWekjUBw,UCam8T03EOFBsNdR0thrFHdQ`
And result will look something like this:
{
"kind": "youtube#channel",
"etag": "\"m2yskBQFythfE4irbTIeOgYYfBU/MG6zgnd09mqb3nAdyRnPDgFwfkE\"",
"id": "UC-lHJZR3Gqxm24_Vd_AJ5Yw",
"statistics": {
"viewCount": "15194203723",
"commentCount": "289181",
"subscriberCount": "54913094",
"hiddenSubscriberCount": false,
"videoCount": "3175"
}
}
And you can take subscriberCount from result for each channel.
I know, this is not the way to sort your 50 results while writing into the file,
but with this you can sort later your results by "subscriber count" while fetching from file for your autocomplete input.
I didn't find any other way to sort results by subscriber count, so maybe this can be helpful.

The idea to do is to run a server side script, that makes RESTful api calls in a loop, and writes the results to .JSON file, to save results. For that you can create PHP script, that makes REST API call to google, and fetch first 50 results, and then use file write operations to write your results. Run that PHP script as corn job to update results at regular intervals. Executing corn job at every specific time interval you set keeps results fresh.
Hit CURL command with loop for next, to fetches 50 results every time and create temp file with all the results saved in .JSON file. Once your results are fetched, replace your old JSON file with newly created temporary file. This will generate fresh JSON file are regular, with new results if any changes are made to data.
However, the idea to use temporary file is to avoid script avoid wait/slow of AJAX down due to consistent read and write operations on same file. Once temporary file is written, simply use move command to replace the actual file.
Make sure, you use cache control headers in AJAX results to keep its freshness of data.

Related

How can I retrieve popular users from GitHub?

I am trying to get a list of popular repos and users on GitHub.
Their API has an example to find users given some criteria that must be sent under the q query param, this is a required parameter but I am not sure how to send it as 'empty'
The query should list users and sort by followers, I am close but I am not sure what to send in q
`https://api.github.com/search/users?q=${WHAT_WHOULD_GO_HERE}&sort=followers&order=desc`
Just for reference, I was also trying to get popular repos and this is possible with the following query and it works just fine:
curl https://api.github.com/search/repositories\?q\=stars:\>1+language:javascript\&sort\=stars\&order\=desc\&type\=Repositories
You can run a query by specifying the follower limit, repository language, and page on the Github API. If you can configure the queries correctly, you will get what you want.
Sample query
`https://api.github.com/search/users?q=repos:followers:<1000&language:javascript&page=1&per_page=100`
For example, I can fetch all users with more than 2000 followers. This is also getting a kind of popular users.
`https://api.github.com/search/users?q=repos:followers:%3E2000&language:javascript&page=1&per_page=100`
Response
{
"total_count": 321,
"incomplete_results": false,
"items": [
{
"login": "vim-scripts",
"id": 443562,
"node_id": "MDQ6VXNlcjQ0MzU2Mg==",
"avatar_url": "https://avatars0.githubusercontent.com/u/443562?v=4",
"gravatar_id": "",
"url": "https://api.github.com/users/vim-scripts",
...
}
After fiddling around I got the answer:
curl https://api.github.com/search/users\?q\=followers:\>1000\&page\=1\&per_page\=10\&sort\=followers\&order\=desc
The query is based on Github's own popular list which has some clues in its own URL, the query above returns the exact same result
https://github.com/search?o=desc&q=followers%3A%3E%3D1000&ref=searchresults&s=followers&type=Users
The q query param needs only this:
followers: >1000,
Plus some sorting as described in the question:
sort: by the followers count,
order: descendent

Possible ways to send huge amount of data to PHP server

I have a step form in a project that handles a lot of data. To prevent errors during creation, all information is stored client-side, and in the end, is sent to the server.
the information sent to the server looks like this:
{
name: "project1",
decription: "lot of text",
schedule:[{weekDay:1, startHour:"09:00", endHour:"15:00"}, ...]
tasks:["task1", "task2"... until 20/30],
files:[{file1}, {file2}, ...],
services:[{
name: "service1",
decription: "lot of text",
schedule:[{weekDay:1, startHour:"09:00", endHour:"15:00"}, ...]
tasks:["task1", "task2"... until 20/30],
files:[{file1}, {file2}, ...],
jobs:[{
name: "job1",
decription: "lot of text",
schedule:[{weekDay:1, startHour:"09:00", endHour:"15:00"}, ...]
tasks:["task1", "task2"... until 20/30],
files:[{file1}, {file2}, ...]
},{
name: "job2",
}
]
...
},{
name:"service2",
...
}
}
And so on..
This is a really reduced example, in a real enviroment there will be 1 project with about 10-15 services, each one with 4-5 jobs.
I have been able to process everything with about 15 items in the last level, and now I´m trying to preprocess data to delete objects not neeeded in the server before send, and with that I expect to be able to send over 50 items in the last level without triggering "max_input_variables exceeded xxx" server side. But still, will be very close to the limit in some cases.
I´m thinking about changing the way I send/receive data, but I´m not sure if my guesses are even correct.
Before some suggest a json request to prevent the input variables error, the request has to bee multipart/form-data to send files.
Said that, my guesses were the following:
Mount all the data as json in a single variable and keep the files in separated variables ( formData would look like {project:{hugeJSON}, files:[file1, file2], services:[{files:[...]}, {files:[...]}] } )
Send partial data during the form fill to the server and store it somewhere, (a tmp file would be my best bet) and in the last step, send only the main form information.
Probably a stupid guess, but is there something like sending chunked data? Ideally, I would like to show to the user a loading bar saying "Creating project--> Saving Service nº1 --> Generating Docs for Service 1..." I think that I could achieve this making my server-side script generate a chunked reponse, but not sure about that.
Well, any help that could show me the correct way would be really appreciated.
Tank you in advance.
Once you are finished filling your object, you should stringify it and send it to the server as a post parameter.
Once you receive it serverside, you can parse JSON and continue working.

Detect when no previous posts available in Facebook Graph posts edge?

I'm accessing the Facebook Graph API for posts and am trying to figure out the pagination handling. I understand the use of paging.next and paging.previous properties of the results but I'd like to know when there are actually previous results. Particularly, when I make the first 'posts' call, I get back a paging.previous url even though there are no previous values. Upon calling that url I get a response with no results.
For example, calling "168073773388372/posts?limit=2" returns the following:
{
"data": [
{
"story": "Verticalmotion test added a new photo.",
"created_time": "2015-12-02T17:04:56+0000",
"id": "168073773388372_442952469233833"
},
{
"message": "http://www.youtube.com/watch?v=QD2Rdeo8vuE",
"created_time": "2013-12-16T23:19:30+0000",
"id": "168073773388372_184840215045061"
}
],
"paging": {
"previous": "https://graph.facebook.com/v2.6/168073773388372/posts?limit=2&format=json&since=1449075896&access_token=****&__paging_token=enc_AdA69SApv4VoBZB0PPZA7W5EivCYQal8KMFmRNkyhr8ZBk4w0YmFEQUJWV3JZBS70ihyMpbqieQaERhY50enqNCMBuIZATadeopYj8xPvQL7Y8KueaQZDZD&__previous=1",
"next": "https://graph.facebook.com/v2.6/168073773388372/posts?limit=2&format=json&access_token=****&until=1387235970&__paging_token=enc_AdAVMaUlPmpxjBmq5ZClVdNpFp7f9MyMFWjE7ygqsMLW7zvSx3eGHLkfwDxdCx0uO3ooAZCKDmCwMWHZA9RNyxkYUPJyjMtO3kynKm5uF2PhoPZB2gZDZD"
}
}
How can I tell if it's the first set of results?
From tidbits scattered around the documentation and web, it seems like the previous url shouldn't be there.
I don't think it matters because I get the same results in the Graph Explorer but I'm using OpenFB to access the API.
You can set the order to be reverse then get the 1st result
https://developers.facebook.com/docs/graph-api/using-graph-api
Ordering
You can order certain data sets chronologically. For example you may sort a photo's comments in reverse chronological order using the key reverse_chronological:
GET graph.facebook.com
/{photo-id}?
fields=comments.order(reverse_chronological)
order must be one of the following values:
*chronological*
*reverse_chronological*

Best practice creating a key/value map dev/prod node.js

I have a Node.js app, APP-A, that communicates with another C# app, APP-B, using APP-B's API. APP-B has a RESTful API that returns JSON. Other than a few standard fields e.g., name, description, APP-B's keys are defined when the user creates the field in the system. The resulting JSON looks like this:
{
"name": "An example name",
"description": "Description for the example",
"cust_fields": {
"cust_123": "Joe Bloggs",
"cust_124": "Essex"
}
}
I have two instances of APP-B, a dev and prod environment, which are separate installations. As a result, the JSON from the prod environment is as above, and the JSON from the dev environment looks like this:
{
"name": "An example name",
"description": "Description for the example",
"cust_fields": {
"cust_782": "Joe Bloggs",
"cust_793": "Essex"
}
}
This is dealt with in APP-A (the Node.js app) by having a JSON map like this:
{
"name": "name",
"description": "description",
"cust_fields": {
"full_name": "cust_123",
"city": "cust_124"
}
}
Which is loaded like this:
var map;
switch(env) {
case 'dev':
map = require('../env/dev/map.json');
break;
case 'prod':
map = require('../env/prod/map.json');
break;
};
module.exports = {
name: map.name,
description: map.description,
cust_fields: {
full_name: map.cust_fields.full_name,
city: map.cust_fields.city,
}
}
So I am wondering, is there is a better way of dealing with this? I don't see a way around having to create some kind of manual relationship between the key names across prod and dev, as there is no way to find out what field corresponds to what, but it seems like a lot of work.
Thanks for reading.
Update:
I have created a jsFiddle to better illustrate my question: http://jsfiddle.net/7k9k03o6.
If the mapping is unavoidable and everything is done manually right now, the next best progression would be to automate the building of those lookup maps, through some persistent storage, i.e. a database.
The general flow would be:
When APP-B creates a new form, that field information is stored in the database with all the identifying information. You could store production and dev data in the same db (as a flag) but likely they would just be different databases. Structure might be like customerId, formId, fieldName, fieldMapping, fieldValue, isProduction --> 123, 2, 'cust_124', 'city', 'Essex', true
When APP-A needs a field listing, it queries the DB for the relevant field lists."Find mapping customer X for form Y in production" --> WHERE custId = 123 AND formId = 2 AND isProduction = true would yield a list of fields and their mapping values (which you would post process/reduce into the mapping you need).
This automated process will leave less work for you manually. You shouldn't accidentally miss or forget a mapping from the hand generated file.
This will add a tiny bit of work to the server processing, as you'll need the field mapping from the DB every time a request is processed. (You could back off a bit and do one big query each time a customer is loaded, or further back is each time the server starts . . . depends how dynamic these custom fields are). Plus you would have to map DB results into a usable listing for your purposes.
Depending how many customers and custom forms you are monitoring, an automated process for that will save you a lot of time and avoid a lot of mistakes of all things hand generated.

Sequential array issue

Back end code - PHP
Front end - Angular/JavaScript
I am experimenting around with a preferential search on my website - I have users who are mapped to friends, each user can post certain content which can be "liked", my idea for the search was to count how many of the users friends have "liked" resources on the site and sort them from highest to lowest. I have the main chunk of this working (the background code) and have it returning an object that looks like:
{"results":
"post":
{"9": {"message" : "blah9"}
,
"1": {"message" : "blah"}}
}
The number is the id of the post - just a side note, which I'm using to refresh something elsewhere on the site, my problem is, is when I console.log(); this onto the screen it changes to:
{"results":
"post":
{"1": {"message" : "blah"},
"9": {"message" : "blah9"}}
}
Which makes the sorting code kind of useless, is there anyway I can stop this from happening?
$http.post('php/router.php', {'request' : 'search', 'page': 'Search', 'searchString': searchString}).success(function(data) {
console.log(data.results.post);
});
Let the Javascript side of things do the sorting and totally remove the sort from your PHP. Just have the PHP do the pagination of the set (1 to 10, 11 to 20, etc) and the Javascript can order results for you (chunks of 10 from my earlier example) for you.
Probably you'll still have some kind of sort on the PHP side if you have a ton of results to chunk up but the JS can certainly sort out each chunk that is sent to the client.

Categories

Resources