Using JavaScript for scanning wikipedia articles

Using JavaScript for scanning wikipedia articles - javascript

I recently saw http://www.histography.io/ - system that uses HTML, CSS and Javascript to scan Wikipedia articles when you hover over a point and grabs the articled and the related youtube video so it can it be displayed to you.
I was exploring the system in the past two hours but can't seem to find the way it fetches the big data that it's in use.
Any pointers to the technique or functions used to fire the events in JS would be highly helpful.

In terms of how the videos are being rendered, it looks like there is a large manifest file with video Ids which I presume correspond to youtube video Ids. Refer to: http://www.histography.io/int.js
The main scripts are uglified so it's hard to tell you exactly what each function is doing.
For future reference I'd suggest checking out the network tab of a website in dev-tools to get a better understand of where resources are coming from.
Also when you request a wiki page it is making a request to a sever which takes the following inputs at the endpoint /wiki_page.php:
link
title
year

Related

How to embed a self updating photo album that fetches images from Google Drive

So first of pardon me for being really noobish at this. I'm trying to build a site from an HTML5 template for a small music company I work with for fun and educational reasons. I have very limited knowledge of Web development, but I am able to edit the template and have done quite a bit of reading to try to understand how everything works. I find that it's much easier to do so than having to build it from scratch, which could take months to do so properly. (Not to say I don't want to learn, but I'd like to get this done quickly if I can).
So let me get right to it, and I'll try to be short. The company is a music promotion company. They need the following pages: about, team, in-house musician bios, blog, contact form and events. I have pretty much everything figured out (the blog and musicians currently need to be properly finished in terms of content and CSS), except for the events page.
Currently its a static album that pulls images link by link from a Google Drive folder instead of the Web server (For ease of access). The issue with this is that every time an event is outdated or new ones need to be added, I have to manually update the index file and change the links, which is time consuming and unnecessary. I want this site to be fairly autonomous, so that I don't have to log into the server every few days and change the index file over and over. I hope you understand where I am coming from.
So, my idea is to use the same Drive folder, but have the site automatically pull the images, properly resize them (if possible) and show them on the page. I haven't found an easy solution to this yet, and with my lack of web experience, I don't think I can write this myself. I know self updating albums that pull form the server exist and can be found online, but those still require FTP access, which is not bad, but can be improved on.
So the process I think is as follows
Fetch images from shared drive folder via Drive API
Fetch individual image link
Insert each link dynamically into an array/table.
Have individual row/column resize the image.
Here is what the site I am working on currently looks like: http://rushone2010.x10host.com/ocml
And here is their current site: http://www.ocmusicleague.com/
The culprit is in the idea of automatically fetching the images every time they get updated/removed from the Drive folder, without server access. Sort of reminds me of this:

How to extend HTML of an existing site via JavaScript or similar

I want to add a bit of extra HTML to an existing site based on a REST API call response.
Specifically, www.arbookfind.com lets you search for kids school books with an "AR" test. (My son has to read a certain number of books at a level.) It has a link to amazon.com if you find a book you want to buy. However I would like to know if available for Kindle (most are not). Right now I have to click the Amazon link, check the page, go back and try next one - it can take 10 tries to find one available on the Kindle. Painful!
I was after ideas of the easiest way to do this. That is, without touching the arbookfind.com web site, can I add some JavaScript (jQuery) to all the returned HTML pages. The JavaScript will look in the returned page for each book, fire off a Amazon ItemSearch query (?) to see if available on Kindle, then inject a HTML link to the Kindle book on Amazon. I can learn how to write the JavaScript - I am just after some pointers for the easiest way to augment the current site.
That way I can use the current arbookfind.com site to find a book, but it is faster for me to identify which books are available on Kindle without manually trying each link by hand.
E.g. a web browser plugin that runs some javaScript on each returned page? A varnish proxy with some smart logic to fiddle pages on the way through? A PHP app acting like as a proxy server? Thanks!

Maybe you want to have something like the chrome extension Tampermonkey.
It allows to add and manage userscripts for websites. Means, a javascript "snippet" which is added to websites maching specific patterns.

Unable to see complete scraped web page in Google Apps Script logs

A few weeks ago I started learning Javascript and the Google Apps Script API, specifically in regard to spreadsheets. I have been trying to make a spreadsheet that fetches web pages and pulls stats about my friends for the game League of Legends. However, I have been running into a problem with the site I want to use, which is basically the only free LoL stats site that updates frequently. I'm not familiar at all with web development, but it seems when I try to access a page on lolking.net, for example http://www.lolking.net/summoner/na/60783 with Google's UrlFetchApp.fetch() it does not load the dynamic page. So instead of the final source, I get this which doesn't help me. Is there an easy way around this or would I simply have to use another website?

Thanks for thie info! Although it turns out I was mistaken. The UrlFetchApp was indeed returning the full source code, but I was using GAS's Logger to view the text. It seems the Logger has a length limit, so when I searched for the stats I wanted they weren't there simply because the source code got truncated. So, due to an oversight on my part, I never had a problem in the first place. For other people reading this question, in the end I have no idea how UrlFetchApp works with dynamic pages using clientside js (you'd probably want to talk to the poster below or post a new question).

You are getting fhe raw html page with clientside js included. That wont work from any system not just gas. You need to debug that page js and find where it does an ajax call to get the data you want.
Then do the same from your gas. Might not work if the call is authenticated etc.

Getting most recent youtube video links for a user using API

So, I've been reading the Youtube API--I'm interested in showing the three most recent videos uploaded by a user. But, as I've never navigated an API or done this kind of work before, I'm a bit confused by what exactly the API here is trying to tell me. What I DO understand is that if I enter a URL like the following:
https://gdata.youtube.com/feeds/api/users/aosjeff/uploads
Then I'll get a ton of information in a kind of list. What I don't understand is how to navigate that list in HTML, and make it return a link to the most recent video (or second most recent, etc) so that I can embed that video into the page. Can anyone explain this to me? Really appreciate the help!
Note: I'm working within site building software that will not allow me to use PHP or reference .php files.
Simon

You should look at parsing the xml data with php. This is probably the easiest way of doing things. Beginners tutorial here:
http://www.kirupa.com/web/xml_php_parse_beginner.htm

Factoring in SEO on a Flash Site

There have been many debates about this topic already here, but none of them fully answered my question so I figured I would pose it and hope I get one or two decent answers.
We're planning on relaunching our company website in the next few months. Our current site, for the most part, is text-driven and because of this we rank very well on Google, Yahoo, and Bing for our primary keywords. We want to increase the "Wow Factor" of the site a bit (we're an interactive agency) but still maintain a majority of our search engine footing. The option to use Flash, AJAX, and other technologies that are not considered to be search engine friendly have come up numerous times in our meetings and each time we have to evaluate what kind of impact it would have on us from an SEO perspective.
Assuming a good portion of the site content will be encapsulated within a Flash (swf) file, what would be the best course of action for maintaining current rankings? I've read numerous times that Google indexes Flash files but I am unsure as to what extent. Further, is there a method of telling Google not to index a Flash file (through a variable or otherwise)?
Finally, I had an idea that seemed sound in theory and wanted to put it out into the world and see what type of feedback I receive on it:
Again, assuming the whole page is in a Flash file living on index.html, would it be possible to build out the site as normal (set up a logical directory structure, add content to static pages within said structure, etc), specify paths to those static pages in a Google XML Sitemap file, and have the spiders crawl only those pages (which are rich in content) while the user experiences some concoction of Flash/Javascript/AJAX/etc? If this works, what would be the pros/cons of this solution? Thanks for bearing with me on this slightly off-kilter question.

Well referencing Google I found that they have made impressive strides into indexing Flash based web pages. The only limitation I found from reading the article is that they are currently still limited in their ability in these three areas:
Googlebot does not execute some types of JavaScript. So if your web
page loads a Flash file via
JavaScript, Google may not be aware of
that Flash file, in which case it will
not be indexed.
We currently do not attach content from external resources that are
loaded by your Flash files. If your
Flash file loads an HTML file, an XML
file, another SWF file, etc., Google
will separately index that resource,
but it will not yet be considered to
be part of the content in your Flash
file.
While we are able to index Flash in almost all of the languages found on
the web, currently there are
difficulties with Flash content
written in bidirectional languages.
Until this is fixed, we will be unable
to index Hebrew language or Arabic
language content from Flash files.
By the sounds of it you won't have any problems with any of the 3 'problems'. Based on this document Flash sounds like a viable option for you.
Adobe has been working on their end as well to accommodate the search engines in their stride to make SWFs more search engine friendly as well. So with the combined efforts of both Adobe and Google/Yahoo if you take a dip in ranking within a year or two the search algorithms will be better than they are even now.
As far as not indexing you should be able to add in a simple
User-agent: *
Disallow: /directory/
Disallow: /directory/page.html
to your robots.txt file.

Andrew,
I've had to deal with this sort of thing a few times and I'd recommend maintaining both a Flash site (for users) and an HTML site (for search engines). Here's how you do it:
With whatever server-side stuff you're using set up some kind of switch that determines whether a particular request is for HTML or for whatever your Flash movie consumes (XML, JSON, another SWF, whatever). Every page on your site should be able to return HTML and whatever you choose to feed your Flash movie. A query string parameter like "requestType=Flash" will work just fine.
Put all of the content in your HTML pages in a div tag and make the div invisible with CSS. Use SWFObject to check if the requesting browser supports Flash and, if it does, have SWFObject replace your HTML content with your Flash movie. Search engine spiders will ignore your scripts and simply crawl your HTML pages and if you'd like to show the HTML to users with browsers that don't support Flash (like mobile browsers), just make the HTML content visible after SWFObject has determined that the browser doesn't support Flash.
Once your Flash movie has loaded, have it request whatever data it needs from the server using the same URL of the page that it was loaded on, but with the addition of the switch variable above.
Handle navigation from that point on with SWFAddress. When a user clicks a button to request a new page, pass the request through SWFAddress first, which will update the browser history using the hash mark trick, and then have your Flash movie make its request to the server.
I'm currently working on a site for a friend that uses this technique here (I should note, to protect my pride, that the site is still very much a work in progress):
http://www.casabarbuenosaires.com/
A browser request to any page on the site will first return the HTML representation of that page (you can view source in your browser to see that). SWFObject then replaces the HTML content with a Flash movie that loads a custom XML description of the same page which the Flash movie then constructs and displays.
I've worked on sites in the past that have used this technique and gotten excellent search engine results. Since you don't need to worry too much about what your HTML site looks like to humans, you can focus solely on what it looks like to search engines.
Another added benefit of building your site this way is that you are compelled to separate your site's content/copy from its visual representation. Throwing your entire site into a single SWF is generally NOT a good way to do that. It's much easier to maintain (or re-skin or scrap) a site when your content isn't all mixed up with your code.
Hope this helps,
Scott

Develop Reference

JavaScript is the programming language of the Web.