I'm using the IMPORTXML function on google sheets to import the latest visa numbers from the Canadian government however I can import everything on the web page apart from the data/numbers which I think could be down to JavaScript.
I simply just need to find a way to pull that data into the spreadsheet but I'm not experienced with JS.
Here's the website and here's my query:
=IMPORTXML("https://www.cic.gc.ca/english/work/iec/selections.asp?country=au&cat=wh",
"//div[#class='col-md-8']")
Here is an example sheet.
Unfortunately Google Sheet's formula IMPORTXML is only able to read the static HTML source of the page so it will not be able to read any dynamically inserted element (as you guessed the visa number is dynamically inserted with a Javascript script).
If you inspect in your browser the page source of this site, you will see that the numbers like Candidates in the pool are not there and therefore, IMPORTXML will not be able to reach them.
In order to get them you will need to look for another web scraping technique (with libraries like scrappy and so on).
I hope this has helped you. Let me know if you need anything else or if you did not understood something. :)
Related
Hey so currently working on my first personal project so bear with the questions!
Currently trying to create a Javascript program that will parse info from google forms to produce slides displaying the info. So far from my research the best way I've found to facilitate this process is googles app script editor. However, I was wondering if I can run this code by requesting it from a different javascript (or maybe even java) program that I will write code on webstorm. If I cant do this what is the best way to utilize the google apps script editor?
Thanks!
Google Apps Script is just javascript with extra built-in APIs (like SpreadsheetApp, FormApp, etc.).
It also has a UrlFetchApp API.
So you can run code like this:
// The code below logs the HTML code of the Google home page.
var response = UrlFetchApp.fetch("http://www.google.com/");
Logger.log(response.getContentText());
As such, if you want to provide JavaScript from elsewhere, you could fetch it and then eval it on the Google Apps Script side. (but we all know how tricky eval can get)
One other option is to have your own server side written using Google App Engine (or any other framework) and use Google's OAuth and authorize your app to fetch data from the Forms form
Slides and Google Apps Script
You might like to take a look at the addon "Slides Merge" by Bruce McPherson. I've never used it but it sounds like it might work for you. Here's what it's looks like in the addon store:
Getting information from Google Forms is a snap with google apps script since your can link the form right up to a spreadsheet. The Google Apps Script documentation is really quite good these days. Here's the documentation link. Google Apps Script is loosely based on Javascript 1.6. If your already a programmer my guess is that you'll have few problems learning to use it. In my experience the most difficult thing was dealing with the arrays of arrays produced by the getValues() method of ranges in google apps script and I made a short video that might be of some help to you.
I also have a script that I wrote in Google Apps Script that produces a sheet show that is a slide show inside of a spreadsheet.
I've found that using the Script Editor is pretty easy. There's some documentation in the support section of the documentation. It can be a bit buggy at times but overall I think it's a pretty good tool.
I am curious about saving space and real time editing code. Google docs don't take up any space, so I'm curious to see if it's possible, and with that there could be extentions that color code Google docs? And i want to stay off saying i know this is a funky way to do it but I'm curious to see if it's possible.
The way I imagine it working is like this:
Google doc
If you look at the document, you see that it is named "script.js" and the text inside is printing hello world.
So then in the html page it could be something like
<script type=text/javascript src="https://docs.google.com/document/d/1-SCNoJSQlMGJh-hmBLIwlyh_4eM9IisJqspARMMNKg0?plaintext"></script>
(or plainhtml)
I have honestly no idea what the syntax could be of how to do it, but i hope i can get the point across.
And then for Google sheets, it could be something like
<script type=text/javascript src="https://docs.google.com/spreadsheets/d/1-cUUw0KJ8k87hTrPWhERRNyr8r-_hZn5tW5sbPsSiLc?plaintext&c=a&r=1"></script>
That would retrieve "hello world" from the Google sheet.
You could also go as far as doing this in the main html page:
<script type="text/javascript">
var sheet = "https://docs.google.com/spreadsheets/d/1-cUUw0KJ8k87hTrPWhERRNyr8r-_hZn5tW5sbPsSiLc"
// then getting the spot (c,4) where column is "c", and row is "r".
Document.write(sheet+"?c=c&r=4");
</script>
Any ideas?
Edit
It looks like the only way to get the raw text of the google doc would be to use the Drive API. But this is no good for your use case as it must be authenticated even when the file is publicly accessible. Leaving the possible awkward solution of downloading the file with JavaScript, and converting the text into functions.
Looks like this might be possible. However you will have to convert the doc from a editable state to a raw text state. You can take a look at this link for more info on that:
The URL for a document's raw text? (outdated)
But why not just use GitHub to host the code? (or other code oriented services) Github will even host your entire website (with restrictions).
If you looking for code collaboration Google docs is horrible for code formatting. I might point you towards this question for options other than Google Docs.
For Google sheets as a database That might be more feasible. I might take a look at this link. But now there are several free tier database as a service platforms with much richer features. (for databases)
In the end it might be a cool hack but ultimately painful for real use cases when compare to alternatives.
I'm trying to get stock quotes from Yahoo Finance or Google Finance.
I know Google sheet's IMPORTXML function can basically do this, but into a spreadsheet. I want to get the price of the stock and then have a program decide to buy / sell / anything else.
So I'm trying to find the JavaScript source code for the IMPORTXML function. I think it's JavaScript but I haven't actually found it so I don't actually know. Also if you know of another way to accomplish this I'm open to it!
A few weeks ago I started learning Javascript and the Google Apps Script API, specifically in regard to spreadsheets. I have been trying to make a spreadsheet that fetches web pages and pulls stats about my friends for the game League of Legends. However, I have been running into a problem with the site I want to use, which is basically the only free LoL stats site that updates frequently. I'm not familiar at all with web development, but it seems when I try to access a page on lolking.net, for example http://www.lolking.net/summoner/na/60783 with Google's UrlFetchApp.fetch() it does not load the dynamic page. So instead of the final source, I get this which doesn't help me. Is there an easy way around this or would I simply have to use another website?
Thanks for thie info! Although it turns out I was mistaken. The UrlFetchApp was indeed returning the full source code, but I was using GAS's Logger to view the text. It seems the Logger has a length limit, so when I searched for the stats I wanted they weren't there simply because the source code got truncated. So, due to an oversight on my part, I never had a problem in the first place. For other people reading this question, in the end I have no idea how UrlFetchApp works with dynamic pages using clientside js (you'd probably want to talk to the poster below or post a new question).
You are getting fhe raw html page with clientside js included. That wont work from any system not just gas. You need to debug that page js and find where it does an ajax call to get the data you want.
Then do the same from your gas. Might not work if the call is authenticated etc.
a couple of google questions:
1 - is there ANY chance that google will "see" text retrieved using ajax?
the user selects from a chain of select boxes and some text from the Db is displayed.
2 - if i change the page title using javascript, outside the HEAD area, will google index the modified title?
sorry if these are trivial questions and thanx for any reply
have a nice day :-)
What Google sees is what you see when you disable javascript on your browser. So the answer to both your questions is no.
The correct way to have all the data of your site indexed is to degrade gracefully inside <noscript> tags. For example, you could offer an interface to browse all the content of your database, using list and sublists of requests that point to proper result pages, that are well integrated in your site.
Warning note: your content must really be a noscript version of your site. If you create a special site, it becomes cloaking, which is forbidden.
Update: Since 2014, Google seems to support everything you can think of (including javascript and ajax).
Try using seo-browser.com or lynx browser to see how google see your site.
Also see this answer on Googlebot doesn't see jquery generated content and/or this document by Google, on ways you can have your AJAX content spidered.