Extracting a specific set of values out of an HTML table

Extracting a specific set of values out of an HTML table - javascript

I'm in the process of teaching myself Javascript and I'm having a little trouble understanding something.
I'm trying to extract every one of the "Title" and "Instructor" values from this class registration page to make an enhanced scheduling tool for myself. However, in the examples I'm looking at, they all use the "getElementsByClassName(class)" and "getElementsById(id)" to extract specific information from an HTML table. When I look at the page source in chrome, I am not able to find either a unique class name or id to specify in these calls.
Would someone mind pointing me in the right direction? Am I using the page source code correctly or is there a better way of doing things?
EDIT: Here's the html of the page in question
view-source:https://admin.wwu.edu/pls/wwis/wwsktime.ListClass

You can use querySelectorAll to use CSS selectors.
document.querySelectorAll("tr>td:nth-child(3)") and document.querySelectorAll("tr>td:nth-child(8)") will give you all Title and Instructor elements
Here's a jsfiddle of it https://jsfiddle.net/n1fuo87p/

No you're not really doing anything wrong, but unfortunately the creators of the web page haven't made use of classes and ids in a way that will make them useful to you.
I'd recommend creating a Google Sheet to import the table. (See the importHTML function in Google sheets.) Then I'd retrieve the data as JSON and work with it that way. IMO you'll learn more valuable skills working with JSON than you will parsing HTML too. This article will take you through getting JSON out of your Google sheet: http://ctrlq.org/code/20004-google-spreadsheets-json

Related

Get data from another HTML page

I am making an on-line shop for selling magazines, and I need to show the image of the magazine. For that, I would like to show the same image that is shown in the website of the company that distributes the magazines.
For that, it would be easy with an absolute path, like this:
<img src="http://www.remotewebsite.com/image.jpg" />
But, it is not possible in my case, because the name of the image changes everytime there is a new magazine.
In Javascript, it is possible to get the path of an image with this code:
var strImage = document.getElementById('Image').src;
But, is it possible to use something similar to get the path of an image if it is in another HTML page?

Assuming that you know how to find the correct image in the magazine website's DOM (otherwise, forget it):
the magazine website must explicitly allow clients showing your website to fetch their content by enabling CORS
you fetch their HTML -> gets you a stream of text
parse it with DOMParser -> gets you a Document
using your knowledge or their layout (or good heuristics, if you're feeling lucky), use regular DOM navigation to find the image and get its src attribute
I'm not going to detail any of those steps (there are already lots of SO answers around), especially since you haven't described a specific issue you may have with the technical part.

You can, but it is inefficient. You would have to do a request to load all the HTML of that other page and then in that HTML find the image you are looking for.
It can be achieved (using XMLHttpRequest or fetch), but I would maybe try to find a more efficient way.

What you are asking for is technically possible, and other answers have already gone into the details about how you could accomplish this.
What I'd like to go over in this answer is how you probably should architect this given the requirements that you described. Keep in mind that what I am describing is one way to do this, there are certainly other correct methods as well.
Create a database on the server where your app will live. A simple MySQL DB will work, but you could use anything. Create a table called magazine, with a column url. Your code would pull the url from this DB. Whenever the magazine URL changes, just update the DB and the code itself won't need to be changed.
Your front-end code needs some sort of way to access the DB. One possible solution is a REST API. This code would query the DB for the latest values (in your case magazine URLs), and make them accessible to your web page. This could be done in a myriad of different languages/frameworks, here's a good tutorial on doing something like this in Node.js and express (which is what I'd personally use).
Finally, your front-end code needs to call your REST API to get the updated URLs. This needs to be done with some kind of JavaScript based language. jQuery would make this really easy, something like this:
$(document).ready(function() {
$.Get("http://uri_to_your_rest_api", function(data) {
$("#myImage").attr("scr", data.url);
}
});
Assuming you had HTML like this:
<img id="myImage" src="">
And there you go - You have a webpage that pulls the image sources dynamically from your database.
Now if you're just dipping your toes into web development, this may seem a bit overwhelming. But I promise you, in the long run it'll be easier then trying to parse code from an HTML page :)

Sharepoint - How to: dynamic Url for Note on Noteboard

I'm quite new to SharePoint (about 1 week into it actually) and I'm attempting to mirror certain functionality that my company has with other products. Currently I'm working on how to duplicate the tasking environment in Box.com. Essentially it's just an email link that goes to a webpage where users can view an image and comments related to that image side by side.
I can dynamically load the image based on url parameters using just Javascript so that part is not a problem. As far as the comments part goes I've been trying to use a Noteboard WebPart, and then my desire is to have the "Url for Note" property to change dependent on the same URL parameter. I've looked over the Javascript Object Model and Class Library on MSDN but the hierarchy seems to stop at WebPart so I'm not finding anything that will allow me to update the Url for Note property.
I've read comments saying that there's a lot of exploration involved with this so I've tried the following:
-loading the javascript files into VisualStudio to use intellisense for looking up functions and properties in the SP.js files.
-console.log() on: WebPartDefinitionCollection, WebPartDefinition, WebPart, and methods .get_objectData(), get_properties() on all the previous
-embedding script in the "Builder" on the Url for Note property (where it says "click to use Builder" - I'm still not sure what more this offers than just a bigger textbox to put in the URL path)
I'm certain I've missed something obvious here but am gaining information very slowly now that I've exhausted the usual suspects. I very much appreciate any more resources or information anyone has and am willing to accept that I may be approaching this incorrectly if someone has accomplished this before.
Normally I'd keep going through whatever info I could find but I'm currently on a trial period and start school back up again soon so I won't have as much time with it. Apologies if this seems impatient, I'm just not sure where else to look at the moment.

Did you check out the API libraries like SPServices or SharepointPlus? They could help you doing what you want...
For example with SharepointPlus you could:
Create a Sharepoint List with a "Note" column and whatever you need to record
When the user goes to the page with the image you just show a TEXTAREA input with a SAVE button
When the user hits the SAVE button it will save the Note to the related list using $SP().list("Your list").add()
And you can easily retrieve the information (to show them to the user if he goes back to the page) with $SP().list("Your list").get()
If I understood your problem, that way it may be easier for you to deal with a customized page :-)

JavaScript: Using OmniGrid with loadData

I'm trying to work with the nice OmniGrid control.
Everything is great when I'm setting a URL for an ASP.NET handler that returns an answer.
My problem starts when I'm trying to use the data provider.
I was breaking my fingers trying to find a piece of information of how to use the data provider (can't find the appropriate format of the table content result).
Anyone familiar with such tutorial/example?

My I know how you are trying to integrate it in your pages? and what kind of backed u have to bind?

How do I extract data from a website using javascript.

Hi complete newbie here so bear with me. Seems like a simple job but I can't seem to find an easy way to do this.
So I need to extract a particular text from a webpage "www.example.com/index.php". I know that the text would be available in p tag with certain id. How do I extract this data out using javascript?
What I'm trying currently is that I have my javascript file (trying.js) on my computer with the following code:
$(document).ready(function () {
$.get("www.example.com/index.php", function(data) {
console.log(data)
}) ;
});
and a html that runs the javascript file.
When I open this html page with firefox it doesn't show me anything in console. How do I get the website's data? Am I on the correct track here? Is there a better way to do this?

What you're looking for is a page scraper. Javascript can't pull it off because it can only gather data from the domain you're on.
You could build it in Ruby, for example, and use one of the many existing gems for this sort of task, like https://github.com/assaf/scrapi or http://nokogiri.org/

Please take a look at Can Javascript read the source of any web page?
There are multiple ways discussed. Hope it helps you.

Does google robot index text from javascript document.write()?

Lets say I have this:
<script type="text/javascript">
var p = document.getElementById('cls');
p.firstChild.nodeValue = 'Some interesting information';
</script>
<div id="cls"> </div>
So, google robots will index text Some interesting information or not?
Thanks!

AFAIK, google robot will now indexing AJAX and Javascript stuff.For reference please follow:
http://www.submitshop.com/2011/11/03/google-bot-now-indexing-ajax-javascript
Get google to index links from javascript generated content

Update
SearchEngine watch has recently mentioned that Google bot has been improvised to read JavaScript, to quote exactly
it can now read and understand certain dynamic comments implemented
through AJAX and JavaScript. This includes Facebook comments left
through services like the Facebook social plugin.

We've had a need to hide pieces of information on pages from GoogleBot. As the information wasn't extremely sensitive, we've used document.write()-s to avoid searchbots indexing content in question.
Later in 2011 Q3 I've found that GoogleBot did index the scripted content, so I'm pretty sure now that Google is indexing much more than just fetching URLs from content, even though it's really not documented anywhere deeply.

Google doesn't index the JavaScript code or the generated content. You will only see it in the cache because the cached page consists of the complete file including the JavaScript code and your browser renders it. Google does scan JavaScript for URLs to crawl, so if the code is pulling content from an external file via Ajax, etc., there's a chance that the external file will also be indexed, but separate from the parent page. If you want the content to be indexed, it's got to be in plain HTML. Good luck!

Develop Reference

JavaScript is the programming language of the Web.