Does google robot index text from javascript document.write()? - javascript

Lets say I have this:
<script type="text/javascript">
var p = document.getElementById('cls');
p.firstChild.nodeValue = 'Some interesting information';
</script>
<div id="cls"> </div>
So, google robots will index text Some interesting information or not?
Thanks!

AFAIK, google robot will now indexing AJAX and Javascript stuff.For reference please follow:
http://www.submitshop.com/2011/11/03/google-bot-now-indexing-ajax-javascript
Get google to index links from javascript generated content

Update
SearchEngine watch has recently mentioned that Google bot has been improvised to read JavaScript, to quote exactly
it can now read and understand certain dynamic comments implemented
through AJAX and JavaScript. This includes Facebook comments left
through services like the Facebook social plugin.

We've had a need to hide pieces of information on pages from GoogleBot. As the information wasn't extremely sensitive, we've used document.write()-s to avoid searchbots indexing content in question.
Later in 2011 Q3 I've found that GoogleBot did index the scripted content, so I'm pretty sure now that Google is indexing much more than just fetching URLs from content, even though it's really not documented anywhere deeply.

Google doesn't index the JavaScript code or the generated content. You will only see it in the cache because the cached page consists of the complete file including the JavaScript code and your browser renders it. Google does scan JavaScript for URLs to crawl, so if the code is pulling content from an external file via Ajax, etc., there's a chance that the external file will also be indexed, but separate from the parent page. If you want the content to be indexed, it's got to be in plain HTML. Good luck!

Related

How to scrape 'src' or 'href' value when it uses Javascript?

Perhaps this is a simple solution, but I'm just really stuck on this one.
Say when you would pull the value of 'href' from a webpage using BeautifulSoup, for example:
soup.find("a", {"id" : "home-page"})['href']
How would you do this if the element looked like this:
<a id="main_lnkWool" class="WhiteLinkText Canela-Medium-Web" href="javascript:__doPostBack('ctl00$main$lnkWool','')">Wool</a>
When the value of the url is pulled from a javascript query?
I can see the jquery.js file the site is using, I'm just not sure how to pull the url using all the pieces together. All I'm trying to do is to use requests to scrape the url's of certain ranges of products.
Here is a link for reference: https://www.kersaintcobb.co.uk/home
The links I'm trying to extract are under the tab 'Our Products'.
I know there are only 6 pages in total, and yes I could just copy and paste them at this point lol! But it's a question I need answering anyway as I've encountered this same problem on other projects so would really help me out if I knew how to solve it.
Thank you :)
Maybe not the best approach, but with JS sites what I have been able to do is use a webdriver, which is a web browser you can control from code (which you can make invisible btw, like hide it from sight). Wait till it loads then pass the source code to BS4. For more info: https://chromedriver.chromium.org/getting-started

Get data from another HTML page

I am making an on-line shop for selling magazines, and I need to show the image of the magazine. For that, I would like to show the same image that is shown in the website of the company that distributes the magazines.
For that, it would be easy with an absolute path, like this:
<img src="http://www.remotewebsite.com/image.jpg" />
But, it is not possible in my case, because the name of the image changes everytime there is a new magazine.
In Javascript, it is possible to get the path of an image with this code:
var strImage = document.getElementById('Image').src;
But, is it possible to use something similar to get the path of an image if it is in another HTML page?
Assuming that you know how to find the correct image in the magazine website's DOM (otherwise, forget it):
the magazine website must explicitly allow clients showing your website to fetch their content by enabling CORS
you fetch their HTML -> gets you a stream of text
parse it with DOMParser -> gets you a Document
using your knowledge or their layout (or good heuristics, if you're feeling lucky), use regular DOM navigation to find the image and get its src attribute
I'm not going to detail any of those steps (there are already lots of SO answers around), especially since you haven't described a specific issue you may have with the technical part.
You can, but it is inefficient. You would have to do a request to load all the HTML of that other page and then in that HTML find the image you are looking for.
It can be achieved (using XMLHttpRequest or fetch), but I would maybe try to find a more efficient way.
What you are asking for is technically possible, and other answers have already gone into the details about how you could accomplish this.
What I'd like to go over in this answer is how you probably should architect this given the requirements that you described. Keep in mind that what I am describing is one way to do this, there are certainly other correct methods as well.
Create a database on the server where your app will live. A simple MySQL DB will work, but you could use anything. Create a table called magazine, with a column url. Your code would pull the url from this DB. Whenever the magazine URL changes, just update the DB and the code itself won't need to be changed.
Your front-end code needs some sort of way to access the DB. One possible solution is a REST API. This code would query the DB for the latest values (in your case magazine URLs), and make them accessible to your web page. This could be done in a myriad of different languages/frameworks, here's a good tutorial on doing something like this in Node.js and express (which is what I'd personally use).
Finally, your front-end code needs to call your REST API to get the updated URLs. This needs to be done with some kind of JavaScript based language. jQuery would make this really easy, something like this:
$(document).ready(function() {
$.Get("http://uri_to_your_rest_api", function(data) {
$("#myImage").attr("scr", data.url);
}
});
Assuming you had HTML like this:
<img id="myImage" src="">
And there you go - You have a webpage that pulls the image sources dynamically from your database.
Now if you're just dipping your toes into web development, this may seem a bit overwhelming. But I promise you, in the long run it'll be easier then trying to parse code from an HTML page :)

loading a external content so that searchable by Google for SEO purposes

I'm working on a project where we'd like to load external content onto a customers site. The main requirements are that we'd like the customer to have as simple of an include as possible (like a one-line link similar to Doubleclick) and would preferably not have to be involved in any server-side language. The two proposed ways of doing this were an iframe or loading a javascript file that document.write's out the content.
We looked more at the latter since it seemed to produce more reliable legibility and simplicity for the end user - a single line of Javascript. We have been hit with the reality that this will be indexed unpredictably by Google. I have read most of the posts on this topic regarding javascript and indexing (for example http://www.seroundtable.com/google-ajax-execute-15169.html, https://twitter.com/mattcutts/status/131425949597179904). Currenlty we have (for example):
<html>
<body>
<div class='main-container'>
<script src='http://www.other.com/page.js'></script>
</div>
</body>
</html>
and
// at http://www.other.com/page.js
document.write('blue fish and green grass');
but it looks like google indexes this type of content only sometimes based upon 'Fetch As Google' used in Google's webmaster tools. Since it does sometimes work, I know it's possible for this indexing to be ok. More specifically, if we isolate our content to something like the above and remove extraneous content, it will index it each time (as opposed to the EXACT SAME Javascript in a regular customer html page). If we have our content in a customer's html file it doesn't seem to get indexed.
What would be a better option to ensure that Google has indexed the content (remote isn't any better)? Ideas I have tried / come across would be to load a remote file in for example PHP, something like:
echo file_get_contents('http://www.other.com/page');
This is obviously blocking but possibly not a deal-breaker.
Given the above requirements, would there be any other solution?
thx
This is a common problem and I've created a JS plugin that you can use to solve this.
Url: https://github.com/kubrickology/Logical-escaped_fragment
Make sure to use the: __init() function instead of standard DOM ready functions and you know for sure that Google is able to index.

Sharepoint - How to: dynamic Url for Note on Noteboard

I'm quite new to SharePoint (about 1 week into it actually) and I'm attempting to mirror certain functionality that my company has with other products. Currently I'm working on how to duplicate the tasking environment in Box.com. Essentially it's just an email link that goes to a webpage where users can view an image and comments related to that image side by side.
I can dynamically load the image based on url parameters using just Javascript so that part is not a problem. As far as the comments part goes I've been trying to use a Noteboard WebPart, and then my desire is to have the "Url for Note" property to change dependent on the same URL parameter. I've looked over the Javascript Object Model and Class Library on MSDN but the hierarchy seems to stop at WebPart so I'm not finding anything that will allow me to update the Url for Note property.
I've read comments saying that there's a lot of exploration involved with this so I've tried the following:
-loading the javascript files into VisualStudio to use intellisense for looking up functions and properties in the SP.js files.
-console.log() on: WebPartDefinitionCollection, WebPartDefinition, WebPart, and methods .get_objectData(), get_properties() on all the previous
-embedding script in the "Builder" on the Url for Note property (where it says "click to use Builder" - I'm still not sure what more this offers than just a bigger textbox to put in the URL path)
I'm certain I've missed something obvious here but am gaining information very slowly now that I've exhausted the usual suspects. I very much appreciate any more resources or information anyone has and am willing to accept that I may be approaching this incorrectly if someone has accomplished this before.
Normally I'd keep going through whatever info I could find but I'm currently on a trial period and start school back up again soon so I won't have as much time with it. Apologies if this seems impatient, I'm just not sure where else to look at the moment.
Did you check out the API libraries like SPServices or SharepointPlus? They could help you doing what you want...
For example with SharepointPlus you could:
Create a Sharepoint List with a "Note" column and whatever you need to record
When the user goes to the page with the image you just show a TEXTAREA input with a SAVE button
When the user hits the SAVE button it will save the Note to the related list using $SP().list("Your list").add()
And you can easily retrieve the information (to show them to the user if he goes back to the page) with $SP().list("Your list").get()
If I understood your problem, that way it may be easier for you to deal with a customized page :-)

Using JavaScript to "Create" a Microsoft Word Document

I would like to dynamically create a document using JavaScript and then open that document in Microsoft word. Is this possible? Here is my current code:
<html>
<head>
<title></title>
<script src="js/jquery-1.4.4.js" type="text/javascript"></script>
</head>
<body>
<div id="myDiv">The quick brown fox jumped lazly over the dead log.</div>
<script type="text/jscript">
var printWindow = window.open("", "Print", "width=800,height=400,scrollbar=0");
var printAreaHtml = $("#myDiv").attr("outerHTML");
printWindow.document.open("text/html", "replace");
printWindow.document.writeln("<html><head>")
printWindow.document.writeln("<meta HTTP-EQUIV='Content-Type' content='application/vnd.ms-word'>");
printWindow.document.writeln("<meta HTTP-EQUIV='Content-Disposition' content='attachment;filename=print.doc'>");
printWindow.document.writeln("</head>");
printWindow.document.writeln("<body>");
printWindow.document.write(printAreaHtml);
printWindow.document.writeln("</body>");
printWindow.document.writeln("</html>");
printWindow.document.close();
// printWindow.print();
</script>
</body>
</html>
I'm not sure exactly what you are trying to do in your code up there but here is some information i found about accessing a word document and a table within the doc:
Microsoft Word Object Model
This object model is part of Microsoft Word (not Javascript) and it lets you "automate" word remotely from other programs (not just web pages, but any computer program).
It is primarily designed for Visual Basic, but can be accessed by Javascript from a web page - see para 2 below.
However it is a bit more tricky to use through Javascript, particularly because you cannot use visual basic constants - you need to refer to them by value. If you research this further, you will soon know what I mean by this.
So where can you find out about this Object Model?
It is all there in the Word help files if you look for it.
If you look in the Word help, under programming information, you will find the Microsoft Word Visual Basic Programming Reference.
The Word object model, which lets you do things you will need to solve your problem like:
Open Word
Open a Document in Word
Access the collection of Tables in that ActiveDocument.
Access the Rows and Cells of a given Table.
How do you access this from Javascript?
This might only be done I think through Internet Explorer (and perhaps Opera).
Here you need to learn about ActiveXObjects.
ActiveXObjects (if you do not know) are separate computer programs which enable additional functionality. There are lots of ActiveX objects on the internet.
When you install Word, this also installs an ActiveX object for automating word, giving you access to the Word Object Model.
So in javascript, lets open up a new instance of word:
var oApplication=new ActiveXObject("Word.Application");
oApplication.Visible=true; // "Visible" is in the Word Object Model`
There you have it.
Then if you want to open you file and get the table:
oApplication.Documents.Open("myfilename");
var oDocument=oApplication.ActiveDocument;
var oTable=oDocument.Tables(1);`
And now I leave it to you to keep going with the rest.
EDIT: this wasn't possible when the question was asked but in 2017 it is. See link from comment by jrm - http://www.effectiveui.com/blog/2015/02/23/generating-a-downloadable-word-document-in-the-browser/
Browser place some serious restrictions on Javascript which will prevent you creating a downloadable file. See this related question:
Create a file in memory for user to download, not through server
I don't believe that this idea will work. You need to create the Word file with a serverside language. For example PHP: http://www.webcheatsheet.com/php/create_word_excel_csv_files_with_php.php
You can not get this working using client side. Main thing is you need to send headers not as html. So I would suggest you to use server side scripting as Max suggested and preferably use .htaccess file if you are using Apache server to also name these files as .doc.
Lets assume your php file needs to create a .doc file with some passed argument lets say id. So you want file_.doc to point to file.php?id=, try using following rewrite rule so that browser understands by extension too
RewriteRule file_(.*).doc file.php?id=$1
if you need server side document generation and server is running Java, take a look at this:
https://github.com/leonardoanalista/java2word/
This is absolutely possible. Googoose is a jQuery plugin that I wrote to handle a lot of the more complicated conversions. It's still fairly new, but there appear to be a few other attempts at this, so you could check those out. Here is the best documentation I've found so far that actually explains this process http://sebsauvage.net/wiki/doku.php?id=word_document_generation. If you're interested check out the examples in Googoose.
sometimes we can not use server side app or activeX to create office document because of phonegap mobile app that uses only client-side javascipt to operate.
the only way i found for now is uding word binary file format or OOXML
http://msdn.microsoft.com/en-us/library/hh643138(v=office.12)
some say that its much easier to create RTF file and i agree with them.

Categories

Resources