I have code like this in the root page (e.g. http://www.example.com):
<div ng-repeat="url in urls">
<a ng-href="{{url}}"/>url</a>
</div>
Each url is just a SEO-friendly-sub-url (e.g. http://www.example.com/pages/hello-world)
I am aware that Google now executes Javascript (http://googlewebmastercentral.blogspot.no/2014/05/understanding-web-pages-better.html), but I still can't get those sub-urls being indexed (only the root page is indexed).
How to make it indexable?
If possible, the solution should not depend on third party services such as seo4ajax or prerendex.io
Not to long but we ran into same issue. We used severals techniques so google can index our website in better way. But honestly saying that that doesn't worked well. We ditched our the front-end framework and decided to go with rails. Now, we are quite happy with rails tubrolinks.
This are our few work arounds, may be that will be helpful for you.
We first created site map. And, then submitted to google via webmaster tool.
When google crawl the website , it sends something like Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) as user agent (notice Googlebot, thats important). We created one middleware which checks all requests whether they are coming from search engine or not. You can get the list of all such user agents from google (may be http://www.useragentstring.com/pages/useragentstring.php will help too). We returned the static page (which essentially contain all data which webpage do but it isnt dynamic).
You might be interested in https://github.com/prerender/prerender_rails
You should create server side page for ajax crawling, managing "_escaped_fragment_" in your rewrite rules.
Related
Apologies if this is a roundabout way of asking this question, but I am a little confused about how the web and javascript work.
What I want to do: execute javascript on all pages of a list of urls I have found. (Specifically use jquery to pull info from them)
Problem I can't execute Javascript on these pages because they aren't mine and don't have the Access-Control-Allow-Origin header. So I can't load them (with AJAX) in order to use JQuery on them.
BUT Google Chrome can both load pages and execute javascript on them (with their developer's console). So if I wanted too, I could go to each page, open the developers console, and pull the information from there. If there's nothing stopping Chrome from accessing these, then why am I stopped? And, is there a way around this?
Thank you, and I hope my description makes sense. I've been researching this for a while but have found nothing that explains how seemingly inconsistent CORS is.
I could go to each page, open the developers console, and pull the information from there. If there's nothing stopping Chrome from accessing these, then why am I stopped?
You're not stopped. You, the human at the keyboard, can do exactly as you say, by visiting each page as a top-level page.
What is stopped -- happily -- is any and all scripts on the Web you happen to run having the same level of visibility that you do. Based on your cookies and your network topology, you have a unique view into the Web. You can see your home router's control interface (on 192.168.1.1 or similar). You can see any local web server you're running on 127.0.0.1. No one else can see these. If the same-origin policy were not in place, then any script that you loaded on the Web could inspect these.
And, is there a way around this?
If you have some scripts that you trust absolutely (hopefully a significant subset of "all scripts that exist on the Web") that you want to be able to bypass the same-origin policy and see your full, cross-domain view of the Web, you could load them as an extension, which can act with elevated permissions beyond the abilities of normal web pages. (See How does Same Origin Policy apply to browser extensions?)
I'm going to assume that you are looking to grab data from these pages that aren't yours and store it somewhere. I have done this before with curl using php. If you are looking to display these sites for users to interact in a different way, but starting from a page that is yours, you may be able to render these pages by grabbing the source html using curl and rendering it as a sort of proxy.
I've used this tutorial for something similar https://www.youtube.com/watch?v=_kQN-3aNCeI . Hopefully this gives you a start. I think you should be a little more detailed in your question though to get more help.
Quite a few of the sites that the schools I work in use have user accounts to protect the content from people who haven't paid for it which means that the users (aged 5+) have to type in some pretty weird usernames/passwords before they can do their work.
I was wondering if it possible to use Javascript to create a page that would let me do something along the lines of:
Fetch the Login Page
Fill out the form
Submit It
Redirect the user to the site
1-3 would happen in the background without the user seeing it.
In most cases these accounts are shared and the details are on displays etc... in the classrooms so there is no issue with the details being publicly accessible.
I have used Mechanize in ruby before and would imagine a solution like it but running client side.
I know that some inspection of the target site will be needed but once I have an in-principle example I should be able to tailor it to each site later.
If you have a standardized browser, you should consider building a plugin for that browser, that's the easiest way to interact with the web pages. Otherwise you'll get into issues with anti-CSRF protections and cross-domain-policies.
As for the language, Chrome extensions are written in javascript and are pretty easy to build. For the other browsers I don't know.
I have created a webpage but my friends or collegues always copy the source code and copy all the data easily, so is there any way to hide page source option from browser ?
As a rule, if you are putting information on another user's computer (whether because you made a document or they viewed your webpage), you really can't control what they do with it.
This is an issue that larger companies deal with often. Have you heard of DRM? It's a mechanism that companies like to try to use to control how people can connect to their services, use their content and in general, try to exert control over their data while it's on your system.
Now, a web page is a relatively simple container for holding information. You expressed an urge to prevent your friends from copying the source code. You could try to encrypt it, but if it's using local data to decrypt itself, there still isn't going to be anything that stops them from just copying what's in the View Source window and running it again (even if they can't really read it).
I'd suggest that you don't worry about it. If what you have on your page is so important that others shouldn't be able to see it, don't put it on a webpage.
Finally, Google doesn't much care that you're able to view the source to their home page. Why not? Because the value of the search engine isn't in what the home page looks like, but in the data on the back-end that you don't have direct access to. The value is in the algorithms that execute on the server when you hit that Google Search button that queries that data and returns the information you're looking for. There's very little relative value in the generated HTML that you see in the page. Take a leaf from their book and don't stress that they copy your HTML.
No , there isnt any way to do it, however you can disable right clicking in browser via javascript, but still they can use shortkeys to open developer view (in chrome F12) and see the source. You cannot hide html or javascript from client, but maybe you can make it harder to read.
No. Your HTML output is in the user's realm. Even if there was a way to disable view source in one client, a user could use a different one
Always assume that your site's HTML is fully available to end users.
Yes and no. You can definitely make HTML and JS harder to intrepret by obfuscating your code - that is, taking your code and making it look confusing. Here is a tool that can do that: http://www.colddata.com/developers/online_tools/obfuscator.shtml
However, these things all use code, and code can be decrypted through any number of methods. If you post a song to the internet, even if they cannot find the mp3, they can simply record their speakers. If you upload an image and prevent users from downloading it, they can take a screenshot or use their camera. In order for HTML and Javascript to work, it has to be intrepreted by their computer, and even if you do find a way to disable "View Source" there are others ways, like a DOM inspector (F12 in IE/Chrome, Ctrl+Shift+K in Firefox).
As a workaround, use copyright, warn your users they will be punished if they copy your code, and put watermarks, labels and logos over any mp3s or images you don't want stolen. In the end, disabling right clicking (which is also possible, see How do I disable right click on my web page? ) or disabling selection (also possible) does nothing, because there is more than one way to get your code, like searching through temporary internet files.
However, you ask "what if I want a site where my users can log in and I need security? How can I make it so nobody can see my code then? Doesn't it have to be secure and not out in the open?"
And the answer is, yes, it needs to be secure. That's what server-side languages, like PHP, are for. PHP does all the work on the server itself so the user cannot see it. PHP is like a pre-rendered language - rather than doing it in real-time, PHP does all the work beforehand so the user's computer doesn't have to, making the code safe. The code is never put onto the user's computer, because the user's computer doesn't need it. The work is done by the website itself before the page is sent. SSL is often paired with PHP to make absolutely sure that websites have not been hacked.
But HTML and Javascript have to be done in real time on the user's computer, so you cannot disable View Source because it is useless. There are many, many ways that users could get around it, even if View Source is disabled, and even if right clicking is disabled.
If your code doesn't need to be secure, however, I'd recommend you consider keeping it open source. :)
Okay, so I'm trying to figure something out. I am in the planing stages of a site, and I want to implement "fetch data on scroll" via JQuery, much like Facebook and Twitter, so that I don't pull all data from the DB at once.
But I some problems regarding the SEO, how will Google be able to see all the data? Because the page will fetch more data automatically when the user scrolls, I can't include any links in the style of "go to page 2", I want Google to just index that one page.
Any ideas for a simple and clever solution?
Put links to page 2 in place.
Use JavaScript to remove them if you detect that your autoloading code is going to work.
Progressive enhancement is simply good practise.
You could use PHP (or another server-side script) to detect the user agent of webcrawlers you specifically want to target such as Googlebot.
In the case of a webcrawler, you would have to use non-JavaScript-based techniques to pull down the database content and layout the page. I would recommended not paginating the search-engine targeted content - assuming that you are not paginating the "human" version. The URLs discovered by the webcrawler should be the same as those your (human) visitors will visit. In my opinion, the page should only deviate from the "human" version by having more content pulled from the DB in one go.
A list of webcrawlers and their user agents (including Google's) is here:
http://www.useragentstring.com/pages/Crawlerlist/
And yes, as stated by others, don't reply on JavaScript for content you want see in search engines. In fact, it is quite frequently use where a developer doesn't something to appear in search engines.
All of this comes with the rider that it assumes you are not paginating at all. If you are, then you should use a server-side script to paginate you pages so that they are picked up by search engines. Also, remember to put sensible limits on the amout of your DB that you pull for the search engine. You don't want it to timeout before it gets the page.
Create a Google webmaster tools account, generate a sitemap for your site (manually, automatically or with a cronjob - whatever suits) and tell Google webmaster tools about it. Update the sitemap as your site gets new content. Google will crawl this and index your site.
The sitemap will ensure that all your content is discoverable, not just the stuff that happens to be on the homepage when the googlebot visits.
Given that your question is primarily about SEO, I'd urge you to read this post from Jeff Atwood about the importance of sitemaps for Stackoverflow and the effect it had on traffic from Google.
You should also add paginated links that get hidden by your stylesheet and are a fallback for when your endless-scroll is disabled by someone not using javascript. If you're building the site right, these will just be partials that your endless scroll loads anyway, so it's a no-brainer to make sure they're on the page.
I'm just looking for clarification on this.
Say I have a small web form, a 'widget' if you will, that gets data, does some client side verification on it or other AJAX-y nonsense, and on clicking a button would direct to another page.
If I wanted this to be an 'embeddable' component, so other people could stick this on their sites, am I limited to basically encapsulating it within an iframe?
And are there any limitations on what I can and can't do in that iframe?
For example, the button that would take you to another page - this would load the content in the iframe? So it would need to exist outwith the iframe?
And finally, if the button the user clicked were to take them to an https page to verify credit-card details, are there any specific security no-nos that would stop this happening?
EDIT: For an example of what I'm on about, think about embedding either googlemaps or multimap on a page.
EDIT EDIT: Okay, I think I get it.
There are Two ways.
One - embed in an IFrame, but this is limited.
Two - create a Javascript API, and ask the consumer to link to this. But this is a lot more complex for both the consumer and the creator.
Have I got that right?
Thanks
Duncan
There's plus points for both methods. I for one, wouldn't use another person's Javascript on my page unless I was absolutely certain I could trust the source. It's not hard to make a malicious script that submits the values of all input boxes on a page. If you don't need access to the contents of the page, then using an iframe would be the best option.
Buttons and links can be "told" to navigate the top or parent frame using the target attribute, like so:
This is a link
<form action="http://some.url/with/a/page" target="_parent"><button type="submit">This is a button</button></form>
In this situation, since you're navigating away from the hosting page, the same-origin-policy wouldn't apply.
In similar situations, widgets are generally iframes placed on your page. iGoogle and Windows Live Gadgets (to my knowlege) are hosted in iframes, and for very good reason - security.
If you are using AJAX I assume you have a server written in C# or Java or some OO language.
It doesn't really matter what language only the syntax will vary.
Either way I would advise against the iFrame methods.
It will open up way way too many holes or problems like Http with Https (or vice-versa) in an iFrame will show a mixed content warning.
So what do you do?
Do a server-side call to the remote site
Parse the response appropriately on the server
Return via AJAX what you need
Display returned content to the user
You know how to do the AJAX just add a server-side call to the remote site.
Java:
URL url = new URL("http://www.WEBSITE.com");
URLConnection conn = url.openConnection();
or
C#:
HttpWebRequest req = (HttpWebRequest)HttpWebRequest.Create("http://www.WEBSITE.com");
WebResponse res = req.GetResponse();
I think you want to get away from using inline frames if possible. Although they are sometimes useful, they can cause issues with navigation and bookmarking. Generally, if you can do it some other way than an iframe, that is the better method.
Given that you make an AJAX reference, a Javascript pointer would probably be the best bet i.e. embed what you need to do in script tags. Note that this is how Google embed things such as Google Analytics and Google Ads. It also has the benefit of also being pullable from a url hosted by you, thus you can update the code and 'voila' it is active in all the web pages that use this. (Google usually use version numbers as well so they don't switch everyone when they make changes).
Re the credit card scenario, Javascript is bound by the 'same origin policy'. For a clarification, see http://en.wikipedia.org/wiki/Same_origin_policy
Added: Google Maps works in the same way and with some caveats such as a user/site key that explicitly identify who is using the code.
Look into using something like jQuery, create a "plugin" for your component, just one way, and just a thought but if you want to share the component with other folks to use this is one of the things that can be done.