My site has all dynamically loaded content.
I have written a few JS functions that change the content based on the URL received. If someone goes to www.mysite.com/#1056, the content for that will be loaded.
function getLocationHash() {
//check if there is a location hash in the address bar, get that URL
if (window.location.hash != '') {
processURL()
}
}
Then it calls the processURL function
function processURL() {
if (window.location.hash != '') {
urlHash = window.location.hash;
//if it's a catalog item, it has a number above #1000
if (urlHash > 10000) {
getDetail(urlHash);
}
This works fine for history or jumping right to a URL on the site - however, other sites cannot follow this. For instance, if I enter www.mysite.com/#1056 into Facebook status, FB scrapes only www.mysite.com index page. It does not follow through to the end of the JS. Is this because the JS is looking for the 'window' property?
Same thing with Google crawling. I set up a sitemap with all of the hashed URLs but Google only crawls the index page.
So the question is: How do I take what I have here and properly format a URL that other services like Facebook and Google can "see"?
Any tips would be much appreciated.
The # indicates the start of the fragment identifier. It is how you link to part of a page.
It is frequently abused to be read by JavaScript to load different content via Ajax, but that only works if the client runs the JS.
The scrapers used by Google and Facebook don't run JS.
Stop using fragment identifiers to load content
Use real URLs instead
Have the server deliver complete pages for those URLs
Apply your Ajax changes using the history API to update the URI to match the one that would load the page you are creating with JS directly
These are the solutions i discovered when i researched this.
For crawling there is the 'hashbang' , as described in the google pages.https://developers.google.com/webmasters/ajax-crawling/docs/learn-more?hl=nl
And for the linking on facebook you can for example use html5 pushstate.
http://badassjs.com/post/840846392/location-hash-is-dead-long-live-html5-pushstate
Related
I have created a page for a web banner under http://example.com/banner, I'm sending this link to publisher websites and pay them to run it.
However, some publishers run, some are not and I'd like to find which parent URL'S called for this page or where did the click come from. Generally, they are putting this URL in an iframe to serve it.
(Many pages doesn't pass referral parameter.)
I've tried different approaches with JS and PHP but as you might guess I'm getting http://example.com/banner as the parent URL.
Is there a way to know the parent URL from a different domain with PHP, JS or any other piece of code? I have a list of publishers but I also need to know which websites running the banner except for those sites.
To make it more clear here is a schema:
MY PAGE WITH BANNER > MY PUBLISHER WEBSITE > USER VISITING THE
PUBLISHER
I don't want to get IP of the user visiting my publisher's website or my page's
URL. I want to see URL of my publisher's website which is in between.
Since this is my web server I can read access logs, error logs etc. without issues.
I'm open to any suggestions.
Thanks!
You could try this, host a javascript file on your server.
Then they would place the script anywhere they want to put the banner:
<script src="//yoursite.com/banner.js"></script>
You could use params in that URL to then serve custom js.
Then fundamentally the code would look something like the following which injects the banner into the DOM where ever the script is placed. You get the sites URL from window.location.href and then send it as a param when requesting the image. (You could also use cookies etc)
<script>
// inject an anchoring element
document.write('<div class="banner_ad"></div>');
// find it
var parentDiv = document.getElementsByClassName("banner_ad");
// create the img/banner, notice the site param
var banner = document.createElement("img");
banner.src = 'http://via.placeholder.com/350x150?site=' + encodeURI(window.location.href);
// loop over the parent elements of each anchoring element
for (var i = 0, len = parentDiv.length; i < len; i++) {
// create the link
var link = document.createElement('a');
link.appendChild(banner);
link.setAttribute('title', 'Ads by Foobar');
link.setAttribute('href', 'http://example.com');
// inject the link and img
parentDiv[i].parentNode.appendChild(link);
}
</script>
Then server-side, look for the $_GET['site'] param.
It's not foolproof, nothing is.
I'm working with a client who's stuck using a widget based website and needs a custom built web app. We're hosting it off site. I need to pull product information from their main site to populate the off site page.
I wrote this to pull H1 content from their main site, this will serve as the link to the off site page.
BUILD LEASE
<script type="text/javascript">
$(document).ready(function() {
var url = $('h1').html();
url = url.replace(/\s+/g, '-').toLowerCase()+'.html';
console.log(url);
$('#specialURL').attr('href', 'http://joethemovie.com/' + url );
});
Now I need to turn the URL content back into an h1 on the off site page.
Also, if possible, does anyone know how I could store price information in the URL?
Such as http://example.com/mainSiteH1&price=12345
Then, same as the H1, display on the off site page.
Thanks!
Anything you're sticking in a url should be uri encoded. this will take care of spaces too so the regex isn't necessary..
$('#specialURL').attr('href', 'http://joethemovie.com/?h1=' + encodeURIComponent(url) +"&price=" + encodeURIComponent("about tree fiddy"));
Then you can get that junk from the url using the server (php's $_GET for example), or you can use the function here to get those parameters with javascript..
I want to create a tracking script. Something similar with Google Analytic but very basic.
The requirements are
I need simple js script like Google Analytic does
make most of the logic inside the js file from the main site
collect in PHP the information and store it
What I can't figure yet is what are the ways to do this? Google from what I see is loading a gif file, stores the information and parses the logs. If I do something similar sending the data to a php file Ajax cross site policy will stop me, from what I remember.
Not sure on how Google Analytics does things, but the way to circumvent the x-site policy is, simply, don't do an Ajax call. Suppose you used javascript and now have an hash with your visitor's data:
var statsPage = 'http://mysite/gather_stats.php';
var stats = {
page: location.href,
browser: 'ie',
ip: '127.0.0.1',
referral: 'google.com?search=porn'
};
var params = $.param(stats); // serializes it https://api.jquery.com/jQuery.param/
now you only have to do a GET request to you php page with this string as a parameter, don't use Ajax tho, simply use the url as an img src
$('<img>', {
src: statsPage + '?' + params
}).appendTo('body').remove()
you may as well use a script tag the same way, but you should pay attention because anything the php stats page returns will be executed as javascript (which is exactly how jsonp works).
Bear in mind that some limits apply to Get strings length.
I have the following function that activates when I click on some links:
function showPage(page) {
var History = window.History;
History.pushState(null,null,page);
$("#post-content").load(page + ".php");
}
The content of the page updates, the URL changes. However I know I'm surely doing something wrong. For example when I refresh the page, it gives me the Page Not Found error, plus the link of the new page can't be shared, just because of the same reason.
Is there any way to resolve this?
It sounds like you're not routing your dynamic URLs to your main app. Unless page refers to a physical file on your server, you need to be doing some URL rewriting server-side if you want those URLs to work for anything other than simply being placeholders in your browser history. If you don't want to mess with the server side, you'll need to use another strategy, like hacking the URL with hashes. That way the server is still always serving your main app page, and then the app page reads the URL add-on stuff to decide what needs to be rendered dynamically.
You need to stop depending on JavaScript to build the pages.
The server has to be able to construct them itself.
You can then progressively enhance with JavaScript (pushState + Ajax) to transform the previous page into the destination page without reloading all the shared content.
Your problem is that you've done the "enhance" bit before building the foundations.
I have recently read Google's Making AJAX Applications Crawlable as I was wondering how to correctly prepare my dynamic site, which uses hashbang navigation, for SEO.
I understand now that for mysite.com/#!/foobar I should serve an equivalent html snapshot at mysite.com/?_escaped_fragment_=foobar.
I just want to know if google then correctly indexes my page as http://example.com/#!/foobar
or if it uses this escaped_fragment url? I'm assuming (but would like to be sure) it will correctly use my hashbang url for the search results but that the indexed content was taken from the escaped_fragment page.
Some confirmation would help me sleep better. thanks
By default, google will create escaped_fragment url for your page. That could end up looking ugly.
You should redirect escaped_fragment url to a page page with a prettier url using 301
Say your server gets a URL request from googlebot/any hashbang compliant crawler such as "targetPage?_escaped_fragment_=command=play%26id=4ee7af"
You should have your targetPage accepts targetPage?_escaped_fragment_= .... and created a 301 redirect to itself as "targetPage?command=play&id=4ee7af" ( or any other pretty url as long as it is to the same page)
If you were using J2EE you could create a servlet filter to intercept and 301 redirect to cleaner url of the same page.