How to catch a certain part of a website?

How to catch a certain part of a website? - javascript

Suppose that we have a website. We want to show a specified part of this site in another site, like a table of data that shows latest news, and we want to show this part in our website with javascript.
Is this possible? Are there any more information needed?
We all know that with this code:
<iframe src="http://www.XYZ.com">
</iframe>
we can load all website, but how to load a specific part of a website?

I think your best bet is JQuery.load(), but I'm not up to speed on whether there are crossdomain problems with that approach. • Just checked and there are.
You can use PHP to load the output of a page with file_get_contents() or similar and then scrape out what you want for your own output.
Example:
$str = file_get_contents("http://ninemsn.com.au/");
// Reallly shouldn't use regex with HTML.
$region = '/(\<div id\=\"tabloid\_2\" class\=\"tabloids\">)(.*)(\<\/div\>)/';
preg_match($region, $str, $matches);
echo $matches[0];
Finally, something to keep in the back of your mind is that many of the larger websites that developers may want to get content from for their own website offer APIs to easily and efficiently obtain information from their site. YouTube, Twitter and a number of photo sharing sites are good examples of these. JSON and XML are probably the most common data formats that you will receive from these APIs.
Here are some examples of APIs that produce usable JSON:
YouTube video feed: https://gdata.youtube.com/feeds/api/users/Martyaced/uploads?alt=json
Twitter feed: http://api.twitter.com/1/statuses/user_timeline.json?screen_name=Martyaced

Related

AngularJS - SEO - S3 Static Pages

My application uses AngularJS for frontend and .NET for the backend.
In my application I have a list view. On clicking each list item, It will fetch a pre rendered HTML page from S3.
I am using angular state.
app.js
...
state('staticpage', {
url: "/staticpage",
templateUrl: function (){
return 'http://xxxxxxx.cloudfront.net/staticpage/staticpage1.html';
},
controller: 'StaticPageCtrl',
title: 'Static Page'
})
StaticPage1.html
<div>
Hello static world 1!
<div>
How do I do SEO here?
Do I really need to do HTML snapshot using PanthomJS or so.

Yes PhantomJS would do the trick or you can use prerender.io with that service you can just use their open source renderer and have your own server.
Another way is to use _escaped_fragment_ meta tag
I hope this helps, if you have any questions add comments and I will update my answer.

Do you know that google renders html pages and executes javascript code in the page and does not need any pre-rendering anymore?
https://webmasters.googleblog.com/2014/05/understanding-web-pages-better.html
And take a look at these :
http://searchengineland.com/tested-googlebot-crawls-javascript-heres-learned-220157
http://wijmo.com/blog/how-to-improve-seo-in-angularjs-applications/

My project front-end also has biult on top of Angular and I decieded to solve SEO issue like this:
I've created an endpiont for all search engines (SE) where all the requests go with _escaped_fragment_ parameter;
I parse a HTTP Request for _escaped_fragment_ GET parameter;
I make cURL request with parsed category and article parameters and get the article content;
Then I render a simpliest (and seo friendly) template for SE with the article content or throw a 404 Not Found Exception if article does not exists;
In total: I do not need to prerender some html pages or use prrender.io, have a nice user interface for my users and Search Engines index my pages very well.
P.S. Do not forget to generate sitemap.xml and include there all urls (with _escaped_fragment_) wich you want to be indexed.
P.P.S. Unfortunately my project's back-end has built on top of php and can not show you suitable example for you. But if you want more explanations do not hesitate to ask.

Firstly you can not assume anything.
Google does say that there bots can very well understand javascript application but that is not true for all scenarios.
Start from using crawl as google feature from the webmaster for your link and see if page is rendered properly. If yes, then you need not read further.
In case, you see just your skeleton HTML, this is because google bot assumes page load complete before it actually completes. To fix this you need an environment where you can recognize that a request is from a bot and you need to return it a prerendered page.
To create such environment, you need to make some changes in code.
Follow the instructions Setting up SEO with Angularjs and Phantomjs
or alternatively just write code in any server side language like PHP to generate prerendered HTML pages of your application.
(Phantomjs is not mandatory)
Create a redirect rule in your server config which detects the bot and redirects the bot to prerendered plain html files (Only thing you need to make sure is that the content of the page you return should match with the actual page content else bots might not consider the content authentic).
It is to be noted that you also need to consider how will you make entries to sitemap.xml dynamically when you have to add pages to your application in future.
In case you are not looking for such overhead and you are lacking time, you can surely follow a managed service like prerender.
Eventually bots will get matured and they would understand your application and you will say goodbye to your SEO proxy infrastructure. This is just for time being.

At this point in time, the question really becomes somewhat subjective, at least with Google -- it really depends on your specific site, like how quickly your pages render, how much content renders after the DOM loads, etc. Certainly (as #birju-shaw mentions) if Google can't read your page at all, you know you need to do something else.
Google has officially deprecated the _escaped_fragment_ approach as of October 14, 2015, but that doesn't mean you might not want to still pre-render.
YMMV on trusting Google (and other crawlers) for reasons stated here, so the only definitive way to find out which is best in your scenario would be to test it out. There could be other reasons you may want to pre-render, but since you mentioned SEO specifically, I'll leave it at that.

If you have a server-side templating system (php, python, etc.) you can implement a solution like prerender.io
If you only have AngularJS-only files hosted on a static server (e.g. amazon s3) => Have a look at the answer in the following post : AngularJS SEO for static webpages (S3 CDN)

yes you need to prerender the page for the bots, prrender.io
can be used and your page must have the
meta tag
<meta name="fragment" content="!">

Embed XML/RSS feed into website

I'm trying to embed these pages into my website.
http://globalnews.ca/regina/feed/
http://weather.gc.ca/rss/city/sk-32_e.xml
I'm not sure exactly how these pages are supposed to work. Are these pages only used for scraping data? I'm not sure how to get the information from these XML/RSS feeds?

After some more search I discovered that I need to parse them XML in some fashion. I am using PHP so my first step is to use
simplexml_load_file('http://weather.gc.ca/rss/city/sk-32_e.xml');
and then read it from an array.

Use Google Feed API. It's pretty simple to get to work, and only needs Javascript.
https://developers.google.com/feed/v1/devguide#getting-started

PHP HttpRequest to create a web page - how to handle long response times?

I am currently using javascript and XMLHttpRequest on a static html page to create a view of a record in Zotero. This works nicely except for one thing: The page html title.
I can of course also change the <title>...</title> tag, but if someone wants to post the view to for example facebook the static title on the web page will be shown there.
I can't think of any way to fix this with just a static page with javascript. I believe I need a dynamically created page from a server that does something similar to XMLHttpRequest.
For PHP there is HTTPRequest. Now to the problem. In the javascript version I can use asynchronous calls. With PHP I think I need synchronous calls. Is that something to worry about?
Is there perhaps some other way to handle this that I am not aware of?
UPDATE: It looks like those trying to answer are not at all familiar with Zotero. I should have been more clear. Zotero is a reference db located at http://zotero.org/. It has an API that can be used through XMLHttpRequest (which is what I said above).
Now I can not use that in my scenario which I described above. So I want to call the Zotero server from my server instead. (Through PHP or something else.)
(If you are not familiar with the concepts it might be hard to understand and answer the question. Of course.)
UPDATE 2: For those interested in how Facebook scraps an URL you post there, please test here: https://developers.facebook.com/tools/debug
As you can see by testing there no javascript is run.

Sorry, im not sure if i understand what you are trying to ask, are you just wanting to change the pages title?
Why not use javascript?
document.title = newTitle

Facebook expects the title (or opengraph :title tags) to be present when it fetches the page. It won't execyte any JavaScript for you to fill in the blanks.
A cool workaround would be to detect the Facebook scraper with PHP by parsing the User Agent string, and serving a version of the page with the information already filled in by PHP instead of JavaScript.
As far as I know, the Facebook scraper uses this header for User Agent: "facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)"
You can check to see if part of that string is present in the header and load the page accordingly.
if (strpos($_SERVER['HTTP_USER_AGENT'], 'facebookexternalhit') !== false)
{
//synchronously load the title and opengraph tags here.
}
else
{
//load the page normally
}

Creating a custom widget using django for use on external sites

I have a new site that I am putting together and part of it has statistics for the site's users. I would like to create a widget that others can use on another website by invoking javascript that reads data from my server and shows that statistics for a given user, but I am having a hard time finding specific tutorials that covers this in django.
I have seen the link at Alex Maradon's site [0], but it looks to me like that is passing html back to the widget and I am having a hard time figuring out how to do this using something like xml.
Are there any django apps for doing this or does anyone know of good how-tos?
[0] http://alexmarandon.com/articles/web_widget_jquery/

This is not matter of Django, you can solve this by using the most common solution. Javascript.
Give your users this to put on their websites.
<script type="text/javascript" src="http://mysite.com/widget/user/124546465"></script>
On a django view, render the next template:
(function(){
document.write('<div class="mysite-userprofile">');
document.write('My visits are {{total_visits}}<br />')
document.write('</div>') })()
)
So on your view, you may have something like this, the mimetype is important
def total_visits(request, user_id):
user = get_object_or_404(User, id = user_id)
total_visits = Visits.objects.filter(user:user).total_visits() #this is a method to count, you may have to write your own logic
context = {'total_visits': total_visits}
render_to_response('widget_total_visits.html', context, mimetype='text/javascript')
What can you do next?
User settings, like this.
<script type="text/javascript">
mysite_options = {
'just_friends': True,
'theme': 'bluemarine,
'realtime': True
}
</script>
<script type="text/javascript" src="http://mysite.com/widget/user/124546465"></script>
So on your template, you can use the variables set before include the script on the web site of your user, a simple stuff.
Later, you can use POST method, to gather information from the user clients. For stats.
And of course make it Ajax!
I hope this give you a path to follow

Obey the one rule: Keep It Simple Silly!
I know it may be very web 1.0 but an iframe really is your best friend in this situation. A simple piece of code such as <script>document.write("<iframe src='yoursite.com/userwidget/" + username + "' height='30' width='150' />");</script> to inject an iframe at load time will save you a crapload of time writing jsonp async code and dom manipulators and making sure that all the elements you inject onto their page will be styled correctly on every different website and worrying about origin policies.
if you have your code plug in an iframe pointed at you page then:
The origin is you! you don't have to worry about it.
You can use django templates instead of js to construct the widget being shown.
their CSS won't mess with your presentation.
their js can't easily manipulate your stats ;)
This is exactly what iframes were designed to do.

Ajax page part load and Google

I have some div on page loaded from server by ajax, but in the scenario google and other search engine don't index the content of this div. The only solution I see, it's recognize when page get by search robot and return complete page without ajax.
1) Is there more simple way?
2) How distinguish humans and robots?

You could also provide a link to the non-ajax version in your sitemap, and when you serve that file (to the robot), you make sure to have included a canonical link-element to the "real" page you want users to see:
<html>
<head>
[...]
<link rel="canonical" href="YOUR_CANONICAL_URL_HERE" />
[...]
</head>
<body>
[...]
YOUR NON_AJAX_CONTENT_HERE
</body>
</html>
edit: if this solution is not appropriate (some comments below points out that that this solution is non-standard and only supported by the "big-three"), you might have to re-think whether you should make the non-ajax version the standard solution, and use JavaScript to hide/show the information instead of fetching it via AJAX. If it is business critical information that is fetched, you have to realize that not all users have JavaScript enabled, and thus they won't be able to see this information. A progressive enhancement approach might be more appropriate in this case.

Google gets antsy if you are trying to show different things to you users than to crawlers. I suggest simply caching your query or whatever it is that needs AJAX and then using AJAX to replace only what you need to change. You still haven't really explained what's in this div that only AJAX can provide. If you can do it without AJAX then you should be, not just for SEO but for braille readers, mobile devices and people without javascript.

You can specify a sitemap in your robots.txt. That sitemap should be a list of your static pages. You should not be giving to Google a different page at the same URL, so you should have a different URL with static and dynamic content. Typically, the static URL is .../blog/03/09/i-bought-a-puppy and dynamic URL is something like .../search/puppy.

Develop Reference

JavaScript is the programming language of the Web.

How to catch a certain part of a website? - javascript

Related

AngularJS - SEO - S3 Static Pages

Embed XML/RSS feed into website

PHP HttpRequest to create a web page - how to handle long response times?

Creating a custom widget using django for use on external sites

Ajax page part load and Google

Categories

Resources