I want to fetch this site
https://www.film-fish.com/modern-mindless-action
to fetch the IMDB IDs of all movies listed there.
The problem is that the page loads all movies listed there just after scrolling down. So, a simple wget doesn't work.
Even if I scroll to the bottom of the page and view the source code, I do not see the last movie in the list (Hard Kill (2020)).
So the problem seems to be that the content is being created via JavaScript.
Has anybody a tip on how to achieve that?
So the problem seems to be that the content is being created via a js
script. Has anybody a tip on how to achieve that?
Indeed, executing JavaScript code is beyond scope of GNU Wget. You would need browser automation tool. If you know some Node.js or JavaScript I suggest taking look at PhantomJS Quick Start, Page Automation. Please take look at first example in 2nd link, you should be probably able to rework to your needs, i.e. instruct page to scroll down using JavaScript then extract what you need using JavaScript.
I have setup a dynamic competition page where the query string determines what content you see.
For example (http://nectarfinance.com.au/dc=korinadrogan will show Korina's content, while no query string will show generic head office content).
The site (as is) is loading slowly, and I know it is happening because of the Facebook 'like and share' dynamic Facebook scripts on the page.
I was wondering if there is anyway to minify these script into one? Or if there is anyone to increase the load time of these scripts? or reduce the size of these scripts?
I'm not sure how to work around it as the files are externally hosted by Facebook.
I'll post the GTMetrix report in the answer below, as I can't post two links.
Thanks for your help
I am using Facebook's graph API to pull images posted to a company FB page and putting them on a web page. Some images show up fine, some don't show up at all. Looking at the page's source code I can see that all the images that are working have a URL that begins: graph.facebook.com/... and the images that won't load begin with: facebook.com/ads/images/...
Suspecting that the /ads/ part of the URL was triggering Adblock to block the images, I disabled Adblock for the page and the missing images appeared.
Given that is impractical to expect all visitors not to use Adblock, I'm wondering how I can fix this issue.
Thanks!
Its the way filters are implemented inside Adblock. For them, somehow the api and image along with the ip lookup translates to an ad which should be blocked. They use lot of parameters to determine if it translates to an ad or not like comparing image dimensions, filename, social media links, behaviour etc. So about the question of fixing it would simply would be to correct the way Adblock works or improving its detection for the ads in this case.
Have a look on this search where you can see that just my main page is indexed.
But why does Google/Search engines not take arda-maps.org/about/ and the other subpages? Is my deep linking done in a wrong way? Do the search engines need more time? If they do need more time why is the forum - which came very late - already indexed?
And by clicking the links I'm loading the "subpages" via hiding and showing of layers. Maybe it's because of that?
I didn't see index-follow tag in your html code. It's better to have it
<meta name="robots" content="index, follow">
Also you can do two more things. Go to GWT > Crawl > Fetch as Google and submit some of your pages. Also click on the Sitemaps button in the left menu and submit your sitemap.
Also you can share pages from your site in Twitter or Google+, everything posted there is indexed very fast.
Wish you luck,
Kasmetski
You don't seem to have a robots.txt. It is always good idea to implement one. This might explain your issue, because when Google does not find one it stops crawling websites. Check for warnings or error messages in Google Webmaster Tools.
I have also seen that site:arda-maps.org returns urls with www. You should implement a redirect from www to non-www URLs.
Keep in mind that the site command does not return all indexed pages.
Your About page does not have a NOINDEX tag which is good. I have noticed you have a sitemap.xml and that your About page is in there. If the issue persists, this probably means Google thinks your page is not worth indexing.
I am currently working a new feature to allow users to select the thumbnail they would like to use when sharing an page on Facebook. The user should be able to use the Facebook widgets like the send dialog or share buttons as well as simply cutting and pasting the URL into their udpate status dialog on Facebook.
I have read much of the documentation, which seems to indicate that I simply need to add multiple og:image tags in the page being shared. I have done this and run the page through the linter so the cache gets updated.
When passing the page to the share.php directly, effectively removing any of my client side code and letting the dialog present what it is scraping, I am seeing 3 images from the page available.
I am not sure what I am doing wrong here.
Here is the linter result, the graph object, the sharer.php link and the page. Anyone have ideas of what I could be doing incorrectly?
I have confirmed that at least the og:title tag is being respected by the share dialog. I have also tested the size of the images, and included file extensions as suggested below.
I know this works because buzzfeed has the exact functionality I am going for. I have reduced my example down to only the core pieces I think should work. You can find the full source here.
Could it be the XML namespace in the top HTML tag?
In the BuzzFeed article, it's:
xmlns:og="http://opengraphprotocol.org/schema/"
In your page its:
xmlns:og="http://ogp.me/ns#"
On the Buzzfeed article, the content attributes in the og:image links point to named .jpg files, vs your links which do not have a filename/extension at the end.
It may be required to include a filename in the links, especially if it's basing image detection on the file extension.
EG:
Buzzfeed:
<meta property="og:image" content="http://s3-ak.buzzfeed.com/static/campaign_images/webdr02/2013/3/18/11/10-lifechanging-ways-to-make-your-day-more-effici-1-2774-1363621197-4_big.jpg" />
Yours:
<meta property="og:image" content="http://statics.stage3.cheezdev.com/mediumSquare/3845/4AC356E3/1"/>
After some tests, I guess it's a caching issue.
Looks like the sharer is caching the graph, using the og:url as a key, so that different querystrings in the sharer won't bypass the cache, if they do not impact the og:url value.
Obviously, the debug tool don't use such cache.
If I'm right (this is just an insight), you can either wait that the cache entry expires or try with a different og:url. Moreover, to ease the test, keep the new og:url equal to the new page location.
So funny story, I'm a developer at BuzzFeed and came across this while trying to figure out why our share dialogs suddenly stopped showing the thumbnail picker.
It looks like Facebook disabled the functionality. It briefly made a reappearance on 1/14/2014 but they introduced a bug that prevented sharing from any pages with multiple og:image tags defined. (See: https://developers.facebook.com/bugs/1393578360896606/)
They fixed the bug, but as of 1/22/2014 it still looks like the thumbnail picker is disabled.
The Sharer.php script on the Facebook site doesn't support all the OG tags as far as I know. The images are scraped from the page content itself, so if you want your three images to appear on the Sharer.php script, include them in your content.
Sharer.php has been officially deprecated by Facebook, so I wouldn't be surprised if certain functionality does not work with it. While it still works, it was always the simplest option and I'm guessing they never built the link image scraping from the og items into it.
I was able to find this article, which shows one way that you can specify exactly what images are available to the sharer.php share page. You can specify one (or multiple) images to share with a URL structure like the following:
http://www.facebook.com/sharer.php?s=100
&p[url]=http://bit.ly/myelection
&p[images][0]=http://election.gv.my/assets/vote.png
&p[title]=My customized title
&p[summary]=My customized summary