NextJS Dynamic Pages cannot be crawled

NextJS Dynamic Pages cannot be crawled - javascript

I'm using NextJS and ExpressJS as Server.
I already implemented the custom routes like the example in the documentation of nextjs (https://nextjs.org/docs#custom-routes-using-props-from-url).
I am also using getInitialProps for server-side rendering.
I also used Screaming Frog SEO Spider as crawler to test if it will be able to crawl my dynamic pages (it can't crawl my dynamic pages, it will just crawl the static pages).
I don't know if I'm doing something wrong but I just followed the documentation for custom routes.
I really want the crawlers to crawl my dynamic pages because it will affect the SEO of our website.
Thanks

There is a common SEO recommendation not to build dynamic websites.
I am not an expert in NextJS and ExpressJS. But in general I can say that most of crawlers don't like dynamic websites. To crawl dynamic website they need to execute JavaScript, it takes time and resources. As far as I know Google can crawl dynamic website, please follow the link. So, it is possible that Googlebot crawl your website successfully. Please, do not build SPA for SEO.
About Screaming Frog SEO Spider. As far as I know it also can use Chromium like Googlebot. Please, read the documentation.

For my project, I added a sitemap.xml.tsx as a page that allows the Google crawler to see all the available pages. In order for this to work, you have to be able to retrieve all the possible dynamic pages that you want to be crawled then create the sitemap.
I would follow the along with the example given here: https://dev.to/timrichter/dynamic-sitemap-with-next-js-41pe on how to correctly implement the sitemap.

Related

Should I use Next.js to allow the application to be indexed by google?

I created a web application for a family business using react.js a few months ago but the website is only accessible by people who know the exact URL. It is using a firebase backend and a React.JS frontend.
I've used the google crawler checker and it returns normal saying that the crawlers are able to access the website with a screenshot of the page. However, it is not indexed on google search results.
I've read about how SSR is a possible solution to this using Next.JS.. but not really sure what it means. How can i get the website to show towards the top of the search results when the business name is searched in google? Should I use Next.JS over React.JS for something like this?

Welcome to the massive world of Search Engine Optimization.
There are many, infinite ways, to get your website to the top of the rankings on Google. To name a few:
Readable domain, look up how to put your website behind jtsapebusiness.com instead of weird-animal-123.firebase.etc.io.
Serving certain files like robot.txt that Google specifically searches for
Having Meta Tags
Having specific Meta Tags for each page (Next JS is great for this)
Render Time (Server Side Rendering is also great for this, but if your "React" app is small enough the performance difference shouldn't really matter to be honest.)
Page accessibility, Google can scrape single page apps a lot better than it used to, but serving up each page individually via Sever Side Rendering has a lot of perks.
How often your page is searched and clicked on (tell your friends and family to search you on google and click on your website)
These are just a few! Reading more about Search Engine Optimization will help you come up with even more questions. When and if you do switch to Next.js, you will still be using React. You will just be writing it a little differently to fit a more server side pattern.
It will not matter much if you choose to pick between "React or Next". If you wanted to maximize your chances of ranking higher, then I would go with Next. But I wouldn't want you to pickup a whole new technology if you already have the React App built. Instead you would just need to add some Search Engine Optimization sprinkles on top (some examples listed above).

How do I scrape data generated with javascript using BeautifulSoup?

I'm trying to migrate some comments from a blog using web scraping with python and BeautifulSoup. The content I'm looking for isn't in the HTML itself and seems to have been generated in a script tag (which I can't find). I've seen some answers regarding this but most of them are specific to a certain problem and I can't seem to figure out how to apply it to my site. I'm just trying to scrape comments from pages like this one:
http://www.themasterpiececards.com/famous-paintings-reviewed/bid/92327/famous-paintings-duccio-s-maesta
I've also tried Selenium, but I'm using a Cloud9-based IDE currently and it doesn't seem to support web drivers.
I apologize if I botched any of the lingo, I'm pretty new to programming. If anyone has any tips, that would be helpful. Thanks!

You have many ways to scrap such content. One would be to find out how comments are loaded on this website. On quick lookup in chromium developer tools, comments for the page mentioned are loaded via this api call.
This may not be a suitable way for you as you may not generate this url for every different page.
Another more reliable way would be to render such js content using GUIless browser, for ease of implementation i would suggest using scrapy with splash .Splash is a python framework which renders most of the content for your requests.

React / Express - How will server side rendering work with my dynamic pages

So I am building an article based app using react and express.
My app consists of 3 static pages, and 1 dynamic article page.
At this moment in time, my article's data comes from an RSS feed. The dynamic article page displays different articles depending on which article RSS item is passed to it through props.
My question is:
How will SSR work with search engine crawlers to know my articles exist? so if I was to search for "My Site Article Foo" or "My Site Article Bar", how would it know that those different articles existed?
Because as it stands, the article's urls would be like so:
www.mySite.com/articles?articleId=1
www.mySite.com/articles?articleId=2
www.mySite.com/articles?articleId=3
Even if I was not using an RSS feed, and simply using a database, how does this concept work?
Any help or advice is appreciated, Thank you in advance.
PS. I was not sure if this was the correct stack exchange site to ask
on, if there is a better suited one, please let me know so I can move
this.

Your pages should be indexed by crawlers if you have setup server-side rendering. Crawlers like Google's can now follow query parameters.
You can use the 'Fetch and Render' method of Google's Search Console to see the contents of your JS-based page.
https://webmasters.googleblog.com/2014/10/updating-our-technical-webmaster.html

SEO and crawling: UI-Router ui-sref VS ng-click

After looking around a bit I came to no conclusion about this matter: does Google and other search engines crawl pages that are only accessible through ng-click, without an anchor tag? Or does an anchor tag always need to be present for the crawling to work successfully?
I have to build various elements which link to other pages in a generic way and ng-click is the best solution for me in terms of flexibility, but I suppose Google won't "click" those elements since they have no anchor tag.
Besides the obvious ui-sref tag with I have about other solutions like:
<a ng-click = 'controller.changeToLink()'>Link name</a>
Altough I am not sure if this is a good practice either.
Can someone please clarify this issue for me? Thanks.

Single page applications are in general very SEO unfriendly, ng-click not being followed being the least of the problems.
The application does not get rendered server side, so search engine crawlers have a hard time properly indexing the content.
According to this latest recommendation, the Google crawler can render and index most dynamic content.
The way that it will work is that it will wait for the Javascript to kicking and render the application, and only index after the content is injected in the page. This process is not 100% proof and single page applications cannot compete with static applications until recently.
This is the main reason why most sites are using them for their menu system, as that would make for a much better user experience than full page reloads. Single page apps are not SEO friendly.
This is slowly changing as now Angular Universal, Ember Fast Boot and React are adding the possibility to render server side an SEO friendly page, but still have it take over as SPA on the client side.
I think your best bet to try to improve your SEO is to submit a site map file to google using their webmaster tools. This will let google know about those pages that you trigger via ng-click.
Note that this only has a chance of working if you are using the HTML5 mode for the router and not using bookmarks (urls using #), as Google does not index bookmarks.
In general its very hard to get good SEO for an Angular 1 app, and thats why its mostly not used for public indexable content. The sweetspot of AngularJs is for building the "dashboard" private section of your app, that users can access after logging in.

Try using prerender.io to prerendered these angularge pages and filter out bot requests and serve these prerendered pages from the page cache.

How do you make a single-page-application (SPA) searchable from SharePoint?

We are working on a single-page-application (SPA) developed in ASP.NET MVC using knouckout and a wealth of other libraries. Routing will be done in front-end, maybe we will use crossroads.js. A lot of information is presented in virtual grids using slickgrid.js. All data is fetched from backend using AJAX.
Now, if you want to crawl and index such a site from SharePoint, how would you go about that? If you just load the main page with no javascript, it is almost empty.
Update
After more investigation into this issue, I have concluded that there are at least two possible solutions to this kind of problem.
Possible solution 1: Render HTML
This approach would involve detecting that a SP crawler is crawling your site, and then return static HTML pages for the crawler. PhantomJS could possibly be used for this. There are however several uncertain aspects to this solution, and I suspect that it would involve a lot of work.
Possible solution 2: Import data into SharePoint
As so clearly described by Josh below, you could import the data that you want crawlable into SharePoint. Then SharePoint can be configured to crawl the data, and the data is not Javascript dependant anymore as it is inside SharePoint. I think this is the best and easiest solution and will mark Josh answer as the accepted answer.

While I see that this question is getting a lot of close requests, I did run across this very question on a previous project. Sharepoint won't be able to index the page that data is manipulated on via your SPA, but inside SP, you can connect external data sources into the search service, thus exposing the data inside the SPA. You would then write custom search results tied to the content type of the data exposed in order to make the results a bit more friendly than just a data row. You might create an entry point in your SPA that can take in a URL with a parameter so you can send the user from the search results to the SPA in one shot.
There are a lot of interconnected concepts to this solution, so I'd suggest looking into connecting external data sources and adding them to the crawled index of SP. Then, create a content type out of the exposed objects. And finally, add a custom search result template for the content type. MSDN will be your friend on this and so will your SP administrator.

Develop Reference

JavaScript is the programming language of the Web.