Vanilla JavaScript version of Google's UrlFetchApp

Vanilla JavaScript version of Google's UrlFetchApp - javascript

I'm working on an html file in a google apps script project right now and I want it to be able to retrieve the html content of a web page, extract a snippet, and paste the snippet to the page. I've tried using fetch() and a couple of its options (mostly CORS), but I either get nothing back or an error that says "No 'Access-Control-Allow-Origin' header is present on the requested resource." A workaround I found was using google.script.run.withSuccessHandler() to return the html content via UrlFetchApp.fetch(url).getContentText(). The problem with this is that it is time consuming and I want to try and optimize it as much as possible. Does anyone have any suggestions on how to make this work or will I be forced to stick with my workaround?

Using "Vanilla JavaScript" in a Google Apps Script web application might not work to
retrieve the html content of a web page, extract a snippet, and paste the snippet to the page.
The above because the client-side code of Google Apps Script web applications is embedded in an iframe tag that can't be modified from server-side code. In other words, Google Apps Script is not universally suitable web development platform. One alternative is to use another platform to create your web application, i.e. GitHub Pages, Firebase, etc.
Related
Where is my iframe in the published web application/sidebar?

Related

Writing a Chrome extension to block websites

I am trying to implement a Chrome Browser Extension.
The Extension should take the web content (HTML + Javascript Code) of any user-requested Website
and should firstly block that content from displaying. Also, no Javascript Code should be executed at this point.
The Extension should then send the entire web content to my Python Flask Web Application and wait for a response. Based on the Response of my Web Application, the Web Content should either be allowed and normally displayed and executed or be disallowed while loading a premade disallow.html file.
I know how to implement the Python Web Application, and no further discussion of the logic inside that application is needed for answering this question. The part which I'm not sure about yet is blocking content and allowing or disallowing it based on the decision of my application.
Any help would be highly appreciated

What is causing the site to call Facebook for scripts and images?

We have a ASP.NET MVC website that contains no JavaScript file or image referenced from Facebook.net or Facebook.com.
Yet Firefox developer tools is showing these calls are happening and causing the site to load slowly.
How do I find out what is causing these calls?

These kind of transactions are usually connected to Facebook Like/Share/Follow buttons (Facebook plugins in general) embedded in the webpage.
Based on the few transactions, which you shared, I would say that you've added a Facebook Page/Group Plugin to your webpage.

It was coming from Google Tag Manager JavaScript code. Even though there were no hard coded references to Facebook in the project, they were present in the Google Tag Manager admin console.

How to scrape dynamic website - using python scrapy?

I can scrape static website using scrapy however, this other website that I'm trying to scrape has 2 sections in its HTML namely; "head" and "body onload". And the information that I need is in the body onload part. I believe that content is loaded after html is requested and thus the website is dynamic. Is this doable using scrapy? What additional tools do I need?

Check out scrapy_splash, it's a rendering service for scrapy, that will allow you to crawl javascript based web sites.
You can also create your own downloader middleware and use Selenium with PhantomJS (example). The downside of this technic is that you lose the concurrency provided by scrapy.
Anyway, I think splash is the best way to do this.
Hope this helps.

How to extract the dynamically generated HTML from a website

Is it possible to extract the HTML of a page as it shows in the HTML panel of Firebug or the Chrome DevTools?
I have to crawl a lot of websites but sometimes the information is not in the static source code, a JavaScript runs after the page is loaded and creates some new HTML content dynamically. If I then extract the source code, these contents are not there.
I have a web crawler built in Java to do this, but it's using a lot of old libraries. Therefore, I want to move to a Rails/Ruby solution for learning purposes. I already played a bit with Nokogiri and Mechanize.

If the crawler is able to execute JavaScript, you can simply get the dynamically created HTML structure using document.firstElementChild.outerHTML.
Nokogiri and Mechanize are currently not able to parse JavaScript. See
"Ruby Nokogiri Javascript Parsing" and "How do I use Mechanize to process JavaScript?" for this.
You will need another tool like WATIR or Selenium. Those drive a real web browser, and can thus handle any JavaScript.

You can't fetch the records coming from the database side. You can only fetch the HTML code which is static.
JavaScript must be requesting the records from the database using a query request which can't be fetch by the crawler.

Twitter Cards using Backbone's HTML5 History

I'm working on a web app which uses Backbone's HTML5 History option. In order to avoid having to code everything on the client and on the server, I'm using this method to route every request to index.html
I was wondering if there is a way to get Twitter Cards to work with this setup, as currently it can't read the page as everything is loaded in dynamically with Javascript.
I was thinking about using User Agents to detect whether it's the TwitterBot, and if it is, serving a static version of the page with the required meta-tags. Would this work?
Thanks.

Yes.
At one job we did this for all the SEO/search/facebook stuff etc.
We would sniff the user-agent, and if it was one of the following sniffers
Facebook Open Graph
Google
Bing
Twitter
Yandex
(a few others I can't remember)
we would redirect to a special page that was written to dump all the relevant data about the page for SEO purposes into a nicely formatted (but completely unstyled) page.
This allowed us to retain our google index position and proper facebook sharing even though our site was a total single-page app in backbone.

Yes, serving a specific page for Twitterbot with the right meta data markup will work.
You can test your results while developing using the card's preview tool.
https://dev.twitter.com/docs/cards/preview (with your static URL or just the tags).

Develop Reference

JavaScript is the programming language of the Web.

Vanilla JavaScript version of Google's UrlFetchApp - javascript

Related

Writing a Chrome extension to block websites

What is causing the site to call Facebook for scripts and images?

How to scrape dynamic website - using python scrapy?

How to extract the dynamically generated HTML from a website

Twitter Cards using Backbone's HTML5 History

Categories

Resources