Service worker alter cached response body [duplicate] - javascript

This question already has answers here:
DOM manipulation inside web worker
(3 answers)
Is there a way to create out of DOM elements in Web Worker?
(10 answers)
Parsing XML in a Web Worker
(2 answers)
Closed last year.
I have a service worker, if the network is down, it serves up a cached response.
The HTML of the body is in response.body. I wish to alter it to add a banner to alert the user that this page is in fact not live data, but is a cached page.
Is there a way to alter the page with DOM manipulation, e.g.
const reponse = await cache.match(request);
const body = await response.text();
document = buildDocument(body); // <- !!! Imaginary desired function
const banner = document.getElementById('banner');
banner.innerHTML = "This is a cached page!";
reponse.body = document.innerHTML;
return response
At the moment I guess one might have to settle on regex/replace of text:
adjusted = body.replace('<span id="banner"></span>', '<span id="banner">This is a cached page!</span>')

Related

Get only HTML <head> from URL

My question is similar to this one about Python, but, unlike it, mine is about Javascript.
1. The problem
I have a large list of Web Page URLs (about 10k) in plain text;
For each page#URL (or for majority of) I need to find some metadata and a title;
I want to NOT LOAD full pages, only load everything before </head> closing tag.
2. The questions
Is it possible to open a stream, load some bytes and, upon getting to the </head>, close stream and connection? If so, how?
Py's urllib.request.Request.read() has a "size" argument in number of bytes, but JS's ReadableStreamDefaultReader.read() does not. What should I use in JS then as an alternative?
Will this approach reduce network traffic, bandwidth usage, CPU and memory usage?
Answer for question 2:
Try use node-fetch's fetch(url, {size: 200})
https://github.com/node-fetch/node-fetch#fetchurl-options
I don't know if there is a method in which you can get only the head element from a response, but you can load the entire HTML document and then parse the head from it even though it might not be so efficient compared to other methods. I made a basic app using axios and cheerio to get the head element from an array of urls. I hope this might help someone.
const axios = require("axios")
const cheerio = require("cheerio")
const URLs = ["https://stackoverflow.com/questions/73191546/get-only-html-head-from-url"]
for (let i = 0; i < URLs.length; i++) {
axios.get(URLs[i])
.then(html => {
const document = html.data
// get the start index and the end index of the head
const startHead = document.indexOf("<head>")
const endHead = document.indexOf("</head>") + 7
//get the head as a string
const head = document.slice(startHead, endHead)
// load cheerio
const $ = cheerio.load(head)
// get the title from the head which is loaded into cheerio
console.log($("title").html())
})
.catch(e => console.log(e))
}

Access a page's HTML [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
Is it possible to take a link and access its HTML code through that link? For example I would like to take a link from Amazon and put it within my own HTML code, use JavaScript to getElementsByClassName to get the price from that link and display it back into my HTML code.
It is possible. You could do a GET request to the Amazon page that will give you the html in the response from there you'll have a string now you'll need to format it, last time I used the node module jsdom to do that.
In more detail:
HTTP is a protocol that we use to request data from the server, I've wrote an explanatory node js script:
const https = require('https');
const JSD = require('jsdom');
const { JSDOM } = JSD;
const zlib = require('zlib');
// The http get request
https.get('https://www.amazon.com', (response) => {
html = '';
// we need this because amazon is tricky and encodes the response so it is smaller hence it is faster to send
let gunzip = zlib.createGunzip();
response.pipe(gunzip);
// we need this to get the full html page since it is too big to send in one amazon divides it to chunks
gunzip.on('data', (chunk) => {
html += chunk.toString();
});
// when the transmittion finished we can do wathever we want with it
gunzip.on('end', () => {
let amazon = new JSDOM(html);
console.log(amazon.window.document.querySelector('html').innerHTML);
});
});

Get the rendered HTML from a fetch in javascript [duplicate]

This question already has answers here:
How can I dump the entire Web DOM in its current state in Chrome?
(4 answers)
Closed 3 years ago.
I’m trying to fetch a table from a site that needs to be rendered. That causes my fetched data to be incomplete. The body is empty as the scripts hasn't been run yet I guess.
Initially I wanted to fetch everything in the browser but I can’t do that since the CORS header isn't set and I don’t have access to the server.
Then I tried a server approach using node.js together with node-fetch and JSDom. I read the documentation and found the option {pretendToBeVisual: true } but that didn't change anything. I have a simple code posted below:
const fetch = require('node-fetch');
const jsdom = require("jsdom");
const { JSDOM } = jsdom;
let tableHTML = fetch('https://www.travsport.se/uppfodare/visa/200336/starter')
.then(res => res.text())
.then(body => {
console.log(body)
const dom = new JSDOM(body, {pretendToBeVisual: true })
return dom.window.document.querySelector('.sportinfo_tab table').innerHTML
})
.then(table => console.log(table))
I expect the output to be the html of the table but as of now I only get the metadata and scripts in the response making the code crash when extracting innerHTML.
Why not use google-chrome headless ?
I think the site you quote does not work for --dump-dom, but you can activate --remote-debugging-port=9222 and do whatever you want like said in https://developers.google.com/web/updates/2017/04/headless-chrome
Another useful reference:
How can I dump the entire Web DOM in its current state in Chrome?

Load an external javascript file as a node process [duplicate]

This question already has answers here:
Load and execute external js file in node.js with access to local variables?
(6 answers)
Closed 4 years ago.
I'm writing integration tests for purescript FFI bindings with google's API map.
The problem Google's code is meant to be loaded externally with a <script> tag in browser not downloaded and run in a node process. What I've got now will download the relevant file as gmaps.js but I don't know what to do to actually run the file.
exports.setupApiMap = function() {
require('dotenv').config();
const apiKey = process.env.MAPS_API_KEY;
const gmaps = "https://maps.googleapis.com/maps/api/js?key=" + apiKey;
require('download')(gmaps, "gmaps.js");
// what now???
return;
};
For my unit tests, I must later be able to run new google.maps.Marker(...). Then I can check that my setTitle, getTitle etc. bindings are working correctly.
This is a duplicate question of this one. The correct code was.
exports.setupApiMap = async function() {
require('dotenv').config();
const apiKey = process.env.MAPS_API_KEY;
const gmaps = "https://maps.googleapis.com/maps/api/js?key=" + apiKey;
await require('download')(gmaps, __dirname);
const google = require('./js');
return;
};
The key was to download to __dirname before using require. That said my specific use cases didn't work since google's API map code just can't be run in a node process. It must be run in a browser.

How to read the generated source (html with DOM changes) of a webpage within javascript?

I want to read a webpage programmatically (with javascript-angular) and search some elements inside. What i have until now is:
$http.get('http://.....').success(function(data) {
var doc = new DOMParser().parseFromString(data, 'text/html');
var result = doc.evaluate('//div[#class = \'xx\']/a', doc, null, XPathResult.STRING_TYPE, null);
$scope.all = result.stringValue;
});
so in the example i can read the value of any html element.
Very unluckily, the page i want to read uses some Javascript and the source code (html) is just a part of its entire html source (including DOM changes), which the browser at the end shows. So the html which is returned from the http get, does not necessarily contain the elements i need.
Is there a way of getting the entire html after the javascript run?
Edit: Yes the page is from another domain + The provided API does not give me the info i need.

Categories

Resources