Access a page's HTML [closed] - javascript

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
Is it possible to take a link and access its HTML code through that link? For example I would like to take a link from Amazon and put it within my own HTML code, use JavaScript to getElementsByClassName to get the price from that link and display it back into my HTML code.

It is possible. You could do a GET request to the Amazon page that will give you the html in the response from there you'll have a string now you'll need to format it, last time I used the node module jsdom to do that.
In more detail:
HTTP is a protocol that we use to request data from the server, I've wrote an explanatory node js script:
const https = require('https');
const JSD = require('jsdom');
const { JSDOM } = JSD;
const zlib = require('zlib');
// The http get request
https.get('https://www.amazon.com', (response) => {
html = '';
// we need this because amazon is tricky and encodes the response so it is smaller hence it is faster to send
let gunzip = zlib.createGunzip();
response.pipe(gunzip);
// we need this to get the full html page since it is too big to send in one amazon divides it to chunks
gunzip.on('data', (chunk) => {
html += chunk.toString();
});
// when the transmittion finished we can do wathever we want with it
gunzip.on('end', () => {
let amazon = new JSDOM(html);
console.log(amazon.window.document.querySelector('html').innerHTML);
});
});

Related

JS - how to batch multiple dynamic imports into one? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
I have two separate but dependent dynamic imports requested from my JS application at virtually the same time. How can I avoid make two import calls or batch them into one?
You can combine the power of URL.createObjectURL() and dynamic imports to import multiple files in one HTTP call.
Server setup
You'll obviously need some sort of api to fetch multiple files in one API call. For this you'll need to make the server somehow send multiple files in one HTTP response. The syntax can vary, but for this example I'm using the syntax GET /a.js+b.js, which will return a string.
Example: 16 24;export default 3export default [2, 3, 5]. This has two files, one with a length of 16 characters, and one with 24. The numbers before the ; are like metadata for the contents of the files. Idk you might put the metadata in the headers or something, but this example uses the ; seperating the metadata and contents.
Client side code
I created a function called fetchMultiple, which is like fetch, but it returns a Promise<Array<Promise< the data exported by the files >>>.
// I created a syntax where it goes
// {length of file 1} {length of file 2} {...lengths of files};(no \n)
// {contents of file 1} {contents of file 2} {...contents of files}
const mockServerResponses = new Map()
.set('a.js+b.js', '16 24;export default 3export default [2, 3, 5]')
// The real thing would fetch the files from the server
const fetchMultiple = async (...urls) =>
mockServerResponses.get(urls.join('+'))
// You could probably optimize this function to load the first script as soon as it is streamed, so that the first script will be loaded while the second one is still being streamed.
const importMultiple = async (...urls) => {
const result = await fetchMultiple(...urls)
const semi = result.indexOf(';')
const lengths = result.slice(0, semi).split(' ').map(str =>
parseInt(str))
const rawContents = result.slice(semi + 1)
let currentIndex = 0
const contents = []
for (const length of lengths) {
contents.push(rawContents.slice(currentIndex, currentIndex + length))
currentIndex += length
}
return contents.map(content =>
import(URL.createObjectURL(new Blob(
[content],
{ type: 'application/javascript' }
))))
}
importMultiple('a.js', 'b.js')
.then(promises => Promise.all(promises))
.then(console.log)
In case the snippet stops working (like a Content Security Policy change), here is the link to the repl: https://replit.com/#Programmerraj/dynamic-import-url#script.js.
Optimizations
What could make the example above slow is that it waits for the entire two (or more) files to get fetched, and then it loads them. Since the files are streamed file 1, file2, ...files, faster code would load file 1 as soon as it is available, and load the other files as they get downloaded.
I didn't implement this optimized stream stuff because I didn't setup a server which streams a response, but you could for maximum efficiency.

Get the rendered HTML from a fetch in javascript [duplicate]

This question already has answers here:
How can I dump the entire Web DOM in its current state in Chrome?
(4 answers)
Closed 3 years ago.
I’m trying to fetch a table from a site that needs to be rendered. That causes my fetched data to be incomplete. The body is empty as the scripts hasn't been run yet I guess.
Initially I wanted to fetch everything in the browser but I can’t do that since the CORS header isn't set and I don’t have access to the server.
Then I tried a server approach using node.js together with node-fetch and JSDom. I read the documentation and found the option {pretendToBeVisual: true } but that didn't change anything. I have a simple code posted below:
const fetch = require('node-fetch');
const jsdom = require("jsdom");
const { JSDOM } = jsdom;
let tableHTML = fetch('https://www.travsport.se/uppfodare/visa/200336/starter')
.then(res => res.text())
.then(body => {
console.log(body)
const dom = new JSDOM(body, {pretendToBeVisual: true })
return dom.window.document.querySelector('.sportinfo_tab table').innerHTML
})
.then(table => console.log(table))
I expect the output to be the html of the table but as of now I only get the metadata and scripts in the response making the code crash when extracting innerHTML.
Why not use google-chrome headless ?
I think the site you quote does not work for --dump-dom, but you can activate --remote-debugging-port=9222 and do whatever you want like said in https://developers.google.com/web/updates/2017/04/headless-chrome
Another useful reference:
How can I dump the entire Web DOM in its current state in Chrome?

java scraping data hidden in html and script [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I would like java to display a particular line of a webpage. This line is a src link to a jpg on a server. But Jsoup methods or OpenStreamReader methods cannot get the line that is generated only when a pin on a map is pushed.
Here is a site:
https://webgispu.wigeogis.com/kunden/omvpetrom/client/map.php?BRAND=OMV&LNG=SI&CTRISO=SVN&MODE=NEXTDOOR&VEHICLE=CAR
which displays this data for one gass station at a time in a frame that opens only when you click on a pin in a map. What's more, src link to .jpg with a gas price changes every two hours. I would like to get my program to get to those jpg-s but I donno how. When I use OpenStremReader to get to the html of this site I cannot figure out where to next.
Here is a line of code (it is an img tag)I am looking for( it is an eksample,' tmp2C31' changes every 2 hours):
'img src="https://webgispu.wigeogis.com/temp/tmp2C31.tmp.png" alt="" title="" style="margin-bottom:5px;display:block;" class="preisImageClass" '
Please have a look at the upper link and sugest which classes and methods should I adopt in my program. I have already read about OCRs so no need to explain geting data from jpgs.
thanx
I think what you're looking for is a HTML parser. In my opinion, the best parser is jsoup.
From the site:
jsoup is a Java library for working with real-world HTML. It provides a very convenient API for extracting and manipulating data, using the best of DOM, CSS, and jquery-like methods.
With this, you can specify what you want to display on your program straight from the html document.
this code vil return a html.txt file:
public void htmlToTxt(String startSite) throws Exception {
URL u = new URL(startSite);
InputStream is = u.openStream();
InputStreamReader isr = new InputStreamReader(is);
BufferedReader br = new BufferedReader(isr);
BufferedWriter bw = new BufferedWriter(new FileWriter("htmlž.txt"));
String code = new String();
while ((code = br.readLine()) != null) {
bw.write(code);
bw.newLine();
}
bw.close();
br.close();
isr.close();
is.close();
}
public static void main(String[] args) throws Exception {
TestOMV a = new TestOMV();
a.htmlToTxt(
"https://webgispu.wigeogis.com/kunden/omvpetrom/client/map.php?BRAND=OMV&LNG=SI&CTRISO=SVN&MODE=NEXTDOOR&VEHICLE=CAR");
}
}

Loading contents from another page into a variable as a string [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
Is it possible to get the HTML of a div from another page in a variable as a string, so I can run a regex search to find a specific number?
If that other page is in your own domain:
jsBin demo
$('<div />').load("otherpage.html", function(data){
var num = /\d+/.exec( $(this).find("#number").text() );
console.log( num ); // 45
});
Note: the above presumes the desired number is somewhere inside the #number element. The regex /\d+/ is used to get all the numbers from that element.
If the page is not in your domain:
jQuery load external site page
you'll first need to get that page content using PHP with file_get_contents. After the desired content is on your server you'll not run any more into security issues and you can than respond to AJAX with the grabbed content.
If you're loading content from the same domain, the answer #lawrence overflow linked to in his comment will do the trick:Load content from external page into another page using Ajax/jQuery
JS with jQuery:
$(document).ready(function() {
$("#main").load('sourcePage.html #content');
});
Otherwise, you'll need to use server-side technology. Here's a Node.js server that proxies for another site:
JS (Node and Express):
var request=require('request');
var express = require('express');
var app = express();
//Put the source URL here:
var URL='http://www.nytimes.com';
app.get('/', function (req, res) {
res.type('.html');
request(URL,function(err,response,body){
res.send(body);
});
});
var server = app.listen(3000, function () {
var host = server.address().address;
var port = server.address().port;
console.log('listening at http://%s:%s', host, port);
});

The Right Way? Including JS in both NodeJS and HTML [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
Currently I have a object that is used for passing message from client to server.
var JSONMessage = function() {
this.sender = "";
this.method = "";
this.arguments = "";
}
I want this object to be available both to the server (NodeJS) as well as the client (HTML). Currently I am doing the following below the above object.
if ( typeof module === 'undefined' ) {
console.log("must be client side!");
}
else {
module.exports = JSONMessage;
}
And in the nodeJS file I do the following
var JSONMessage = require('./public/js/message');
While in the HTML I can simply include the js file.
My question is, is this the best way sharing code between Node and Javascript?
You should check out the umdjs patterns hosted in this Github Repo:
https://github.com/umdjs/umd
What you are doing will work, but you can eliminate the guess work by using one from that repo.

Categories

Resources