Download external website with nodejs (including resources) - javascript

I want to download a webpage, say with nodejs. Meaning that I have an offline copy of the static page. It has to download the resources (like styles, javascript files, images etc) and update the references to local ones.
In any case I want an offline page that once opened looks exactly like the real page. Just like what happens when I choose file->save in a web browser.
Basically I want to replicate the function of
wget --page-requisites
(Although this does not download css and images properly)
The background is that I want to execute Javascript on an external website. This is (rightly) not possible due to cross-domain-policies. To avoid this, I just want to download the website and statically host it myself, execute my Javascript analysis-code and then delete it.

I'm sort of spit-balling on a solution that could work for this:
A package like js dom could be used to grab all the page's script, link, img, etc's source URLs. You could then GET and save each of those resources to your local environment and replace their src attributes with a new URL that points to your local copy. Then you could stringify the resulting HTML and save that as well. Then just serve the containing directory statically in Node.
Maybe just running wget --page-requisites from within node is the easiest solution?
I'll be interested to know what the final solution to this is. Hopefully something I said helps.


Open HTML with CSS in JupyterLab tab with full formatting

I create an HTML document using Sphinx. When I click on the index.html file it opens a browser and looks like this. The look depends on some .CSS and .JS files being executed:
If I open the same file from the JupyterLab file browser, it opens in a tab but looks much worse: .CSS and .JS are not displayed, and images are not displayed. It looks like this:
Is there a way to get JupyterLab to get JupyterLab to execute the .CSS and .JS and pass through any images linked in the text? The JupyterLab is running on a remote server, so I don't have the option of having it create a new browser process on my local machine, because the files are remote.
Using JupyterLab within JupyterHub (old school install with conda, no docker and such)
I've been stuck at this HTML Preview issue for a few weeks.
I have the very same use case as you (Sphinx stuff for a team to work on their docs).
So far, no luck.
It may or may not work (depending on... I'm not sure of...) if I'm using JupyterLab from the browser on the hypervisor hosting JupyterHub itself
It won't work if I'm using JupyterLab from the browser on my client machine.
I tried to mess around with
c.NotebookApp.allow_remote_access = True parameter with no luck
tried to put it in my profile ~/.jupyter/
tried to add it to general config file /path/to/conf/
=> Not sure of the right way to set this option on JupyterLab's JupyterHub install, nor if it's even a relevant option...
Well, security wise, it's not, that's a given (^^'), but Preview HTML is an important feature for Sphinx users, hope someone can help with this...
I also looked after nginx config, but you get the issue with or without the reverse proxy anyway...

static webpage change file

I'm making a simple website in GitHub pages. I have a text file in the /docs folder (I can move though) and I want to change it's content through index.html. I found a lot of back-end solutions but GitHub pages allows static webpages only. Is there a way to do so in static webpage and if so how to do it in javascript?
Since it's static pages, you can't rewrite the hosted file from the front end client. To update the text file contents, you'll need to do it through the github interface, or as a commit into your repository unfortunately
Changing content on the server requires code that runs on the server.
If you could do it with client side code, then every website would rapidly become defaced.
The closest you could do would be to store data on the client (e.g. via localstorage) and then have a script on the page read that data and edit the DOM locally. Obviously, changing the data would change it only for the particular browser and not for all visitors.
There are two answers to your question:
Technically, it is possible to change files on GitHub from a script:
GitHub's API allows you to update files through an HTTP request.
You could use Javascript to modify the contents of a file, and then send a request to GitHub's API to update that file. There are a few libraries that make it really easy to work with the API, but from here you have to figure it out yourself.
Here is the documentation for this:
Conceptually, it sounds like you are doing something wrong. Static webpages are called static because nothing changes. If you want to have dynamic content, you should really look into other solutions.

How to manipulate a local text file from an HTML page

I've generated an HTML file that sits on my local disk, and which I can access through my browser. The HTML file is basically a list of links to external websites. The HTML file is generated from a local text file, which is itself a list of links to the remote sites.
When I click on one of the links in the HTML document, as well the browser loading the relevant site (in a new tab), I want to remove the site from the list of sites in the local text file.
I've looked at Javascript, Flask (Python), and CherryPy (Python), but I'm not sure these are valid solutions.
Could someone advise on where I should look next? I'd prefer to do this with Python somehow - because it's what I'm familar with - but I'm open to anything.
Note that I'm running on a Linux box.
First, Javascript cannot modify the local filesystem from the context of a webpage. To allow that would be a massive security concern.
Any server-side web framework can do this, and Flask is a great one to use because it's so lightweight. The general steps you would want to take are:
When / is requested, load the list of links.
Change each link to point to /goto?line=<line_number>.
Display the list to the user.
Then when you click a link:
When /goto is requested, load the list of links.
Remove the line number from the list.
Save the list of links.
Return status code 302, with the real URL as the Location header.
There is many ways to do this
Here is the easiest 3
Use JavaScript
2 install wampserver or similar and use php o modify the file
3 don't use te browser to delete and instead use a bat file to open the browser and remove the link from the text file

Loading a local image with JavaScript/jQuery where I know the location, but not the file name

The new version of Linux Mint allows HTML 5 login window themes -- I'm trying to write one that will grab each user's wallpaper. These wallpapers are located in the folder /home/#USER#/.cache/wallpaper/, however the file name is not consistent and I need a programmatic way of determining it. Once I know the filename, the login screen will display the image correctly using the file:///.. format.
I don't have any tools other than client-side HTML/CSS/JavaScript[/jQuery/etc] available to me. Is there any way I can grab the file names in that directory, so that I can grab the wallpaper image?
EDIT: Figured it out! The browsers won't allow access to the file:/// resources at all, the mdm-theme-emulator will.
It looks like these files are located on the client machine, in which case you would not be able to access them using jQuery. Javascript does not have access to the local file system.
If you are sending the request through a server, you'd be able to use the server-side code (ASP.NET, PHP, etc.) to loop through the filenames

Does Javascript support the ability to get a directory listing?

I want to upload a bunch of image files to a directory that I've set up on my ISP's free hosting service. It's something like
I want my Javascript code to be able to get a directory listing and then preload whatever it finds.
But getting such a thing even possible? My impression is not.
I suspect I will have to instead rename my files to 00000.jpg and upward, and attempt to detect what files are there using try.
FYI, I know that my ISP does not support using FTP protocol to get a directory listing.
Thanks for any help.
Under the assumption that your JavaScript code is code on your pages and not code on your server, then no, there's no API provided for JavaScript in a web browser other than a server-side API accessible via HTTP that you would create yourself. If the directory full of files is on the server, then it's going to have to be some server-side code that delivers the directory listing anyway. You could write such code in the server-side programming environment of your choice (including a server-side JavaScript solution, if that's what you want and if such a thing is possible at your ISP). As Pekka notes, it may be possible to simply enable directory browsing in your server, though that's generally a fairly low-level service that will deliver some sort of HTML page to you, and parsing through that might be somewhat painful (compared to what you could get from a tailor-made service).
Another, simpler thing you could do would be to upload a manifest file along with the other image files. In other words, create the directory listing in some easy-to-digest form, and maintain it separately as a simple file to be fetched.
javascript not suport directory listing in a direct way. but you can create a directory dumper php file, and send via AJAX.

