Update all href anchor links in folder of HTML files - javascript

I am using Gatsbyjs to generate a static site, this outputs a folder of static HTML files.
I have a requirement to host these HTML files on Microsoft SharePoint - this requires the .html to be converted to .aspx in order for them to run.
I have a postscript which updates all .html to .aspx (this works nicely).
However, all the generated links point to the folder:
link
In order for this to work on sharepoint, I need to update every href in each html file to point to the index.aspx file in each folder:
link
What's the best way to do on post build? Ideally, I'd like to include this as part of my post-build script. Can this be achieved with webpack? Or am I better off using something like JSDOM to loop through each file and update each of the links?

You are probably better off using cheerio, which is lighter than jsdom and supports most of the jquery syntax.
var html = fs.readFileSync(input.html);
const $ = cheerio.load(html);
var output = $('a[href="folder"').attr('href', '/folder/index.aspx').html();

Related

Loop to load all existing HTML files into one content

How can I load all html files in a folder into a content? I have about 125 html files in a folder src and need to load them in a content called .content.
$(".content").load("src/1.html");
is this doable on client side using JavaScript?
I belive you need to use PHP to get the numbers of files on a specific folder on your root directory.
http://php.net/manual/en/function.scandir.php
If you read this, you will be able to get the folder array by using PHP and they you can use JS or JQuery to do rest.

collect all the js css and img resources used in a html file

I want to write a npm package to localize an html url.
1. using the html url download the html page
2. parse the html file, extract all the js, css and img files used in the html and local these resources.
3. If these js, css and img files using some external resources, localize these resources. For example, extract background image in the css.
The first and second requirements are easy to meet. But I have no idea about the last one.
I can parse the all the css files and localize the resources used in it. But how can I parse the js files?
For example:
If the js adds a 'script src = XXX' tag into the html dom, how can I extract the src?
I think I would try to use a headless browser to catch every network calls instead of trying to parse the code.
I didn't used it personally but PhantomJS seems to fit the bill.
It can be used to load a webpage then execute any script / css that would normally happen on the request and execute stuff once the page is loaded.
The network monitoring features are probably what you'll want to use.

Using external JavaScript files in a WinForms/WPF WebBrowser control

I found a cool feature on a website, implemented in JavaScript, I'd like to use it as is in my desktop application (for personal use).
During my experiments I managed to generate custom HTML on the fly, feed it to the browser using webBrowser1.DocumentText = [my generated HTML]
I've managed to put some inline JavaScript into the HTML, and hook it up via a ScriptManager so that I can call the JavaScript from my C# code, pass a value to it, and get a return value.
But the feature I'm trying to use is a bit more complicated: it's no less than 10 JavaScript files. 2 of them are referenced directly in the web page the usual way <script src="/js/script1.js" type="text/javascript"></script>
The other 8 are loaded in one of the scripts:
var elem = document.createElement("script");
elem.type = "text/javascript";
elem.src = "/js/" + filename;
document.body.appendChild(elem);
These 8 files are in fact data files, even though the data is represented in JavaScript. They're pretty large, over 1MB each. Stuffing it all into the HTML file seems quite stupid. Also, the script that loads the data creates a "file map" and further refers to the data based on which file it's in:
var fileMap = [
[/[\u0020-\u00ff]/, 'file1.js'],
[/[\u3000-\u30ff]/, 'file2.js'],
[/[\u4e00-\u5dff]/, 'file3.js'],
...
I don't want to resort to modifying the JavaScript, because it's not exactly my strong point. So the browser needs to "see" the js files in order to be able to use them. I thought of creating the file structure locally, and navigating the browser there. But I don't want any loose files in my solution. I'd like to have everything embedded if possible. And I doubt I can get the browser to navigate to an embedded resource, and see other embedded resources as files. Any idea how I could get around this?
EDIT:
I've tried to do it with local files. No luck. I get the HTML to load properly, but when I try to invoke a JavaScript call, nothing happens. I tried pointing the browser to those js files, to make sure they're there. They are. I tried an element with src attribute pointing to an image in the same subfolder as the script files. It gets rendered. It's as if external js files refuse to load.
I had a similar need as your scenario and I addressed it using two key points embedded in two other Stack Overflow answers. As noted by SLaks' answer here the first key is using the syntax file:/// as the prefix for an absolute path to external files. The second is using .Replace("\\", "/") for an absolute file path as listed in Adam Plocher's answer and one of his follow-up comments here.
In short, the final output for each external file in an HTML page will look something like:
<link href="file:///c:/users/david/myApp/styles/site.css" rel="stylesheet" type="text/css">
or
<script src="file:///c:/users/david/myApp/scripts/JavaScript1.js"></script>
Using the format in the samples above in my HTML file resulted in the WebBrowser control loading external CSS, image or script files.
The details and solving the scenario in the question
In the womd's answer in the first referenced SO answer above he used the method System.IO.File.ReadAllText() to load script files and embedded the text of the script files into the <head> tag. As you indicated in your question loading script files directly into the HTML page is not what you're looking to do.
The solution below involves using the same System.IO.File.ReadAllText() method but loads the text of the HTML page instead. The premise works similar to the Razor View Engine in ASP.NET.
The main idea in the solution below involves adding a temporary string in an HTML page that will be loaded into the WebBrowser control and then replacing this temporary string in a C# method in my app just before the HTML page is set to be loaded into the WebBrowser control.
Here are the basic steps to my solution:
Add a temporary string for each external reference in the HTML file.
Declare a variable for the absolute path in a script tag within the HTML file. This step is not necessary unless you're going to use the absolute path elsewhere within your JavaScript code. Your scenario involves delay loading external script files via JavaScript code so this step was necessary.
Modify the src property in the JavaScript code that delay loads the other script files with the absolute path variable.
Add a method in your app to loads the HTML page file as a text string and then replaces all temporary string instances with an absolute path containing the prefix 'file:///'. The absolute path should have forward slashes.
Set the 'DocumentText' property on the WebBrowser control to the updated HTML.
Set the 'Copy to Output Directory' of each external file in your project to 'Copy always' or 'Copy if newer'. This step may not be necessary if you have a fixed location to your external files and that location is not within the build or publish directory used by Visual Studio.
The following are the details for each step. I added a lot of detail that you can skip. I was verbose to reduce any confusion since the steps make changes to several places in the project.
1. Using a temporary string
I used the string "/ReplaceWithAbsolutePath/" but you can use any distinct text. Each reference to an external file in the HTML page looks like:
<!DOCTYPE html>
<html lang="en" xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta charset="utf-8" />
<title></title>
<link href="/ReplaceWithAbsolutePath/styles/site.css" rel="stylesheet" type="text/css">
<script type="text/javascript">
var absolutePath = "/ReplaceWithAbsolutePath/";
</script>
</head>
<body>
<p>My web page</p>
<script src="/ReplaceWithAbsolutePath/scripts/JavaScript1.js"></script>
</body>
</html>
2. Declare absolute path variable
Note in the above HTML page I listed a <script> tag with the declared variable 'absolutePath' set to the temporary string. (In the HTML page above the variable is added a global variable and that is not necessarily best practice. You can declare the variable within a namespace instead of declaring it in the global namespace.)
3. Modify the delay load script to include absolute path variable
Add the 'absolutePath' variable to your JavaScript file that delay loads other JavaScript files containing your data.
elem.src = absolutePath + "/js/" + filename;
4. C# method to replace all temporary string instances
Within your project add the following line to your form load event handler or place this line somewhere in your initialization of the WebBrowser control.
webBrowser1.DocumentText = GetUpdatedHtmlWithAbsolutePaths("/ReplaceWithAbsolutePath/", "HTMLPage1.html");
Add the following method to your code. Update the call to the method in the line above with the name of the class instance where the following method is placed.
// The result of this method will look like the following example:
// <script src="file:///c:/users/david/documents/myApp/scripts/JavaScript1.js"></script>
public string GetUpdatedHtmlWithAbsolutePaths(string tempPathString, string htmlFilename)
{
// Get the directory as the application
// stackoverflow.com/questions/674857/should-i-use-appdomain-currentdomain-basedirectory-or-system-environment-current
// Note that the 'BaseDirectory' property will return a string with trailing backslashes ('\\')
string appDirectory = AppDomain.CurrentDomain.BaseDirectory;
// Replace '//' with '/' in the appDirectory string
appDirectory = appDirectory.Replace("\\", "/");
// Read all of the HTML text from the HTML page file
string html = System.IO.File.ReadAllText(appDirectory + #"\" + htmlFilename);
// Replace all '/ReplaceWithAbsolutePath/' strings within the HTML text with
// the absolute path on the local machine
html = html.Replace(tempPathString, "file:///" + appDirectory);
return html;
}
5. Set the DocumentText property of the WebBrowser control
I added the initialization of the WebBrowser control in the form load event handler but you can, of course, add the line that sets the DocumentText property wherever you initialize your WebBrowser control.
private void Form1_Load(object sender, EventArgs e)
{
// Set the document text of the web browser control with the updated HTML
webBrowser1.DocumentText = GetUpdatedHtmlWithAbsolutePaths("HTMLPage1.html");
}
6. Set the 'Copy to Output Directory' of each external file
Take a look at the answer posted by Matthew Watson in this Stack Overflow question if you want your external files included in your solution/project file structure.
You can add files to your project and select their properties: "Build
Action" as "Content" and "Copy to output directory" as "Copy Always"
or Copy if Newer (the latter is preferable because otherwise the
project rebuilds fully every time you build it).
Then those files will be copied to your output folder.
This is better than using a post build step because Visual Studio will
know that the files are part of the project. (That affects things like
ClickOnce applications which need to know what files to add to the
clickonce data.)
In short, add the external file to your project. You can add the external to any subfolder in your project. (In Visual Studio 2013 or 2015 -- I don't have VS2012) Right-click on the external file in the Solution Explorer and select Properties from the context menu. The Properties pane will be displayed. In the Properties pane change the setting for 'Copy to Output Directory' to 'Copy always' or 'Copy if newer'.
Use View Source to verify absolute path strings
Run your project and it should load your external files in the WebBrowser control. Assuming you have not set the property wbChartContainer.IsWebBrowserContextMenuEnabled = false; in code or in the Properties pane for WebBrowser control you can right-click on the WebBrowser control when your form is running. Click 'View Source' from the context menu and check the paths to your external resources in the View Source window.

HTML/Javascript: Enabling folder access from a subdirectory

I have a simple HTML file with some JavaScript that I would like to run locally (as opposed to deploying to a server). It is embedded inside a larger project whose file structure I would like to maintain. For example, the structure is something like this:
project level folder > src folder containing folders & files I would like to probe
> separate, non-project util folder > HTML & JS files I would like to run against src
I am aware that certain browsers do not allow this for security reasons (as pointed out here), but since I control all of the files - is there a way for the src folder/files to somehow indicate that they will allow the 'separate, non-project util folder' to access them? Maybe some kind of project-specific settings somewhere? I am aware that this can be done in server settings, but as I mentioned above I'd like to be able to run it locally without the need for a server.
The JavaScript that is attempting to access the src files uses RequireJS, in case that helps.
Here is what I ended up doing:
I wasn't able to provide full access exactly this way, but instead I setup a dummy HTML page in the project level folder that clicks itself to redirect to the HTML file located in the separate, non-project util folder. This allowed me to keep everything but that one, very small file separate but not have issues with file access.

How to save whole page as one file so that it works offline (including external javascript)?

I need to be able to save a page on my website to my harddrive, so that i can use it both online and offline. The thing is, the page uses references to javascript and CSS files outside it's own folder. It is very important that i can save the whole page as one .html file, so that all the javascript and CSS code from the external files are in that file as well.
Is there a way to do this?
This should be programatically
As you request, this can be done programmatically with lets say Python.
The pattern of the code would look like this:
Request the user to paste the url in a box
wget or curl the page and use regex to find out where the included codes are
OR: use a library like SGML to interact with the HTML tags directly
Put all the linked CSS, JS etc. files in a List
Fetch all the linked files their contents and put them in a List
Rebuild the HTML source code and strip out the and tags stuff
Now loop the linked files's contents in the tags like this:
newHeaderContent = ''
for content in linkedFilesArray:
newHeaderContent = newHeaderContent + content
newHTML = firstHTMLCode + newHeaderContent + lastHTMLCode

Categories

Resources