Parsing and Modifying local HTML with Node.js

Parsing and Modifying local HTML with Node.js - javascript

I kind of have a weird question that's probably very basic. I'm pretty new to web development and front-end stuff in general.
Premise
First of all, my only real other experience with JS is using node.js to make a discord bot, as well as use D3 to make some basic GUI stuff.
I'm trying to make a GitHub pages wiki for a game I play. In the game, there are units, which I've encoded as JSON files, as well as a folder of images. All I really want is a basic index.html page with links to HTML pages for all the units. But since units are constantly being added, I am constantly updating the JSON so I want to be able to generate the HTML files for the units as well as updating index.html automatically.
The Method
Since all I really know is node.js, I've been trying to use that to make the HTML files. What I'm trying to do right now is this hacky thing of using fs to read in the HTML files, then use jsdom to parse them into something I can use jquery to edit. After that I just want to serialize the jsdom into a string and use fs to write to the file. Right now I can successfully deserialize with jsdom and serialize the html back with fs. But I'm having trouble editing the actual HTML.
Question
First of all, I have no idea whether this is a good way to do what I've described at all, I'm still in the very early stages, so I'm welcome to suggestions on other libraries or just completely redoing the project. The closest thing I've found that's similar to what I want to do is this GitHub snippet from 10 years ago, unfortunately, I'm sure a lot of it is outdated and my puny knowledge is not able to adapt to this to modern js: https://gist.github.com/clarkdave/959811. I was wondering whether someone could do that for me, or suggest a better way of doing what I'm trying to do. You can see what I'm currently trying to do here: https://github.com/aayu3/ATBotJSONDependencies

"In order to be able to generate HTML, and Auto-Update webpages based on a configuration, input, &/or data, you need a template engine. You might not know it, but that is what your building already, well sorta. You have the right idea, using fs to read your HTML, but your going to need to go a few steps further."
Read your HTML files, and store the read HTML as a string inside your Node.js code-base.
"sounds like you have done this already."
Build a Parsing-Engine
"You can build one, or implement a pre-built parser via a Node.js module, you can probably find a few different parsers on NPM."
Create a Library that Manipulates Strings
"You need to implement a library that can alter the HTML after it has been read. You can implement a simple library, custom for a single webpage, or you can build something that is a bit more dynamic in its use, something that you can re implement for a later purpose, which, semantically, would actually be closer to an API, rather than a library."
Build/Implement a Compiler
"The last thing you need to implement is a "compiler". This takes the HTML, that you have altered — which should be in the form of a string at this point — and compiles it back to HTML. Like the parser, you can implement a pre-built compiler, or write it yourself."
"Most people do not build their own engines, because its not financially feasible. To build something quickly, and easily, its far more efficient to use a tool that has been pre-built, and a lot of the tools that already have template engines built into them, have been built by very good opensource developers, or companies with a lot of money (e.g. React is built by Facebook)."
There's really no shortage to the options you have.
I have listed some alternate options you can choose below:
"Including a tutorial about building your own template engine, which is considered by most to be the hard way. The hard-way isn't always the worst way though. In this case, choosing the hard-way — building your own template engine — offers an insight to the mechanics of the popular tools that contemporary web developers use,"
The first is the obvious HTML Template Engine.
Right now Handlebars and Pug are really popular. A template engine is essentially a rendering tool that that coverts a special variable syntax used in your html to data that is stored, usually, in JSON format somewhere, and if that data is dynamic, the variable in your page will change depending on the state of your data. It might sound a bit confusing at first, but I promise this is quite simple. Bellow is the syntax commonly used.
// Your Data.
{
animal: "My little dog, Sophie.",
place: "Santa Rosa, Ca.",
}
// Your HTML template, or your input.
<h3>{{place}}</h3>
<p>{{animal}}</p>
// Your HTML output, that is compile and rendered by your template engine.
<h3>Santa Rosa, Ca.</h3>
<p>My little dog, Sophie.</p>
Here is a link to an online tool built by Handlebars, for experimenting with their template engine. In-fact I used that tool to create the example above.
You have two other options
For the second option, you could use a full blown framework like React or Vue. Personally I have never used React, but I have spent sometime with Vue. Vue implements the same Double Curly Bracket Syntax that the template engines use (which is demonstrated in the above example). Vue takes things a step further though, or it might be more correct to say Vue takes things 5-thousand and 42 steps further which is why some people choose it over a template engine, on the flip side it is also why many people choose to not use it (same goes for React). I am a Vanilla JS Guy myself...
...and as a Vanilla lover the 3rd Option you have, is what I do. Build a simple yet extremely powerful template engine for yourself. Though this may sound the most ludicrous and time consuming, its not. The frameworks will take you far more time to learn, and building a template engine for your self only takes a few functions, though it would be quicker to implement one that's already made for you from Option #1 above, however; Building your own template engine ensures you understand whats going on under that hood of yours, and not only that, but you become a mechanic to your own engine under your hood. You will be able to swap parts and make adjustments as you please. Bellow is a small bit of code that looks simple, but has a lot going on in it.
var render = (template, data) => {
return template.replace(/{{(.*?)}}/g, (match) => {
return data[match.split(/{{|}}/).filter(Boolean)[0]]
})
}
The snippet above was written by ShadowTime2000 at Hackernoon.com.
The snippet above is a great example for demonstrating the render function that is at the heart of most every JavaScript template engine (for the sake of correctness there are exceptions, but that is irrelevant here). The snippet above came from an entire guide that is a wonderful free resource that I suggest reading. Link is below.
Shadowtime2000's Tutorial on Template Engines
By the way I don't know shadow-time, and I am not trying to market his stuff. I really like the way he thinks. IMHO, Shadowtime2000 writes, very useful tutorials/guides, which is becoming increasingly harder to find the more I learn.
I hope something above helps you mate,
CHEERS!

Related

JavaScript Code-Splitting with hiding the code-endpoints

hey im not really familar with JavaScript or react.
So i hope i dont a too easy question:
i want to have a "one-page"-website, and want to change this page dynamically with ajax-request.
I have coded for example code for four visibility-levels (guest-user, normal user, moderator, administrator)
if you log in into my page and you are an admin, you get the JS-Code from all levels. For example in the json-response there is a list with URLs to the Javascriptcode destination.
If you log in as a normal user you should get only the normal-user js-code. The guest-user-js-code you already have; you got that at the time you entered the page.
So i guess the thing is clear, what i want.
But how i should implement this?
Are there some ready solutions out there?
https://reactjs.org/docs/code-splitting.html
maybe i have to adjust this here?
and maybe there are some good bundlers out there, that i can use, doing that splitting with hiding the endpoint urls (which i get if i have the rights from an ajax-request)?
lg knotenpunkt

As I said in the comments, I think that the question is very, very broad. Each one of the requests is a full standalone argument.
Generally speaking, I hope that this will led you to the right way.
You can split your code by using CommonJS or ES6 modules (read more here). That is to keep it "modular". Then, during the bundling process, other splitting techniques may be applied, but this will depend on your development environment and used tools.
Your best option for bundling would be Webpack without any doubt. However, directly dealing with Webpack or setting up a custom development environment is not an easy task. You'll certainly want to read about Create React App, which is a good place to start for a Single Page Application. It will allow you to write your code in a "modular" fashion and will bundle, split and process it automatically (uses Webpack under the hood).
Finally securing access must be done server-side (there is another world of available options there).

On-Disk Text Processing With Javascript

I have some html files that I need to do automated processing on, basically regex replaces, but also some more complex actions like copying select blocks of text from one file to another.
I want to create a series of scripts that will let me do this processing (it will need to be done more than once on different batches of files). It would be trivial to use Go for this (read the file into memory, regex, save to disk) but I am the only member of the project that's familiar with Go.
Javascript is a tad more ubiquitous, and I do have project members who are familiar with the language, so it's a better fit in that respect. If I'm not around later, someone else could edit the scripts.
Is there a simple way to write some JS scripts to do on-disk text processing? I'm looking for a cross-platform solution (OSX, Windows). Ideally, once the scripts are written, they can be executed by double-clicking an icon--there will be "not computer people" involved at some point.
Also, I'd like to be able to do some kind of alert/message box to inform the user of the success/failure of the script. (This may be a tall order, and is of secondary importance.)
What I've looked at:
Node.js was the first thing that popped into my head, because I know that it has file system access tools, and obviously regex capacity. But I've never used Node before, and based on the tutorials I've read, it seems like overkill for something this simple.
There's a whole slew of "javascript compiling" tools that you can find by googling around. Some are not cross-platform, some seem old or not actively maintained, etc. None of them caught my eye as easy to pick up and just write some JS scripts with.
Any thoughts?

Node.js is a simple solution and with it's framework you can create or later modify your script to your needs. This way you will not be locked down by someone else's code. And it is not that difficult to to use.
Here is a quick tutorial on accesing files using node.js
http://www.sitepoint.com/accessing-the-file-system-in-node-js/
And here is a quick tutorial on using a node module called Cheerio. It allow you to access html files using "jquery like syntax". You don't need to use regex.
http://maxogden.com/scraping-with-node.html
I worked on a project for a client once and it required parsing thru hundreds of html files to check and replace certain image files based on certain criterias. I wasn't familiar with node at the time so I read some tutorials and wrote the script in about an hour.
And as long as Nodejs' path is set, you can run it on the command line.

Some tips:
You need any kind of DOM HTML parser, not only JS nor specifically JS.
You can do that thing with Java with use of jTidy or jSoup libraries (I've used second one few times). It's pretty simple language to learn if you know JS and IDE like Netbeans helps a lot. So can be made quickly with that.
You can use PhantomJS to create some job files and create shell/batch code to run them on some files. You might need to write a generator for job files (like taking a list of files, creating job files for each and running them).
You can use Node.js which isn't much overkill, I'm sure any solution won't be trivial.
You can create an ETL for processing with for example Pentaho ETL (which has JS embedded as one of two scripting languages... but without DOM parser - for that one you would need to use a bit of Java there and some library in way similar to this article).
You can also do that with PHP with Simple HTML DOM Parser - so you can make a service online (or on local server) that takes those html files and throws out processed ones.

First I think you underestimate the complexity. The statement
"It would be trivial to use Go for this (read the file into memory,
regex, save to disk) but I am the only member of the project that's
familiar with Go."
is probably false. Parsing HTML with RegExp is just a bad idea. (Google it and you will see why)
Second, if you can trivially write the code using RegExps in Go, you can just as easily write the same thing in Javascript. They both support RegExp and file operations. If you are unsure about the Javascript/Node.js details, I suggest writing the trivial solution in Go and then translate the thing into Javascript with a colleague.
Since Javascript is a script language, writing command line utilities in Node.js is straight forward.
Some pointers to get you started
RegExp in Javascript
Building command line apps in Node.js

What is the suggested way to use template in Chrome content scripts?

I use several templates in a Chrome content script for elements I'm adding to matched pages.
Currently I'm storing these templates as string in the script, but I wonder if there are better ways to do this.

tl;dr answer - Yes, you can store them in a <script type="text/html"> tags or load them dynamically through ajax. Examples here (from KnockoutJS) and here. Store them in a file with the proper extension and use an empty tag with an id X, then $("#X").load("/path/to/template", function() { renderX(); })
Long answer with insightful thoughts, keep reading...
Please make sure templates/views or related GUI components of your system are in separate files (Keep reading to know why)
As a front end engineer, I learned to keep layers as separate as possible; this helps your team to understand better your code and make it more maintainable. Separating your layers in modules and then assembling them through an "assembly" mechanism is probably one of the best practices in software engineering. Some of the benefits of this practice include:
Maintainability: Multiple developers can browse and edit single parts of your code in order to create a more robust piece of software.
Readability: With the amount of languages around, you can't expect everyone to understand all the syntax of these X or Y language; mixing languages in a single file is just one step to confusing your code reviewer and make him spend more time than needed.
Accessibility: Take for instance, html, jade, haml, smarty, twig, erb or other template files. Those files should always be named with the proper extension in order to help code editors and IDE's to syntax highlight. A developer should only need to glance a folder to know what those files are supposed to do. A script or bot can come up with important information from just an extension.
By keeping views in separate files, other coders can view them and understand an important layer of the system without needing to understand the entire application; even collaboration gets easier when the developer just need to review those specifics files. Through bashing or scripting, even large architectural systems with thousand of "views" like files can be filtered in order to output just what needs to be reviewed.
(By now I hope I convinced you into removing your string from your code and create a new file and it there, otherwise I really need to improve my writing skills)
So, after we have moved our template to an external file, what do we do next?
A word on Javascript and Chrome-Extensions
Javascript doesn't have a default templating feature (heck, it doesn't even have a modular one, although some smart people are working on it for the ECMAScript Ed. 6, yay!), which means that we need to use a templating library for that. Let's assume you are using Mustache, Underscore template of something alike and thus using its library to render it.
Most of those templating engines use the infamous eval, which can provide a vulnerability to your code, which Chrome Extensions dislike, a lot. The Chrome Extension Dev Team even enforced a new version of the manifest.json file in order to forbid eval, and gave developers the choice to use Sandboxing Pages in order to do so. Luckily for us, they decided to relax the policies a little bit and we can continue use it with a proper CSP definition in our manifest.json.
This means that we need to tackle two problems instead of only one: the "load the template" one and the "render the template in a way won't freak out the new incoming versions of CSP in case the Chrome Extension Dev team change their mind".
Load the template
Load the template, luckily, can be done through an XML HTTP Request through AJAX. Just point the url with the name of the file and you can receive the template as a string, which was probably your original set up. Here's an example that I do with KnokcoutJS:
$("#aiesecTemplate").load('js/libs/aiesec/aiesecTemplate.html', function() {
ko.applyBindings(AIESECViewModel);
});
The #aiesecTemplate is a <script type="text/html"> tag that the browser won't render as part of the DOM. You can use that with other template mechanisms in order to actually assemble your DOM. If you have already a solution for this, this is probably the end of the answer and you can move on with your life. If you are wondering how do we render the code from there, keep reading.
Render the template
The Chrome Dev team suggest us to Sandbox our rendering process due most templating engine libraries being non-CSP compliant (AngularJS being an exception). Here'an excerpt of the code from the Sandbox page.
iframe.contentWindow.postMessage(message, '*');
Where iframe is a specific DOM Iframe Element from the sandbox page with a src attribute of a page that has the templating engine; message has the string template previously loaded, so after posting the message a window.addEventListener for message inside the iframe can render it without a problem. More information about sandboxing eval can be read here
Conclusion
If you made it to here, awesome! My answers might not be that boring. As a last note, you might be thinking "What about AMD or RequireJS?"; to be honest I haven't tried them, but really smart people think that AMD is not the best approach. Loading through a XML HTTP Request might not be better, but in case you think it hits your performance (I have used it in my application and it doesn't) you can always use some Event Pages and Web Workers with that.

using haml/jade in kanso couchapp

following this post, I took a look at kanso.
From this I learnt that people are not afraid to load to the database context complicated modules if they need them, and that encouraged me a lot.
So I tried kanso. It gave me some trouble that implied immaturity - however, it demonstrateds a great potential.
(mostly compatibiliy view with npm, with node 0.6.x, and some open edge-cases)
So I looked a little deeper.
I saw it comes with a templates engine of it's own.
But what if I want to reuse templates that are written already by another standard?
(for example - haml, or even better - jade that has also a nice text-to-text JS implementation, and a well growing jQuery plugin - same link - see end of document ).
Does anybody here know how coupled the templates engine with the kanso types mechanism, and how simple should it be to use other template engines instead?
Or, what are the limitations I take upon myself when trying to use my own templates?

As far as I can see in the source, DustJS (the template engine of kanso) is not pluggable (like in expressjs for example). That being said, it probably won't be that hard to plug in a different templating engine, the code doesn't seem very complicated.
You might want to add something to this issue on the GitHub page and request for a pluggable templating mechanism.

The Web 2.0 Ecosystem/Stack

Being new to front-end website development, I can understand some stuff, things like routes, ORM, etc. What I don't understand is how they all play together. My understanding is, there are a bunch of components for a website built with Pyramid/Django etc:
A templating engine: Something for you to abstract away your HTML from your code. Makes sense.
SQLAlchemy et al: An ORM. Fine.
A renderer. No idea.
JS libraries: JQuery et al:
No idea what use these are except for adding pretty effects. How does this interact with the templating engine? How does this interact with the entire framework? Can I write code for jquery in Pyramid, or do I write JS separately, plug in my JS file into my template or...?
Form templating libraries (formish, formalchemy et al): How do these relate to the big picture? where do they plug in?
Any other important components that I'm missing?
So, could someone help me out and explain the stack?

1) A templating engine: Something for
you to abstract away your HTML from
your code. Makes sense.
There's several of these available. Mako tries to utilize many common Python idioms in the templates to avoid having to learn many new concepts. Jinja2 is similar to Django, but with more functionality. Genshi is if you like XML based templating.
As someone new to the whole thing, it's hard to say which is easiest to begin with unfortunately. Perhaps Jinja2.
2) SQLAlchemy et al. An ORM. Fine.
Yep.
3) A renderer. No idea.
A renderer is a Pyramid view configuration option, which tells Pyramid that if your view returns a dict, then it should be passed to the given 'renderer'. Renderers are setup to work with extension names, and Pyramid comes with several built-in:
http://docs.pylonsproject.org/projects/pyramid/1.0/narr/renderers.html#built-in-renderers
In short, the renderer option merely looks at the name you pass it, and finds a template engine that matches the extension (.mak, .pt, 'json', 'string', .etc), and renders the dict results with it.
In many frameworks you don't designate a renderer as configuration, but instead have some code inside the view which looks something like this:
def somefunc(request):
return render_to_response('/some/template.mak', {})
In Pyramid, you could do the same thing with:
#view_config(renderer='/some/template.mak')
def somefunc(request):
return {}
There are several reasons the latter is a useful capability:
When it's entirely in configuration, you can override the renderer without having to change the view code logic.
You can add multiple configurations that change the renderer based on other conditions.
Consider this example which changes the renderer based on if the HTTP request is an XHR (AJAX request that wants a JSON formatted result, instead of a general HTTP request that wants HTML spit out by the template engine).
#view_config(renderer='json', xhr=True)
#view_config(renderer='/some/template.mak')
def somefunc(request):
# lookup some_dict_data in a db, etc.
return some_dict_data
4) JS libraries: JQuery et al. No idea
what use these are except for adding
pretty effects. How does this interact
with the templating engine? How does
this interact with the entire
framework? Can I write code for jquery
in pyramid, or do I write JS
separately, plug in my JS file into my
template or...?
JS libraries make it easier to write Javascript. They interact in the browser with the DOM, and have no interaction with Pyramid beyond sending HTTP requests to your web application that might want JSON formatted results.
To begin with, I'd suggest ignoring Javascript entirely until you're much more familiar with HTML, the DOM tree, and getting a site that works with just HTML, CSS, and the web-application.
5) Form templating libraries (formish,
formalchemy et al) How do these relate
to the big picture? where do they plug
in?
I would highly suggest ignoring those entirely, and writing basic HTML form elements. You're new to the whole web stack, and there's really no need to jump straight to the most advanced aspects of web development without getting familiar with the basics first.
What you will need though, after writing basic forms, is you will want a form validation library that makes it easier to verify that the form which was submitted contains valid parameters. Back in the old days of PHP, people would write hundreds of lines of if/else statements that went through forms (some still do! ack!).
Nowadays we use form validation libraries which make it easy to declare what the valid parameters are for a form. I'd suggest FormEncode to begin with, as its fairly easy to use just for validation. For Pyramid, the easiest way to get going with FormEncode is probably pyramid_simpleform:
http://packages.python.org/pyramid_simpleform/
For now, ignore the form rendering part and write the HTML form elements in the template yourself, and use pyramid_simpleform just for the easy FormEncode integration.
In short, start with just displaying HTML pages with links using view functions and templates (and use URL dispatch, its easier to grasp than traversal for beginners). Then add forms, their HTML and validation, then add CSS to start styling things.
Next you can start with some basic Javascript with jQuery to make things move around on the page, and work your way up to interacting with the webapp via AJAX to get more data. Just don't tackle too much at once, and it should be easier to see how they fit together.

3) A renderer. No idea.
Generally a renderer takes your data/model and converts it into something that the client wants. If the client is just a browser then the renderer will usually mash your data through a template to produce HTML. If the client is some JavaScript code or a non-browser application (a desktop application, another server that is consuming your data, ...) then the renderer would usually produce JSON (or possibly XML). You can think of this as a serialization or marshalling system.
4) JS libraries:
These are what you use to program the user interface. The user interface may just be some pretty effects slapped on top of HTML but it could be a lot more. Google Docs, for example, is JavaScript and a bit more than pretty effects; Cloud9 IDE would be another example full application built with JavaScript (thanks to Raynos for another example).
5) Form templating libraries
You can think of these as (more or less) macro systems for the template engine. If you have a data schema then you can use these things to generate template chunks and to automatically handle the server side processing of the corresponding return data.
Any other important components that I'm missing?
You can think of the modern web stack as a traditional client server system; this will probably anger some people but there's nothing radically new here except possibly the scale. The client is built with HTML and CSS for the layout and JavaScript (possibly with a toolkit) for the functionality and eye candy. The server is a web server of some sort. Communication between client and server is usually done in a combination of JSON and HTML over HTTP. You can think of web-1.0 (may deity forgive my marketing-talk terminology) as old school dumb terminals where web-2.0 is more like an X-terminal with some brains on the client.

Develop Reference

JavaScript is the programming language of the Web.