On-Disk Text Processing With Javascript - javascript

I have some html files that I need to do automated processing on, basically regex replaces, but also some more complex actions like copying select blocks of text from one file to another.
I want to create a series of scripts that will let me do this processing (it will need to be done more than once on different batches of files). It would be trivial to use Go for this (read the file into memory, regex, save to disk) but I am the only member of the project that's familiar with Go.
Javascript is a tad more ubiquitous, and I do have project members who are familiar with the language, so it's a better fit in that respect. If I'm not around later, someone else could edit the scripts.
Is there a simple way to write some JS scripts to do on-disk text processing? I'm looking for a cross-platform solution (OSX, Windows). Ideally, once the scripts are written, they can be executed by double-clicking an icon--there will be "not computer people" involved at some point.
Also, I'd like to be able to do some kind of alert/message box to inform the user of the success/failure of the script. (This may be a tall order, and is of secondary importance.)
What I've looked at:
Node.js was the first thing that popped into my head, because I know that it has file system access tools, and obviously regex capacity. But I've never used Node before, and based on the tutorials I've read, it seems like overkill for something this simple.
There's a whole slew of "javascript compiling" tools that you can find by googling around. Some are not cross-platform, some seem old or not actively maintained, etc. None of them caught my eye as easy to pick up and just write some JS scripts with.
Any thoughts?

Node.js is a simple solution and with it's framework you can create or later modify your script to your needs. This way you will not be locked down by someone else's code. And it is not that difficult to to use.
Here is a quick tutorial on accesing files using node.js
http://www.sitepoint.com/accessing-the-file-system-in-node-js/
And here is a quick tutorial on using a node module called Cheerio. It allow you to access html files using "jquery like syntax". You don't need to use regex.
http://maxogden.com/scraping-with-node.html
I worked on a project for a client once and it required parsing thru hundreds of html files to check and replace certain image files based on certain criterias. I wasn't familiar with node at the time so I read some tutorials and wrote the script in about an hour.
And as long as Nodejs' path is set, you can run it on the command line.

Some tips:
You need any kind of DOM HTML parser, not only JS nor specifically JS.
You can do that thing with Java with use of jTidy or jSoup libraries (I've used second one few times). It's pretty simple language to learn if you know JS and IDE like Netbeans helps a lot. So can be made quickly with that.
You can use PhantomJS to create some job files and create shell/batch code to run them on some files. You might need to write a generator for job files (like taking a list of files, creating job files for each and running them).
You can use Node.js which isn't much overkill, I'm sure any solution won't be trivial.
You can create an ETL for processing with for example Pentaho ETL (which has JS embedded as one of two scripting languages... but without DOM parser - for that one you would need to use a bit of Java there and some library in way similar to this article).
You can also do that with PHP with Simple HTML DOM Parser - so you can make a service online (or on local server) that takes those html files and throws out processed ones.

First I think you underestimate the complexity. The statement
"It would be trivial to use Go for this (read the file into memory,
regex, save to disk) but I am the only member of the project that's
familiar with Go."
is probably false. Parsing HTML with RegExp is just a bad idea. (Google it and you will see why)
Second, if you can trivially write the code using RegExps in Go, you can just as easily write the same thing in Javascript. They both support RegExp and file operations. If you are unsure about the Javascript/Node.js details, I suggest writing the trivial solution in Go and then translate the thing into Javascript with a colleague.
Since Javascript is a script language, writing command line utilities in Node.js is straight forward.
Some pointers to get you started
RegExp in Javascript
Building command line apps in Node.js

Related

Parsing and Modifying local HTML with Node.js

I kind of have a weird question that's probably very basic. I'm pretty new to web development and front-end stuff in general.
Premise
First of all, my only real other experience with JS is using node.js to make a discord bot, as well as use D3 to make some basic GUI stuff.
I'm trying to make a GitHub pages wiki for a game I play. In the game, there are units, which I've encoded as JSON files, as well as a folder of images. All I really want is a basic index.html page with links to HTML pages for all the units. But since units are constantly being added, I am constantly updating the JSON so I want to be able to generate the HTML files for the units as well as updating index.html automatically.
The Method
Since all I really know is node.js, I've been trying to use that to make the HTML files. What I'm trying to do right now is this hacky thing of using fs to read in the HTML files, then use jsdom to parse them into something I can use jquery to edit. After that I just want to serialize the jsdom into a string and use fs to write to the file. Right now I can successfully deserialize with jsdom and serialize the html back with fs. But I'm having trouble editing the actual HTML.
Question
First of all, I have no idea whether this is a good way to do what I've described at all, I'm still in the very early stages, so I'm welcome to suggestions on other libraries or just completely redoing the project. The closest thing I've found that's similar to what I want to do is this GitHub snippet from 10 years ago, unfortunately, I'm sure a lot of it is outdated and my puny knowledge is not able to adapt to this to modern js: https://gist.github.com/clarkdave/959811. I was wondering whether someone could do that for me, or suggest a better way of doing what I'm trying to do. You can see what I'm currently trying to do here: https://github.com/aayu3/ATBotJSONDependencies
"In order to be able to generate HTML, and Auto-Update webpages based on a configuration, input, &/or data, you need a template engine. You might not know it, but that is what your building already, well sorta. You have the right idea, using fs to read your HTML, but your going to need to go a few steps further."
Read your HTML files, and store the read HTML as a string inside your Node.js code-base.
"sounds like you have done this already."
Build a Parsing-Engine
"You can build one, or implement a pre-built parser via a Node.js module, you can probably find a few different parsers on NPM."
Create a Library that Manipulates Strings
"You need to implement a library that can alter the HTML after it has been read. You can implement a simple library, custom for a single webpage, or you can build something that is a bit more dynamic in its use, something that you can re implement for a later purpose, which, semantically, would actually be closer to an API, rather than a library."
Build/Implement a Compiler
"The last thing you need to implement is a "compiler". This takes the HTML, that you have altered — which should be in the form of a string at this point — and compiles it back to HTML. Like the parser, you can implement a pre-built compiler, or write it yourself."
"Most people do not build their own engines, because its not financially feasible. To build something quickly, and easily, its far more efficient to use a tool that has been pre-built, and a lot of the tools that already have template engines built into them, have been built by very good opensource developers, or companies with a lot of money (e.g. React is built by Facebook)."
There's really no shortage to the options you have.
I have listed some alternate options you can choose below:
"Including a tutorial about building your own template engine, which is considered by most to be the hard way. The hard-way isn't always the worst way though. In this case, choosing the hard-way — building your own template engine — offers an insight to the mechanics of the popular tools that contemporary web developers use,"
The first is the obvious HTML Template Engine.
Right now Handlebars and Pug are really popular. A template engine is essentially a rendering tool that that coverts a special variable syntax used in your html to data that is stored, usually, in JSON format somewhere, and if that data is dynamic, the variable in your page will change depending on the state of your data. It might sound a bit confusing at first, but I promise this is quite simple. Bellow is the syntax commonly used.
// Your Data.
{
animal: "My little dog, Sophie.",
place: "Santa Rosa, Ca.",
}
// Your HTML template, or your input.
<h3>{{place}}</h3>
<p>{{animal}}</p>
// Your HTML output, that is compile and rendered by your template engine.
<h3>Santa Rosa, Ca.</h3>
<p>My little dog, Sophie.</p>
Here is a link to an online tool built by Handlebars, for experimenting with their template engine. In-fact I used that tool to create the example above.
You have two other options
For the second option, you could use a full blown framework like React or Vue. Personally I have never used React, but I have spent sometime with Vue. Vue implements the same Double Curly Bracket Syntax that the template engines use (which is demonstrated in the above example). Vue takes things a step further though, or it might be more correct to say Vue takes things 5-thousand and 42 steps further which is why some people choose it over a template engine, on the flip side it is also why many people choose to not use it (same goes for React). I am a Vanilla JS Guy myself...
...and as a Vanilla lover the 3rd Option you have, is what I do. Build a simple yet extremely powerful template engine for yourself. Though this may sound the most ludicrous and time consuming, its not. The frameworks will take you far more time to learn, and building a template engine for your self only takes a few functions, though it would be quicker to implement one that's already made for you from Option #1 above, however; Building your own template engine ensures you understand whats going on under that hood of yours, and not only that, but you become a mechanic to your own engine under your hood. You will be able to swap parts and make adjustments as you please. Bellow is a small bit of code that looks simple, but has a lot going on in it.
var render = (template, data) => {
return template.replace(/{{(.*?)}}/g, (match) => {
return data[match.split(/{{|}}/).filter(Boolean)[0]]
})
}
The snippet above was written by ShadowTime2000 at Hackernoon.com.
The snippet above is a great example for demonstrating the render function that is at the heart of most every JavaScript template engine (for the sake of correctness there are exceptions, but that is irrelevant here). The snippet above came from an entire guide that is a wonderful free resource that I suggest reading. Link is below.
Shadowtime2000's Tutorial on Template Engines
By the way I don't know shadow-time, and I am not trying to market his stuff. I really like the way he thinks. IMHO, Shadowtime2000 writes, very useful tutorials/guides, which is becoming increasingly harder to find the more I learn.
I hope something above helps you mate,
CHEERS!

Portable Javascript Application with String to File Output

I am using Javascript wrapped in HTML to simplify the task of one of my students. Her task is to create a text file for a research project, which will act as a configuration file for the analysis software.
I decided on Javascript, because I wanted portable, transparent code, with zero dependencies (no libs, no server, no installation), yet a familiar feel from the web that is easy to get started with. However, client side Javascript appears to have its limits when it comes to handling file output.
There are multiple questions and answers on Stackoverflow that address the issue by appealing to server-side solutions, external dependencies, and the newer HTML5 download element.
I have considered if I should use a complementary script or batch file that reads the output, but I am not sure about how to best implement such a layer. The file is complex to generate and this is achieved using form elements.
Another idea would be to package the script as an executable. For example, a browser could be called from Java, or the HTML/JS could be converted somehow. Perhaps there is a wrapper that I am not familiar with.
This is one of those side projects that is fast to code and so I would assume that there is a go-to solution among programmers for this type of problem. On the one hand, this is a packaging problem. On the other hand, it is about some of the limitations with Javascript for projects that run without a server backbone.
How can I deliver a no-bells-and-whistles Javascript application that is local only and capable of handling file I/O?

What is the purpose of Node.js ? [eg: while implementing a graph algorithm on data set available on a server]

I have been using JS for simple front-end scripting for a while now, but am absolutely new to Node.js. After some surfing, I found out certain stuff about Node.js that it is fast, event-driven,uses modules, can be used both on server and client side, can be run from command line, etc.
As a project, the following task has been given to me:
"To develop a graph algorithm (such as minimum spanning tree) in javascript using node.js. Use the larger of the following graphs as inputs: http://snap.stanford.edu/data/ " [the link contains data from various network sites organised as nodes and edges and stored in .txt files]
Now I know how to implement a graph algorithm in a language (such as C), can even do it in JS using arrays. But I need some help regarding the "using node.js" part of the problem. What is its purpose in the problem ? Which of its features should I look up ?
Typically JS was made to run inside a browser.
Node.js is actually a javascript runtime invokable. You can invoke it from commandline. This means you can execute files of code from commandline like many other languages which you might be already familiar with. Beyond, that there is nothing much from your context.
But, yes it is fast, event-based, async and like server-scripting languages has server-handling capabilities inbuilt. That said it can be used for non-server contexts as well. Like computation in your case.
Node JS helps you to run backend logic, which is written on Javascript Language.
For example, in PHP, when you write backend code, you need some kind of application which will get all clients requests and run specific code to handle it. In PHP it will be done via Apache Server. In Java it will be done via Glassfish/JBoss/Tomcat.
Node JS is something like them, but for Javascript code.

Higher-level Web Page I/O with javascript (analogous to file I/O with shell scripts)?

What I am trying to do (no alternative suggestions please, I know more conventional solutions) Easily scrape data from web pages such as images without writing any site-specific code (e.g. to get the biggest image at a particular URL). This is just ONE possibility.
What I'm dreaming of (for other uses too)
I know you can scrape using many APIs that use a DOM model. But surely someone's thought of something higher level? One of the most attractive things about shell script is the data manipulation you can do with basic file I/O with basic commands: Grep plus regular expressions (awk, sed, perl) can instantly put you in touch with goldmines of file-based data. What shell scripts are to files, javascript should be to web pages. But code gets so messy when you address things by tags and attributes. Wouldn't it be wonderful if there was some kind of API like this?
# determine the biggest image by checking images[0].height etc.
$("< http://www.cnn.com/man-has-three-eyes.html").images[0].url
Has such an API ever been attempted? I'm guessing not. If not, what makes this technically unrealistic? If so, what kind of javascript frameworks are the closest thing to offering this?
(If it doesn't, I should file for trademark protection on the brand name "Scrapy Eye" or "ScrAPI" or something!)

Advice on starting JS or JQuery for file processing

My knowledge on web technologies (JS, JQ) are limited and I want to start learning them. As a starting point I want to do some file processing. Because it is something I have to do for my work and was planning to do it in Java. What I basically need to do is to go through a list of text files (assembly files) in a folder and search for routines and then list them. This is the first step and is a trivial task in Java.
But I wanted to take this a step further and do it in the browser, so that others in my team also can use it without installing anything (and also to impress them a little bit in the process. since I'm the new guy in the team :-)).
So when I input the folder, the script will go through the files and search and will display results in a web page. Basically first page will be a list of files in the folder, and clicking a file name will take me to another page which displays the routines in that file.
Sorry to bother you with details, but what I actually want to know are:
Is this possible with JS? (to
search for text patterns in a file)
Should I start with JS or JQ? (I
think many would recommend starting
with JS, but since this is a side
project and this is done purely in
my own time, would you suggest start
learning JQ because it's relatively
simpler to learn (from what I have
read) for a beginner?
Or should I just do the processing
in JAva and then interface the
results to a webpage
Any advice is appreciated.
Thank you very much.
Java and JavaScript have nothing to do with each other, jQuery is library written to simplify usage of JavaScript with some handy shortcuts.
I'm afraid JavaScript would not be able to parse text files as its main usage is manipulating content inside browser window and limited by different security policies.
To parse files you have to chose server side language.
maybe you can use java to deal with the file processing, and then send the result to js script , which will show these results to users.
js's ability is limited
For security reasons, JavaScript is sandboxed within the browser, and has basically no access to the local file system. From what you have described, it sounds like your best option is to use Java to process ...whatever...
This function has nothing to do with web browsing. Why is a browser the best tool for the job, anyway?

Categories

Resources