Skeleton website that will parse external file and display in HTML - javascript

I am creating a website that is meant to be a sort of user start guide with an introduction page, an overview page, and a page featuring the actual guide which will feature step-by-step instructions for the user to follow. Since I don't have the specific content for the page, I'm " hard coding" the HTML with placeholder text. Going forward, I want to create a skeleton HTML framework that will parse the start guide files JavaScript and display the information on the page in its expected place. I'm strongly considering using XML (or JSON) files and parsing it with JavaScript and using browser DOM methods because that's what I know. However, I'm working on a team of people who most likely won't know how to convert the start guides (which are PDFs) into well-formed XML so I'm wondering, are there any alternative ways to do this? I know Word has the ability to save documents as HTML and XML but they don't create the proper attributes/tags to be parsed with the JavaScript I'd write.
TLDR: Are there files or alternative methods other than JSON and XML (something that non-tech savvy people will understand) that can be parsed and processed with JavaScript to display in a skeleton website?

Related

Converting HTML Template to pdf in JavaScript

I am currently using weasyprint to convert a HTML template to pdf. Now my company has decided to move from python to javascript for increased user experience or something.
Is there a PDF renderer in JavaScript that can do the following:
Use data on the client. I have data that under no circumstances can be transferred over the internet. (That is the reason I can not create the pdf on backend side or use external renderers.)
Use CSS page numbering
Use features like page headers and footers on print medium.
And of course can do the usual CSS layout.
Ideally there is something that can be used in python and javascript. But Weasyprint works great with python, so I can keep that if there is no "one tool to do the job everywhere".

Should JS dynamically generate metadata/the whole page?

So I am going to have many pages that have a bunch of text in them, that a JS and CSS file will convert to a colored and everything webpage. I noticed that the text is usually going to be long, and since there are going to be many webpages, I should lower file size. Also since I don't want to ruin file quality, I have decided that my JS file is going to take the text and make a webpage out of it. Side Note: what I am trying to do is make tutorial pages, so I am going to use JS to generate a lot of the things that are on every tutorial page, like the lessons list, to lower file size.
I have noticed that metadata (<head> content) usually takes up some space that JS could generate, so I thought, Why don't I just generate this with JS? But then arose the problem that the some browsers might not parse it, or it might be slow to parse it. So I am asking here on Stack Overflow:
Should JavaScript generate metadata (and maybe almost the whole page, like remove the <head> tag completely and generate it with JS)?
It depends on your desired result.
Google has improved it's SEO mechanisms to render your page before indexing it, see here:
https://developers.google.com/search/docs/guides/javascript-seo-basics
However other bots may not do the same, such as social media crawlers like facebook or twitter that read Open Graph meta tags, or other search engines like Baidu.
If a bot doesn't render your document then the javascript doesn't get executed and your meta isn't present.
Additionally, if your initial document does not contain the stylesheets or other CDNs it takes a bit longer for the client. Imagine the process:
With head
fetch document
fetch resources
render content
Without head
fetch document
render content
fetch resources
re-render
That's way over-simplified but it demonstrates my point.
Alternative:
If your content is so dynamic, you might consider Server Side Rendering (SSR) or Pre-Rendering
You would build your pages programmatically and store/serve them all, or build them on the server-side as they are requested.
https://developers.google.com/web/updates/2019/02/rendering-on-the-web

Attach PDF to PDF as attachment (not as a page) via Javascript in HTML (not in Acrobat)

I would like to generate a PDF portfolio using JS from an HTML/CSS page on a local machine. I would use a PDF template file which includes a PDF portfolio Navigator in SWF form. I have successfully accomplished this using C# and a command line program, but can not identify the proper Javascipt components to do this browser-side or pseuo-server with Node.js. Basically, I am looking for something which will allow me to append a PDF to a new or existing PDF via configuration choices and an 'assemble' action using a JS or HTML button. iTextSharp provides the required PDF interaction functionality, but I can not figure out to run this inside an HTML to allow configuration via the HTML/CSS DOM (i.e. checkboxes, text field desciptors, etc...). Does a library with this type of functionality exist?
So you want to create a PDF using JavaScript?
On a quick google search, I found what appears to be a javascript library for creating and manipulating PDFs call jsPDF
If you want information on how to upload files with JavaScript alone, here is an article on how to do that. It also shows you how to use the file element.
For style, I recommend using a CSS Framework is you don't know much about CSS. I personally use Twitter Bootstrap for quickly prototyping things. It's quick and easy, and has good documentation. You can also use this to see how to make a form in HTML. I haven't got any good starter tutorials for HTML off the top of my list, sorry.
If you don't know much about JavaScript, when it comes to getting the options from the form, so that you can use them as configuration options, I'd suggest using the jQuery framework. It'll help you get up and running quickly enough
Note, all of this shouldn't replace basic training in JavaScript and HTML/CSS. Frameworks make things simpler, but if you don't know how to do something without a framework, you're going to have a hard time with a lot of the more complicated things. This goes for every language

Tools to retrieve static content from JSP and JS files - ANTLR?

I am trying to find out if there are any tools that exist that will get all the static content (for localization) from JSP and JS files. We want to automate the process of finding the static content from JSP and create resource bundles with them.
After some analysis it seems like some form of this can be achieved using ANTLR and XML grammar for ANTLR - http://www.antlr.org/wiki/display/ANTLR3/1.+Lexer
That is use ANTLR with XML grammar to parse JSP and use StringTemplate to output in to property bundle.
Kindly let me know if somebody has attempted same succesfully.Any help or pointer is greatly appreciated. Thanks
What you always end up discovering is that a half-baked solution is half-baked. Trying to parse JSP or JS (for the latter you really mean HTML with embedded JS?) you need parsers that will handle JSP and HTML. XML looking like HTML won't cut it; you'll just end up with parsing errors.
So, you can try to bend ANTLR's XML parser and with sufficient effort you might succeed. But then this project turns in to one of bending parsers, rather than doing localization.
Our DMS Software Reengineering Toolkit has full parsers for JSP and for JavaScript embedded in HTML, and for HTML. These parsers build full ASTs automatically and make their content available for custom output purposes.

What is the best way to convert HTML into Excel

I have an HTML page which has a flash chart(FusionCharts) and HTML table. I need to convert this whole thing into Excel. HTML table should be displayed in cells of excel sheet. Flash chart can be displayed as an image.
Is there any open source API that we could use for achieving this. Could you let me know what are the possible options.
Can this be done by using javascript alone.
The HTML table is relatively easy. You can download the page, parse the HTML (there are various HTML parsing libraries available), extract the table and convert it into CSV (which Excel can load), or directly create an Excel file, e.g. using Java POI, as suggested above.
The Flash part is significantly harder. There are quite a few tools available to capture flash to an image, you'd need to use one of them. This can be tricky, as Flash might be interactive, so you'd possibly have to remote-control the Flash part so it shows the right image before capturing. Hard to tell without more info.
That said, screen-scraping (which is what you're doing) is always labour-intensive and fragile. You should really push for a better interface to get your data from, it will save loads of hassle in the long run.
Just set the content type of the page to "application/vnd.ms-excel". If the html page is just a table it will open with excel and look perfect. You can even add background colors and font styles.
Try some of these content types
application/excel
application/vnd.ms-excel
application/x-excel
application/x-msexcel
Excel can convert HTML tables by default. The easiest way to force it to do this is to save the HTML file with an XLS extension. Excel will then open the XLS as if it were its native workbook.
There's a very good Java POI api that would let you do that, but it's Java.
http://poi.apache.org/
If you're on Win32 you can also use Excel's COM api, there are quite a few tutorials on the net.
I cannot offer any advice on the Flash part, but I have done HTML table to Excel many times. Yes, Excel can open HTML tables but most HTML tables out there have extraneous crap in them that can make it fragile to consistently parse the tables.
CPAN module HTML::TableExtract is a wonderful module that allows you to focus on the non-presentation specific aspects of the table you are trying to extract. Just specify the column headings you are interested in and maybe specify the title or class of the table and you are mostly set. You might have to post process the rows returned a little, but that is considerably easier than dealing with the underlying tag soup in all its glory.
Further, for output to Excel format, stick with Spreadsheet::WriteExcel rather than the OLE interface. That way, you do not depend on having Excel installed for your program to work and things go a little faster.
Make sure you specify the data type of cells if you do not want content to be changed automatically by Excel upon opening the files (another reason I do not like sending around CSV files). Use a configuration file for formatting information so that you can change how the spreadsheet looks without having to change the program.
You can always use Excel's built-in charting facilities to replace the web site graphs.
This combination has enabled me to generate pretty good looking documents comprising several hundreds of megabytes of scraped data (with logos and image links etc) using just a few hundred lines of Perl and a couple of days' work.
What you're trying to do is fragile and difficult to maintain. You should attempt to create a csv feed to fetch the data. All it takes is for someone to come along and modify the HTML and your scraper will throw up on it (probably years after anyone remembers how your program works).
Try to get CSV and image data from the original source (ie, database or whatever) and build the Excel file from that.
I will add to SpliFF's answer that when you have your data as a CSV file you can set the mime type of the page to application/vnd.ms-excel which will open the page in Excel

Categories

Resources