Get html form from xsd - javascript

I have a quite complex xsd file that describe some objects (it's not important, but it's the DATEX II standard)
Do you know if there is an automatic way to create an html form that act like a "wizard" to guide the user to create xml object as described in the xsd?

The answer to this depends on the intended user base, how you want your users to access your forms, and the technology stack you have in place already or you're willing to deploy.
If your users are quality control analysts, and so the intent is to have them use that generated UI to manage test cases, then a handful of commercial tools have this ability. A quick search on Google for terms such as "generate ui forms from XSD to test web services" should give you on first page the major players in this space (I won't name names to avoid conflict of interests). There are differences in how vendors approach this, that have to do with the time it takes to generate these forms from large bodies of XML Schema, which in turn translate into different degrees of usability. Given what I see in DATEX, from a complexity perspective, you may have a hard time to find a free tool for this...
If your users are rather data entry specialists, then the above are not the tools you want them to use. Without knowing much about your environment (I see your java-ee tag, but still not clear how it would relate to this task), one model could be a combination of InfoPath with SharePoint; while the process to generate the form is not fully automatic, it comes close to that. It is XSD driven, in the sense that at design time you drag and drop XSD on a design form, that allows you to build some really nice UI. Follow their competition on your particular technology stack and you may have your answer. Or, you can go to this site that lists XForms implementations; IBM's form designer, much like InfoPath, can use XML Schema for design, etc.
If this is for developers to get some XML, another alternative might also be to go with an Excel based (or SharePoint lists) approach and generate XML from that data (you give away cost to acquire something to build specific to your requirements tooling, here assuming people that are really familiar with spreadsheets instead).
Given how DATEX model looks like, you'll have to do some manual customizations anyway, if you plan to use the extensibility model, or if you choose to build different forms for different scenarios i.e. instead of one big form that'll give you all descendents for the abstract payloadPublication in some drop down, to just a specific, simple form e.g. MeasurementSiteTablePublication.

I know this is an old question, but I ran into this issue myself and couldn't find a satisfying open-source solution. I ended up writing my own implementation, XSD2HTML2XML. To my knowledge it is the most complete tool out there. It supports automatic generation of HTML forms from XSD, including the population of a form with XML data.
If you prefer an out-of-the-box solution rather than writing your own, please see my implementation, XML Schema Form Generator.

Related

OCR a scanned file and retrieve the metadata

I am using Alfresco community 6.1.
I have thousands of invoices to scan, OCR them (near 100% recognition) and retrieve the needed metadata (Partner, Invoice Number, Amount, Units,Currency,...).(All of this in Alfresco)
Based on these metadata retrieved i need to do some operations on the invoices ( Move them to appropriate folders, apply some workflows...).
As a first approche:
For the OCR I used Alfresco Simple OCR Action, but the result is not very accurate (far from 100%).
For retrieving the results I convert the PDF OCRed to a plain text file and then i search it's content using javascript with document.content ... But since the OCR is not accurate i can't tell if it's the best solution to search inside the document.
So my questions are :
How can I make the OCR results more accurate?
How to retrieve important data from the invoice? is the method i'm using good enough or very poor for such processing?
Im using pdfsandwich, and my alfresco-global.properties is:
ocr.command=/usr/bin/pdfsandwich
ocr.output.verbose=true
ocr.output.file.prefix.command=-o
ocr.extra.commands=-verbose -lang eng
ocr.server.os=linux
I'm afraid this question is off topic: https://stackoverflow.com/help/on-topic
Some input anyway:
I highly recommend to do all the ocr/classification/extraction outside / before storing the pdfs in Alfresco
The technical term for what you're looking for is: Document Capture
If you really expect to classify your scanned docs and to extract the data for inbound documents (which you can't control in structure) the solutions are quite expensive and licensed per pages/period. Market leaders are Kofax and Abbyy in that area.
If you can control the document structure / if the structure of the document is fix you could use quite cheaper solutions which use something like a dynamic template approach (depending on found ancor points, barcodes, regex matches). We use PDFmdx for this to automate qualified extraction.
Everything depends on the OCR quality. My personal opinion: the free/open source ocr components can't compete with the commercial solutions if you don't have the time, exprtise and resources to train and optimize them. Abbyy has a quite affordable CLI solution for linux (ABBYY FineReader Engine CLI for Linux) but I'm sure there are others with similar results.
There is a quite nice and simple solution called AutoOCR which is a REST-/SOAP-Service providing a generic, configurable interface to use several ocr engines and configurations as a service. We implemented an Alfresco integration to act as an Alfresco Transformer but since the Alfresco Transformer framework is deprecated I'd recommend to do the whole ocr and recognition stuff before storing the documents in Alfresco
Finally: if it is a one time approach: Try to find a service provider doing at least the ocr and maybe also the classification/extraction.
To answer your questions.
To improve OCR results you need to pre-process image. That includes noise removal, line removal, thresholding, etc. But none of them helps if the engine is not working precisely. Tesseract from version 4.0.0 is working well enough for most applications.
Your approach may work in some cases but it will not work great on a large set of invoices. I suggest using some of the invoice data extraction services. In that case, you don't need to worry about preprocessing and extraction itself. You could use:
typless
Klippa
Taggun
Using such a service can save you a lot of headaches and time.
Disclaimer: I am one of the creators of typless. Feel free to suggest edits.

What is a good example of a strategy for achieving SEO-friendliness in a javascript-heavy application?

Intro
I know this has been asked before but the questions I found were either to specific or to general to provoke the kind of answer I was looking for. The best possible answer I can imagine would be an example using backbone and the least amount of server-side logic possible (no preferred language/framework there).
Problem
I am planning an javascript/ajax-heavy (backbone + mostly-json backend) application that implements a facetted search. Take for example a facetted search of a simple shoe shop application that lets you filter color, brand and type of shoes and sort by price and size or whatever else.
Assume I am using backbone or a similar framework on the client and a json service as a backend.
What would be a good (tradeoff between effort and result) strategy to achieve seo-friendliness as well as a snappy interface?
Resources
A solution that came to my attention is Hijax by reusing client-sided templates on the server-side, as described here: http://duganchen.ca/single-page-web-app-architecture-done-right
Resources that I digested without final conclusion
http://code.google.com/intl/de-DE/web/ajaxcrawling/
https://stackoverflow.com/a/6194427/818846
http://www.quora.com/Search-Engine-Optimization-SEO/If-I-have-data-that-loads-using-json-JavaScript-will-it-get-indexed-by-Google?q=seo+javascript
The general point in SEO friendliness: It should work without JavaScript.
It's also good for accessibility, so you should do it like this, if the user does not have JavaScript enabled (like the search engine does), it will work.
If he has JavaScript enabled, (like any sane human being does), it will work with all the nifty JavaScript features you've added.
As a general usability rule of thumb: If it works, it should also work without JavaScript!
The solution of your first link sounds right. The main issue of a single page app is that you have to render your templates on both sides, the backend and the frontend. Using the Mustache or google closures template will be good solution for this.
The same solution that was used for google+, where initially the side will be rendered on the server and you load a static html page, after that the page will be rendered on the client side but with the same templates as on the server.
Also remember that the search engines follow links much more often than they (ever?) complete forms.
This problem of enabling the crawlers to see your db contents is called the "dark web," "invisible web", "deep web" or "hidden web". Blog post
So re your problem statement:
a facetted search of a simple shoe shop application that lets you filter color, brand and type of shoes and sort by price and size or whatever else.
I'd suggest that you include searches via a hierarchy of links in addition to searching via forms with select fields.
Eg, on a secondary menu include all the different brands as individual links. Then each link should lead to a list of the products sold by that brand. The trick is arrange things so that the link to an individual shoe will take you back to the first page (the rich one page app) but showing the specific shoe. -- And the page should implement the Google Ajax-crawling recommendations that you reference in the OP.

Accessibility and all these JavaScript frameworks

I've been investigating a few of the JavaScript frameworks such as Backbone.js and Batman.js for a while and whilst I really like them, I have one niggling thing that I keep coming back to. That issue is accessibility.
As a web developer I've always tried to make my websites and applications with accessibility in mind, especially using the idea of progressive enhancement.
Clearly out of the box these new JS frameworks don't gracefully degrade, so I was wondering what are other developers thoughts on this issue and what are you doing about it. After all the accessibility of a website / app isn't really an optional thing as it's part of the law in many countries.
Maybe I'm just being overly zealous on this subject, and not appreciating how far things have come in terms of accessibility.
I use a js-framework (spine.js in my case) in my latest site. Still I make sure that non-js browsers (certainly not over zealous: think SEO) can navigate my site and digest the contents.
As an example I'm going with a search-page with products being shown. Products can be paged, filtered, sorted. Of course this is an example of the generalized idea.
PREREQ: use a template-engine that can both render server-side and client-side. (I use Mustache) . This makes sure you can render models without js- through server-side templating, and render models with js through client-side templating.
Initially: render the products using server-side mustache-template. Also include a 'bootstrapJSON'-object which contains the same products in JSON-format.
Initially: all links (product-detail page, paging, sorting, filtering) are real server-side urls (no hashbang urls)
The end-result is a page which can be navigated 100% with paging, sorting, filtering without the use of JS.
all paging,sorting, filtering urls result in a request to the server, which in turn results in a new set of products being rendered. Nothing special here.
JS-enabled - on domload:
fetch the bootstrapJSON and make product-models from it (use your js-framework features to do this) .
Afterwards rerender the products using the same mustache-template but now doing it client-side. (Again using your js-framework).
Visually nothing should change (after all server-side and client-side rendering was done on same models, with same template), but at least now there's a binding between the client-side model and the view.
transform urls to hashbang-urls. (e.g: /products/#sort-price-asc ) and use your js-framework features to wire the events.
now every (filtering, paging, sorting ) url should result in a client-side state-change, which would probably result in your js-framework doing an ajax-request to the server to return new products (in JSON-format) . Rerendering this again on the client should result in your updated view.
The logic part of the code to handle the ajax-request in 6. on the server-side is 100% identical to the code used in 4. Differentiate between an ajax-call and an ordinary request and spit out the products in JSON or html (using mustache server-side) respectively.
EDIT: UPDTATE JAN 2013
Since this question/answer is getting some reasonable traction I thought I'd share some closely-related aha-moments of the last year:
Spitting out JSON and rendering it client-side with your client-side mvc of choice (steps 6. and 7. above) can be pretty costly cpu-wise. This, of course, is especially apparent on mobile-devices.
I've done some testing to return html-snippets on ajax (using server-side mustache-template rendering) instead of doing the same on the client-side as suggested in my answer above. Depending on your client-device it can be up to 10 times faster (1000ms -> 100ms) , of course your mileage may vary. (practically no code changes needed, since step 7. could already do both)
Of course, when no JSON is returned there's no way for a client-side MVC to build models, manage events, etc. So why keep a clientside MVC at all? To be honest, with even very complex searchpages in hindsight I don't have much use for client-side mvc's at all. The only real benefit to me is that they help to clearly separate out logic on the client, but you should already be doing that on your own imho. Consequently, stripping out client-side MVC is on the todo.
Oh yeah, I traded in Mustache with Hogan (same syntax, a bit more functionality, but most of all extremely performant!) Was able to do so because I switched the backend from java to Node.js (which rocks imho)
Since I'm a visually-impaired user and web developer, I'll chime in here.
These frameworks, in my experience, haven't been a problem provided the appropriate steps are taken with regard to accessibility.
Many screen readers understand JavaScript, and we as developers can improve the experience using things like HTML5's aria-live attribute to alert screen readers that things are changing, and we can use the role attribute to provide additional hints to the screenreaders.
However, the basic principle of web development with JavaScript is that we should develop the underlying site first, without JavaScript, and then use that solid, working, and tested foundation to provide better features. The use of JS should not be required to purchase a product, receive services, or get information. And some users disable JavaScript because it interferes with the way their screenreaders work.
Doing a complete Backbone.js or Knockout site from the ground up without regard for accessibility will result in something akin to "new Twitter" which fails extremely hard with many screenreaders. But Twitter has a solid foundation and so we can use other means to access the platform. Grafting Backbone onto an existing site that has a well-crafted API is quite doable, and an awful lot of fun, too.
So basically, these frameworks themselves are no more of an accessibility issue than jQUery itself - the developer needs to craft a user experience that works for everyone.
Any webpage that requires javascript in order to get the content out of it will likely be met with accessibility-related challenges. The accessibility of JavaScript frameworks is definitely an issue of contention, though really, any web application suffers drawbacks when content is provided dynamically, regardless of the framework used.
There's no silver bullet to ensure your site will be accessible, and I certainly can't account for every JavaScript framework. Here's a few thoughts about how you can prevent your site from being totally inaccessible when using JavaScript:
Follow the guidelines from WCAG 2.0 on client-side scripting, and WCAG 2.0 in general.
Avoid frameworks that require you generate the page's UI, controls and/or content entirely through javascript such as Uki.js, or ones that use their own proprietary markup, like Jo. The closer you can stick with static(-ish), semantic HTML content, the better off you'll be.
Consider using ARIA roles such as role="application" and the aria-live attribute to indicate the areas of your page which are dynamic. More and more aria roles are being supported by assistive devices as time goes by, so using these aria attributes makes sense when you can add them to your app appropriately.
In terms of JS libraries, check their source and see if they output any aria roles. They might not be perfectly accessible, but it would demonstrate they're considering assistive devices.
Wherever possible, treat JavaScript as an enhancement rather than a necessity. Try to provide alternative methods or workflows to accessing the important information that don't require dynamic page updates.
Test and validate your app with your users! Do some user testing sessions with people who use assistive devices or have other difficulties using web software. Nothing will help you prove your site is accessible more than watching real people use it.
The last point is the most important, though many try to escape it. Regardless of the technology, the fact remains that you're developing an application that people will use. No machine or theory will ever be able to perfectly validate your application as being usable, but you're not building it for machines anyway. Right? :)
Chris Blouch (AOL) and Hans Hillen (TPG) had a good presentation on this regarding jQuery, including the work they do in reviewing for accessibility. Making Rich Internet Applications Accessible Through JQuery That and another related presentation on Accessibility of HTML5 and Rich Internet Applications (http://www.paciellogroup.com/training/CSUN2012/) should be of interest to you.
My money is on choosing the most accessible framework: jQuery provides a great deal of graceful degradation or progressive enhancement fallback as well as an overall pretty good focus on accessibility. Also, indirectly I help test and review several systems that leverage jQuery (Drupal public and Intranet websites) such that defects found for accessibility are found and routed back to the project for fixes.

dynamic web forms

I'm developing a web application that allows reports to be written and viewed online. These reports will have the structure of a typical school report or annual employee appraisal report. I would like the user to be able to customise the structure of their report. For example, one school might want a report in the format
Subject Comment Score
-----------------------------
English He sucks 20%
Maths He rocks 88%
Science About average 70%
whereas another might want
Subject Grade
---------------
English A
Maths B
Science C
What I'm looking for is a way for each school to specify the format of their reports - possibly some kind of JavaScript form-building library. Such a library could be used in a page that allows the uses to build a form which would be used as a template for their reports.
As I'll need to process each report submitted on the server-side, I'll need to capture some semantics about each field. For example, it would be great if the user could specify whether the answer to each question on the report should be plain text, a numerical score, a checkbox, radio buttons, etc
Any suggestions about useful technologies for handling such "dynamic" forms would be really appreciated. XForms looks like it might be relevant, but I haven't dug into it too deeply yet.
Cheers,
Don
A very nice XForms based form builder, (LGPL) http://www.orbeon.com/
You can check out their form builder demo here: http://www.orbeon.com/ops/fr/orbeon/builder/summary/
I agree with Jeff Beck's comments and also noticed the following.
You said your target audience is non-technical and all of the solutions above are going to involve learning HTML and a complex template language, possibly a non-starter for your audience.
The solutions above also seem to need more complexity than your problem requires. MooTools, Dojo, etc. seem like overkill. XForms and XSLT even more so. Yes they'll work and give you a lot of extra functionality, but do you need the level of complexity and the issues of debugging/maintainability/training that go with those extra features?
Your regular teacher or business user probably has a basic understanding of how to enter and save files in Excel. If you can teach them how to save in CSV format and upload the form, or even better yet install a macro that will save to CSV and post it to your web site, then that's likely the only training they'll need. To get the semantics you can add a bit more training and have the first row of the report be the column names and the 2nd row be the column type. It's not elegant, but it is easy for possibly tech-challenged users to adopt, as Jeff points out.
On the server side I'd recommend the following stack:
Web server => node.js (perhaps using Chain - github.com/hassox/chain)
Data store => Redis (and node-redis)
Templating => Haml-js (github.com/creationix/haml-js)
CSV parsing => See http://purbayubudi.wordpress.com/2008/11/09/csv-parser-using-javascript/
and make sure to use the fixed version that's in the comments (for quoted commas).
Your more tech savvy users can customize the HAML without you compromising security, and HAML is pretty straightforward with a little training:
this HAML...
%body
.profile
.left.column
#date= print_date()
#address= current_user.address
.right.column
#email= current_user.email
#bio= current_user.bio
produces...
<div class="profile">
<div class="left column">
<div id="date">Thursday, October 8, 2009</div>
<div id="address">Richardson, TX</div>
</div>
<div class="right column">
<div id="email">tim#creationix.com</div>
<div id="bio">Experienced software professional...</div>
</div>
</div>
Pragmatic approach would be using google spreadsheet's feature called forms, (paid) services from wufoo or JotForm.
I would suggest to use:
a wiki engine or a plain-text to HTML converter such as Markdown to allow your users to customize the templates you provide
an HTML templating library to insert the data using the defined template
For the HTML converter, you may use John Gruber's MarkDown (in perl) on the server-side, or a Javascript port by John Fraser, Showdown.
For the HTML templating, there are many Javascript libraries available, depending on your framework of choice:
jTemplates, a plugin for jQuery
the Template API in Prototype
Wojo based on Dojo
tmpl.js for MooTools
I also designed one, bezen.template.js part of the bezen.org Javascript library
I'm in charge of XSLTForms, and it seems like a good candidate for what you want to do.
The possibilities for XSLTForms are superior to those of XForms 1.1 specification : XSLT at client-side, SVG, and others.
Dynamic forms can be developed with XForms and, in case it would not be enough for your application, XSLTForms could integrate necessary extensions.
Should be easy to build in Smalltalk with Seaside. You have a WATableReport with WATableColumns. Just build a simple editor where each school can define those columns.
I'm not sure what javascript or XForms have to do with it. As far as I know XForms is currently dead unless you can prescribe the browser.
i think if the forms are not changing too frequently, you should not provide an system for the non-tech user to mess up with the reports
rather make ur system easy for YOU to add new reports, in this case, customer send you an pdf/excel showing the format they want, and you can quickly come up with a new report
that is was i did for our accounting system which being used for several clinics, we also charge a nominal fee for each report changing (to prevent user mindlessly changing the system)

Reflective Web Application (WebIDE)

Preamble
So, this question has already been answered, but as it was my first question for this project, I'm going to continue to reference it in other questions I ask for this project.
For anyone who came from another question, here is the basic idea: Create a web app that can make it much easier to create other web applications or websites. To do this, you would basically create a modular site with "widgets" and then combine them into the final display pages. Each widget would likely have its own set of functions combined in a Class if you use Prototype or .prototype.fn otherwise.
Currently
I am working on getting the basics down: editing CSS, creating user JavaScript functions and dynamically finding their names/inputs, and other critical technical aspects of the project. Soon I will create a rough timeline of the features I wish to create. Soon after I do this, I intent to create a Blog of sorts to keep everyone informed of the project's status.
Original Question
Hello all, I am currently trying to formalize an idea I have for a personal project (which may turn into a professional one later on). The concept is a reflective web application. In other words, a web application that can build other web applications and is actively used to build and improve itself. Think of it as sort of a webapp IDE for creating webapps.
So before I start explaining it further, my question to all of you is this: What do you think would be some of the hardest challenges along the way and where would be the best place to start?
Now let me try to explain some of the aspects of this concept briefly here. I want this application to be as close to a WYSIWYG as possible, in that you have a display area which shows all or part of the website as it would appear. You should be free to browse it to get to the areas you want to work on and use a JavaScript debugger/console to ask "what would happen if...?" questions.
I intend for the webapps to be built up via components. In other words, the result would be a very modular webapp so that you can tweak things on a small or large scale with a fair amount of ease (generally it should be better than hand coding everything in <insert editor of choice>).
Once the website/webapp is done, this webapp should be able to produce all the code necessary to install and run the created website/webapp (so CSS, JavaScript, PHP, and PHP installer for the database).
Here are the few major challenges I've come up with so far:
Changing CSS on the fly
Implementing reflection in JavaScript
Accurate and brief DOM tree viewer
Allowing users to choose JavaScript libraries (i.e. Prototype, jQuery, Dojo, extJS, etc.)
Any other comments and suggestions are also welcome.
Edit 1: I really like the idea of AppJet and I will check it out in detail when I get the time this weekend. However, my only concern is that this is supposed to create code that can go onto others webservers, so while AppJet might be a great way for me to develop this app more rapidly, I still think I will have to generate PHP code for my users to put on their servers.
Also, when I feel this is ready for beta testers, I will certainly release it for free for everyone on this site. But I was thinking that out of beta I should follow a scheme similar to that of git: Free for open source apps, costs money for private/proprietary apps.
Conceptually, you would be building widgets, a widget factory, and a factory making factory.
So, you would have to find all the different types of interactions that could be possible in making a widget, between widgets, within a factory, and between multiple widget making factories to get an idea.
Something to keep on top of how far would be too far to abstract?
**I think you would need to be able to abstract a few layers completely for the application space itself. Then you'd have to build some management tool for it all. **
- Presentation, Workflow and the Data tier.
Presentation: You are either receiving feedback, or putting in input. Usually as a result of clicking, or entering something. A simple example is making dynamic web forms in a database. What would you have to store in a database about where it comes/goes from? This would probably make up the presentation layer. This would probably be the best exercise to start with to get a feel for what you may need to go with.
Workflow: it would be wise to build a simple workflow engine. I built one modeled on Windows Workflow that I had up and running in 2 days. It could set the initial event that should be run, etc. From a designer perspective, I would imagine a visio type program to link these events. The events in the workflow would then drive the presentation tier.
Data: You would have to store the data about the application as much as the data in the application. So, form, event, data structures could possibly be done by storing xml docs depending on whether you need to work with any of the data in the forms or not. The data of the application could also be stored in empty xml templates that you fill in, or in actual tables. At that point you'd have to create a table creation routine that would maintain a table for an app to the spec. Google has something like this with their google DB online.
Hope that helps. Share what you end up coming up with.
Why use PHP?
Appjet does something really similar using 100% Javascript on the client and server side with rhino.
This makes it easier for programmers to use your service, and easier for you to deploy. In fact even their data storage technique uses Javascript (simple native objects), which is a really powerful idea.

Categories

Resources