Tesseract in a specific information

Tesseract in a specific information - javascript

I want to scan a Spanish DNI ang get some information and print it in the screen. A DNI has this form: 1
And i want to take the fields DNI, Nombre and Apellidos (in the image, it would be 99999999R, CARMEN, ESPAÑOLA ESPAÑOLA).
I thought that the best way is using "cut tool" and use the OCR in the cut images. What do you think? I have to make the project in HTML/JS and I don't really know how to program this.
Thanks.

This is not an easy task and to do it, you need to do the following:
Make sure you "cut" the image precisely around the borders. This method needs to be robust to lightning conditions, low contrast situations, etc. Ideally, it should use advanced computer vision and ML techniques
Then you need to define where the individual fields are. This is also not an easy task, because the sizes and positions of the fields vary between different IDs.
In the final step, you need to have a very reliable OCR tool, one which would give you a low error rate, so that you actually have a benefit of doing this automatically, compared to just retyping all these fields manually. Although OCR seems like an easy problem today, it's still very hard, especially on ID documents which can be worn out and damaged and taken in weird lighting conditions.
My company Microblink has spent years working on ID scanning, not just for Spanish DNIs, but also for many other document types (there are more than 5000 different types in the world).
If you are interested in reading how we're doing it, here are some of the materials:
Goodbye Templates
BlinkID v5
From OCR to DeepOCR
As for the "cut tool" - we do have a feature that allows you to automatically capture the image of a document and crop it around the edges of the document. We call it "Document capture" and it's a part of our BlinkID SDK.
As for the HTML/JS - it's not clear what exactly you need, but we do have a React Native and Cordova plugins which allow you to build cross-platform mobile apps in JS, and we also have a Frontend SDK and Web API which allow you to scan documents in any browser.

Related

OCR a scanned file and retrieve the metadata

I am using Alfresco community 6.1.
I have thousands of invoices to scan, OCR them (near 100% recognition) and retrieve the needed metadata (Partner, Invoice Number, Amount, Units,Currency,...).(All of this in Alfresco)
Based on these metadata retrieved i need to do some operations on the invoices ( Move them to appropriate folders, apply some workflows...).
As a first approche:
For the OCR I used Alfresco Simple OCR Action, but the result is not very accurate (far from 100%).
For retrieving the results I convert the PDF OCRed to a plain text file and then i search it's content using javascript with document.content ... But since the OCR is not accurate i can't tell if it's the best solution to search inside the document.
So my questions are :
How can I make the OCR results more accurate?
How to retrieve important data from the invoice? is the method i'm using good enough or very poor for such processing?
Im using pdfsandwich, and my alfresco-global.properties is:
ocr.command=/usr/bin/pdfsandwich
ocr.output.verbose=true
ocr.output.file.prefix.command=-o
ocr.extra.commands=-verbose -lang eng
ocr.server.os=linux

I'm afraid this question is off topic: https://stackoverflow.com/help/on-topic
Some input anyway:
I highly recommend to do all the ocr/classification/extraction outside / before storing the pdfs in Alfresco
The technical term for what you're looking for is: Document Capture
If you really expect to classify your scanned docs and to extract the data for inbound documents (which you can't control in structure) the solutions are quite expensive and licensed per pages/period. Market leaders are Kofax and Abbyy in that area.
If you can control the document structure / if the structure of the document is fix you could use quite cheaper solutions which use something like a dynamic template approach (depending on found ancor points, barcodes, regex matches). We use PDFmdx for this to automate qualified extraction.
Everything depends on the OCR quality. My personal opinion: the free/open source ocr components can't compete with the commercial solutions if you don't have the time, exprtise and resources to train and optimize them. Abbyy has a quite affordable CLI solution for linux (ABBYY FineReader Engine CLI for Linux) but I'm sure there are others with similar results.
There is a quite nice and simple solution called AutoOCR which is a REST-/SOAP-Service providing a generic, configurable interface to use several ocr engines and configurations as a service. We implemented an Alfresco integration to act as an Alfresco Transformer but since the Alfresco Transformer framework is deprecated I'd recommend to do the whole ocr and recognition stuff before storing the documents in Alfresco
Finally: if it is a one time approach: Try to find a service provider doing at least the ocr and maybe also the classification/extraction.

To answer your questions.
To improve OCR results you need to pre-process image. That includes noise removal, line removal, thresholding, etc. But none of them helps if the engine is not working precisely. Tesseract from version 4.0.0 is working well enough for most applications.
Your approach may work in some cases but it will not work great on a large set of invoices. I suggest using some of the invoice data extraction services. In that case, you don't need to worry about preprocessing and extraction itself. You could use:
typless
Klippa
Taggun
Using such a service can save you a lot of headaches and time.
Disclaimer: I am one of the creators of typless. Feel free to suggest edits.

R & Javascript callbacks

I'm writing a UI to my R script, which asks the user some names of organisms and the location of a folder, using javascript/html that will be local (not hosted, ever).
At the moment, I have just that: a couple of text boxes that take input and pass an executable R script. Originally this UI was being written as a very user friendly option, but slowly I've realized that some nifty tricks can be added such as a textbox that completes the word for the user (so if the user misspells the name of the organism, the UI will correct the input based on the files uploaded. And this would come from a list of organisms text file that R would generate immediately once the files have been added).
Is there a way to make this more efficient? For example, retrieving plots from R (as .pngs) and updating my local webpage and being able to share a log file between R and the UI (mind you, I am aware of the potential File I/O errors)..but for the sake of brainstorming.
I'm aware of Shiny, but what I would like is a simple local UI, as I will be dealing with big data (average ~ 1 gigabyte worth of files that my script will process).
Another way to ask my question that is more to the point:
Here's an example of integrating PHP and R: http://www.r-bloggers.com/integrating-php-and-r/
I am looking to create something similar with javascript/css/html/jquery etc.
Thanks

You could definitely use nodejs (nodejs.org) for that. Take a look at https://github.com/elijah/r-node and r-node. Confusingly enough, this is two different projects with the same name. More info on the latter here: squirelove.net/r-node/doku.php
In recent years JavaScript has become one of the fastest programming languages. In one case I know of, JavaScript is faster than C++. See: benchmarksgame.alioth.debian.org/u32/performance.php?test=regexdna
Bear in mind, though, that memory is very difficult to manage in JavaScript, so you should run some sort of memory leak detection program on your code, if you plan to create long running processes.
E.I: memwatch (npmjs.org/package/memwatch) or nodeheap (npmjs.org/package/memwatch)
Good luck with your endeavors!
PS. sorry for the lack of real links. I'm apparently not allowed to post more than 2 links.

Why wouldn't you be able to use Shiny locally? You design your app on your computer and run it locally with runApp('myapp') from an R-prompt. Unless you are experienced with javascript I would give shiny another look: http://www.rstudio.com/shiny/
The example you linked to can be very easily implemented using Shiny. See link below for a tutorial on how to write the app:
http://rstudio.github.com/shiny/tutorial/#hello-shiny
To run that example locally:
install.packages('shiny')
shiny::runExample('01_hello')

I have a similar case, and shiny looked like a good idea to me. However, after I did a few first steps, I am no longer sure about this. Note that most of the examples use shiny to display results. When you get into editing some fields and using a database, things can become messy; the reactive-ness gets in the way once fields can be change by program and by the user.
As an example see https://gist.github.com/dmenne/4721235/edit. The main problem for the current state of shiny is that you must use the dynamic UI for this type of work, which kills any separation of ui and server because you have to create the ui elements in the server.
shiny is a great idea, but for anything larger with interaction it is too early now. Knowing that the amazing RStudio team is behind it, I am sure the stress should be on now.
What else is there around to make user interfaces for R? TclTk makes me shudder. I working in c# a lot, and I had been using R(D)COM for interfacing some years ago, but gave up after installation and licensing problems. There is R.DOTNet which works better now; it is the most hazzle-free installation-wise, but it is not a very active project, and tends to crash. Interfacing via RServe/RServeCLI is stable, but is too difficult to install on Windows, for example on hospital computers with their strict security issues.
And there is Qt. With the active RInside community, it would be a good choice and the interface is great. I wish however my programming skills were at the level of the RStudio-guys. The fact that even Dirk is one the proof-of-concept level (using rinside with qt in windows) is not encouraging.

Is there a way to write javascript interface requirements?

I'm the leader of my development team, consisting of:
Back end developers (PHP and API's /Frameworks, CMS backend, server side technologies)
Front-end developers (JavaScript with APIs, client side technologies)
HTML/CSS Ninjas that work closely with the graphics ppl
other staff related to development process
Lately, projects have many requirements related to user interface, lots of them comming from the graphics / creativity ppl and the user itself. The requirements sound like "when the user hovers the logo, the letters should scatter in horizontal opposite directions, and the logo should fade-in while moving up in a smooth movement animation where the speed decreases while the logo reaches the target". It's my duty to document these requirements to send them to the front-end developers.
I was wondering if there's any way to document such things in a way that's good for everyone. Lately, describing animations and such has been a pain, and the documents are good for nothing.
My enterprise is in a position where the creativity staff and the javascript staff can communicate with each other directly, but we are having trouble monitoring the process, estimating times / effort and filling up metrics.
Can any one give some idea to document such things? I'm sure it's not only happening to us... I'm loooking for an organized / structured way to make a document / whatever that I can give the javascript ppl (along with the HTML /CSS that make up the web page), that they will surely understand without even asking the creativity ppl directly, allowing them to start working immediately without further communications.

How about create primitive mockups of the intended animations, say, in PowerPoint or flash? The creativity team wouldn't need any programming skills to use those, and nothing brings graphical stuff better across then graphics. Even pictures showing the desired start and end state of animation, maybe with supporting arrows etc., would probably be much more helpful than text (or, at least, text itself).

Saving Div Content As Image On Server

I have been learning a bit of jQuery and .Net in VB. I have created a product customize tool of sorts that basically layers up divs and add's text, images etc on top of a tshirt.
I'm stuck on an important stage!
I need to be able to convert the content of the div that wraps all these divs of text and images to one flat image taking into account any CSS that has been applied to it also.
I have heard of things that I could use to screen capture the content of a browser on the server which could be possible for low res thumbs etc, but it sounds a little troublesome! and it would really be nice to create an image of high res.
I have also heard to converting the html to html5 canvas then writing that out... but looks too complicated for me to fathom and browser support is an issue.
Is this possible in .NET?
Perhaps something with javascript could be done?
Any help or guidance in the correct direction would be appreciated!
EDIT:
I'm thinking perhaps I could do with two solutions for this. Ideally I would end up with a normal res jpg/png etc for displaying on the website, But also a print ready high res file would be very desirable as well.
PostScript Printer - I have heard of it but I'm struggling to find a good resource to understand it for a beginner (especially with wiki black out). Perhaps I could create a html page from my div content and send it to print to a EPS file. Anyone know any good tutorials for this?

We did this... about 10 years ago. Interestingly, the tech available really hasn't changed too much.
update - Best Answer
Spreadshirt licenses their product: http://blog.spreadshirt.net/uk/2007/11/27/everyones-a-designer-free-designers-for-premium-partners/
Just license it. Don't do this yourself, unless you have real graphics manipulating and print production experience. I'd say in today's world you're looking at somewhere around 4,000 to 5,000 hours of dev time to duplicate what they did... And that's if you have two top tier people working on it.
Short answer: you can't do it in html.
Slightly longer answer:
It doesn't work in part because you can't screen cap the client side and get the level of resolution needed for production type printing. Modern screen resolution is usually on the order of 100 ppi. For a decent print you really need something between 3 and 6 times that density. Otherwise you'll have lots of pixelation and it will generally look like crap when it comes out.
A different Answer:
Your best bet is to leverage something like SVG (scalable vector graphics) and provide a type of drawing surface to the browser. There are several ways of doing this using Flash (Spreadshirt.com uses this) or Silverlight (not recommended). We used flash and it was pretty good.
You might be able to get away with using HTML 5. Regardless, whatever path you pick is going to be complicated.
Once the user is happy with their drawing and wants to print it out, you create the final file and run a process to convert it to Postscript or whatever format your t-shirt provider needs. The converter (aka RIP software) is going to either take a long time to develop or cost a bunch of money... pick one. (helpful hint: buy it. Back then, we spent around $20k US and it was far cheaper than trying to develop).
Of course, this ignores issues such as color matching and calibration. This was actually our primary problem. Everyone's monitor is slightly different and what looks like red on one machine is pink on another.
And for a little background, we were doing customized wrapping paper. The user added text, selected images from our library or uploaded their own, and picked a pattern. Our prints came out on large-format HP Inkjet printers (36" and 60" wide). Ultimately we spent between $200k and $300k just on dev resources to make it happen... and it did, unfortunately, the price point we had to sell at was too high for the market.

If you can use some server-side tool, check phantomjs. This is a headless webkit browser (with no gui) which can take a page's screenshot, an uses a javascript api. It should do the trick.

Send the whole div with user generated content back to server using ajax call.
Generate an HTML Document on server using 'HtmlTextWriter' class.
Then you can convert that HTML file using external tools like
(1) http://www.officeconvert.com/products_website_to_image.htm#easyhtmlsnapshot
(2) http://html-to-image.acasystems.com/faq-html-to-picture.htm
which are not free tools, but you can use them by creating new Process on server.

The best option I came across is wkhtmltopdf. It comes with a tool called wkhtmltoimage. It uses QtWebKit (A Qt port of the WebKit rendering engine) to render a web page, and converts the result to PDF or image format of your choice, all done at server side.
Because it uses WebKit, it renders everything (images, css and even javascript) just like a modern browser does. In my use case, the results have been very satisfying and are almost identical to what browsers would render.
To start, you may want to look at how to run external tools in .NET:
Execute an external EXE with C#.NET

Drawing an Interactive Diagram

My boss wants to draw the local network and then, if you click on one of the computers or roll the mouse over one, he wants to see stuff like RAM, CPU, OS, etc. This has to be done in a browser, more specifically, the intranet's wiki.
One of my coworkers suggested using flash (I am a complete noob but I assume ActionScript is what would be used?) and I think it could also be done in javascript but I dunno. Not sure what would be better.
He wants it to be extensible if possible, so adding another computer later or editing values shouldn't be too hard, though the topology shouldn't change very often. This may be up to me to code a separated way to edit it though, I dunno.
Any thoughts/recommendations?

Up until one of the recent versions of Visio (2003?) there was a very useful network discovery tool that would build the diagram. Then using the Save As HTML option there are a number of different ways to build the clickable diagram.
I'd imagine other network diagrammers can do the same.
This is the easiest way I've ever found to do what you want. The only downside I'm aware of is that the Visio discovery will send a significant number of packets; it can flood a network. However in my opinion if your network is that susceptable to load you want to know ASAP. Preferably with a job you can stop and start at will.
Don;t forget that any process you arrive at should be able to rebuild your diagram regularly, e.g. once a week or a month.

The technology you want might be called an HTML Image Map.

It might be cost-effective to purchase a monitoring system.
OpManager is pretty easy to use and provides the graphing capabilities.
Paessler is less expensive, still details, but doesn't have the graphing capabilities.

Develop Reference

JavaScript is the programming language of the Web.