moz-hocr-edit
moz-hocr-edit provides a line-by-line interface for people to proofread the results of the Optical Character Recognition (OCR) process. OCR programs are not perfect at recognizing text, so human editing is often necessary.
END OF DEVELOPMENT: As of 2016, this program has been discontinued and is no longer maintained.
Screenshot
Install
Latest version: moz-hocr-edit 0.4.3 (released 2010-09-16)
moz-hocr-edit is a Firefox extension, and as such it must be installed in the Firefox web browser. From the user's perspective, it operates like an ordinary web application, except it runs on your local machine and can save files anywhere you want.
Note: In 2011, Firefox switched to a (much criticized) six-week release cycle. I do not myself test this add-on with each version of Firefox, so please let me know if you experience any issues.
License
moz-hocr-edit is free software, available under the same tri-license as Mozilla itself.
Getting Started
moz-hocr-edit can edit local hOCR files (as generated by OCRopus), and it can edit files over HTTP if the remote server supports HTTP PUT. To edit a document, browse to its location in Firefox; then click on "hOCR" in the add-on bar (bottom right corner of Firefox), and choose "Edit this hOCR document." This will launch the editing system in a new tab. In recent versions of Firefox, the add-on bar is hidden by default. If you do not see the add-on bar, click on the menu: View → Toolbars → Add-on Bar.
Sample Documents
moz-hocr-edit is being used to proofread Concerning Beards, a 1930 book by Edwin Valentine Mitchell. The hOCR file of this book makes excellent sample material for editing. You can open this document in moz-hocr-edit, but you can only save it to your local filesystem since you probably don't have write access to the subversion repository in which it is hosted.
If you are trying to set up your own subversion repository for remote editing, the key step is to enable autoversioning.
Feedback
I am very interested in feedback or suggestions for the program. Is it easy to use? Does it increase your productivity? Have you found any bugs? What else should a future version of moz-hocr-edit do?
Please send any feedback to moz-hocr-edit@googlegroups.com. Although you need not subscribe to send a message, you are encouraged to join the group.
Hacking
The XPI download includes all source code for the latest release. If you'd like to live on the bleeding edge, you can clone the project's git repository:
$ git clone https://github.com/garrison/moz-hocr-edit.git
Information about setting up a development environment within Firefox is available at developer.mozilla.org.
You can also browse the code on github.
Links
- hOCR format – public specification of the hOCR file format
- OCRopus – OCR software
- hocr-tools – tools for manipulating hOCR documents
- hOCR discussion group – also includes links to other hOCR software
- BookLiberator – hardware + software for digitizing your books
jimgarrison.org