The document understanding research community has traditionally focused much of its efforts on methods and algorithms for automated document segmentation, recognition and classification. However, these methods and algorithms usually tend to be developed as the focus of highly specific research projects. Few initiatives have tried to combine them into a generic, modular framework. Moreover, fully automatic document processing is still restricted to only a few specific applications, and the cost of manual post-corrections is often underestimated.
During my stay at the department of Informatics of the University of Fribourg, I worked on the design of a modular framework for document understanding. Our hope was that this framework would benefit to both researchers and end-users by supporting the rapid prototyping and incremental development of document processing applications. Instead of being designed to replace the human, we wanted it to be cooperative, allowing the users and the system to work together. We also wanted to take advantage of XML and Web technologies (i.e. URLs, HTTP and HTML) to make this framework as open and flexible as possible.
The result of this work is a framework named Edelweiss and described in . A few screenshots are also available.