Since distortions in camera-captured document images significantly affect the accuracy of optical character recognition (OCR), distortion removal plays a critical role for document digitalization systems using a camera for image capturing. This paper proposes a novel framework that performs three-dimensional (3D) reconstruction and rectification of camera-captured document images. While most existing methods rely on additional calibrated hardware or multiple images to recover the 3D shape of a document page, or make a simple but not always valid assumption on the corresponding 3D shape, our framework is more flexible and practical since it only requires a single input image and is able to handle a general locally smooth document surface. The main contributions of this paper include a new iterative refinement scheme for baseline fitting from connected components of text line, an efficient discrete vertical text direction estimation algorithm based on convex hull projection profile analysis, and a 2D distortion grid construction method based on text direction function estimation using 3D regularization. In order to examine the performance of our proposed method, both qualitative and quantitative evaluation and comparison with several recent methods are conducted in our experiments. The experimental results demonstrate that the proposed method outperforms relevant approaches for camera-captured document image rectification, in terms of improvements on both visual distortion removal and OCR accuracy.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1364/JOSAA.33.002089 | DOI Listing |
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!