A method is proposed for detecting a document in which image data are recorded by means of a camera, in which filtered picture data are determined by a first processing unit on the basis of the recorded image data, and a camera picture is stored by a second processing unit on the basis of the filtered picture data if a stability criterion is fulfilled. Also specified correspondingly are a device, computer program product and storage medium.
Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.
1. A method for detecting a document, the method comprising: recording image data by a camera, wherein the image data comprises an image stream at a first resolution; determining, by a first processing unit, filtered picture data based, at least in part, on the image data at the first resolution, wherein the first processing unit is a graphics processing unit; determining document boundaries based at least in part on the filtered picture data; determining a frame based on the document boundaries; determining whether at least one of the frame or the image data is stable; and based on a determination that the frame or the image data is stable, storing a camera picture in a second resolution, by a second processing unit, wherein the camera picture is based, at least in part, on the filtered picture data, wherein the second resolution is higher than the first resolution.
A method for detecting a document using a camera involves capturing an image stream at a first (lower) resolution. A graphics processing unit (GPU) filters this image data. Document boundaries are then determined based on the filtered data, and a frame is defined around these boundaries. The system checks if either the frame or the original image data is stable (not blurry or shaking). If stable, the system stores a higher-resolution camera picture, based on the filtered data, using a second processing unit (e.g., CPU). This results in a clear, high-resolution image of the document.
2. The method of claim 1 , wherein the second processing unit comprises a central processor unit.
The document detection method uses a central processing unit (CPU) as the second processing unit that stores the final high-resolution image, after initial image processing is done by a graphics processing unit (GPU) as described in the document detection method involving capturing an image stream at a first (lower) resolution.
3. The method of claim 1 , wherein the camera, the first graphics processing unit and the second processing unit are part of a portable terminal.
In the document detection method involving capturing an image stream at a first (lower) resolution, the camera, graphics processing unit (GPU), and central processing unit (CPU) are all integrated into a portable device like a smartphone or tablet, enabling on-the-go document scanning.
4. The method of claim 1 , wherein the first resolution corresponds to the resolution of a display unit.
The document detection method uses a first resolution (for initial image processing by a graphics processing unit (GPU) as described in the document detection method involving capturing an image stream at a first (lower) resolution) that matches the resolution of the device's display screen. This allows for real-time preview and framing of the document before capturing the high-resolution image.
5. The method of claim 1 , wherein the following steps are performed with the aid of the first processing unit: performing a first filtering operation in accordance with a Canny algorithm to generate a Canny filtered texture; transmitting the Canny filtered texture to the second processing unit; performing a Hough transformation based, at least in part, on stored coordinates provided by the second processing unit; performing a second filtering operation based, at least in part, on the Hough transformation to generate a Hough filtered texture; and passing the Hough filtered texture to the second processing unit.
As part of the document detection method using a camera, the graphics processing unit (GPU) performs several filtering steps: First, it applies a Canny edge detection algorithm to create a Canny filtered texture. This texture is sent to the CPU. A Hough transform is then performed, using stored coordinates from the CPU. Finally, a second filtering operation based on the Hough transform generates a Hough filtered texture, which is then also passed to the CPU.
6. The method of claim 5 , further comprising determining, by the second processing unit, the stored coordinates by determining pixels which lie on an edge, and storing coordinates of said pixels.
Within the document detection method, the CPU determines the coordinates used in the Hough transform by identifying and storing the coordinates of pixels that lie on edges detected in the image. The graphics processing unit (GPU) performs several filtering steps: First, it applies a Canny edge detection algorithm to create a Canny filtered texture. This texture is sent to the CPU. A Hough transform is then performed, using stored coordinates from the CPU. Finally, a second filtering operation based on the Hough transform generates a Hough filtered texture, which is then also passed to the CPU.
7. The method of claim 5 , wherein, with the aid of the second filtering operation, pixels are filtered out in a Hough space where less than a threshold number of points lie on a line.
During the Hough transform filtering step in the document detection method, pixels in the Hough space are removed if they correspond to lines with fewer points than a specified threshold. The graphics processing unit (GPU) performs several filtering steps: First, it applies a Canny edge detection algorithm to create a Canny filtered texture. This texture is sent to the CPU. A Hough transform is then performed, using stored coordinates from the CPU. Finally, a second filtering operation based on the Hough transform generates a Hough filtered texture, which is then also passed to the CPU.
8. The method of claim 5 further comprising, with the aid of the filtered picture data from the second processing unit, superimposing the document boundaries, in the form of the frame, on the image data displayed on a display unit.
After the CPU processes the filtered image data in the document detection method, the detected document boundaries (represented as a frame) are superimposed onto the original image being displayed on the device's screen. The graphics processing unit (GPU) performs several filtering steps: First, it applies a Canny edge detection algorithm to create a Canny filtered texture. This texture is sent to the CPU. A Hough transform is then performed, using stored coordinates from the CPU. Finally, a second filtering operation based on the Hough transform generates a Hough filtered texture, which is then also passed to the CPU.
9. The method of claim 1 , wherein determining whether the frame and/or the image data are stable comprises at least one of: analyzing the image data over a prescribed period and determining if a change in the image data is below a prescribed threshold value; analyzing the frame over a prescribed period and determining if a change in the frame is below a prescribed threshold value; determining when a shaky hand movement occurs; and determining when the frame is around a document or around a specific region of the document.
The stability check in the document detection method involves analyzing the image data and/or the frame over a period of time. This includes: checking if the changes in image data are below a certain threshold, checking if the changes in frame position are below a threshold, detecting shaky hand movements, or verifying the frame consistently surrounds the document or a specific region of the document. This determines if the image is steady enough to capture.
10. The method of claim 8 further comprising determining a transformed image by carrying out, based at least in part on the stored coordinates, a perspective transformation of the camera picture.
In the document detection method, after capturing a stable image and identifying document boundaries, a perspective transformation is applied to the captured image, using the stored coordinates of the document's corners. This corrects for any distortion caused by the camera angle and creates a straightened, top-down view of the document. After the CPU processes the filtered image data in the document detection method, the detected document boundaries (represented as a frame) are superimposed onto the original image being displayed on the device's screen.
11. The method of claim 10 , wherein the transformed image is further processed by means of character recognition.
The transformed image, after perspective correction, is further processed using optical character recognition (OCR) to extract the text content from the document image in the document detection method, after capturing a stable image and identifying document boundaries, and applying perspective transformation to the captured image, using the stored coordinates of the document's corners.
12. The method of claim 11 , wherein the character recognition is carried out at least partially on the first processing unit or the second processing unit.
The character recognition step in the document detection method can be performed either on the graphics processing unit (GPU) or the CPU of the device, or a combination of both. The transformed image, after perspective correction, is further processed using optical character recognition (OCR) to extract the text content from the document image in the document detection method, after capturing a stable image and identifying document boundaries, and applying perspective transformation to the captured image, using the stored coordinates of the document's corners.
13. The method of claim 11 , wherein the character recognition is carried out at least partially on an external processing unit which is coupled at least temporarily via a wireless or hardwired communication link.
In the document detection method, the character recognition can also be offloaded to an external processing unit (e.g., a remote server) connected wirelessly (e.g., Wi-Fi) or through a wired connection. The transformed image, after perspective correction, is further processed using optical character recognition (OCR) to extract the text content from the document image in the document detection method, after capturing a stable image and identifying document boundaries, and applying perspective transformation to the captured image, using the stored coordinates of the document's corners.
14. A device for detecting a document, the device comprising: a camera having a first processing unit and a second processing unit, wherein the first processing unit is a graphics processing unit, wherein the camera is configured to: record image data, wherein the image data comprises an image stream at a first resolution; determine, by the first processing unit, filtered picture data, based, at least in part, on the recorded image data at the first resolution; and determine, by the second processing unit, document boundaries based at least in part on the filtered picture data; determine, by the second processing unit, a frame based on the document boundaries; determine, by the second processing unit, whether at least one of the frame or the image data is stable; and based on a determination that the frame or the image data is stable, store a camera picture at a second resolution, by the second processing unit, wherein the camera picture is based, at least in part, on the filtered picture data if a stability criterion is fulfilled, wherein the second resolution is higher than the first resolution.
A device for detecting documents includes a camera with a graphics processing unit (GPU) and a central processing unit (CPU). The camera captures an image stream at a lower resolution. The GPU filters the image data, and the CPU determines the document boundaries and creates a frame around them. The CPU checks for stability of the frame or image data. If stable, the CPU stores a higher-resolution picture, based on the filtered data, resulting in a clear, high-resolution image of the document.
15. The device of claim 14 , wherein the second processing unit comprises a central processor unit.
The document detection device uses a central processing unit (CPU) as the second processing unit that stores the final high-resolution image, after initial image processing is done by a graphics processing unit (GPU) as described in the document detection device including a camera with a graphics processing unit (GPU) and a central processing unit (CPU).
16. The device of claim 14 , wherein the device is a portable or mobile device, in particular a tablet computer or a smartphone, with a wireless or hardwired communication interface.
The document detection device is a portable device, such as a tablet or smartphone, equipped with wireless or wired communication capabilities for sharing the captured and processed document images. The document detection device includes a camera with a graphics processing unit (GPU) and a central processing unit (CPU).
17. The device of claim 14 , wherein the first graphics processing unit and/or the second processing unit are/is configured to perform character recognition on the picture.
The document detection device's graphics processing unit (GPU) or central processing unit (CPU), or both, are configured to perform character recognition (OCR) on the captured document image, extracting the text content directly on the device. The document detection device includes a camera with a graphics processing unit (GPU) and a central processing unit (CPU).
18. The device of claim 14 , wherein the device is configured to transmit the picture to a different device, it being possible for the different device to carry out document processing on the picture.
The document detection device can transmit the captured document image to another device for further processing, such as more advanced OCR or document management tasks. The document detection device includes a camera with a graphics processing unit (GPU) and a central processing unit (CPU).
19. The device of claim 14 , wherein the device is a portable or mobile device, including a tablet computer or a smartphone, having a wireless or hardwired communication interface.
The document detection device is a portable device, such as a tablet or smartphone, equipped with wireless or wired communication capabilities for sharing the captured and processed document images. The document detection device includes a camera with a graphics processing unit (GPU) and a central processing unit (CPU).
20. A non-transitory computer-readable storage medium storing computer executable instructions that, when executed by a computer, configure the computer to perform operations comprising: recording image data by a camera, wherein the image data comprises an image stream at a first resolution; determining, by a first processing unit, filtered picture data based, at least in part, on the recorded image data at the first resolution, wherein the first processing unit is a graphics processing unit; determining, by the second processing unit, document boundaries based at least in part on the filtered picture data; determining, by the second processing unit, a frame based on the document boundaries; determining, by the second processing unit, whether at least one of the frame or the image data is stable; and based on a determination that the frame or the image data is stable, storing a camera picture in a second resolution, by a second processing unit, wherein the camera picture is based, at least in part, on the filtered picture data, wherein the second resolution is higher than the first resolution.
A non-transitory computer-readable storage medium stores instructions for document detection. When executed, these instructions cause a computer to capture an image stream from a camera at a lower resolution. A graphics processing unit (GPU) filters this image data. A CPU then determines the document boundaries and creates a frame around them. The CPU checks for stability of the frame or image data. If stable, the CPU stores a higher-resolution picture, based on the filtered data.
21. The method of claim 1 , wherein the stability criterion comprises: determining coordinates of the corners of the document based on the image data; and recognizing the image data as stable if the coordinates of the corners of the document move by less than a prescribed threshold value during a predetermined duration.
The stability criterion for capturing an image, as part of the document detection method, involves tracking the coordinates of the document's corners. The system recognizes the image as stable if the movement of these corner coordinates remains below a specified threshold value over a set period. The document detection method involves capturing an image stream at a first (lower) resolution.
22. The method of claim 21 , wherein the coordinates of the corners are processed with the aid of a filter algorithm.
The coordinates of the corners, tracked to determine image stability in the document detection method, are processed using a filter algorithm to smooth out any noisy measurements and improve accuracy. The stability criterion for capturing an image, as part of the document detection method, involves tracking the coordinates of the document's corners.
23. The method of claim 22 , wherein the filter algorithm uses a lowpass filter to reduce or eliminate slight rapid changes in coordinates.
The filter algorithm used to process corner coordinates in the stability determination of the document detection method is a low-pass filter, which reduces or eliminates minor, rapid changes in the coordinates, providing a more stable and reliable stability assessment. The coordinates of the corners, tracked to determine image stability in the document detection method, are processed using a filter algorithm to smooth out any noisy measurements and improve accuracy.
Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.
April 1, 2014
November 21, 2017
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.