OCR detection of checkboxes from PDF files

Abstract

Automatic information extraction from scanned images is of great help for many fields such as medicine, computer science, which can be exam sheets, disease cards, etc... In this dissertation, we propose an algorithm to detect the positions of checkboxes and their values (checked, unchecked) using deep learning with other techniques such as OCR. First, we convert the pdf file into images representing each page of the pdf file, then load the image into our algorithm and detect the regions of the checkboxes using OCR. After that, we crop these regions into smaller images for use in the classification part, where we use deep learning techniques to classify these cropped images into the appropriate classes.

Description

Citation

Collections

Endorsement

Review

Supplemented By

Referenced By