Development of an Automated PDF Document Routing System Integrating OCR and Employee Database Matching for Targeted Personnel Notification
Main Article Content
Abstract
In the digital era, electronic document management plays a pivotal role in organizations, particularly with the widespread adoption of the Portable Document Format (PDF). However, a significant portion of administrative documents consists of scanned files (image-based PDFs), rendering their text content unsearchable and uncopiable directly. This limitation necessitates manual processing for screening and routing documents to relevant personnel, consuming substantial human resources and time. This research proposes the development of an automated system to streamline the document screening and routing process. The system integrates Optical Character Recognition (OCR) technology using Tesseract software combined with Regular Expressions to extract Thai names from documents. Furthermore, String Similarity Matching algorithms are employed in conjunction with an employee database within a Directory Service system to accurately identify individuals and dispatch specific files via email. The system is developed based on a Microservices architecture to ensure scalability. Experimental results demonstrate the system's effectiveness, with a similarity threshold setting of 80% yielding a maximum identification accuracy of 98.1%. This significantly minimizes errors and reduces the time required for document management workflows.
Article Details

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
References
D. Mailserver. (23 October 2025). Docker mailserver. [Online] Available : https://github.com/docker-mailserver/docker-mailserver
K. Chaimooltan. (26 July 2021). TH-National-Document-OCR-Part-II: First releases of THND OCR Part II. [Online] Available : https://zenodo.org/records/5136432 doi: 10.5281/zenodo.5136432
R. Hengprasert, "An end-to-end trainable Thai OCR system using deep recurrent neural network," Journal of Science Innovation for Sustainable Development, Vol. 2 (1), pp. 78–83, 2020.
T. Khumphakdee, S. Waijanya, and N. Promrit. "Natural language processing to improve the errors caused by the optical character recognition," KKU Science Journal, Vol. 51 (2), pp. 126–141, 2023.
R. Cox. (23 October 2025). Regular expression matching can be simple and fast. [Online] Available : http://swtch.com/~rsc/regexp/regexp1.html