Fusion: Practice and Applications

Journal DOI

https://doi.org/10.54216/FPA

Submit Your Paper

2692-4048ISSN (Online) 2770-0070ISSN (Print)

Volume 4 , Issue 2 , PP: 56-61, 2021 | Cite this article as | XML | Html | PDF | Full Length Article

An efficient extraction of information from Indian Government issued documents Aadhar and Pan Card

Rachna Tewani 1 * , Arun K. Dubey 2 , Achin Jain 3 , Eshika Agarwal 4 , Disha Mittal 5

  • 1 Data Scientist ,Great Learning, India - (rachnatewani09@gmail.com)
  • 2 Bharati Vidyapeeth's College of Engineering, INDIA - (arudubey@gmail.com)
  • 3 Bharati Vidyapeeth's College of Engineering, INDIA - (achin.mails@gmail.com)
  • 4 Bharati Vidyapeeth's College of Engineering, INDIA - (eshika2812@gmail.com)
  • 5 Bharati Vidyapeeth's College of Engineering, INDIA - (dishamittal.it2@bvp.edu.in)
  • Doi: https://doi.org/10.54216/FPA.040201

    Received: April 12, 2021 Accepted: August 01, 2021
    Abstract

    In today's world, everything is getting digitized, and widespread use of data scanning tools and photography. When we have a lot of image data, it becomes important to accumulate data in a form that is useful for the company/organization. Doing it manually is a tedious task and takes an ample amount of time. Hence to simplify the job, we have developed a FLASK API that takes an image folder as an object and returns an excel sheet of relevant data from the image data. We have used optical character recognition and software like pytesseract to extract data from images. Further in the process, we have used natural language processing, and finally, we have found relevant data using the globe and regex module. This model is helpful in data collection from Registration certificates which helps us store data like chassis number, owner name, car number, etc.,  easily and can be applied to Aadhaar cards and pan cards.

    Keywords :

    Optical character recognition , Aadhar , Pan Card , NLP

    References

    [1]       Shafait, F., Keysers, D., & Breuel, T. M. (2008, January). Efficient implementation of local adaptive thresholding techniques using integral images. In Document recognition and retrieval XV (Vol. 6815, p. 681510). International Society for Optics and Photonics.

    [2]       Smith, R. (2007, September). An overview of the Tesseract OCR engine. In Ninth international conference on document analysis and recognition (ICDAR 2007) (Vol. 2, pp. 629-633). IEEE.

    [3]       Wen, Y., Lu, Y., Yan, J., Zhou, Z., von Deneen, K. M., & Shi, P. (2011). An algorithm for license plate recognition applied to intelligent transportation system. IEEE Transactions on intelligent transportation systems12(3), 830-845..

    [4]       Fan, X., & Fan, G. (2008). Graphical models for joint segmentation and recognition of license plate characters. IEEE Signal Processing Letters16(1), 10-13..

    [5]       Wu, H., & Li, B. (2011, July). License plate recognition system. In 2011 International Conference on Multimedia Technology (pp. 5425-5427). IEEE.

    [6]       Pan, Y. F., Hou, X., & Liu, C. L. (2008, September). A robust system to detect and localize texts in natural scene images. In 2008 The Eighth IAPR International Workshop on Document Analysis Systems (pp. 35-42). IEEE..

    [7]       Liang, J., DeMenthon, D., & Doermann, D. (2008). Geometric rectification of camera-captured document images. IEEE transactions on pattern analysis and machine intelligence30(4), 591-605.

    [8]       Wen, Y., Lu, Y., Yan, J., Zhou, Z., von Deneen, K. M., & Shi, P. (2011). An algorithm for license plate recognition applied to intelligent transportation system. IEEE Transactions on intelligent transportation systems12(3), 830-845.

    [9]       Zheng, L., He, X., Samali, B., & Yang, L. T. (2013). An algorithm for accuracy enhancement of license plate recognition. Journal of computer and system sciences79(2), 245-255..

    [10]     Deselaers, T., Gass, T., Heigold, G., & Ney, H. (2011). Latent log-linear models for handwritten digit classification. IEEE transactions on pattern analysis and machine intelligence34(6), 1105-1117..

    [11]     Jiao, J., Ye, Q., & Huang, Q. (2009). A configurable method for multi-style license plate recognition. Pattern Recognition42(3), 358-369..

    [12]     Kocer, H. E., & Cevik, K. K. (2011). Artificial neural networks based vehicle license plate recognition. Procedia Computer Science3, 1033-1037.

    [13]     Desai, A. A. (2010). Gujarati handwritten numeral optical character reorganization through neural network. Pattern recognition43(7), 2582-2589..

    [14]     Pal, U., Roy, P. P., Tripathy, N., & Lladós, J. (2010). Multi-oriented Bangla and Devnagari text recognition. Pattern Recognition43(12), 4124-4136..

    [15]     Manwatkar, P. M., & Singh, K. R. (2015, January). A technical review on text recognition from images. In 2015 IEEE 9th International Conference on Intelligent Systems and Control (ISCO) (pp. 1-5). IEEE..

    [16]     Akopyan, M. S., Belyaeva, O. V., Plechov, T. P., & Turdakov, D. Y. (2019, September). Text recognition on images from social media. In 2019 Ivannikov Memorial Workshop (IVMEM) (pp. 3-6). IEEE.

    [17]     Xiaojing Liu, Feiyu Gao, Qiong Zhang and Huasha Zhao, "Graph convolution for multimodal information extraction from visually rich documents", Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 2, no. Industry Papers, pp. 32-39, June 2019.

    [18]     Seong Ah Chin and Raashid Malik, "Extraction of Text in Images", Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 4459-4469, October-November 2018.

    Cite This Article As :
    Tewani, Rachna. , K., Arun. , Jain, Achin. , Agarwal, Eshika. , Mittal, Disha. An efficient extraction of information from Indian Government issued documents Aadhar and Pan Card. Fusion: Practice and Applications, vol. , no. , 2021, pp. 56-61. DOI: https://doi.org/10.54216/FPA.040201
    Tewani, R. K., A. Jain, A. Agarwal, E. Mittal, D. (2021). An efficient extraction of information from Indian Government issued documents Aadhar and Pan Card. Fusion: Practice and Applications, (), 56-61. DOI: https://doi.org/10.54216/FPA.040201
    Tewani, Rachna. K., Arun. Jain, Achin. Agarwal, Eshika. Mittal, Disha. An efficient extraction of information from Indian Government issued documents Aadhar and Pan Card. Fusion: Practice and Applications , no. (2021): 56-61. DOI: https://doi.org/10.54216/FPA.040201
    Tewani, R. , K., A. , Jain, A. , Agarwal, E. , Mittal, D. (2021) . An efficient extraction of information from Indian Government issued documents Aadhar and Pan Card. Fusion: Practice and Applications , () , 56-61 . DOI: https://doi.org/10.54216/FPA.040201
    Tewani R. , K. A. , Jain A. , Agarwal E. , Mittal D. [2021]. An efficient extraction of information from Indian Government issued documents Aadhar and Pan Card. Fusion: Practice and Applications. (): 56-61. DOI: https://doi.org/10.54216/FPA.040201
    Tewani, R. K., A. Jain, A. Agarwal, E. Mittal, D. "An efficient extraction of information from Indian Government issued documents Aadhar and Pan Card," Fusion: Practice and Applications, vol. , no. , pp. 56-61, 2021. DOI: https://doi.org/10.54216/FPA.040201