phone +7 (3412) 91 60 92

Archive of Issues


Russia Izhevsk
Year
2020
Volume
30
Issue
3
Pages
513-529
<<
Section Computer science
Title Refinement of the results of recognition of mathematical formulas using the Levenshtein distance
Author(-s) Saparov A.Yu.a, Beltyukov A.P.a, Maslov S.G.a
Affiliations Udmurt State Universitya
Abstract The article deals with the problem of recognizing scanned mathematical texts with repeating formulas or formulas with same fragments. A method for comparing recognition results is described, which allows one to select similar elements from a variety of recognition options. The method is based on calculating the Levenshtein distances between individual fragments with additional parameters. The proposed method differs from the usual method in that, in the presence of uncertainties in comparison, all possible recognition options are used, presented as a symbol-weight pair. In the case of nonlinear formulas, numerical parameters that specify the location of individual symbols on the plane are also used in comparison. This comparison will allow you to group the formulas, and the data obtained will be useful in making decisions both by a user and by a program. Using this method will simplify the process of manual error correction, which will be based on the dynamic management of intermediate results in the process of close man-machine interaction.
Keywords Levenshtein distance, replacement weight, displacement weight, variety of recognition options, formulas with common fragments
UDC 004.93, 510.5
MSC 68T10, 68W32
DOI 10.35634/vm200311
Received 12 March 2020
Language Russian
Citation Saparov A.Yu., Beltyukov A.P., Maslov S.G. Refinement of the results of recognition of mathematical formulas using the Levenshtein distance, Vestnik Udmurtskogo Universiteta. Matematika. Mekhanika. Komp'yuternye Nauki, 2020, vol. 30, issue 3, pp. 513-529.
References
  1. Saparov A.Yu., Shirobokova I.Yu. User interface development to manage the process of handwritten mathematical formula recognition, Vestnik Udmurtskogo Universiteta. Matematika. Mekhanika. Komp'yuternye Nauki, 2016, no. 1, pp. 141-152 (in Russian). https://doi.org/10.20537/vm160111
  2. Specht D.F. Probabilistic neural networks, Neural Networks, 1990, vol. 3, issue 1, pp. 109-118. https://doi.org/10.1016/0893-6080(90)90049-q
  3. Rutkowski L. Adaptive probabilistic neural networks for pattern classification in time-varying environment, IEEE Transactions on Neural Networks, 2004, vol. 15, no. 4, pp. 811-827. https://doi.org/10.1109/tnn.2004.828757
  4. Savchenko A.V. Statistical pattern recognition based on probabilistic neural network with homogeneity check, Iskusstvennyi Intellekt i Prinyatie Reshenii, 2013, no. 4, pp. 45-56 (in Russian). http://www.isa.ru/aidt/images/documents/2013-04/45_56.pdf
  5. Ray A., Rajeswar S., Chaudhury S. A hypothesize-and-verify framework for text recognition using deep recurrent neural networks, 2015 13th International Conference on Document Analysis and Recognition (ICDAR), Tunis, 2015, pp. 936-940. https://doi.org/10.1109/ICDAR.2015.7333899
  6. Ray A., Rajeswar S., Chaudhury S. OCR for bilingual documents using language modeling, 2015 13th International Conference on Document Analysis and Recognition (ICDAR), Tunis, 2015, pp. 1256-1260. https://doi.org/10.1109/ICDAR.2015.7333965
  7. Kiryushina A.E. The methods of processing of the image which contains mathematic formulas, Fundamental'nye Issledovaniya, 2015, no. 9 (part 2), pp. 240-246 (in Russian). http://fundamental-research.ru/ru/article/view?id=39082
  8. Thorelli L.E. Automatic correction of errors in text, BIT Numerical Mathematics, 1962, vol. 2, issue 1, pp. 45–-52. https://doi.org/10.1007/BF02024781
  9. Yang Z., Xiaobing Z. Automatic error detection and correction of text: the state of the art, 2013 6th International Conference on Intelligent Networks and Intelligent Systems (ICINIS), 2013, pp. 274-277. https://doi.org/10.1109/ICINIS.2013.77
  10. Bassil Y., Alwani M. OCR post-processing error correction algorithm using Google's online spelling suggestion, Journal of Emerging Trends in Computing and Information Sciences, 2012, vol. 3, no. 1, pp. 90-99. http://www.cisjournal.org/journalofcomputing/archive/vol3no1/vol3no1_7.pdf
  11. Han B., Cook P., Baldwin T. Lexical normalization for social media text, ACM Transactions on Intelligent Systems and Technology, 2013, vol. 4, no. 1, article no. 5. https://doi.org/10.1145/2414425.2414430
  12. Hadzic D., Sarajlic N. Methodology for fuzzy duplicate record identification based on the semantic-syntactic information of similarity, Journal of King Saud University - Computer and Information Sciences, 2020, vol. 32, issue 1, pp. 126-136. https://doi.org/10.1016/j.jksuci.2018.05.001
  13. Prasetya D.D., Prasetya Wibawa A., Hirashima T. The performance of text similarity algorithms, International Journal of Advances in Intelligent Informatics, 2018, vol. 4, no. 1, pp. 63-69. https://doi.org/10.26555/ijain.v4i1.152
  14. Gomaa W.H., Fahmy A.A. A survey of text similarity approaches, International Journal of Computer Applications, 2013, vol. 68, no. 13, pp. 13-18. https://doi.org/10.5120/11638-7118
  15. Boitsov L.M. Classification and experimental study of the vocabulary of modern algorithms for fuzzy search, 6-ya Vserossiiskaya Nauchnaya Konferentsiya “Elektronnye Biblioteki: Perspektivnye Metody i Tekhnologii, Elektronnye Kollektsii” (6th all-Russian Scientific Conference “Electronic Libraries: Promising Methods and Technologies, Electronic Collections”), Pushchino, Russia, 2004. (In Russian). http://rcdl.ru/doc/2004/paper27.pdf
  16. Frolov A.S. Development of fuzzy search algorithm based on hashing, Molodoi Uchenyi, 2016, no. 13 (117), pp. 357-360 (in Russian). https://moluch.ru/archive/117/32158
  17. Kolesov D.A. Development of an algorithm for searching information in databases using the fuzzy string comparison function, Issledovaniya po Informatike, 2004, issue 7, pp. 125-132 (in Russian). http://mi.mathnet.ru/eng/ipi112
  18. Sochenkov I.V. Text comparison method for solving search and analytical problems, Iskusstvennyi Intellekt i Prinyatie Reshenii, 2013, no. 2, pp. 32-43 (in Russian). http://www.isa.ru/aidt/images/documents/2013-02/32_43.pdf
  19. Grinchenkov D.V., Kushchii D.N. Solution to problem of constructing inquiries in a thematic search based on recognition of partial structured texts, Izvestiya Vysshikh Uchebnykh Zavedenii. Severo-Kavkazskii Region. Ser. Tekhnicheskie Nauki, 2019, no. 1, pp. 10-16 (in Russian). https://doi.org/10.17213/0321-2653-2019-1-10-16
  20. Povolotskii M.A., Tropin D.V., Chernov T.S., Savelyev B.I. Dynamic programming approach to textual structured objects segmentation in images, Informatsionnye Tekhnologii i Vychislitel'nye Sistemy, 2019, no. 3, pp. 66-78 (in Russian). https://doi.org/10.14357/20718632190306
  21. Levenshtein V.I. Binary codes capable of correcting deletions, insertions, and reversals, Sov. Phys., Dokl., 1965, vol. 10, pp. 707-710. https://zbmath.org/?q=an:0149.15905
  22. Winkler W.E. String comparator metrics and enhanced decision rules in the Fellegi-Sunter model of Record Linkage, Reports - Evaluative/Feasibility (142), 1990, p. 8. http://files.eric.ed.gov/fulltext/ED325505.pdf
  23. Hamming R.W. Error detecting and error correcting codes, The Bell System Technical Journal, 1950, vol. 29, no. 2, pp. 147-160. https://doi.org/10.1002/j.1538-7305.1950.tb00463.x
  24. Damerau F.J. A technique for computer detection and correction of spelling errors, Communications of the ACM, 1964, vol. 7, no. 3, pp. 171-176. https://doi.org/10.1145/363958.363994
  25. Kondrak G. $N$-gram similarity and distance, String Processing and Information Retrieval, 2005, pp. 115-126. https://doi.org/10.1007/11575832_13
  26. Saparov A.Yu., Beltyukov A.P. Regular expressions in the mathematical text recognition problem, Vestnik Udmurtskogo Universiteta. Matematika. Mekhanika. Komp'yuternye Nauki, 2012, no. 2, pp. 63-73 (in Russian). https://doi.org/10.20537/vm120206
  27. Beltyukov A.P., Maslov S.G. The understanding in the framework of the ergatic system, III International Conference “International cooperation: integration of educational areas”, Udmurt State University, Izhevsk, 2016, pp. 145-151 (in Russian). https://elibrary.ru/item.asp?id=29080734
  28. Lyutikova L.A., Shmatova E.V. The logical correction algorithms for the qualitative analysis of the subject area in pattern recognition, Kibernetika i Programmirovanie, 2015, no. 5, pp. 1-127 (in Russian). https://doi.org/10.7256/2306-4196.2015.5.16368
  29. Wagner R.A., Fischer M.J. The string-to-string correction problem, Journal of the ACM, 1974, vol. 21, issue 1, pp. 168-173. https://doi.org/10.1145/321796.321811
Full text
<< Previous article