phone +7 (3412) 91 60 92

Archive of Issues

Russia Izhevsk
Section Computer science
Title Refinement of the results of recognition of mathematical formulas using the Levenshtein distance
Author(-s) Saparov A.Yu.a, Beltyukov A.P.a, Maslov S.G.a
Affiliations Udmurt State Universitya
Abstract The article deals with the problem of recognizing scanned mathematical texts with repeating formulas or formulas with same fragments. A method for comparing recognition results is described, which allows one to select similar elements from a variety of recognition options. The method is based on calculating the Levenshtein distances between individual fragments with additional parameters. The proposed method differs from the usual method in that, in the presence of uncertainties in comparison, all possible recognition options are used, presented as a symbol-weight pair. In the case of nonlinear formulas, numerical parameters that specify the location of individual symbols on the plane are also used in comparison. This comparison will allow you to group the formulas, and the data obtained will be useful in making decisions both by a user and by a program. Using this method will simplify the process of manual error correction, which will be based on the dynamic management of intermediate results in the process of close man-machine interaction.
Keywords Levenshtein distance, replacement weight, displacement weight, variety of recognition options, formulas with common fragments
UDC 004.93, 510.5
MSC 68T10, 68W32
DOI 10.35634/vm200311
Received 12 March 2020
Language Russian
Citation Saparov A.Yu., Beltyukov A.P., Maslov S.G. Refinement of the results of recognition of mathematical formulas using the Levenshtein distance, Vestnik Udmurtskogo Universiteta. Matematika. Mekhanika. Komp'yuternye Nauki, 2020, vol. 30, issue 3, pp. 513-529.
  1. Saparov A.Yu., Shirobokova I.Yu. User interface development to manage the process of handwritten mathematical formula recognition, Vestnik Udmurtskogo Universiteta. Matematika. Mekhanika. Komp'yuternye Nauki, 2016, no. 1, pp. 141-152 (in Russian).
  2. Specht D.F. Probabilistic neural networks, Neural Networks, 1990, vol. 3, issue 1, pp. 109-118.
  3. Rutkowski L. Adaptive probabilistic neural networks for pattern classification in time-varying environment, IEEE Transactions on Neural Networks, 2004, vol. 15, no. 4, pp. 811-827.
  4. Savchenko A.V. Statistical pattern recognition based on probabilistic neural network with homogeneity check, Iskusstvennyi Intellekt i Prinyatie Reshenii, 2013, no. 4, pp. 45-56 (in Russian).
  5. Ray A., Rajeswar S., Chaudhury S. A hypothesize-and-verify framework for text recognition using deep recurrent neural networks, 2015 13th International Conference on Document Analysis and Recognition (ICDAR), Tunis, 2015, pp. 936-940.
  6. Ray A., Rajeswar S., Chaudhury S. OCR for bilingual documents using language modeling, 2015 13th International Conference on Document Analysis and Recognition (ICDAR), Tunis, 2015, pp. 1256-1260.
  7. Kiryushina A.E. The methods of processing of the image which contains mathematic formulas, Fundamental'nye Issledovaniya, 2015, no. 9 (part 2), pp. 240-246 (in Russian).
  8. Thorelli L.E. Automatic correction of errors in text, BIT Numerical Mathematics, 1962, vol. 2, issue 1, pp. 45–-52.
  9. Yang Z., Xiaobing Z. Automatic error detection and correction of text: the state of the art, 2013 6th International Conference on Intelligent Networks and Intelligent Systems (ICINIS), 2013, pp. 274-277.
  10. Bassil Y., Alwani M. OCR post-processing error correction algorithm using Google's online spelling suggestion, Journal of Emerging Trends in Computing and Information Sciences, 2012, vol. 3, no. 1, pp. 90-99.
  11. Han B., Cook P., Baldwin T. Lexical normalization for social media text, ACM Transactions on Intelligent Systems and Technology, 2013, vol. 4, no. 1, article no. 5.
  12. Hadzic D., Sarajlic N. Methodology for fuzzy duplicate record identification based on the semantic-syntactic information of similarity, Journal of King Saud University - Computer and Information Sciences, 2020, vol. 32, issue 1, pp. 126-136.
  13. Prasetya D.D., Prasetya Wibawa A., Hirashima T. The performance of text similarity algorithms, International Journal of Advances in Intelligent Informatics, 2018, vol. 4, no. 1, pp. 63-69.
  14. Gomaa W.H., Fahmy A.A. A survey of text similarity approaches, International Journal of Computer Applications, 2013, vol. 68, no. 13, pp. 13-18.
  15. Boitsov L.M. Classification and experimental study of the vocabulary of modern algorithms for fuzzy search, 6-ya Vserossiiskaya Nauchnaya Konferentsiya “Elektronnye Biblioteki: Perspektivnye Metody i Tekhnologii, Elektronnye Kollektsii” (6th all-Russian Scientific Conference “Electronic Libraries: Promising Methods and Technologies, Electronic Collections”), Pushchino, Russia, 2004. (In Russian).
  16. Frolov A.S. Development of fuzzy search algorithm based on hashing, Molodoi Uchenyi, 2016, no. 13 (117), pp. 357-360 (in Russian).
  17. Kolesov D.A. Development of an algorithm for searching information in databases using the fuzzy string comparison function, Issledovaniya po Informatike, 2004, issue 7, pp. 125-132 (in Russian).
  18. Sochenkov I.V. Text comparison method for solving search and analytical problems, Iskusstvennyi Intellekt i Prinyatie Reshenii, 2013, no. 2, pp. 32-43 (in Russian).
  19. Grinchenkov D.V., Kushchii D.N. Solution to problem of constructing inquiries in a thematic search based on recognition of partial structured texts, Izvestiya Vysshikh Uchebnykh Zavedenii. Severo-Kavkazskii Region. Ser. Tekhnicheskie Nauki, 2019, no. 1, pp. 10-16 (in Russian).
  20. Povolotskii M.A., Tropin D.V., Chernov T.S., Savelyev B.I. Dynamic programming approach to textual structured objects segmentation in images, Informatsionnye Tekhnologii i Vychislitel'nye Sistemy, 2019, no. 3, pp. 66-78 (in Russian).
  21. Levenshtein V.I. Binary codes capable of correcting deletions, insertions, and reversals, Sov. Phys., Dokl., 1965, vol. 10, pp. 707-710.
  22. Winkler W.E. String comparator metrics and enhanced decision rules in the Fellegi-Sunter model of Record Linkage, Reports - Evaluative/Feasibility (142), 1990, p. 8.
  23. Hamming R.W. Error detecting and error correcting codes, The Bell System Technical Journal, 1950, vol. 29, no. 2, pp. 147-160.
  24. Damerau F.J. A technique for computer detection and correction of spelling errors, Communications of the ACM, 1964, vol. 7, no. 3, pp. 171-176.
  25. Kondrak G. $N$-gram similarity and distance, String Processing and Information Retrieval, 2005, pp. 115-126.
  26. Saparov A.Yu., Beltyukov A.P. Regular expressions in the mathematical text recognition problem, Vestnik Udmurtskogo Universiteta. Matematika. Mekhanika. Komp'yuternye Nauki, 2012, no. 2, pp. 63-73 (in Russian).
  27. Beltyukov A.P., Maslov S.G. The understanding in the framework of the ergatic system, III International Conference “International cooperation: integration of educational areas”, Udmurt State University, Izhevsk, 2016, pp. 145-151 (in Russian).
  28. Lyutikova L.A., Shmatova E.V. The logical correction algorithms for the qualitative analysis of the subject area in pattern recognition, Kibernetika i Programmirovanie, 2015, no. 5, pp. 1-127 (in Russian).
  29. Wagner R.A., Fischer M.J. The string-to-string correction problem, Journal of the ACM, 1974, vol. 21, issue 1, pp. 168-173.
Full text
<< Previous article