+7 (3412) 91 60 92

## Archive of Issues

Russia Izhevsk
Year
2020
Volume
30
Issue
3
Pages
513-529
 Section Computer science Title Refinement of the results of recognition of mathematical formulas using the Levenshtein distance Author(-s) Saparov A.Yu.a, Beltyukov A.P.a, Maslov S.G.a Affiliations Udmurt State Universitya Abstract The article deals with the problem of recognizing scanned mathematical texts with repeating formulas or formulas with same fragments. A method for comparing recognition results is described, which allows one to select similar elements from a variety of recognition options. The method is based on calculating the Levenshtein distances between individual fragments with additional parameters. The proposed method differs from the usual method in that, in the presence of uncertainties in comparison, all possible recognition options are used, presented as a symbol-weight pair. In the case of nonlinear formulas, numerical parameters that specify the location of individual symbols on the plane are also used in comparison. This comparison will allow you to group the formulas, and the data obtained will be useful in making decisions both by a user and by a program. Using this method will simplify the process of manual error correction, which will be based on the dynamic management of intermediate results in the process of close man-machine interaction. Keywords Levenshtein distance, replacement weight, displacement weight, variety of recognition options, formulas with common fragments UDC 004.93, 510.5 MSC 68T10, 68W32 DOI 10.35634/vm200311 Received 12 March 2020 Language Russian Citation Saparov A.Yu., Beltyukov A.P., Maslov S.G. Refinement of the results of recognition of mathematical formulas using the Levenshtein distance, Vestnik Udmurtskogo Universiteta. Matematika. Mekhanika. Komp'yuternye Nauki, 2020, vol. 30, issue 3, pp. 513-529. References Saparov A.Yu., Shirobokova I.Yu. User interface development to manage the process of handwritten mathematical formula recognition, Vestnik Udmurtskogo Universiteta. Matematika. Mekhanika. Komp'yuternye Nauki, 2016, no. 1, pp. 141-152 (in Russian). https://doi.org/10.20537/vm160111 Specht D.F. Probabilistic neural networks, Neural Networks, 1990, vol. 3, issue 1, pp. 109-118. https://doi.org/10.1016/0893-6080(90)90049-q Rutkowski L. Adaptive probabilistic neural networks for pattern classification in time-varying environment, IEEE Transactions on Neural Networks, 2004, vol. 15, no. 4, pp. 811-827. https://doi.org/10.1109/tnn.2004.828757 Savchenko A.V. Statistical pattern recognition based on probabilistic neural network with homogeneity check, Iskusstvennyi Intellekt i Prinyatie Reshenii, 2013, no. 4, pp. 45-56 (in Russian). http://www.isa.ru/aidt/images/documents/2013-04/45_56.pdf Ray A., Rajeswar S., Chaudhury S. A hypothesize-and-verify framework for text recognition using deep recurrent neural networks, 2015 13th International Conference on Document Analysis and Recognition (ICDAR), Tunis, 2015, pp. 936-940. https://doi.org/10.1109/ICDAR.2015.7333899 Ray A., Rajeswar S., Chaudhury S. OCR for bilingual documents using language modeling, 2015 13th International Conference on Document Analysis and Recognition (ICDAR), Tunis, 2015, pp. 1256-1260. https://doi.org/10.1109/ICDAR.2015.7333965 Kiryushina A.E. The methods of processing of the image which contains mathematic formulas, Fundamental'nye Issledovaniya, 2015, no. 9 (part 2), pp. 240-246 (in Russian). http://fundamental-research.ru/ru/article/view?id=39082 Thorelli L.E. Automatic correction of errors in text, BIT Numerical Mathematics, 1962, vol. 2, issue 1, pp. 45–-52. https://doi.org/10.1007/BF02024781 Yang Z., Xiaobing Z. Automatic error detection and correction of text: the state of the art, 2013 6th International Conference on Intelligent Networks and Intelligent Systems (ICINIS), 2013, pp. 274-277. https://doi.org/10.1109/ICINIS.2013.77 Bassil Y., Alwani M. OCR post-processing error correction algorithm using Google's online spelling suggestion, Journal of Emerging Trends in Computing and Information Sciences, 2012, vol. 3, no. 1, pp. 90-99. http://www.cisjournal.org/journalofcomputing/archive/vol3no1/vol3no1_7.pdf Han B., Cook P., Baldwin T. Lexical normalization for social media text, ACM Transactions on Intelligent Systems and Technology, 2013, vol. 4, no. 1, article no. 5. https://doi.org/10.1145/2414425.2414430 Hadzic D., Sarajlic N. Methodology for fuzzy duplicate record identification based on the semantic-syntactic information of similarity, Journal of King Saud University - Computer and Information Sciences, 2020, vol. 32, issue 1, pp. 126-136. https://doi.org/10.1016/j.jksuci.2018.05.001 Prasetya D.D., Prasetya Wibawa A., Hirashima T. The performance of text similarity algorithms, International Journal of Advances in Intelligent Informatics, 2018, vol. 4, no. 1, pp. 63-69. https://doi.org/10.26555/ijain.v4i1.152 Gomaa W.H., Fahmy A.A. A survey of text similarity approaches, International Journal of Computer Applications, 2013, vol. 68, no. 13, pp. 13-18. https://doi.org/10.5120/11638-7118 Boitsov L.M. Classification and experimental study of the vocabulary of modern algorithms for fuzzy search, 6-ya Vserossiiskaya Nauchnaya Konferentsiya “Elektronnye Biblioteki: Perspektivnye Metody i Tekhnologii, Elektronnye Kollektsii” (6th all-Russian Scientific Conference “Electronic Libraries: Promising Methods and Technologies, Electronic Collections”), Pushchino, Russia, 2004. (In Russian). http://rcdl.ru/doc/2004/paper27.pdf Frolov A.S. Development of fuzzy search algorithm based on hashing, Molodoi Uchenyi, 2016, no. 13 (117), pp. 357-360 (in Russian). https://moluch.ru/archive/117/32158 Kolesov D.A. Development of an algorithm for searching information in databases using the fuzzy string comparison function, Issledovaniya po Informatike, 2004, issue 7, pp. 125-132 (in Russian). http://mi.mathnet.ru/eng/ipi112 Sochenkov I.V. Text comparison method for solving search and analytical problems, Iskusstvennyi Intellekt i Prinyatie Reshenii, 2013, no. 2, pp. 32-43 (in Russian). http://www.isa.ru/aidt/images/documents/2013-02/32_43.pdf Grinchenkov D.V., Kushchii D.N. Solution to problem of constructing inquiries in a thematic search based on recognition of partial structured texts, Izvestiya Vysshikh Uchebnykh Zavedenii. Severo-Kavkazskii Region. Ser. Tekhnicheskie Nauki, 2019, no. 1, pp. 10-16 (in Russian). https://doi.org/10.17213/0321-2653-2019-1-10-16 Povolotskii M.A., Tropin D.V., Chernov T.S., Savelyev B.I. Dynamic programming approach to textual structured objects segmentation in images, Informatsionnye Tekhnologii i Vychislitel'nye Sistemy, 2019, no. 3, pp. 66-78 (in Russian). https://doi.org/10.14357/20718632190306 Levenshtein V.I. Binary codes capable of correcting deletions, insertions, and reversals, Sov. Phys., Dokl., 1965, vol. 10, pp. 707-710. https://zbmath.org/?q=an:0149.15905 Winkler W.E. String comparator metrics and enhanced decision rules in the Fellegi-Sunter model of Record Linkage, Reports - Evaluative/Feasibility (142), 1990, p. 8. http://files.eric.ed.gov/fulltext/ED325505.pdf Hamming R.W. Error detecting and error correcting codes, The Bell System Technical Journal, 1950, vol. 29, no. 2, pp. 147-160. https://doi.org/10.1002/j.1538-7305.1950.tb00463.x Damerau F.J. A technique for computer detection and correction of spelling errors, Communications of the ACM, 1964, vol. 7, no. 3, pp. 171-176. https://doi.org/10.1145/363958.363994 Kondrak G. $N$-gram similarity and distance, String Processing and Information Retrieval, 2005, pp. 115-126. https://doi.org/10.1007/11575832_13 Saparov A.Yu., Beltyukov A.P. Regular expressions in the mathematical text recognition problem, Vestnik Udmurtskogo Universiteta. Matematika. Mekhanika. Komp'yuternye Nauki, 2012, no. 2, pp. 63-73 (in Russian). https://doi.org/10.20537/vm120206 Beltyukov A.P., Maslov S.G. The understanding in the framework of the ergatic system, III International Conference “International cooperation: integration of educational areas”, Udmurt State University, Izhevsk, 2016, pp. 145-151 (in Russian). https://elibrary.ru/item.asp?id=29080734 Lyutikova L.A., Shmatova E.V. The logical correction algorithms for the qualitative analysis of the subject area in pattern recognition, Kibernetika i Programmirovanie, 2015, no. 5, pp. 1-127 (in Russian). https://doi.org/10.7256/2306-4196.2015.5.16368 Wagner R.A., Fischer M.J. The string-to-string correction problem, Journal of the ACM, 1974, vol. 21, issue 1, pp. 168-173. https://doi.org/10.1145/321796.321811 Full text