VM — pp. 132-148

jlogo

Vestnik Udmurtskogo Universiteta.
Matematika. Mekhanika. Komp'yuternye Nauki
ISSN: 1994-9197 (Print), 2076-5959 (Online)

phone

+7 (3412) 91 60 92

mail

imi@udsu.ru

ru-RU

Russian

Home » Issues » Archive of Issues » 2021 - 1 issue » pp. 132-148

Archive of Issues

Russia Yekaterinburg

Year

2021

Volume

31

Issue

1

Pages

132-148

<<

>>

Section	Computer science
Title	Relevance ranking for proximity full-text search based on additional indexes with multi-component keys
Author(-s)	Veretennikov A.B.^a
Affiliations	Ural Federal University^a
Abstract	The problem of proximity full-text search is considered. If a search query contains high-frequently occurring words, then multi-component key indexes deliver improvement of the search speed in comparison with ordinary inverted indexes. It was shown that we can increase the search speed up to 130 times in cases when queries consist of high-frequently occurring words. In this paper, we are investigating how the multi-component key indexes architecture affects the quality of the search. We consider several well-known methods of relevance ranking; these methods are of different authors. Using these methods we perform the search in the ordinary inverted index and then in the index that is enhanced with multi-component key indexes. The results show that with multi-component key indexes we obtain search results that are very near in terms of relevance ranking to the search results that are obtained by means of ordinary inverted indexes.
Keywords	full-text search, search engines, relevance ranking, inverted indexes, proximity search, three-component key indexes
UDC	519.683.5
MSC	68P20, 68P10
DOI	10.35634/vm210110
Received	11 October 2020
Language	Russian
Citation	Veretennikov A.B. Relevance ranking for proximity full-text search based on additional indexes with multi-component keys, Vestnik Udmurtskogo Universiteta. Matematika. Mekhanika. Komp'yuternye Nauki, 2021, vol. 31, issue 1, pp. 132-148.
References	Sadakane K. Fast algorithms for k-word proximity search, IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, 2001, vol. 84, issue 9, pp. 2311-2318. Gall M., Brost G. K-word proximity search on encrypted data, 30th International Conference on Advanced Information Networking and Applications Workshops (WAINA), 2016, pp. 365-372. https://doi.org/10.1109/WAINA.2016.104 Zipf G.K. Selected studies of the principle of relative frequency in language, Harvard University Press, 1932. Veretennikov A.B. Proximity full-text search by means of additional indexes with multi-component keys: in pursuit of optimal performance, Data Analytics and Management in Data Intensive Domains. DAMDID/RCDL 2018. Communications in Computer and Information Science, vol. 1003, Cham: Springer, 2019, pp. 111-130. https://doi.org/10.1007/978-3-030-23584-0_7 Miller R.B. Response time in man-computer conversational transactions, Proceedings of the December 9-11, 1968, fall joint computer conference, part I on - AFIPS '68 (Fall, part I), San Francisco, California, 1968, vol. 33, pp. 267-277. https://doi.org/10.1145/1476589.1476628 Veretennikov A.B. Proximity full-text search with response time guarantee by means of three component keys, Bulletin of the South Ural State University. Series “Computational Mathematics and Software Engineering”, 2018, vol. 7, issue 1, pp. 60-77 (in Russian). https://doi.org/10.14529/cmse180105 Veretennikov A.B. Proximity full-text search with a response time guarantee by means of additional indexes, Intelligent Systems and Applications: Proceedings of the 2018 Intelligent Systems Conference, Cham: Springer, 2018, pp. 936-954. https://doi.org/10.1007/978-3-030-01054-6_66 Veretennikov A.B. An efficient algorithm for three-component key index construction, Vestnik Udmurtskogo Universiteta. Matematika. Mekhanika. Komp'yuternye Nauki, 2019, vol. 29, issue 1, pp. 117-132 (in Russian). https://doi.org/10.20537/vm190111 Anh V.N., de Kretser O., Moffat A. Vector-space ranking with effective early termination, Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '01, 2001, pp. 35-42. https://doi.org/10.1145/383952.383957 Garcia S., Williams H.E., Cannane A. Access-ordered indexes, ACSC '04 Proceedings of the 27th Australasian Conference on Computer Science, Dunedin, New Zealand, 2004, pp. 7-14. Lin J., Trotman A. Anytime ranking for impact-ordered indexes, Proceedings of the 2015 International Conference on The Theory of Information Retrieval, 2015, pp. 301-304. https://doi.org/10.1145/2808194.2809477 Anh V.N., Moffat A. Pruned query evaluation using pre-computed impacts, Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '06, 2006, pp. 372-379. https://doi.org/10.1145/1148170.1148235 Veretennikov A.B. Efficient full-text search by means of additional indexes of frequently used words, Sistemy Upravleniya i Informatsionnye Tekhnologii, 2016, vol. 66, issue 4, pp. 52-60 (in Russian). Zobel J., Moffat A. Inverted files for text search engines, ACM Computing Surveys, 2006, vol. 38, issue 2, article 6. https://doi.org/10.1145/1132956.1132959 Yang Y., Ning H. Block linked list index structure for large data full text retrieval, 13th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD), 2017, pp. 2123-2128. https://doi.org/10.1109/FSKD.2017.8393099 Williams H.E., Zobel J., Bahle D. Fast phrase querying with combined indexes, ACM Transactions on Information Systems (TOIS), 2004, vol. 22, issue 4, pp. 573-594. https://doi.org/10.1145/1028099.1028102 Crane M., Culpepper J., Lin J., Mackenzie J., Trotman A. A comparison of document-at-a-time and score-at-a-time query evaluation, Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, 2017, pp. 201-210. https://doi.org/10.1145/3018661.3018726 Robertson S.E., Walker S. Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval, SIGIR '94, London: Springer, 1994, pp. 232-241. https://doi.org/10.1007/978-1-4471-2099-5_24 Salton G., Yang C.S. On the specification of term values in automatic indexing, Journal of Documentation, 1973, vol. 29, issue 4, pp. 351-372. Yan H., Shi S., Zhang F., Suel T., Wen J.-R. Efficient term proximity search with term-pair indexes, Proceedings of the 19th ACM international conference on Information and knowledge management - CIKM '10, Toronto, 2010, pp. 1229-1238. https://doi.org/10.1145/1871437.1871593 Brin S., Page L. Reprint of: The anatomy of a large-scale hypertextual web search engine, Computer Networks, 2012, vol. 56, issue 18, pp. 3825-3833. https://doi.org/10.1016/j.comnet.2012.10.007 Lu X., Moffat A., Culpepper J.S. Efficient and effective higher order proximity modeling, Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval, 2016, pp. 21-30. https://doi.org/10.1145/2970398.2970404 Clarke C.L.A., Cormack G.V., Tudhope E.A. Relevance ranking for one to three term queries, Information Processing and Management, 2000, vol. 36, issue 2, pp. 291-311. https://doi.org/10.1016/S0306-4573(99)00017-5 Song R., Taylor M.J., Wen J.-R., Hon H.-W., Yu Y. Viewing term proximity from a different perspective, Advances in Information Retrieval: Proceedings of 30th European Conference on IR Research, Berlin: Springer, 2008, pp. 346-357. https://doi.org/10.1007/978-3-540-78646-7_32 Trotman A., Puurula A., Burgess B. Improvements to BM25 and language models examined, Proceedings of the 2014 Australasian Document Computing Symposium on - ADCS '14, 2014, pp. 58-65. https://doi.org/10.1145/2682862.2682863 Dang V., Bendersky M., Croft W.B. Two-stage learning to rank for information retrieval, Advances in Information Retrieval: Proceedings of 35th European Conference on IR Research, Berlin: Springer, 2013, pp. 423-434. https://doi.org/10.1007/978-3-642-36973-5_36 Silva S.D.N., de Moura E.S., Calado P.P., da Silva A.S. Effective lightweight learning-to-rank method using unified term impacts, IEEE Access, 2020, vol. 8, pp. 70420-70437. https://doi.org/10.1109/ACCESS.2020.2986943 Mitra B., Diaz F., Craswell N. Learning to match using local and distributed representations of text for web search, Proceedings of the 26th International Conference on World Wide Web, Republic and Canton of Geneva, Switzerland, 2017, pp. 1291-1299. https://doi.org/10.1145/3038912.3052579 Silva T.P.C., de Moura E.S., Cavalcanti J.M.B., da Silva A.S., de Carvalho M.G., Gonçalves M.A. An evolutionary approach for combining different sources of evidence in search engines, Information Systems, 2009, vol. 34, issue 2, pp. 276-289. https://doi.org/10.1016/j.is.2008.07.003 Levenshtein V.I. Binary codes capable of correcting deletions, insertions, and reversals, Soviet Physics Doklady, 1966, vol. 10, issue 8, pp. 707-710. Makhoul J., Kubala F., Schwartz R., Weischedel R. Performance measures for information extraction, Proceedings of DARPA Broadcast News Workshop, 1999, pp. 249-252. Järvelin K., Kekäläinen J. Cumulated gain-based evaluation of IR techniques, ACM Transactions on Information Systems, 2002, vol. 20, issue 4, pp. 422-446. Liu T.-Y. Learning to rank for information retrieval, Berlin: Springer, 2011. https://doi.org/10.1007/978-3-642-14267-3 Büttcher S., Clarke C., Soboroff I. The TREC 2006 terabyte track, Proceedings of the Fifteenth Text REtrieval Conference, TREC 2006, 2006, pp. 128-141. Fox C. A stop list for general text, ACM SIGIR Forum, 1989, vol. 24, issue 1-2, pp. 19-35. https://doi.org/10.1145/378881.378888 Jansen B.J., Spink A., Saracevic T. Real life, real users, and real needs: a study and analysis of user queries on the web, Information Processing and Management, 2000, vol. 36, issue 2, pp. 207-227. https://doi.org/10.1016/S0306-4573(99)00056-4 Jansen B.J., Spink A. How are we searching the World Wide Web? A comparison of nine search engine transaction logs, Information Processing and Management, 2006, vol. 42, pp. 248-263. https://doi.org/10.1016/j.ipm.2004.10.007
Full text	/doc/issues-2021/21-01-10.pdf

<< Previous article

Next article >>