+7 (3412) 91 60 92

## Archive of Issues

Russia Yekaterinburg
Year
2019
Volume
29
Issue
1
Pages
117-132
 Section Computer science Title An efficient algorithm for three-component key index construction Author(-s) Veretennikov A.B.a Affiliations Ural Federal Universitya Abstract Proximity full-text searches in large text arrays are considered. A search query consists of several words. The search result is a list of documents containing these words. In a modern search system, documents that contain search query words that are near each other are more relevant than other documents. To solve this task, for each word in each indexed document, we need to store a record in the index. In this case, the query search time is proportional to the number of occurrences of the queried words in the indexed documents. Consequently, it is common for search systems to evaluate queries that contain frequently occurring words much more slowly than queries that contain less frequently occurring, ordinary words. For each word in the text, we use additional indexes to store information about nearby words at distances from the given word of less than or equal to $MaxDistance$, which is a parameter. This parameter can take a value of 5, 7, or even more. Three-component key indexes can be created for faster query execution. Previously, we presented the results of experiments showing that, when queries contain very frequently occurring words, the average time of the query execution with three-component key indexes is 94.7 times less than that required when using ordinary inverted indexes. In the current work, we describe a new three-component key index building algorithm. We prove the correctness of the algorithm. We present the results of experiments of the index creation depending on the value of $MaxDistance$. Keywords full-text search, search engines, inverted files, additional indexes, proximity search, three-component key indexes UDC 519.683.5 MSC 68P20, 68P10 DOI 10.20537/vm190111 Received 1 July 2018 Language Russian Citation Veretennikov A.B. An efficient algorithm for three-component key index construction, Vestnik Udmurtskogo Universiteta. Matematika. Mekhanika. Komp'yuternye Nauki, 2019, vol. 29, issue 1, pp. 117-132. References Veretennikov A.B. Proximity full-text search with response time guarantee by means of three component keys, Bulletin of the South Ural State University. Ser. Computational Mathematics and Software Engineering, 2018, vol. 7, no. 1, pp. 60-77 (in Russian). https://doi.org/10.14529/cmse180105 Yan H., Shi S., Zhang F., Suel T., Wen J.-R. Efficient term proximity search with term-pair indexes, Proceedings of the 19th ACM International Conference on Information and Knowledge Management (CIKM'10), Toronto, Canada, 2010, pp. 1229-1238. https://doi.org/10.1145/1871437.1871593 Buttcher S., Clarke C., Lushman B. Term proximity scoring for ad-hoc retrieval on very large text collections, Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'06), Seattle, USA, 2006, pp. 621-622. https://doi.org/10.1145/1148170.1148285 Rasolofo Y., Savoy J. Term proximity scoring for keyword-based retrieval systems, European Conference on Information Retrieval (ECIR'2003): Advances in Information Retrieval, 2003, pp. 207-218. https://doi.org/10.1007/3-540-36618-0_15 Zobel J., Moffat A. Inverted files for text search engines, ACM Computing Surveys, 2006, vol. 38, no. 2, article 6. https://doi.org/10.1145/1132956.1132959 Tomasic A., Garcia-Molina H., Shoens K. Incremental updates of inverted lists for text document retrieval, Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data (SIGMOD'94), Minneapolis, Minnesota, USA, 1994, pp. 289-300. https://doi.org/10.1145/191839.191896 Brown E.W., Callan J.P., Croft W.B. Fast incremental indexing for full-text information retrieval, Proceedings of the 20th International Conference on Very Large Data Bases (VLDB'94), Santiago de Chile, Chile, 1994, pp. 192-202. Luk R.W.P. Scalable, statistical storage allocation for extensible inverted file construction, Journal of Systems and Software, 2011, vol. 84, no. 7, pp. 1082-1088. https://doi.org/10.1016/j.jss.2011.01.049 Zipf G. Relative frequency as a determinant of phonetic change, Harvard Studies in Classical Philology, 1929, vol. 40, pp. 1-95. https://doi.org/10.2307/310585 Miller R.B. Response time in man-computer conversational transactions, Proceedings of the December 9-11, 1968, Fall Joint Computer Conference, part I (AFIPS'68), San Francisco, California, 1968, pp. 267-277. https://doi.org/10.1145/1476589.1476628 Veretennikov A.B. Using additional indexes for fast full-text searching phrases that contains frequently used words, Sistemy Upravleniya i Informatsionnye Tekhnologii, 2013, vol. 52, no. 2, pp. 61-66 (in Russian). Veretennikov A.B. Efficient full-text search by means of additional indexes of frequently used words, Sistemy Upravleniya i Informatsionnye Tekhnologii, 2016, vol. 66, no. 4, pp. 52-60 (in Russian). Anh V.N., de Kretser O., Moffat A. Vector-space ranking with effective early termination, Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'01), New Orleans, Louisiana, USA, 2001, pp. 35-42. https://doi.org/10.1145/383952.383957 Garcia S., Williams H.E., Cannane A. Access-ordered indexes, Proceedings of the 27th Australasian Conference on Computer Science (ACSC'04), Dunedin, New Zealand, 2004, pp. 7-14. Williams H.E., Zobel J., Bahle D. Fast phrase querying with combined indexes, ACM Transactions on Information Systems (TOIS), 2004, vol. 22, no. 4, pp. 573-594. https://doi.org/10.1145/1028099.1028102 Veretennikov A.B. Efficient full-text proximity search by means of three component keys, Sistemy Upravleniya i Informatsionnye Tekhnologii, 2017, vol. 69, no. 3, pp. 25-32 (in Russian). Veretennikov A.B. About a structure of easy updatable full-text indexes, Proceedings of the International Youth School-Conference “SoProMat-2017”, Yekaterinburg, Russia, 2017, pp. 30-41 (in Russian). http://ceur-ws.org/Vol-1894/ Lu X., Moffat A., Culpepper J.S. Efficient and effective higher order proximity modeling, Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval (ICTIR'16), Newark, Delaware, USA, 2016, pp. 21-30. https://doi.org/10.1145/2970398.2970404 Full text