Complete Fix-Free Codes for the Statistical Dictionary-Based String Matching Problem

Meer Suri, Stefano Rini

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In the dictionary based string matching problem, the positions in which a short sequence of symbols, query, occurs in a longer sequence, source, have to be retrieved. To improve the query search efficiency, the source can be pre-processed to build an inverted index, that is a table storing positions of occurrences of certain substrings in the text. Given an inverted index, various algorithms can be utilized to retrieve the position of the query in the source. In the literature, motivated by text retrieval and web search applications, the small v/s small (SvS) algorithm is often considered for the case in which the substrings stored are k-gram codes, i.e. all possible substrings of length k. In [9] we have proposed a novel approach to the string matching problem by letting the strings generating the inverted index be a variable length code, i.e. posting code, to be optimally design for a given performance metric. Posting codes [9] generalize classic k-gram codes and enable new query retrieval algorithms with potentially lower complexity than the SvS algorithm. In this paper, we propose a new query retrieval algorithm, the forward-backward SvS (FB-SvS) algorithm which can be applied when the posting code is a complete fix-free code. The complexity of the FB-SvS algorithm for various fix-free codes is investigated and numerical experiments using the human DNA sequence as the source are presented.

Original languageEnglish
Title of host publicationConference Record - 53rd Asilomar Conference on Circuits, Systems and Computers, ACSSC 2019
EditorsMichael B. Matthews
PublisherIEEE Computer Society
Pages1389-1393
Number of pages5
ISBN (Electronic)9781728143002
DOIs
StatePublished - Nov 2019
Event53rd Asilomar Conference on Circuits, Systems and Computers, ACSSC 2019 - Pacific Grove, United States
Duration: 3 Nov 20196 Nov 2019

Publication series

NameConference Record - Asilomar Conference on Signals, Systems and Computers
Volume2019-November
ISSN (Print)1058-6393

Conference

Conference53rd Asilomar Conference on Circuits, Systems and Computers, ACSSC 2019
Country/TerritoryUnited States
CityPacific Grove
Period3/11/196/11/19

Keywords

  • Complete fix-free codes
  • Dictionary-based string matching
  • Inverted index
  • Small vs small algorithm

Fingerprint

Dive into the research topics of 'Complete Fix-Free Codes for the Statistical Dictionary-Based String Matching Problem'. Together they form a unique fingerprint.

Cite this