Full-Privacy Secured Search Engine Empowered by Efficient Genome-Mapping Algorithms

Yuan Yu Chang, Sheng Tang Wong, Emmanuel O. Salawu, Ming Hsuan Liao, Jui Hung Hung*, Lee Wei Yang*


研究成果: Article同行評審


Since the 90s, keyword-based search engines have been the only option for people to locate relevant web content through a simple query comprising one to a few keywords. These engines, whether free or paid, retained users' search queries and preferences, often to deliver targeted ads. Additionally, user-uploaded articles for plagiarism detection can further be stored as part of service providers' expanding databases for profit. Essentially, users could not search without exposing their queries to these providers. We present a new solution here: a method for searching the internet using a full article as a query without disclosing the content. Our Sapiens Aperio Veritas Engine (S.A.V.E.) uses an encoding scheme and an FM-index search, borrowed from next-generation human genome sequencing. Each word in a user's query is transformed into one of 12 'amino acids' to create a pseudo-biological sequence (PBS) on the user's device. Plagiarism checks are done by users submitting their locally created PBSs to our cloud service. This detects identical content in our database, which includes all English and Chinese Wikipedia articles and Open Access journals up to April 2021. PBSs, longer than 12 'amino acids', show accurate results with less than 0.8% false positives. Performance-wise, S.A.V.E. runs at a similar genome-mapping speed as Bowtie and is >5 orders faster than BLAST. With both standard and private modes, S.A.V.E. offers a revolutionary, privacy-first search and plagiarism check system. We believe this sets an exciting precedent for future search engines prioritizing user confidentiality. S.A.V.E. can be accessed at https://dyn.life.nthu.edu.tw/SAVE/.

頁(從 - 到)5155-5164
期刊IEEE Journal of Biomedical and Health Informatics
出版狀態Published - 1 10月 2023


深入研究「Full-Privacy Secured Search Engine Empowered by Efficient Genome-Mapping Algorithms」主題。共同形成了獨特的指紋。