Full-Privacy Secured Search Engine Empowered by Efficient Genome-Mapping Algorithms

Yuan Yu Chang, Sheng Tang Wong, Emmanuel O. Salawu, Ming Hsuan Liao, Jui Hung Hung*, Lee Wei Yang*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Since the 90s, keyword-based search engines have been the only option for people to locate relevant web content through a simple query comprising one to a few keywords. These engines, whether free or paid, retained users' search queries and preferences, often to deliver targeted ads. Additionally, user-uploaded articles for plagiarism detection can further be stored as part of service providers' expanding databases for profit. Essentially, users could not search without exposing their queries to these providers. We present a new solution here: a method for searching the internet using a full article as a query without disclosing the content. Our Sapiens Aperio Veritas Engine (S.A.V.E.) uses an encoding scheme and an FM-index search, borrowed from next-generation human genome sequencing. Each word in a user's query is transformed into one of 12 'amino acids' to create a pseudo-biological sequence (PBS) on the user's device. Plagiarism checks are done by users submitting their locally created PBSs to our cloud service. This detects identical content in our database, which includes all English and Chinese Wikipedia articles and Open Access journals up to April 2021. PBSs, longer than 12 'amino acids', show accurate results with less than 0.8% false positives. Performance-wise, S.A.V.E. runs at a similar genome-mapping speed as Bowtie and is >5 orders faster than BLAST. With both standard and private modes, S.A.V.E. offers a revolutionary, privacy-first search and plagiarism check system. We believe this sets an exciting precedent for future search engines prioritizing user confidentiality. S.A.V.E. can be accessed at https://dyn.life.nthu.edu.tw/SAVE/.

Original languageEnglish
Pages (from-to)5155-5164
Number of pages10
JournalIEEE Journal of Biomedical and Health Informatics
Volume27
Issue number10
DOIs
StatePublished - 1 Oct 2023

Keywords

  • Encoding
  • FM-Index
  • biological sequence
  • logistic regression
  • next-generation sequencing
  • plagiarism
  • privacy

Fingerprint

Dive into the research topics of 'Full-Privacy Secured Search Engine Empowered by Efficient Genome-Mapping Algorithms'. Together they form a unique fingerprint.

Cite this