Linear algebra-based techniques have long been used to correlate similar documents. They map the documents to a multi-dimensional vector space, in which each document is represented by a vector. Searching related documents then translates into searching nearest neighbors in the vector space. In this paper, we propose an indexing structure, called cosine R-tree, which indexes multidimensional vector space and provides efficient nearest neighbor search. Our preliminary results show that it gives better performance than a brute-force linear scan strategy.
|Number of pages||2|
|Journal||Proceedings - IEEE Computer Society's International Computer Software and Applications Conference|
|State||Published - 25 Oct 2000|