Conference paper

SSD controller architecture for similarity search in Vector DBs

Abstract

Generative AI applications are currently transforming industries by their ability to answer questions and generate content. Although LLMs are trained with an immense amount of information, generated results may be hallucinatory or not up-to-date. Hence, semantic search technologies providing context-relevant input is indispensable to reduce these effects. This data is extracted using a process called Retrieval Augmented Generation (RAG) that extracts related facts from large data stores such as a Vector DBs. The number of vectors to be searched is growing towards several billions and can no longer be kept in DRAM motivating the offloading into storage devices. We present CSD SSD controller architectures performing in-storage similarity searches and review data placement strategies for highly-parallelized processing of similarity searches in storage that can scale to multiple billions of vectors within a single device. In particular, we present results from an implementation using inverted index and graph-based approaches providing coarse and fine-grained searching capabilities and introduce NVMe CSD interfaces to handle Vector DB information and perform searches efficiently.