Marcelo Amaral
OSSEU 2023
Generative AI applications are currently transforming industries by their ability to answer questions and generate content. Although LLMs are trained with an immense amount of information, generated results may be hallucinatory or not up-to-date. Hence, semantic search technologies providing context-relevant input is indispensable to reduce these effects. This data is extracted using a process called Retrieval Augmented Generation (RAG) that extracts related facts from large data stores such as a Vector DBs. The number of vectors to be searched is growing towards several billions and can no longer be kept in DRAM motivating the offloading into storage devices. We present CSD SSD controller architectures performing in-storage similarity searches and review data placement strategies for highly-parallelized processing of similarity searches in storage that can scale to multiple billions of vectors within a single device. In particular, we present results from an implementation using inverted index and graph-based approaches providing coarse and fine-grained searching capabilities and introduce NVMe CSD interfaces to handle Vector DB information and perform searches efficiently.
Marcelo Amaral
OSSEU 2023
Max Bloomfield, Amogh Wasti, et al.
ITherm 2025
Nikoleta Iliakopoulou, Jovan Stojkovic, et al.
MICRO 2025
Ilias Iliadis
International Journal On Advances In Networks And Services