Efficient Privacy-Preserving Viral Strain Classification via k-mer Signatures and FHE
Abstract
With the development of sequencing technologies, viral strain classification - which is critical for many applications, including disease monitoring and control - has become widely deployed. Typically, a lab (client) holds a viral sequence, and requests classification services from a centralized repository of labeled viral sequences (server). However, such 'classification as a service' raises privacy concerns. In this paper we propose a privacy-preserving viral strain classification protocol that allows the client to obtain classification services from the server, while maintaining complete privacy of the client's viral strains. The privacy guarantee is against active servers, and the correctness guarantee is against passive ones. We implemented our protocol and performed extensive benchmarks, showing that it obtains almost perfect accuracy (99.8%-100%) and microAUC (0.999), and high efficiency (amortized per-sequence client and server runtimes of 4.95ms and 0.53ms, respectively, and 0.21MB communication). In addition, we present an extension of our protocol that guarantees server privacy against passive clients, and provide an empirical evaluation showing that this extension provides the same high accuracy and microAUC, with amortized per sequences overhead of only a few milliseconds in client and server runtime, and 0.3MB in communication complexity. Along the way, we develop an enhanced packing technique in which two reals are packed in a single complex number, with support for homomorphic inner products of vectors of ciphertexts. We note that while similar packing techniques were used before, they only supported additions and multiplication by constants.