A Lossless Compression for AI Models

Moshik Hershcovitch; Andrew Wood; Leshem Choshen; Guy Girmonsky; Roy Leibovitz; Or Ozeri; Ilias Ennmouri; Michal Malka; Peter Chin; Swaminathan Sundararaman; Danny Harnik

doi:10.1109/CLOUD67622.2025.00028

CLOUD 2025

Conference paper

07 Jul 2025

A Lossless Compression for AI Models

View publication

Abstract

With the growth of model sizes and the scale of their deployment, their sheer size burdens the infrastructure requiring more network and more storage to accommodate these. While there is a vast model compression literature deleting parts of the model weights for faster inference, we investigate a more traditional type of compression - one that represents the model in a compact form and is coupled with a decompression algorithm that returns it to its original form and size - namely lossless compression. We present ZipNN, a lossless compression tailored to neural networks. Somewhat surprisingly, we show that specific lossless compression can gain significant network and storage reduction on popular models, often saving 33% and at times reducing over 50% of the model size. We investigate the source of model compressibility and introduce specialized compression variants tailored for models that further increase the effectiveness of compression. On popular models (e.g. Llama 3) ZipNN shows space savings that are over 17% better than vanilla compression while also improving compression and decompression speeds by 62%. Using multiple workers and threads, ZipNN can achieve decompression speeds of up to 80GB/s and compression speed of up to 13GB/s. We estimate that these methods could save over an ExaByte per year of network traffic downloaded from a large model hub like Hugging Face.

Conference paper