Erik Altman, Jovan Blanusa, et al.
NeurIPS 2023
In this paper we present the results of training large language encoders with the curated Carolina Corpus. Large language models (LLMs) are trained on very large amounts of data, which can be quite expensive to collect. We show in this presentation that a curated corpus can be used to train models with similar performance to models trained on datasets almost 3 times larger.
Erik Altman, Jovan Blanusa, et al.
NeurIPS 2023
Pavel Klavík, A. Cristiano I. Malossi, et al.
Philos. Trans. R. Soc. A
Conrad Albrecht, Jannik Schneider, et al.
CVPR 2025
Miao Guo, Yong Tao Pei, et al.
WCITS 2011