Erich P. Stuntebeck, John S. Davis II, et al.
HotMobile 2008
When it comes to creating Large Language Model (LLM) applications, such as fine-tuning, pre-training, or instruct-tuning, data preparation is a vital early step. It is widely acknowledged that the quality of a model is heavily influenced by the quality of the data it is trained on, as demonstrated in [4, 6, 8]. This tutorial will focus on the data preparation for LLM application development, focussing on the latest data preparation techniques. The tutorial will start by covering the state-of-the-art methods for preparing data for LLMs. We will then provide a hands on tutorial on how to use the data-prep-kit [7], an open source toolkit to implement various data preparation steps. To provide LLM app developers with a practical understanding, we will create a data processing pipeline for a specific LLM app development use case. This will offer an end-to-end experience that users can then apply to their own projects..
Erich P. Stuntebeck, John S. Davis II, et al.
HotMobile 2008
Pradip Bose
VTS 1998
Raymond Wu, Jie Lu
ITA Conference 2007
Ehud Altman, Kenneth R. Brown, et al.
PRX Quantum