This is a training script I made so that I can fine-tune LLMs using my workstation ... first_exhausted': stop when a dataset runs out of examples. 'all_exhausted': stop when all datasets have run out ...
Deepspeed is a hard requirement because the entire training script is built around Deepspeed pipeline parallelism ... Start by reading through the config files in the examples directory. Almost ...