Curated on
August 9, 2023
Microsoft has released DeepSpeed-Chat, a new system aimed at making the training of large conversational AI models both economical and accessible. The tool combines optimizations from DeepSpeed training and inference in a unified framework known as the Hybrid Engine, designed to provide unmatched efficiency in RLHF training. The program makes it possible to train 13B models on a single GPU, extending access to RLHF beyond the typical realm of major tech companies.
By reducing the time and cost of training, DeepSpeed-Chat significantly democratizes access to large conversational AI models, benefitting researchers and startups without vast resources. Microsoft's open-sourcing of this tool enables a broader range of innovators to contribute to its development. Training 30B parameter conversational models is now nearly 15 time faster than with existing systems, and can be completed under $600 on Azure cloud within 18 hours.