@MicrosoftResearch
  @MicrosoftResearch
Microsoft Research | Direct Nash Optimization: Teaching language models to self-improve with general preferences @MicrosoftResearch | Uploaded September 2024 | Updated October 2024, 1 week ago.
Corby Rosset, Senior Researcher, Microsoft Research AI Frontiers, discusses teaching language models to self-improve using a preference oracle like GPT-4, framing it as a two-player game to find an optimal policy at a Nash equilibrium, and achieving state-of-the-art win rates against GPT-4 Turbo on benchmarks such as Alpaca-Eval and MT-Bench.

Microsoft Research Forum, September 3, 2024

See more at https://aka.ms/ResearchForum-Sep2024
Direct Nash Optimization: Teaching language models to self-improve with general preferencesResearch talk: Storing data for millenniaOblivious Online Contention Resolution SchemesSITI 2022 - Reporting from the GroundAnnouncing New Microsoft Research AI & Society Fellows programPrivacy-Preserving Domain Adaptation of Semantic ParsersAI Forum 2023 | Opening RemarksDriving Industry Evolution: Exploring the Impact of Generative AI on Sector TransformationMARI Grand Seminar - Large Language Models and Low Resource LanguagesAzure Container for PyTorch: An optimized container for large scale distributed training workloadsEfficient Large-Scale AI Workshop | Session 3: Aligning models with human intentPanel discussion: Emerging computing technologies in academia and industry

Direct Nash Optimization: Teaching language models to self-improve with general preferences @MicrosoftResearch

SHARE TO X SHARE TO REDDIT SHARE TO FACEBOOK WALLPAPER