Adaptive Best-of-Both-Worlds Algorithm for Heavy-Tailed Multi-Armed Bandits  @MicrosoftResearch
Adaptive Best-of-Both-Worlds Algorithm for Heavy-Tailed Multi-Armed Bandits  @MicrosoftResearch
Microsoft Research | Adaptive Best-of-Both-Worlds Algorithm for Heavy-Tailed Multi-Armed Bandits @MicrosoftResearch | Uploaded December 2022 | Updated October 2024, 1 week ago.
2022 Data-driven Optimization Workshop: Adaptive Best-of-Both-Worlds Algorithm for Heavy-Tailed Multi-Armed Bandits

Speaker: Longbo Huang, Tsinghua University

We generalize the concept of heavy-tailed multi-armed bandits to adversarial environments, and develop robust best-of-both-worlds algorithms for heavy-tailed multi-armed bandits (MAB), where losses have α-th moments bounded by σ^α, while the variances may not exist. Specifically, we design an algorithm HTINF, when the heavy-tail parameters α and σ are known to the agent, HTINF simultaneously achieves the optimal regret for both stochastic and adversarial environments, without knowing the actual environment type a-priori. When α and σ are unknown, HTINF achieves a log𝑇-style instance-dependent regret in stochastic cases and o(T) no-regret guarantee in adversarial cases. We further develop an algorithm AdaTINF, achieving O(σK^(1-1/α) T^(1/α)) minimax optimal regret even in adversarial settings, without prior knowledge on α and σ. This result matches the known regret lower-bound (Bubeck et al., 2013), which assumed a stochastic environment and α and σ are both known. To our knowledge, the proposed HTINF algorithm is the first to enjoy a best-of-both-worlds regret guarantee, and AdaTINF is the first algorithm that can adapt to both α and σ to achieve optimal gap-indepedent regret bound in classical heavy-tailed stochastic MAB setting and our novel adversarial formulation.
Adaptive Best-of-Both-Worlds Algorithm for Heavy-Tailed Multi-Armed BanditsSupporting the Responsible AI Red-Teaming Human Infrastructure | Jina SuhA Conversation with Bill Gates Hosted by Eric HorvitzLightning Talk: LATTE: LAnguage Trajectory TransformErDeep Reinforcement Learning in Supply Chain OptimizationsAI Forum 2023 | Innovating Intelligent Environments for Wireless Communication & SensingGenerative AI and Plural Governance: Mitigating Challenges and Surfacing OpportunitiesAdHocProx: Sensing Mobile, Ad-Hoc Collaborative Device Formations using Dual Ultra-Wideband RadiosChallenges and Opportunities of Large Multi-Modal Models for Blind and Low Vision Users: CLIPInverse Game Theory for Stackelberg Games: The Blessing of Bounded RationalityMatterGen: A Generative Model for Materials DesignAutomating Commonsense Reasoning

Adaptive Best-of-Both-Worlds Algorithm for Heavy-Tailed Multi-Armed Bandits @MicrosoftResearch

SHARE TO X SHARE TO REDDIT SHARE TO FACEBOOK WALLPAPER