Microsoft Research
The Story of the Human Body: Evolution, Health, and Disease
updated
Host: Gabriela de Queiroz
Join the fireside chat event with Jaime Teevan and Ming Ye, hosted by Women in Data Science community on September 12th, 1-2pm PT.
Join the community at https://aka.ms/wids
Upcoming events and past event recordings can be found on https://aka.ms/wids_info
Host: Jonathan Protzenko
MLS is a new IETF standard that deals with secure, end-to-end encrypted group messaging. In this work, recently awarded the Internet Defense Prize and a Distinguished Paper Award at USENIX, Théophile will describe how the protocol is structured; how it achieves security; and how our formal proof allowed to find flaws and shortcomings in MLS that eventually made it all the way up into the RFC. Our reference implementation, written in F*, is interoperable and can serve as a blueprint for other implementors; we also demonstrated its applicability by prototyping MLS in Skype.
Host: Hannes Gamper
As generative music models become more powerful and popular, there is a growing need for robust objective metrics of music quality that correlates with human perception. The Frechet Audio Distance (FAD) is a commonly used metric for this purpose. However, its performance may be hampered by issues including sample size bias, limitations of the underlying audio embeddings, and the use of low-quality reference sets. We propose reducing sample size bias by extrapolating unbiased scores as the sample size approaches infinity. A comparison of various audio embeddings reveals that some are better suited for deriving FAD scores that capture aspects of musical or acoustic quality. Finally, our experiments underscore the importance of choosing a diverse and high-quality reference dataset for FAD calculation. Listening test results indicate that unbiased FAD scores calculated using suitable embeddings and reference music improves correlation with human ratings of musical and acoustic quality.
See more at microsoft.com/en-us/research/lab/microsoft-research-india/.
Learn more about MARI: microsoft.com/en-us/research/group/microsoft-africa-research-institute-mari
Host: Saeed Maleki, Principal Research SDE at Microsoft Research
Billion-parameter artificial intelligence models have proven to show exceptional performance in a large variety of tasks ranging from natural language processing, computer vision, and image generation to mathematical reasoning and algorithm generation. Those models usually require large parallel computing systems, often called "AI Supercomputers", to be trained initially. We will outline several techniques ranging from data ingestion, parallelization, to accelerator optimization that improve the efficiency of such training systems. Yet, training large models is only a small fraction of practical artificial intelligence computations. Efficient inference is even more challenging - models with hundreds-of-billions of parameters are expensive to use. We continue by discussing model compression and optimization techniques such as fine-grained sparsity as well as quantization to reduce model size and significantly improve efficiency during inference. These techniques may eventually enable inference with powerful models on hand-held devices.
See more at microsoft.com/en-us/research/video/ai-for-precision-health and microsoft.com/en-us/research/group/real-world-evidence
See more at microsoft.com/en-us/research/video/multilingual-evaluation-of-generative-ai-mega
In a real-world dialogue system, generated text must be truthful and informative while remaining fluent and adhering to a prescribed style. Satisfying these constraints simultaneously is difficult for the two predominant paradigms in language generation: neural language modeling and rule-based generation. We describe a hybrid architecture for dialogue response generation that combines the strengths of both paradigms. The first component of this architecture is a rule-based content selection model defined using a new formal framework called dataflow transduction, which uses declarative rules to transduce a dialogue agent's actions and their results (represented as dataflow graphs) into context-free grammars representing the space of contextually acceptable responses. The second component is a constrained decoding procedure that uses these grammars to constrain the output of a neural language model, which selects fluent utterances. Our experiments show that this system outperforms both rule-based and learned approaches in human evaluations of fluency, relevance, and truthfulness.
See more at microsoft.com/en-us/research/video/the-whole-truth-and-nothing-but-the-truth-faithful-and-controllable-dialogue-response-generation-with-dataflow-transduction-and-constrained-decoding
See more at microsoft.com/en-us/research/video/privacy-preserving-domain-adaptation-of-semantic-parsers
Blog post: microsoft.com/en-us/research/blog/3d-telemedicine-brings-better-care-to-underserved-and-rural-communities-even-across-continents
Project page: microsoft.com/en-us/research/project/3d-telemedicine
Host: Mary Czerwinski
A new synthesis is emerging that integrates AI technologies with Human-Computer Interaction to produce Human-Centered AI (HCAI). Advocates of this new synthesis seek to amplify, augment, and enhance human abilities, so as to empower people, build their self-efficacy, support creativity, recognize responsibility, and promote social connections. Researchers, developers, business leaders, policy makers, and others are expanding the technology-centered scope of Artificial Intelligence (AI) to include Human-Centered AI (HCAI) ways of thinking. This expansion from an algorithm-focused view, which embraces a human-centered perspective, guides evaluation by usability testing, expert reviews, and user feedback strategies. These include user observation, incident reporting, audit trails, and surveys to assess trustworthiness. The talk will include examples, references to further work, and discussion time for questions. These ideas are drawn from Ben Shneiderman’s award-winning new book Human-Centered AI (Oxford University Press, 2022).
Further information at: https://hcil.umd.edu/human-centered-ai
Escapement is a video prototyping tool that introduces a powerful new concept for prototyping screen-based interfaces by flexibly mapping sensor values to dynamic playback control of videos. This recasts the time dimension of video mock-ups as sensor-mediated interaction.
This abstraction of time as interaction, which we dub video-escapement prototyping, empowers designers to rapidly explore and viscerally experience direct touch or sensor-mediated interactions across one or more device displays. Our system affords cross-device and bidirectional remote (tele-present) experiences via cloud-based state sharing across multiple devices. This makes Escapement especially potent for exploring multi-device, dual-screen, or remote-work interactions for screen-based applications.
We introduce the core concept of sensor-mediated abstraction of time for quickly generating video-based interactive prototypes of screen-based applications, share the results of observations of long-term usage of video-escapement techniques with experienced interaction designers, and articulate design choices for supporting a reflective, iterative, and open-ended creative design process.
See more at microsoft.com/en-us/research/video/escapement-a-tool-for-interactive-prototyping-with-video-via-sensor-mediated-abstraction-of-time
AdHocProx achieves this via sensors including dual ultra-wideband (UWB) radios for sensing distance and angle to other devices in dynamic, ad-hoc arrangements; plus capacitive grip to determine where the user’s hands hold the device, and to partially correct for the resulting UWB signal attenuation. All spatial sensing and communication takes place via the side-channel capability of the UWB radios, suitable for small-group collaboration across up to four devices (eight UWB radios).
Together, these sensors detect proximity and natural, socially meaningful device movements to enable contextual interaction techniques. We find that AdHocProx can obtain 95% accuracy recognizing various ad-hoc device arrangements in an offline evaluation, with participants particularly appreciative of interaction techniques that automatically leverage proximity-awareness and relative orientation amongst multiple devices.
See more at microsoft.com/en-us/research/video/adhocprox-sensing-mobile-ad-hoc-collaborative-device-formations-using-dual-ultra-wideband-radios
The talk was followed by a panel discussion with experts from academia and research; including Dr. Monojit Chowdhury, Dr. Edward Ombui, Dr. Sunayana Sitaram, Dr. David Adelani, and moderated by Maxamed Axmed.
Keynote Abstract:
Predicting, Explaining and Optimizing Performance of LLMs across Languages
Given a massively multilingual language models (MMLM), can we predict the accuracy of cross-lingual zero-shot and few-shot transfer for a task on target languages with little or no test data? This seemingly impossible task, if solved, can have several potential benefits. First, we could estimate the performance of a model even in languages where a test set is not available, and/or building one is difficult. Second, one can predict training data configurations that would give certain desired performance across a set of languages, and accordingly strategize data collection plans; this in turn can lead to linguistically fair MMLM-based models. Third, as a byproduct, we would know which factors influence cross-lingual transfer. In this talk, I will give an overview of Project LITMUS – Linguistically Informed Training and Testing of Multilingual Systems, where we build several ML models for performance prediction; besides their applications, I will discuss what we learn about the factors that influence cross-lingual transfer.
Learn more about MARI: microsoft.com/en-us/research/group/microsoft-africa-research-institute-mari
PROSE group: microsoft.com/en-us/research/group/prose
Learn more about Sumit Gulwani and his work: microsoft.com/en-us/research/people/sumitg
Learn more about Women in Data Science: widsconference.org
[Publication] Learning to Exploit Temporal Structure for Biomedical Vision-Language Processing: microsoft.com/en-us/research/publication/learning-to-exploit-temporal-structure-for-biomedical-vision-language-processing
0:00 Introduction to foundation models
11:04 From GPT-3 to ChatGPT – a jump in generative capabilities
19:11 Everyday impact: Integrating foundation models and products
Check out the video recap, full transcript, and additional resources:
microsoft.com/en-us/research/video/foundation-models-and-the-next-era-of-ai
Drawing on findings from an ethnographic study of data labelling in India, this talk offers insights into the everyday work practices of data labellers, organisational hierarchies, norms, and values that were caught in global flows of resources, rhetoric, and relations of power. We trace these practices, norms and frictions to better understand their influences on everyday annotation work as well as answer an important question, why should we, AI researchers and practitioners, concern ourselves with these seemingly distant realities?
Learn more about MARI: microsoft.com/en-us/research/group/microsoft-africa-research-institute-mari
AI can enhance programming experiences for a diverse set of programmers: from professional developers and data scientists (proficient programmers) who need help in software engineering and data wrangling, all the way to spreadsheet users (low-code programmers) who need help in authoring formulas, and students (novice programmers) who seek hints when stuck with their programming homework. To communicate their need to AI, users can express their intent explicitly—as input-output examples or natural-language specification—or implicitly—where they encounter a bug (and expect AI to suggest a fix), or simply allow AI to observe their last few lines of code or edits (to have it suggest the next steps). The task of synthesizing an intended program snippet from the user’s intent is both a search and a ranking problem. Search is required to discover candidate programs that correspond to the (often ambiguous) intent, and ranking is required to pick the best program from multiple plausible alternatives. This creates a fertile playground for combining symbolic-reasoning techniques, which model the semantics of programming operators, and machine-learning techniques, which can model human preferences in programming. Recent advances in large language models like Codex offer further promise to advance such neuro-symbolic techniques.
We further argue for the need to analyze entire news outlets, which can be done in advance; then, we can fact-check the news before it was even written: by checking how trustworthy the outlet that has published it is (which is what journalists actually do). We will show how this can be automated by looking at a variety of information sources.
The infodemic is often described using terms such as "fake news", which mislead people to focus exclusively on factuality and to ignore the other half of the problem: the potential malicious intent. We aim to bridge this gap by focusing on the detection of specific propaganda techniques in text, e.g., appeal to emotions, fear, prejudices, logical fallacies, etc. This is the target of the ongoing SemEval-2023 task 3, which focuses on multilingual aspects of the problem, covering English, French, German, Italian, Polish, and Russian. We further present extensions of this work to the automatic analysis of various types of harmful memes: from propaganda to harmfulness and harm's target identification to role-labeling in terms of who is portrayed as hero/villain/victim, and generating natural text explanations.
Learn more about MARI: microsoft.com/en-us/research/group/microsoft-africa-research-institute-mari
Responsible AI Tracker: github.com/microsoft/responsible-ai-toolbox-tracker
Responsible AI Mitigations: github.com/microsoft/responsible-ai-toolbox-mitigations
Responsible AI Toolbox: github.com/microsoft/responsible-ai-toolbox
Adaptive Systems and Interaction Group: microsoft.com/en-us/research/group/adaptive-systems-and-interaction
Ideating Responsible AI Mitigations Project: microsoft.com/en-us/research/project/tools-for-managing-and-ideating-responsible-ai-mitigations
Speaker: Dr. Gopal Gupta, UT Dallas
0:00 Welcome to RLOSF 2022
1:15 AutoML Extensions in Vowpal Rabbit
Speaker: Shaokun Zhang
13:01 Compiler Optimization with Reinforcement Learning
Speaker: Ivoline Ngong
26:35 Improve Flatbuffer Parser Support in Vowpal Wabbit
Speaker: Sharvani Somayaji
37:56 Native CSV Parsing
Speaker: Songlin Jiang
49:30 Thank you
Learn more and apply for RL Open Source Fest 2023: https://aka.ms/RLOSFest
Adaptive Systems and Interaction Group: microsoft.com/en-us/research/group/adaptive-systems-and-interaction
Eva Esteban, Embedded Software Engineer at OpenBCI
Galea is an award-winning platform that merges next-generation biometrics with mixed reality. It is the first device to integrate a wide range of physiological signals, including EEG, EMG, EDA, PPG, and eye-tracking, into a single headset. In this session, Conor and Eva will provide a live demonstration of the device and its capabilities, showcasing its potential for a variety of applications, from gaming to training and rehabilitation. They will give an overview of the different hardware and software components of the system, highlighting how it can be used to analyze user experiences in real time. Attendees will get an opportunity to ask questions at the end.
Brain-computer interfaces are used for many applications nowadays. New sensor technology and wireless systems allow an easier and faster usage. Nevertheless, every single step must be done correctly to achieve high accuracies. EEG electrodes, amplifiers, analog-to-digital conversion, wireless transmission, feature extraction, classification and the calibration of the BCI system are of high importance. Dr. Christoph will explain how to do this correctly to be able to use BCIs for stroke, coma and epilepsy patients and to play computer games or to make progress with 1024 EEG channels in neuroscience.
Seeing AI is a free app that narrates the world around you. Designed with and for the blind and low vision community, this ongoing research project harnesses the power of AI to open up the visual world by describing nearby people, text and objects. Seeing AI demonstrates how technology can make the world more inclusive. Available in the iOS App Store. For more information, visit http://SeeingAI.com
Seeing AI is a free app that narrates the world around you. Designed with and for the blind and low vision community, this ongoing research project harnesses the power of AI to open up the visual world by describing nearby people, text and objects. Seeing AI demonstrates how technology can make the world more inclusive. Available in the iOS App Store. For more information, visit http://SeeingAI.com
Seeing AI is a free app that narrates the world around you. Designed with and for the blind and low vision community, this ongoing research project harnesses the power of AI to open up the visual world by describing nearby people, text and objects. Seeing AI demonstrates how technology can make the world more inclusive. Available in the iOS App Store. For more information, visit http://SeeingAI.com
The project is open-sourced at github.com/microsoft/SmartKC-A-Smartphone-based-Corneal-Topographer
Listen to the podcast: microsoft.com/en-us/research/lab/microsoft-research-india/articles/podcast-collaborating-to-develop-a-low-cost-keratoconus-diagnostic-solution-with-dr-kaushik-murali-and-dr-mohit-jain/
Find the paper and more on the SmartKC project page: https://aka.ms/smartkc
Video details: microsoft.com/en-us/research/video/on-learning-aware-mechanism-design
MSR-IISc AI Seminar Series: microsoft.com/en-us/research/event/msriisc/talks
Speaker: Tan Gemicioglu, Georgia Tech
Tan Gemicioglu is a summer intern in the MSR Audio & Acoustics Research Group and an undergraduate student at Georgia Tech. At MSR, they investigated multimodal brain-computer interfaces and gesture interaction advised by Mike Winters and Yu-Te Wang. At Georgia Tech, they are advised by Thad Starner and Melody Jackson, studying passive haptic learning and movement-based brain-computer interfaces. Their primary research interests are in wearable interfaces assisting in communication and learning using physiological sensing and haptics.
Learn more:
globalrenewableswatch.org
microsoft.com/en-us/research/video/global-renewables-watch-ai-for-good-lab-geospatial
Host: Eric Horvitz, Chief Scientific Officer, Microsoft
Panelists:
• Ahmed Awadallah, Senior Principal Research Manager, Microsoft Research
• Erwin Gianchandani, Assistant Director of the Directorate for Technology, Innovation & Partnerships, National Science Foundation
• Percy Liang, Associate Professor of Computer Science and Director of the Center for Research on Foundation Models, Stanford University
• Saurabh Tiwary, Corporate Vice President & Technical Fellow, Microsoft Turing
Learn more about the Microsoft Turing Academic Program Workshop: microsoft.com/en-us/research/event/microsoft-turing-academic-program-workshop
This workshop was part of the Microsoft Research Summit 2022: microsoft.com/en-us/research/event/microsoft-research-summit-2022
Speaker: Yuan Zhou, Tsinghua University
In the problem of joint pricing and inventory management the retailer makes simultaneously a price decision and an inventory order-up-to decision at the beginning of each review period. The demands are being modeled as either a parametric or nonparametric function depending on the prices.
In this talk, I will introduce two of my recent works advancing this problem: the first one deals with fixed ordering costs under the backlogging setting, with a parametric (generalized linear) demand model. The second one studies nonparametric demand models with censored demands and lost sales. The techniques involved include a novel UCB analysis over trajectories of (s,S,p) policies, and a noisy comparison oracle constructed for censored demand models.
This talk is based on the following two papers:
papers.ssrn.com/sol3/papers.cfm?abstract_id=3632475
papers.ssrn.com/sol3/papers.cfm?abstract_id=3750413
• Nitin Pai, Takshshila Institution
• Bill Thies, Everwell Health Solutions
• Dipti Kanade, IAS, State Government of Karnataka
• Nataraj Kuntagod, Accenture, Tech for Good
• Mukesh Sharma, Menterra Venture Advisors
• Vandana Vasudevan, Microsoft Research India
See more at microsoft.com/en-us/research/video/siti-2022-panel-discussion-and-moderated-qa-session
Speaker: Junchi Yan, Shanghai Jiao Tong University
In this talk, I will present our lab’s recent progress and empirical results including some results on EDA, on machine learning for combinatorial optimiation, which has been an emerging topic in both communities of machine learning and operational research. I will also discuss the potential future directions for this exciting area.
Speaker: Shaofeng Jiang, Peking University
In this talk, we present a nearly optimal algorithm for online facility location (OFL) with (untrusted) predictions. In OFL, n demand points arrive in order and the algorithm must irrevocably assign each demand point to an open facility upon its arrival. The objective is to minimize the total connection costs from demand points to assigned facilities, plus the facility opening cost.
We additionally assume an untrusted predictor can suggest the facility that a demand point should be assigned to. With the access to this predictor but without knowing the error of the prediction, our algorithm achieves O(1) ratio when the error is small, which bypasses a \Omega(log n / log log n) worst-case lower bound. Furthermore, our algorithm still maintains O(log n) ratio even when the error is unbounded, nearly matching the mentioned lower bound.
Based on a joint work with Erzhi Liu, You Lyu, Zhihao Tang and Yubo Zhang.
Speaker: Longbo Huang, Tsinghua University
We generalize the concept of heavy-tailed multi-armed bandits to adversarial environments, and develop robust best-of-both-worlds algorithms for heavy-tailed multi-armed bandits (MAB), where losses have α-th moments bounded by σ^α, while the variances may not exist. Specifically, we design an algorithm HTINF, when the heavy-tail parameters α and σ are known to the agent, HTINF simultaneously achieves the optimal regret for both stochastic and adversarial environments, without knowing the actual environment type a-priori. When α and σ are unknown, HTINF achieves a log𝑇-style instance-dependent regret in stochastic cases and o(T) no-regret guarantee in adversarial cases. We further develop an algorithm AdaTINF, achieving O(σK^(1-1/α) T^(1/α)) minimax optimal regret even in adversarial settings, without prior knowledge on α and σ. This result matches the known regret lower-bound (Bubeck et al., 2013), which assumed a stochastic environment and α and σ are both known. To our knowledge, the proposed HTINF algorithm is the first to enjoy a best-of-both-worlds regret guarantee, and AdaTINF is the first algorithm that can adapt to both α and σ to achieve optimal gap-indepedent regret bound in classical heavy-tailed stochastic MAB setting and our novel adversarial formulation.
Speaker: Furong Huang, The University of Maryland
Since the beginning of the digital age, the size and quantity of data sets have grown exponentially because of the proliferation of data captured by mobile devices, vehicles, cameras, microphones, and other internet of things (IoT) devices. Given this boom in personal data, major advances in areas such as healthcare, natural language processing, computer vision, and more have been made with the use of deep learning. Federated Learning (FL) is an increasingly popular setting to train powerful deep neural networks with data derived from an assortment of devices. It is a framework for use-cases involving training machine learning models on edge devices without transmitting the collected data to a central server.
In this talk, I will address some major challenges of efficient machine learning at the edge in parallel. Model efficiency, data efficiency and learning paradigm efficiency will be discussed respectively. As some highlights, I will introduce our recent progress on model compression via tensor representation, data efficiency through the lens of generalization analysis and a decentralized federated learning framework via wait-free model communication.
Speaker: Ruibin Bai, The University of Nottingham Ningbo China
In the past decade, considerable advances have been made in the field of computational intelligence and operations research. However, the majority of these optimization approaches have been developed for deterministically formulated problems, the parameters of which are often assumed perfectly predictable prior to problem-solving. In practice, this strong assumption unfortunately contradicts the reality of many real-world problems which are subject to different levels of uncertainties. The solutions derived from these deterministic approaches can rapidly deteriorate during execution due to the over-optimization without explicit consideration of the uncertainties. To address this research gap, two data-driven hyper-heuristic frameworks are investigated. This talk will present the main ideas of the methods and their performance for two combinatorial optimization problems: a real-world container terminal truck routing problem with uncertain service times and the well-known online 2D strip packing problem. The talk shall briefly describe a port digital twin system developed by our team for the purpose of integrated optimization of multiple port operations.
Speaker: Siwei Wang, Microsoft Research Asia
Existing methods of combinatorial multi-armed bandits mainly focus on the UCB approach. To make the algorithm efficient, they usually use the sum of upper confidence bounds of base arms to represent the upper confidence bound of a super arm. However, when the outcomes of different base arms are independent, this upper confidence bound could be much larger than necessary, which leads to a much higher regret upper bound (in regret minimization problems) or complexity upper bound (in pure exploration problems). To deal with this challenge, we explore the idea of Thompson Sampling (TS) that uses independent random samples instead of the upper confidence bounds, and design TS-based algorithms for both the regret minimization problems and the pure exploration problems. In TS-based algorithms, the sum of independent random samples within a super arm will not exceed its tight upper confidence bound with high probability. Hence it solves the above challenge, and achieves lower regret/complexity upper bounds than existing efficient UCB-based algorithms.
Speaker: Yuko Kuroki, The University of Tokyo
Although most combinatorial optimization models require exact parameters as inputs, it is often impossible to obtain them due to privacy issues or system constraints, and we need to deal with such uncertainty. In this talk, I will introduce recent work on combinatorial pure exploration with limited feedback for solving such combinatorial optimization under uncertainty. We first study the combinatorial pure exploration over graphs with full-bandit feedback, which aims to identify a dense component in a network only together with a noisy evaluation for a sampled subgraph (the offline problem is called the densest subgraph problem). Then we also study the general combinatorial pure exploration with full-bandit or partial-linear feedback, which can work for general combinatorial structures including size-k subsets, matchings, and paths. Finally, I will discuss further extensions and several open problems for future work in this line of research.
Speaker: Hu Fu, Shanghai University of Finance and Economics
Contention resolution schemes (CRSs) are powerful tools for obtaining “ex post feasible” solutions from candidates that are drawn from “ex ante feasible” distributions. Online contention resolution schemes (OCRSs), the online version, have found myriad applications in Bayesian and stochastic problems, such as prophet inequalities and stochastic probing. When the ex ante distribution is unknown, it was unknown whether good CRSs/OCRSs exist with no sample (in which case the scheme is oblivious) or few samples from the distribution.
In this work, we give a simple 1/e-selectable oblivious single item OCRS by mixing two simple schemes evenly, and show, via a Ramsey theory argument, that it is optimal. On the negative side, we show that no CRS or OCRS with O(1) samples can be Ω(1)-balanced/selectable (i.e., preserve every active candidate with a constant probability) for graphic or transversal matroids.
Speaker: Zhijie Zhang, Fuzhou University
We revisit the optimization from samples (OPS) model, which studies the problem of optimizing objective functions directly from the sample data. Previous results showed that we cannot obtain a constant approximation ratio for the maximum coverage problem using polynomial independent samples of the form {S_i,f(S_i )}_(i=1)^t (BRS, STOC17), even if coverage functions are (1-ϵ)-PMAC learnable using these samples (BDF+, SODA12). In this work, to circumvent the impossibility result of OPS, we propose a stronger model called optimization from structured samples (OPSS), where the data samples encode the structural information of the functions. We show that under OPSS model, the maximum coverage problem enjoys constant approximation under mild assumptions on the sample distribution. We further generalize the result and show that influence maximization also enjoys constant approximation under this model.
Speaker: Yan Jin, Huazhong University of Science and Technology
Traveling Salesman Problem (TSP) is one of the most studied routing problems that arise in the practical applications of logistics. Traditional approaches not only rely on hand-crafted rules of experts, but also are time-consuming on iterative search. This limits their applications in time sensitive scenarios, e.g., on-call routing and ride hailing service. We propose an end-to-end approach based on hierarchical reinforcement learning for addressing the large-scale TSP. Using a divide-and-conquer strategy, the upper-level policy chooses a small subset of cities from all remaining cities that are to be traversed, while the lower-level policy takes a Transformer model on the chosen cities to solve a shortest path with prescribed starting and ending cities. These two policies are jointly trained by reinforcement learning algorithms, and the TSP solutions can be directly generated without any search procedure. The proposed approach takes advantage of inference efficiency of Transformer model and provides highly competitive results.