Spring is here — and, so is the PhD-defence of Zhiyuan Yao: May 17, at 10:00 at Ecole Polytechnique

All good things must come to an end … and so, after about 3 years of scientific collaborations, a global pandemic, and 10 or so scientific papers published or in the pipeline, my PhD student Zhiyuan Yao is preparing to become my former-PhD student.

To this end, he will defend his doctoral thesis entitled “Autonomous Service Management in the Cloud” on May 17, 2023, at 10:00 at the Becquerel amphitheater at Ecole Polytechnique. The PhD defence is open to all, and we’re looking forward to seeing you there.

The members of the assessment committee will be:

  • Sara Bouchenak, Professor, INSA Lyon, France (Reviewer)
  • Rémi Badonnei, Professor, Telecom Nancy, France (Reviewer)
  • Laércio Lima Pilla, Research Scientist, LaBRI – Université de Bordeaux, France (Examiner)
  • Gaël Thomas, Professeur, Telecom SudParis, France (Examiner)
  • Erwan Le Pennec, Professeur, École Polytechnique, France (Examiner)
  • Thomas Heide Clausen, Professeur, École Polytechnique, France (Advisor)

Applications and services have become more complex, while the Internet has become increasingly difficult to evolve both regarding its physical infrastructure, and its protocols and performance. Being responsible for policy configurations as well as network management and performance tuning, network operators are shifting towards the use of more and more automated tools to accomplish these tasks. The concept of “programmable networks” has emerged to alleviate the challenges, and to facilitate network evolution.This includes paradigms such as (i) software-defined networking (SDN) and (ii) network function virtualization (NFV), which decouple the forwarding hardware into the control plane and data plane, and seek to abstract network forwarding, and other networking functions, from the hardware.In the era of “big data” on cloud computing, these paradigms have enabled rich network traffic processing services, while having also reduced the granularity of task allocation in data centers.It has been recognized that shifting controllers from logically centralized to distributed will increase not only scalability but also robustness to inconsistency.Machine-Learning (ML)-based approaches have been proposed to deploy more intelligence in networks, when using decoupled control and data planes.In this context, the question explored in this thesis is whether, and how, it is possible to offer generic, data-driven networking functions in data center networks as services, for constructing autonomous networking systems which optimize networking performances with minimal human intervention and operational complexity.This thesis investigates the increasing scale, complexity, and heterogeneity of networking infrastructure, and protocols, as well as the demand for virtualization and cloud support services in terms of efficient resource management, rapid provisioning, and scalability present a set of new challenges in effective network organization, management, and optimization.This is accomplished by studying how certain network functions and primitives (traffic classification, auto-scaling, load balancing) can be reliably enhanced by various data-driven algorithms, while bearing in mind the in-production requirements in data center networks — high scalability, high throughput, low latency, and low overheads.

The characteristics of networking features in the context of in-production overlay networks are investigated first, which opens the discussion of the challenges of collecting measurements and deploying data-driven networking policies in real-world systems. To tackle these challenges, a generic tool to extract networking features from the data plane and deploy ML algorithms for various networking functions in real-world networking systems is built. A methodological framework is also designed and showcased, allowing for the developing of algorithms of different learning paradigms for networking problems. This thesis then dedicates the study to network load balancing problems in data center networks, on which a survey of state-of-the-art load balancers is provided. A hardware-based load balancing mechanism is proposed, achieving line-rate load-aware workload distribution by exploiting server load information embedded in packet headers as feedback signals. Finally, both an open-loop and a closed-loop learning load balancing algorithms are proposed based on learning algorithms, and they show better performance than state-of-the-art load balancing methods.