2022
Zhiyuan Yao; Yoann Desmouceaux; Juan Antonio Cordero; Mark Townsley; Thomas Heide Clausen
Aquarius-Enable Fast, Scalable, Data-Driven Service Management in the Cloud Journal Article
In: IEEE Transactions on Network and Service Management, 2022, ISSN: 1932-4537.
Abstract | Links | BibTeX | Tags: Chaire Cisco, Infrastructure for Big Data, Machine Learning, Network Monitoring
@article{nokeyi,
title = {Aquarius-Enable Fast, Scalable, Data-Driven Service Management in the Cloud},
author = {Zhiyuan Yao and Yoann Desmouceaux and Juan Antonio Cordero and Mark Townsley and Thomas Heide Clausen},
url = {https://ieeexplore.ieee.org/abstract/document/9852806},
doi = {10.1109/TNSM.2022.3197130},
issn = {1932-4537},
year = {2022},
date = {2022-12-01},
urldate = {2022-12-01},
journal = {IEEE Transactions on Network and Service Management},
abstract = {In order to dynamically manage and update networking policies in cloud data centers, Virtual Network Functions (VNFs) use, and therefore actively collect, networking state information -and in the process, incur additional control signaling and management overhead, especially in larger data centers. In the meantime, VNFs in production prefer distributed and straightforward heuristics over advanced learning algorithms to avoid intractable additional processing latency under high-performance and low-latency networking constraints. This paper identifies the challenges of deploying learning algorithms in the context of cloud data centers, and proposes Aquarius to bridge the application of machine learning (ML) techniques on distributed systems and service management. Aquarius passively yet efficiently gathers reliable observations, and enables the use of ML techniques to collect, infer, and supply accurate networking state information -without incurring additional signaling and management overhead. It offers fine-grained and programmable visibility to distributed VNFs, and enables both open-and close-loop control over networking systems. This paper illustrates the use of Aquarius with a traffic classifier, an auto-scaling system, and a load balancer -and demonstrates the use of three different ML paradigms -unsupervised, supervised, and reinforcement learning, within Aquarius, for network state inference and service management. Testbed evaluations show that Aquarius suitably improves network state visibility and brings notable performance gains for various scenarios with low overhead.},
keywords = {Chaire Cisco, Infrastructure for Big Data, Machine Learning, Network Monitoring},
pubstate = {published},
tppubtype = {article}
}
In order to dynamically manage and update networking policies in cloud data centers, Virtual Network Functions (VNFs) use, and therefore actively collect, networking state information -and in the process, incur additional control signaling and management overhead, especially in larger data centers. In the meantime, VNFs in production prefer distributed and straightforward heuristics over advanced learning algorithms to avoid intractable additional processing latency under high-performance and low-latency networking constraints. This paper identifies the challenges of deploying learning algorithms in the context of cloud data centers, and proposes Aquarius to bridge the application of machine learning (ML) techniques on distributed systems and service management. Aquarius passively yet efficiently gathers reliable observations, and enables the use of ML techniques to collect, infer, and supply accurate networking state information -without incurring additional signaling and management overhead. It offers fine-grained and programmable visibility to distributed VNFs, and enables both open-and close-loop control over networking systems. This paper illustrates the use of Aquarius with a traffic classifier, an auto-scaling system, and a load balancer -and demonstrates the use of three different ML paradigms -unsupervised, supervised, and reinforcement learning, within Aquarius, for network state inference and service management. Testbed evaluations show that Aquarius suitably improves network state visibility and brings notable performance gains for various scenarios with low overhead.