Scalable AI Infrastructure: Architecting Cloud-Native Systems for Intelligent WorkloadsScalable AI Infrastructure: Architecting Cloud-Native Systems for Intelligent Workloads
DOI:
https://doi.org/10.70179/15n61n17Keywords:
AI Infrastructure,Cloud-Native Architecture,Scalability,Machine Learning Workloads,Distributed Computing,Containerization (e.g., Docker, Kubernetes),MLOps,Elastic Compute,High Availability,Serverless AI,Edge AI Deployment,Infrastructure as Code (IaC),Data Pipeline Automation,GPU Orchestration,Hybrid Cloud AI.Abstract
Modern data centers must adapt their infrastructure to become scalable, high-performance, trustworthy, and power-efficient to keep pace with the rapid advances in AI algorithms. In light of this, researchers outline what scalable AI infrastructure is, and introduce how scalable AI infrastructure is architected to cope with cloud-native intelligent workloads. The emergence of various AI-based applications has transformed how people access information or connect with one another. Various approaches based on AI techniques have been developed to provide intelligent services, and as a result, the explosive growth of data and the rapid advancement of these AI algorithms are driving further demand and growth in AI workloads. AI algorithms based on structured mathematical models have been conventionally used. Deep Learning (DL) emerging from the field of ML is a key factor in AI's breakthroughs and the rapid increase in AI workloads. It is generally formulated by a directed acyclic graph of mathematical operations on multi-dimensional tensors. A large amount of data is given as input for distributively trained DNNs. Data-driven DNN models are taking on more parameters for inductive tasks, generating an ever-growing network for DNN inference. As the significance of big data analytics is increased by the exploration of vast amounts of data, data centers with a cloud architecture are essential. Nowadays, cloud data centers provide various services such as web search, data storage, social networking, and online shopping, and resource allocation and management on a cloud datacenter are accomplished by a key element called a system controller. As a cloud datacenter is powered from a renewable energy source, the feed to the datacenter is fluctuating. While other workload types are generally static and well-scheduled, cloud datacenters experience bursty and dynamic message passing workloads. Thus, uncertainty in input streams makes it difficult to determine how to best allocate workloads on servers. As a consequence, the resource allocation of the cloud system is a complex combinatorial optimization, which is often NP-hard.