Riya Soni
DevOps & Cloud Engineer
CKA | CKS | Terraform Associate
Project
Concurrent GPU Utilization in Kubernetes Clusters
About Project
This project addresses the challenge of efficient GPU resource utilization within Kubernetes architecture. Developed a groundbreaking solution that allows multiple applications to concurrently leverage GPU resources. By integrating GPU sharing mechanisms into Kubernetes, we can enhance the scalability and efficiency of GPU-accelerated workloads.
Tech Stack
- GPU Driver :Facilitates communication between the operating system and the GPU hardware, ensuring seamless resource allocation.
- NVIDIA Container Runtime (nvidia-docker2) :Enhances containerization by providing compatibility and optimized performance for NVIDIA GPUs.
- GPU Sharing Scheduler Extender :Custom scheduler extension for Kubernetes that intelligently allocates and manages GPU resources among multiple applications.
- GPU Sharing Device Plugin :Enables dynamic device plugin registration for shared GPU resources, ensuring efficient utilization across pods.
Key Features
- Enable multiple applications to run GPU-accelerated tasks simultaneously, improving overall system throughput.
- Efficiently allocate and deallocate GPU resources based on application demand, ensuring optimal resource utilization.