Multi-Instance GPU (MIG)
Multi-Instance GPU is a way to securely partition/share GPU for CUDA applications, providing multiple users with separate GPU resources for optimal GPU utilization. MIG allows us to partition single GPU card into a maximum of 7 GPU instance. Below is the example of A100 40GB which is sharded into 8 memory slices and 7 compute slices. Each of the memory slice will contain 1/8th of the total vram and each of the compute slice will have 1/7th total amount of streaming multiprocessors i....