Multi-Instance GPU is a way to securely partition/share GPU for CUDA applications, providing multiple users with separate GPU resources for optimal GPU utilization.

Mig

MIG allows us to partition single GPU card into a maximum of 7 GPU instance.

Below is the example of A100 40GB which is sharded into 8 memory slices and 7 compute slices. Each of the memory slice will contain 1/8th of the total vram and each of the compute slice will have 1/7th total amount of streaming multiprocessors i.e. 8x5GB memory slice and 7 compute slices.

Mig-a100

GPU Instance

To create a GPU partition/GPU Instance (GI) requires combining some number of memory slices with some number of compute slices. In the diagram below, a 5GB memory slice is combined with 1 compute slice to create a 1g.5gb GI profile.

Mig-1g.5gb

Profile Placement

The number of slices that a GI can be created with is not arbitrary. The NVIDIA driver APIs provide a number of “GPU Instance Profiles” and users can create GIs by specifying one of these profiles.

mig-profile-placement

For more details on supported MIG profiles check the Nvidia documentation - https://docs.nvidia.com/datacenter/tesla/mig-user-guide/#supported-mig-profiles