I've been very fascinated with the idea of running some AI tooling within the house, so I set up a server at home to do that. I use Proxmox VE as my virtualization server. Here are the steps I took to set up my machine.

BIOS Settings

The first thing I had to do was to make sure that my BIOS was ready to support this configuration. I use an AMD M4 CPU with a Gigabyte Motherboard. Your requirements may differ. But here are the 4 things I had to ensure when I was setting this up

  1. AMD-V (SVM Mode)
    This enables AMD's virtualization technology, essential for running virtual machines efficiently. It allows the hypervisor (in this case, Proxmox) to access the CPU's hardware virtualization features, improving VM performance and enabling features like IOMMU.
  2. IOMMU (Input/Output Memory Management Unit)
    IOMMU is critical for GPU passthrough. It allows direct mapping of I/O devices (like your Nvidia GPU) to guest VMs, isolating them from the host system. This enables the VM to access the GPU directly, providing near-native performance.
  3. ACS (Access Control Services)
    ACS is part of the PCIe specification that helps with device isolation. It's crucial in multi-GPU setups or complex PCIe configurations to ensure that devices can be isolated appropriately for passthrough.
  4. Above 4G Decoding
    This setting allows the system to map PCIe devices above the 4GB memory address space. It's often necessary to properly function high-end GPUs and other PCIe devices, especially in systems with large amounts of RAM. Enabling this can prevent issues with GPU passthrough in some systems.

Proxmox Configurations

Enable IOMMU

  1. Edit the GRUB configuration
vim /etc/default/grub
  1. Modify the GRUB_CMDLINE_LINUX_DEFAULT line
GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on iommu=pt"
  1. Update GRUB and reboot:
update-grub
reboot

Identify GPU PCI IDs

  1. Run the following commands to identify the PCI IDs for your GPU
lspci -nn | grep NVIDIA
0b:00.0 VGA compatible controller [0300]: NVIDIA Corporation GA102 [GeForce RTX 3090] [10de:2204] (rev a1)
0b:00.1 Audio device [0403]: NVIDIA Corporation GA102 High Definition Audio Controller [10de:1aef] (rev a1)
  1. Note the PCI IDs for your GPU (e.g 10de:2204)

Load Required Modules

  1. Edit the modules file:
vim /etc/modules
  1. Add these lines:
vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd

Blacklist NVIDIA Drivers

  1. Create a new blacklist file:
vim /etc/modprobe.d/blacklist.conf
  1. Add these lines:
blacklist nvidia
blacklist nouveau

Configure VFIO

  1. Create a new VFIO configuration file:
vim /etc/modprobe.d/vfio.conf
  1. Add this line, replacing 10de:2204 with your GPU's vendor and device IDs:
options vfio-pci ids=10de:2206

Update Initramfs

update-initramfs -u

Configure the VM in Proxmox

Create a new VM in Proxmox

  1. Configure and install for Ubuntu 24.04
  2. Change your machine type to q35 to give you access to the PCI-Express devices

Configure the VM's hardware configuration

  1. Add PCI Device
  2. Enable the ROM-Bar and PCI Express

Boot the machine up

Configure the Ubuntu Virtual Machine

Install the Nvidia drivers

sudo apt update && sudo apt dist-upgrade -y
sudo apt install nvidia-driver-535 nvidia-utils-535

Reboot the VM

Verify the GPU Passthrough

Check if the GPU is recognized

nvidia-smi

You should get an output that looks like this

Install Nvidia machine learning libraries

sudo apt install nvidia-cuda-toolkit

Setup Proxmox with Nvidia GPU pass-through to VMs