Gpubox

From Sudo Room
Revision as of 20:47, 16 February 2026 by Judytuna (talk | contribs) (init gpubox notes)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

gpubox Setup Guide 1. Bare Metal Configuration

   Debian 13 Installation:  
       Use the provided ISO (debian-13.3.0-amd64-netinst.iso).  
       Ensure the correct network interface is configured (check ip a after installation).
   IPMI Setup:  
       Access IPMI via ipmitool (e.g., ipmitool -I lanplus -H 10.0.0.234 -U ADMIN -P ADMIN pwd for password).
   Proxmox VE Installation:  
       Install Proxmox VE on bare metal using the official installer.  
       Configure storage (e.g., local LVM for VMs).

2. GPU Passthrough Configuration

   BIOS/UEFI: Enable VT-d (Virtualization Technology for Directed I/O).  
   Identify GPUs:  
   bash
    
        
    
    
   1
   lspci -nn | grep -i nvidia  # List all NVIDIA GPUs
    
    
   Example output:  
   bash
    
        
    
    
   1
   08:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP102 [GeForce GTX 1080 Ti] [10de:1b06] (rev ff)
    
    
   Use the vendor ID (10de) and device ID (1b06) to identify GPUs.  
   VFIO Modules:
   Create /etc/modules-load.d/vfio.conf with:  
   bash
    
        
    
    
   1
   2
   3
   4
   vfio
   vfio_iommu_type1
   vfio_pci
   vfio_virqfd
    
    

3. NVIDIA Drivers on Host

   Edit /etc/apt/sources.list:  
   bash
    
        
    
    
   1
   2
   3
   sed -i 's/main/main non-free contrib/g' /etc/apt/sources.list
   apt update
   apt install -y nvidia-driver nvidia-kernel-dkms
    
    

4. VM Templates & Cloning

   Template VM:  
       Use debian-13.3.0-amd64-netinst.iso to create a minimal Debian 13 template.  
       Convert to template (Proxmox: VM > Convert to Template).
   Clone VMs:  
       Clone the template for ollama-2080 and dockerhost.  
       Pass GPU:  
           In VM settings, go to "Hardware" > "PCI" > "Raw" and select the GPU (use lspci IDs).

5. Specific VM Configurations

   ollama-2080:  
       Install ollama (e.g., curl -fsSL https://ollama.com/install.sh | sh).  
       Configure GPU acceleration (check ollama --version and ensure NVIDIA drivers are loaded).
   dockerhost:  
       Install Docker:  
       bash
        
            
        
        
       1
       2
       apt install -y docker.io
       systemctl enable --now docker
        
        
       Add user to docker group (usermod -aG docker $USER).
   ai-conductor:  
       Install required tools (e.g., kubectl for Kubernetes orchestration).

Key Commands bash



1 2 3 4 5 6 7 8 9 10 11

  1. Check GPU visibility in host

lspci -k | grep -A 2 "VGA"

  1. Verify VFIO modules loaded

lsmod | grep vfio

  1. Test NVIDIA driver

nvidia-smi # Should show GPU details

  1. Clone a template in Proxmox

qm clone <source_VM_ID> <new_VM_ID> --name "ollama-2080"


Troubleshooting

   GPU Not Visible: Ensure VT-d is enabled in BIOS and the GPU is listed in lspci.  
   Driver Issues: Reinstall nvidia-driver and reboot.  
   Permission Errors: Add user to docker and kvm groups.