Difference between revisions of "Gpubox"

From Sudo Room
Jump to navigation Jump to search
(init gpubox notes)
 
(mediawiki formatting)
Line 1: Line 1:
gpubox Setup Guide
mediawiki
1. Bare Metal Configuration
= gpubox Setup Guide =


    Debian 13 Installation: 
== Bare Metal Configuration ==
        Use the provided ISO (debian-13.3.0-amd64-netinst.iso). 
        Ensure the correct network interface is configured (check ip a after installation).


    IPMI Setup: 
=== Debian 13 Installation ===
        Access IPMI via ipmitool (e.g., ipmitool -I lanplus -H 10.0.0.234 -U ADMIN -P ADMIN pwd for password).
* Use the provided ISO (`debian-13.3.0-amd64-netinst.iso`)
* Ensure correct network interface configuration (check `ip a` after installation)


    Proxmox VE Installation: 
=== IPMI Setup ===
        Install Proxmox VE on bare metal using the official installer. 
* Access IPMI via `ipmitool` (e.g., `ipmitool -I lanplus -H 10.0.0.234 -U ADMIN -P ADMIN pwd` for password)
        Configure storage (e.g., local LVM for VMs).


2. GPU Passthrough Configuration
=== Proxmox VE Installation ===
* Install Proxmox VE on bare metal using the official installer
* Configure storage (e.g., local LVM for VMs)


    BIOS/UEFI: Enable VT-d (Virtualization Technology for Directed I/O). 
== GPU Passthrough Configuration ==


    Identify GPUs: 
=== BIOS/UEFI ===
    bash
* Enable VT-d (Virtualization Technology for Directed I/O)
   
       
   
   
    1
    lspci -nn | grep -i nvidia  # List all NVIDIA GPUs
   
   


    Example output:   
=== Identify GPUs ===
    bash
```
   
{{#lst:|l|nvidia}} // List all NVIDIA GPUs
       
```
   
Example output:
   
```
    1
08:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP102 [GeForce GTX 1080 Ti] [10de:1b06] (rev ff)
    08:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP102 [GeForce GTX 1080 Ti] [10de:1b06] (rev ff)
```
   
Use the vendor ID (`10de`) and device ID (`1b06`) to identify GPUs.
   


    Use the vendor ID (10de) and device ID (1b06) to identify GPUs.
=== VFIO Modules ===
Create `/etc/modules-load.d/vfio.conf` with:
```
vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd
```


    VFIO Modules:
== NVIDIA Drivers on Host ==
    Create /etc/modules-load.d/vfio.conf with: 
    bash
   
       
   
   
    1
    2
    3
    4
    vfio
    vfio_iommu_type1
    vfio_pci
    vfio_virqfd
   
   


3. NVIDIA Drivers on Host
=== Edit /etc/apt/sources.list ===
```
sed -i 's/main/main non-free contrib/g' /etc/apt/sources.list
apt update
apt install -y nvidia-driver nvidia-kernel-dkms
```


    Edit /etc/apt/sources.list: 
== VM Templates & Cloning ==
    bash
   
       
   
   
    1
    2
    3
    sed -i 's/main/main non-free contrib/g' /etc/apt/sources.list
    apt update
    apt install -y nvidia-driver nvidia-kernel-dkms
   
   


4. VM Templates & Cloning
=== Template VM ===
* Use `debian-13.3.0-amd64-netinst.iso` to create a minimal Debian 13 template
* Convert to template (Proxmox: VM > Convert to Template)


    Template VM: 
=== Clone VMs ===
        Use debian-13.3.0-amd64-netinst.iso to create a minimal Debian 13 template. 
* Clone the template for `ollama-2080` and `dockerhost`
        Convert to template (Proxmox: VM > Convert to Template).
* **Pass GPU**: In VM settings, go to "Hardware" > "PCI" > "Raw" and select the GPU (use `lspci` IDs)


    Clone VMs: 
== Specific VM Configurations ==
        Clone the template for ollama-2080 and dockerhost. 
        Pass GPU: 
            In VM settings, go to "Hardware" > "PCI" > "Raw" and select the GPU (use lspci IDs).


5. Specific VM Configurations
=== ollama-2080 ===
* Install `ollama` (e.g., `curl -fsSL https://ollama.com/install.sh | sh`)
* Configure GPU acceleration (check `ollama --version` and ensure NVIDIA drivers are loaded)


    ollama-2080: 
=== dockerhost ===
        Install ollama (e.g., curl -fsSL https://ollama.com/install.sh | sh). 
* Install Docker:
        Configure GPU acceleration (check ollama --version and ensure NVIDIA drivers are loaded).
```
apt install -y docker.io
systemctl enable --now docker
```
* Add user to `docker` group (`usermod -aG docker $USER`)


    dockerhost: 
=== ai-conductor ===
        Install Docker: 
* Install required tools (e.g., `kubectl` for Kubernetes orchestration)
        bash
       
           
       
       
        1
        2
        apt install -y docker.io
        systemctl enable --now docker
       
       
        Add user to docker group (usermod -aG docker $USER).


    ai-conductor: 
== Key Commands ==
        Install required tools (e.g., kubectl for Kubernetes orchestration).


Key Commands
```
bash
   
1
2
3
4
5
6
7
8
9
10
11
# Check GPU visibility in host
# Check GPU visibility in host
lspci -k | grep -A 2 "VGA"
lspci -k | grep -A 2 "VGA"
Line 139: Line 89:
# Clone a template in Proxmox
# Clone a template in Proxmox
qm clone <source_VM_ID> <new_VM_ID> --name "ollama-2080"
qm clone <source_VM_ID> <new_VM_ID> --name "ollama-2080"
```
Troubleshooting


    GPU Not Visible: Ensure VT-d is enabled in BIOS and the GPU is listed in lspci
== Troubleshooting ==
    Driver Issues: Reinstall nvidia-driver and reboot
 
    Permission Errors: Add user to docker and kvm groups.
* **GPU Not Visible**: Ensure VT-d is enabled in BIOS and the GPU is listed in `lspci`
* **Driver Issues**: Reinstall `nvidia-driver` and reboot
* **Permission Errors**: Add user to `docker` and `kvm` groups

Revision as of 20:54, 16 February 2026

mediawiki

gpubox Setup Guide

Bare Metal Configuration

Debian 13 Installation

  • Use the provided ISO (`debian-13.3.0-amd64-netinst.iso`)
  • Ensure correct network interface configuration (check `ip a` after installation)

IPMI Setup

  • Access IPMI via `ipmitool` (e.g., `ipmitool -I lanplus -H 10.0.0.234 -U ADMIN -P ADMIN pwd` for password)

Proxmox VE Installation

  • Install Proxmox VE on bare metal using the official installer
  • Configure storage (e.g., local LVM for VMs)

GPU Passthrough Configuration

BIOS/UEFI

  • Enable VT-d (Virtualization Technology for Directed I/O)

Identify GPUs

``` {{#lst:|l|nvidia}} // List all NVIDIA GPUs ``` Example output: ``` 08:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP102 [GeForce GTX 1080 Ti] [10de:1b06] (rev ff) ``` Use the vendor ID (`10de`) and device ID (`1b06`) to identify GPUs.

VFIO Modules

Create `/etc/modules-load.d/vfio.conf` with: ``` vfio vfio_iommu_type1 vfio_pci vfio_virqfd ```

NVIDIA Drivers on Host

Edit /etc/apt/sources.list

``` sed -i 's/main/main non-free contrib/g' /etc/apt/sources.list apt update apt install -y nvidia-driver nvidia-kernel-dkms ```

VM Templates & Cloning

Template VM

  • Use `debian-13.3.0-amd64-netinst.iso` to create a minimal Debian 13 template
  • Convert to template (Proxmox: VM > Convert to Template)

Clone VMs

  • Clone the template for `ollama-2080` and `dockerhost`
  • **Pass GPU**: In VM settings, go to "Hardware" > "PCI" > "Raw" and select the GPU (use `lspci` IDs)

Specific VM Configurations

ollama-2080

  • Install `ollama` (e.g., `curl -fsSL https://ollama.com/install.sh | sh`)
  • Configure GPU acceleration (check `ollama --version` and ensure NVIDIA drivers are loaded)

dockerhost

  • Install Docker:

``` apt install -y docker.io systemctl enable --now docker ```

  • Add user to `docker` group (`usermod -aG docker $USER`)

ai-conductor

  • Install required tools (e.g., `kubectl` for Kubernetes orchestration)

Key Commands

```

  1. Check GPU visibility in host

lspci -k | grep -A 2 "VGA"

  1. Verify VFIO modules loaded

lsmod | grep vfio

  1. Test NVIDIA driver

nvidia-smi # Should show GPU details

  1. Clone a template in Proxmox

qm clone <source_VM_ID> <new_VM_ID> --name "ollama-2080" ```

Troubleshooting

  • **GPU Not Visible**: Ensure VT-d is enabled in BIOS and the GPU is listed in `lspci`
  • **Driver Issues**: Reinstall `nvidia-driver` and reboot
  • **Permission Errors**: Add user to `docker` and `kvm` groups