Difference between revisions of "Gpubox"

Jump to navigation Jump to search
446 bytes removed ,  Yesterday at 20:54
mediawiki formatting
(init gpubox notes)
 
(mediawiki formatting)
Line 1: Line 1:
gpubox Setup Guide
mediawiki
1. Bare Metal Configuration
= gpubox Setup Guide =


    Debian 13 Installation: 
== Bare Metal Configuration ==
        Use the provided ISO (debian-13.3.0-amd64-netinst.iso). 
        Ensure the correct network interface is configured (check ip a after installation).


    IPMI Setup: 
=== Debian 13 Installation ===
        Access IPMI via ipmitool (e.g., ipmitool -I lanplus -H 10.0.0.234 -U ADMIN -P ADMIN pwd for password).
* Use the provided ISO (`debian-13.3.0-amd64-netinst.iso`)
* Ensure correct network interface configuration (check `ip a` after installation)


    Proxmox VE Installation: 
=== IPMI Setup ===
        Install Proxmox VE on bare metal using the official installer. 
* Access IPMI via `ipmitool` (e.g., `ipmitool -I lanplus -H 10.0.0.234 -U ADMIN -P ADMIN pwd` for password)
        Configure storage (e.g., local LVM for VMs).


2. GPU Passthrough Configuration
=== Proxmox VE Installation ===
* Install Proxmox VE on bare metal using the official installer
* Configure storage (e.g., local LVM for VMs)


    BIOS/UEFI: Enable VT-d (Virtualization Technology for Directed I/O). 
== GPU Passthrough Configuration ==


    Identify GPUs: 
=== BIOS/UEFI ===
    bash
* Enable VT-d (Virtualization Technology for Directed I/O)
   
       
   
   
    1
    lspci -nn | grep -i nvidia  # List all NVIDIA GPUs
   
   


    Example output:   
=== Identify GPUs ===
    bash
```
   
{{#lst:|l|nvidia}} // List all NVIDIA GPUs
       
```
   
Example output:
   
```
    1
08:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP102 [GeForce GTX 1080 Ti] [10de:1b06] (rev ff)
    08:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP102 [GeForce GTX 1080 Ti] [10de:1b06] (rev ff)
```
   
Use the vendor ID (`10de`) and device ID (`1b06`) to identify GPUs.
   


    Use the vendor ID (10de) and device ID (1b06) to identify GPUs.
=== VFIO Modules ===
Create `/etc/modules-load.d/vfio.conf` with:
```
vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd
```


    VFIO Modules:
== NVIDIA Drivers on Host ==
    Create /etc/modules-load.d/vfio.conf with: 
    bash
   
       
   
   
    1
    2
    3
    4
    vfio
    vfio_iommu_type1
    vfio_pci
    vfio_virqfd
   
   


3. NVIDIA Drivers on Host
=== Edit /etc/apt/sources.list ===
```
sed -i 's/main/main non-free contrib/g' /etc/apt/sources.list
apt update
apt install -y nvidia-driver nvidia-kernel-dkms
```


    Edit /etc/apt/sources.list: 
== VM Templates & Cloning ==
    bash
   
       
   
   
    1
    2
    3
    sed -i 's/main/main non-free contrib/g' /etc/apt/sources.list
    apt update
    apt install -y nvidia-driver nvidia-kernel-dkms
   
   


4. VM Templates & Cloning
=== Template VM ===
* Use `debian-13.3.0-amd64-netinst.iso` to create a minimal Debian 13 template
* Convert to template (Proxmox: VM > Convert to Template)


    Template VM: 
=== Clone VMs ===
        Use debian-13.3.0-amd64-netinst.iso to create a minimal Debian 13 template. 
* Clone the template for `ollama-2080` and `dockerhost`
        Convert to template (Proxmox: VM > Convert to Template).
* **Pass GPU**: In VM settings, go to "Hardware" > "PCI" > "Raw" and select the GPU (use `lspci` IDs)


    Clone VMs: 
== Specific VM Configurations ==
        Clone the template for ollama-2080 and dockerhost. 
        Pass GPU: 
            In VM settings, go to "Hardware" > "PCI" > "Raw" and select the GPU (use lspci IDs).


5. Specific VM Configurations
=== ollama-2080 ===
* Install `ollama` (e.g., `curl -fsSL https://ollama.com/install.sh | sh`)
* Configure GPU acceleration (check `ollama --version` and ensure NVIDIA drivers are loaded)


    ollama-2080: 
=== dockerhost ===
        Install ollama (e.g., curl -fsSL https://ollama.com/install.sh | sh). 
* Install Docker:
        Configure GPU acceleration (check ollama --version and ensure NVIDIA drivers are loaded).
```
apt install -y docker.io
systemctl enable --now docker
```
* Add user to `docker` group (`usermod -aG docker $USER`)


    dockerhost: 
=== ai-conductor ===
        Install Docker: 
* Install required tools (e.g., `kubectl` for Kubernetes orchestration)
        bash
       
           
       
       
        1
        2
        apt install -y docker.io
        systemctl enable --now docker
       
       
        Add user to docker group (usermod -aG docker $USER).


    ai-conductor: 
== Key Commands ==
        Install required tools (e.g., kubectl for Kubernetes orchestration).


Key Commands
```
bash
   
1
2
3
4
5
6
7
8
9
10
11
# Check GPU visibility in host
# Check GPU visibility in host
lspci -k | grep -A 2 "VGA"
lspci -k | grep -A 2 "VGA"
Line 139: Line 89:
# Clone a template in Proxmox
# Clone a template in Proxmox
qm clone <source_VM_ID> <new_VM_ID> --name "ollama-2080"
qm clone <source_VM_ID> <new_VM_ID> --name "ollama-2080"
```
Troubleshooting


    GPU Not Visible: Ensure VT-d is enabled in BIOS and the GPU is listed in lspci
== Troubleshooting ==
    Driver Issues: Reinstall nvidia-driver and reboot
 
    Permission Errors: Add user to docker and kvm groups.
* **GPU Not Visible**: Ensure VT-d is enabled in BIOS and the GPU is listed in `lspci`
* **Driver Issues**: Reinstall `nvidia-driver` and reboot
* **Permission Errors**: Add user to `docker` and `kvm` groups

Navigation menu