192
edits
(mediawiki formatting) |
(pre tags instead of `, and lots of edits) |
||
| Line 4: | Line 4: | ||
== Bare Metal Configuration == | == Bare Metal Configuration == | ||
=== | === IPMI Setup === | ||
* | * Access IPMI via <pre>ipmitool</pre> with hostname <pre>ipmi-compute-2-171.local</pre> | ||
Example commands: | |||
<pre> | |||
$ ipmitool -H ipmi-compute-2-171.local -U ADMIN -P pwd power status | |||
Chassis Power is on | |||
$ ipmitool -H ipmi-compute-2-171.local -U ADMIN -P pwd dcmi power reading | |||
[shows electrical power presently being consumed by system] | |||
</pre> | |||
=== Proxmox | === Debian 13 and Proxmox 8 Installation === | ||
* | * Debian 13 install: https://cdimage.debian.org/debian-cd/current/amd64/iso-cd/debian-13.3.0-amd64-netinst.iso | ||
* | * Proxmox VE on that: https://pve.proxmox.com/wiki/Install_Proxmox_VE_on_Debian_13_Trixie | ||
* Let it use DHCP to grab an IP address, then I changed it later when I set up vrb0 | |||
* Hostname gpubox | |||
* Proxmox web ui available at: https://gpubox.local:8006 | |||
== GPU Passthrough Configuration == | == GPU Passthrough Configuration == | ||
=== BIOS/UEFI === | === BIOS/UEFI === | ||
* Enable VT-d (Virtualization Technology for Directed I/O) | * Enable VT-d (Virtualization Technology for Directed I/O) in BIOS on gpubox | ||
=== Identify GPUs === | === Identify GPUs === | ||
``` | Use the vendor ID (`10de`) and device ID (`1b06`) to identify GPUs. Both video cards and their associated audio device will show up. | ||
<pre> | |||
$ lspci -nnk | grep -A 3 'VGA' | |||
04:00.0 VGA compatible controller [0300]: NVIDIA Corporation TU102 [GeForce RTX 2080 Ti Rev. A] [10de:1e07] (rev a1) | |||
08:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP102 [GeForce GTX 1080 Ti] [10de:1b06] (rev | Subsystem: NVIDIA Corporation Device [10de:12a4] | ||
Kernel modules: nvidiafb, nouveau | |||
04:00.1 Audio device [0403]: NVIDIA Corporation TU102 High Definition Audio Controller [10de:10f7] (rev a1) | |||
-- | |||
05:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP102 [GeForce GTX 1080 Ti] [10de:1b06] (rev a1) | |||
Subsystem: PNY Device [196e:1213] | |||
Kernel modules: nvidiafb, nouveau | |||
05:00.1 Audio device [0403]: NVIDIA Corporation GP102 HDMI Audio Controller [10de:10ef] (rev a1) | |||
-- | |||
08:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP102 [GeForce GTX 1080 Ti] [10de:1b06] (rev a1) | |||
Subsystem: Micro-Star International Co., Ltd. [MSI] Device [1462:3609] | |||
Kernel modules: nvidiafb, nouveau | |||
08:00.1 Audio device [0403]: NVIDIA Corporation GP102 HDMI Audio Controller [10de:10ef] (rev a1) | |||
-- | |||
09:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP102 [GeForce GTX 1080 Ti] [10de:1b06] (rev a1) | |||
Subsystem: ZOTAC International (MCO) Ltd. Device [19da:1470] | |||
Kernel modules: nvidiafb, nouveau | |||
09:00.1 Audio device [0403]: NVIDIA Corporation GP102 HDMI Audio Controller [10de:10ef] (rev a1) | |||
-- | |||
0c:00.0 VGA compatible controller [0300]: ASPEED Technology, Inc. ASPEED Graphics Family [1a03:2000] (rev 30) | |||
Subsystem: Super Micro Computer Inc Device [15d9:0892] | |||
Kernel driver in use: ast | |||
Kernel modules: ast | |||
-- | |||
84:00.0 VGA compatible controller [0300]: NVIDIA Corporation TU102 [GeForce RTX 2080 Ti] [10de:1e04] (rev a1) | |||
Subsystem: Gigabyte Technology Co., Ltd Device [1458:37c4] | |||
Kernel modules: nvidiafb, nouveau | |||
84:00.1 Audio device [0403]: NVIDIA Corporation TU102 High Definition Audio Controller [10de:10f7] (rev a1) | |||
-- | |||
85:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP104 [GeForce GTX 1080] [10de:1b80] (rev a1) | |||
Subsystem: eVga.com. Corp. Device [3842:6180] | |||
Kernel modules: nvidiafb, nouveau | |||
85:00.1 Audio device [0403]: NVIDIA Corporation GP104 High Definition Audio Controller [10de:10f0] (rev a1) | |||
-- | |||
88:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP102 [GeForce GTX 1080 Ti] [10de:1b06] (rev a1) | |||
Subsystem: ZOTAC International (MCO) Ltd. Device [19da:1470] | |||
Kernel modules: nvidiafb, nouveau | |||
88:00.1 Audio device [0403]: NVIDIA Corporation GP102 HDMI Audio Controller [10de:10ef] (rev a1) | |||
-- | |||
89:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP102 [GeForce GTX 1080 Ti] [10de:1b06] (rev a1) | |||
Subsystem: Micro-Star International Co., Ltd. [MSI] Device [1462:3609] | |||
Kernel modules: nvidiafb, nouveau | |||
89:00.1 Audio device [0403]: NVIDIA Corporation GP102 HDMI Audio Controller [10de:10ef] (rev a1) | |||
</pre> | |||
=== VFIO Modules === | === VFIO Modules === | ||
Create `/etc/modules-load.d/vfio.conf` with: | * Create `/etc/modules-load.d/vfio.conf` with: | ||
<pre> | |||
vfio | vfio | ||
vfio_iommu_type1 | vfio_iommu_type1 | ||
vfio_pci | vfio_pci | ||
vfio_virqfd | vfio_virqfd | ||
</pre> | |||
== NVIDIA Drivers on Host == | == NVIDIA Drivers on Host == | ||
We did have to install some drivers on gpubox, but we installed them again later on the VMs. This confuses me. | |||
=== Edit /etc/apt/sources.list === | === Edit /etc/apt/sources.list === | ||
<pre> | |||
sed -i 's/main/main non-free contrib/g' /etc/apt/sources.list | sed -i 's/main/main non-free contrib/g' /etc/apt/sources.list | ||
apt update | apt update | ||
apt install -y nvidia-driver nvidia-kernel-dkms | apt install -y nvidia-driver nvidia-kernel-dkms | ||
</pre> | |||
== VM Templates & Cloning == | == VM Templates & Cloning == | ||
=== Template VM === | === Template VM === | ||
* | * Upload <pre>debian-13.3.0-amd64-netinst.iso</pre> to storage through the proxmox web ui | ||
* Convert to template (Proxmox: VM > Convert to Template) | * Create a minimal Debian 13 template | ||
** <pre>apt install -y ufw fail2ban curl git zsh sudo</pre> | |||
** <pre>sudo apt update && sudo apt full-upgrade -y</pre> | |||
* Make a user called <pre>deb</pre> with sudo | |||
* Convert to template (Proxmox: VM > Convert to Template) with name: <pre>debian13-template</pre> | |||
=== Clone VMs === | === Clone VMs === | ||
* Clone the template for | * Clone the template for <pre>ollama-2080</pre> and future VMs that will house AI models | ||
* **Pass GPU**: In VM settings, go to "Hardware" > "PCI" > "Raw" and select the GPU (use | * **Pass GPU**: In VM settings, go to "Hardware" > "PCI" > "Raw" and select the GPU (use <pre>lspci</pre> IDs) | ||
* Don't forget to change the new cloned VM's hostname | |||
** <pre>sudo hostnamectl set-hostname ollama2080 --static</pre> | |||
== Specific VM Configurations == | == Specific VM Configurations == | ||
=== ollama-2080 === | === ollama-2080 === | ||
* Install | * Install <pre>ollama</pre> with <pre>curl -fsSL https://ollama.com/install.sh | sh</pre> | ||
* | * Use ollama to pull and run Deepseek R1 8 | ||
* Verify: http://ollama.local:11434/ should show the message <pre>Ollama is running.</pre> | |||
=== dockerhost === | === dockerhost === | ||
| Line 70: | Line 126: | ||
systemctl enable --now docker | systemctl enable --now docker | ||
``` | ``` | ||
* Add user to | * Add user <pre>docker</pre> to do docker stuff. Do NOT give <pre>docker</pre> sudo. | ||
* As the docker user, make directories <pre>~/git/openwebui</pre> | |||
* Make a docker compose file at <pre>~/git/openwebui/docker-compose.yaml</pre> | |||
* In <pre>~/git/openwebui</pre>, run <pre>docker compose up</pre> | |||
** Note: newer docker uses <pre>docker compose</pre>, not <pre>docker-compose</pre> | |||
* Useful commands | |||
** <pre>sudo ss -plnt # Shows ports this machine is listening on | |||
ip -4 a # Get this machine's IP address on the local network | |||
</pre> | |||
=== ai-conductor === | === ai-conductor === | ||
* | * TBD | ||
== Key Commands == | == Key Commands == | ||