Difference between revisions of "Gpubox"

 
(7 intermediate revisions by 2 users not shown)
Line 3: Line 3:
|+ On gpubox
|+ On gpubox
|-
|-
! IP:Port !! Description
! Hostname:Port !! Description
|-
|-
| [https://gpubox.local:8006] || Proxmox admin
| [https://gpubox.local:8006/ gpubox.local:8006] || Proxmox admin
|-
|-
| [http://dockerhost.local:3000/] || Open WebUI (to play with LLMs)
| [http://dockerhost.local:3000/ dockerhost.local:3000] || Open WebUI (to play with LLMs)
|-
|-
| [https://ipmi-compute-2-171.local/] || IPMI
| [https://ipmi-compute-2-171.local/ ipmi-compute-2-171.local] || IPMI
|}
|}


Line 25: Line 25:
$ ipmitool -H ipmi-compute-2-171.local  -U ADMIN -P pwd dcmi power reading
$ ipmitool -H ipmi-compute-2-171.local  -U ADMIN -P pwd dcmi power reading
[shows electrical power presently being consumed by system]
[shows electrical power presently being consumed by system]
</pre>
Soft shutdown (all this is necessary or it will just reboot):
<pre>
#!/bin/bash
IPMICOMMAND="ipmitool -U ADMIN -P $(cat ipmipass) -H ipmi-compute-2-171"
$IPMICOMMAND power soft
timeout 15 bash -c "while $IPMICOMMAND power status | grep on ; do
        sleep 0.5
      done"
$IPMICOMMAND power off
</pre>
</pre>


Line 33: Line 44:
* Hostname gpubox
* Hostname gpubox
* Proxmox web ui available at: https://gpubox.local:8006
* Proxmox web ui available at: https://gpubox.local:8006
* apt install earlyoom && sudo systemctl enable --now earlyoom  # prevent system crashes due to running out of RAM


== GPU Passthrough Configuration ==
== GPU Passthrough Configuration ==
Line 114: Line 126:
* Upload <pre>debian-13.3.0-amd64-netinst.iso</pre> to storage through the proxmox web ui
* Upload <pre>debian-13.3.0-amd64-netinst.iso</pre> to storage through the proxmox web ui
* Create a minimal Debian 13 template
* Create a minimal Debian 13 template
** <pre>apt install -y ufw fail2ban curl git zsh sudo</pre>
** <pre>apt install -y ufw fail2ban curl git zsh sudo net-tools</pre>
** <pre>sudo apt update && sudo apt full-upgrade -y</pre>
** <pre>sudo apt update && sudo apt full-upgrade -y</pre>
* Make a user called <pre>deb</pre> with sudo
* Make a user called <pre>deb</pre> with sudo
Line 130: Line 142:
* Install nvidia drivers https://www.xda-developers.com/nvidia-stopped-supporting-my-gpu-so-i-started-self-hosting-llms-with-it/
* Install nvidia drivers https://www.xda-developers.com/nvidia-stopped-supporting-my-gpu-so-i-started-self-hosting-llms-with-it/
** Pin the driver version so you don't have to re-run the nvidia installer every time the kernel gets updated
** Pin the driver version so you don't have to re-run the nvidia installer every time the kernel gets updated
* Install <pre>ollama</pre> with <pre>curl -fsSL https://ollama.com/install.sh | sh</pre>
* Install ollama with <pre>curl -fsSL https://ollama.com/install.sh | sh</pre>
* Use ollama to pull and run deepseek-r1:8b
* Use ollama to pull and run deepseek-r1:8b
* <pre>sudo ufw allow from 10.0.0.0/24 to any port 11434 proto tcp</pre>
* Verify: http://ollama.local:11434/ should show the message <pre>Ollama is running.</pre>
* Verify: http://ollama.local:11434/ should show the message <pre>Ollama is running.</pre>
=== imgtotext ===
* Install ollama as above
* <pre>ollama run hf.co/noctrex/ZwZ-8B-GGUF:Q8_0</pre> from the page https://huggingface.co/noctrex/ZwZ-8B-GGUF (i pressed the image-to-text tag and looked at trending models)
* http://imgtotext.local:11434/ should show ollama is running


=== dockerhost ===
=== dockerhost ===
Line 177: Line 195:
=== ai-conductor ===
=== ai-conductor ===
* TBD
* TBD
=== if you're using a 1080 Ti or 1080 ===
<pre>sudo apt purge "*nvidia*"
sudo apt autoremove --purge
</pre>
then reboot.


== Key Commands ==
== Key Commands ==