Tesla P40 in a Homelab: 24GB of Inference on a Budget

The Tesla P40 is a seductive piece of hardware: 24GB of VRAM for a fraction of the cost of a modern RTX card. But after three weeks of fighting with it, I realized that the "budget" part of the equation doesn't include the cost of my sanity. I spent more time debugging QEMU assertion errors and PCI address shifts than I did actually running models.

If you're looking to put a P40 in a Proxmox node to run LLMs, you're likely trying to fit larger models like Qwen2.5:32B into VRAM without spending four figures on an A100 or a 3090. It's a viable path, but the standard way of doing things (GPU passthrough to a VM) is a recipe for instability with this specific card.

The Passthrough Trap

My first instinct was to follow the standard Proxmox pattern: isolate the GPU using vfio-pci and pass it through to a dedicated Ubuntu VM. I've done this before, and usually, it's the right move for isolation. I had my IOMMU groups sorted and the hostpci line configured in the VM config.

It worked for about four hours. Then the P40 decided it didn't want to exist anymore.

The Passthrough Trap

It worked for about four hours. Then the P40 decided it didn't want to exist anymore.

Tesla P40 in a Homelab: 24GB of Inference on a Budget

Other newsrooms on this story

Tesla P40 in a Homelab: 24GB of Inference on a Budget

Other newsrooms on this story

Related reading

I spent two weeks optimizing 96GB of VRAM for local LLMs. Paid APIs still won.

I Tested DeepSeek V4 Flash and GPT-4o Side by Side — Here's the p99 Latency…

AI-NT-No-Problem: Cramming a 9950X3D and RTX 5090 Into an SFF Custom Loop

768GB of cheap Intel Optane DIMM memory sticks used to run 1-trillion-parameter…

Deploying vLLM on OKE with NVIDIA A10 GPUs: The 20-Minute Setup Nobody Talks…

AMD puts out new slottable GPU for AI-curious enterprises

Related reading

I spent two weeks optimizing 96GB of VRAM for local LLMs. Paid APIs still won.

I Tested DeepSeek V4 Flash and GPT-4o Side by Side — Here's the p99 Latency…

AI-NT-No-Problem: Cramming a 9950X3D and RTX 5090 Into an SFF Custom Loop

768GB of cheap Intel Optane DIMM memory sticks used to run 1-trillion-parameter…

Deploying vLLM on OKE with NVIDIA A10 GPUs: The 20-Minute Setup Nobody Talks…

AMD puts out new slottable GPU for AI-curious enterprises