Deleting the 8.4GB Python Sidecar: Pure Go + CUDA with `CGO_ENABLED=0`

TL;DR: I built gocudrv so Go services can talk directly to NVIDIA GPUs — no cgo, no CUDA toolkit, no bloated Python dependencies. One static binary.

Last month I was reviewing a production AI service. The core business logic was clean, efficient Go (15MB binary), but GPU access was routed through a Python sidecar.

The results were painful:

8.4GB Docker images — bloated with unused CUDA toolkits and PyTorch dependencies

4-minute cold starts during autoscaling

Deleting the 8.4GB Python Sidecar: Pure Go + CUDA with `CGO_ENABLED=0`

Other newsrooms on this story

Related reading

PyGo: A Deep Learning Framework Where Go Calls Python Calls C++

Why CGO?: Build Beautiful, Pure Go Desktop Apps with Proton

cgroups v2 as a Native Runtime Isolation Primitive — No Docker Required

Distributing Go binaries like sqlite-scanner through PyPI using go-to-wheel

What a Go Engineer Learns Building Their First Real Python Service

Category: Networking / Communications | NVIDIA Technical Blog