My home virtualisation server has been running without complaint for a while now. It hosts a handful of VMs — including the Ubuntu VM that runs my OpenClaw AI assistant — and it does that job well. This isn’t a story about something breaking. It’s a story about a platform quietly reaching the end of its useful life, and what comes next.
What the current server looks like
It’s built around an ASRockRack EPYC3251D4I-2T — a compact Mini-ITX server board with an AMD EPYC 3251 soldered directly to it and passively cooled. Eight cores, sixteen threads, 2.5GHz. At the time it was a tidy piece of kit: proper server-grade reliability in a small form factor, with dual 10GbE onboard and ECC memory support.
It’s running Windows Server 2025 Datacenter with Hyper-V, has 64GB of RAM, a Samsung 970 EVO Plus 500GB NVMe for the OS, and a 4TB WD SSD for VM storage.
The problem isn’t performance — for hosting VMs it’s still capable. The problem is the platform itself. The EPYC 3251 is soldered to the board, so there’s no upgrade path for the CPU. The board is now out of vendor support, meaning no further firmware updates. And crucially: there are no free PCIe slots. The board’s layout leaves no room for a discrete GPU, which rules out anything involving local AI inference.
That last point is the one that’s become increasingly relevant.
Why now
I’ve been running OpenClaw as a self-hosted AI assistant for a while, currently pointed at external APIs for its language models (mostly Claude Sonnet 4.6 and OpenAI GPT 5.4 at time of writing). It works well, but there’s an obvious next step: running a capable LLM locally, on my own hardware, with no costly external API dependency and no data leaving the home.
To do that properly, you need VRAM. A lot of it. And the current hardware simply can’t accommodate a GPU.
So rather than patch around the platform’s limitations, it makes more sense to replace it properly — with something that can handle both the virtualisation work today and local LLM inference as a first-class capability.
The new build
The replacement is going into a Logic Case LC-3390L-BL, a 3U short-depth rack chassis (380mm deep) that I’ve already bought. Everything else is still to be ordered.
CPU: AMD Ryzen 9 9950X
Sixteen cores and thirty-two threads on Zen 5. That’s double the core count of the EPYC 3251, on a modern process node with considerably better single-threaded performance. For a VM host where you’re slicing compute across multiple guests, more cores make a real difference.
Motherboard: ASRock Rack B650D4U-2L2T/BCM
This is the board the build is really centred around. It’s a micro-ATX server board with ECC UDIMM support (important — I want proper ECC memory, not consumer DIMM), dual 2.5GbE plus dual 10GbE onboard, a BMC for out-of-band management, and full ATX power connector support. It’s designed for exactly this kind of workstation-server crossover use case.
RAM: 64GB DDR5-5200 ECC UDIMM (starting), upgradeable to 128GB
Two 32GB Micron ECC UDIMMs to start, with two more slots free for a future upgrade to 128GB. DDR5 ECC on a consumer-adjacent platform is a reasonable sweet spot — proper error correction without the cost of full RDIMM server memory. If memory prices actually ever drop then this might change.
Storage
The Samsung 970 EVO Plus 500GB NVMe transfers across from the old server as the OS drive. A new WD Red SA500 4TB NAS SATA SSD handles VM storage, and the existing 4TB drive joins it for additional capacity.
Cooling
A Noctua NH-U9S handles the CPU. The chassis gets two Noctua NF-A8 80mm fans at the front for intake and two Noctua NF-A4x20 40mm fans at the rear for exhaust, replacing the stock fans. Quiet and reliable matters more than peak airflow in a home environment where the server is in earshot.
PSU: Seasonic Focus GX-750 V4
750W, 80+ Gold, fully modular, 140mm depth to clear the short chassis. Replacing a ten-year-old be quiet! Power Zone 650W that’s been on borrowed time.
GPU: NVIDIA L4 24GB (Phase 2)
This is the part that makes the new build meaningfully different from just a CPU upgrade.
The NVIDIA L4 is a data centre inference card — Ada Lovelace architecture, 24GB of GDDR6, passive cooling, single slot, 72W TDP. It fits in standard PCIe slots without needing auxiliary power. It was designed for exactly this kind of workload: dense, sustained inference at low power draw.
It’s not the newest card on the market, which is actually part of the appeal. The L4 is old enough now that data centres are starting to cycle them out in favour of newer hardware, which means the second-hand and refurbished market is beginning to open up. My hope is to pick one up from that channel rather than paying full price for a new unit — the card itself hasn’t changed, it’s just been displaced by something newer at the top end.
The 24GB of VRAM is the key number: it’s enough to run capable models properly quantised without compromise. For a home lab inference setup, it’s a very good fit.
The plan is to add it in a second phase once the base build is up and running.
Local LLMs: llama.cpp
Once the L4 is in place, the plan is to run a local LLM using llama.cpp, with GPU offloading to take full advantage of the L4’s VRAM. The specific model is genuinely hard to pin down right now — the LLM landscape is moving fast enough that whatever looks like the right choice today will likely have been superseded by something better by the time the build is complete. At the moment Gemma 4 from Google is the front-runner, but I’m holding that lightly.
What’s less likely to change is the use case. OpenClaw, my self-hosted AI assistant, currently calls out to external APIs for its language models. The plan is to point it at a locally-running llama.cpp instance instead — keeping everything on-premises, removing the API dependency, and making the assistant fully self-contained within my own infrastructure. That’s the dream anyway!
Why keep Hyper-V and Windows Server
The short answer is: because it works. Hyper-V on Windows Server has been reliable, the management story is familiar, and there’s no compelling reason to change hypervisors just because the hardware is changing. Proxmox gets plenty of attention in homelab circles and it’s a reasonable choice, but switching platforms introduces migration work and a learning curve for no immediate gain. The new build will run Windows Server with Hyper-V, the same as the current server does today.
Phased approach
| Phase | What’s included | Approximate cost |
|---|---|---|
| Phase 1 | Everything except the GPU | ~£1,970 |
| Phase 2 | NVIDIA L4 24GB | ~£2,220 (new) or less second-hand |
| Future upgrade | 2x additional 32GB ECC UDIMM (→ 128GB) | ~£370 |
Phase 1 delivers a significantly upgraded VM host. Phase 2 adds local LLM inference. The GPU can be deferred without affecting anything — the server is fully functional without it.
Where things stand
The chassis is in hand. Everything else is still to be ordered. This is a plan in progress rather than a completed build — I’ll write it up properly once it’s done and running.
The immediate next step is pulling the trigger on the motherboard and CPU, then building out Phase 1 and migrating the VMs across from the old server. The L4 and the llama.cpp setup will follow once the base is stable.