My first impressions on ROCm and Strix Halo

A detailed look at setting up ROCm on AMD's Strix Halo platform, highlighting the efficiency of 128GB shared memory for running large language models locally.

Here I'll share my first impressions with ROCm and Strix Halo and how I've set up everything.

128GB efficiently shared between the CPU and GPU.

I'm used to working with Ubuntu, so I stuck with it in the supported 24.04 LTS version, and just followed the official installation instructions.

It seems that things wouldn't work without a BIOS update: PyTorch was unable to find the GPU. This was easily done on the BIOS settings: it was able to connect to my Wifi network and download it automatically.

Also on the BIOS settings, you might need to make sure you set the reserved video memory to a low value and let the memory be shared between the CPU and GPU using the GTT. The reserved memory can be as low as 512MB.

Implications:

The CPU is not able to use the GPU reserved memory.
The GPU can use the total of Reserved + GTT, but utilizing both simultaneously can be less efficient than a single large GTT pool due to fragmentation and addressing overhead.
Some legacy games or software sadly might see the GPU memory as 512 MB and refuse to work, this has not happened to me so far though.

Then on /etc/default/grub, I've made this change:

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash ttm.pages_limit=32768000 amdgpu.gttsize=114688"

and then ran sudo update-grub.

Note that amdgpu.gttsize shouldn't include the whole system memory, you should leave some memory (I read from 4GB to 12GB) reserved to the CPU (Total memory minus reserved GPU minus GTT) for the sake of the stability of the Linux kernel.

This was somewhat tricky because of the weird dependency graph of PyTorch, but eventually I've got it working with:

[project]
name = "myproject"
version = "0.1.0"
description = "Add your description here"
readme = "README.md"
requires-python = ">=3.13"
dependencies = [
"torch==2.11.0+rocm7.2",
"triton-rocm",
]
[tool.uv]
environments = ["sys_platform == 'linux'"]
[[tool.uv.index]]
name = "pytorch-rocm"
url = "https://download.pytorch.org/whl/rocm7.2"
explicit = true
[tool.uv.sources]
torch = { index = "pytorch-rocm" }
torchvision = { index = "pytorch-rocm" }
triton-rocm = { index = "pytorch-rocm" }

and you can even add it this your .bashrc:

alias pytorch='''uvx --extra-index-url https://download.pytorch.org/whl/rocm7.2 \
--index-strategy unsafe-best-match \
--with torch==2.11.0+rocm7.2,triton-rocm \
ipython -c "import torch; print(f\"ROCM: {torch.version.hip}\"); \
print(f\"GPU available: {torch.cuda.is_available()}\"); import torch.nn as nn" -i
'''

podman run --rm -it --name qwen-coder --device /dev/kfd --device /dev/dri \
--security-opt label=disable --group-add keep-groups -e HSA_OVERRIDE_GFX_VERSION=11.5.0 \
-p 8080:8080 -v /some_path/models:/models:z ghcr.io/ggml-org/llama.cpp:server-rocm \
-m /models/qwen3.6/model.gguf -ngl 99 -c 327680 --host 0.0.0.0 --port 8080 \
--flash-attn on --no-mmap

Note that you can easily download the model with:

uvx hf download Qwen/Qwen3.6-35B-A3B --local-dir /some_path/models/qwen3.6

And convert to gguf with the convert_hf_to_gguf.py script from the llama.cpp repo:

git clone https://github.com/ggerganov/llama.cpp.git /some_path/llama.cpp

cd /some_path/models/qwen3.6 &&
uvx --extra-index-url https://download.pytorch.org/whl/rocm7.2 \
--index-strategy unsafe-best-match \
--with torch==2.11.0+rocm7.2,triton-rocm,transformers \
ipython /some_path/llama.cpp/convert_hf_to_gguf.py \
-- . --outfile model.gguf

I'm using a Podman to run Opencode, see my repo on how set it up.

And this is my config to have it work with Llama.cpp:

{
"$schema": "https://opencode.ai/config.json",
"provider": {
"local": {
"options": {
"baseURL": "http://localhost:8080/v1",
"apiKey": "any-string",
"reasoningEffort": "auto",
"textVerbosity": "high",
"supportsToolCalls": true
},
"models": {
"qwen-coder-local": {}
}
}
},
"model": "local/qwen-coder-local",
"permission": {
"*": "ask",
"read": {
"*": "allow",
"*.env": "deny",
"**/secrets/**": "deny"
},
"bash": "allow",
"edit": "allow",
"glob": "allow",
"grep": "allow",
"websearch": "allow",
"codesearch": "allow",
"webfetch": "allow"
},
"disabled_providers": [
"opencode"
]
}

So as I promised, my first impressions are: so far, so good, I was able to play with PyTorch and run Qwen3.6 on llama.cpp with a large context window. There were some rough edges, but I think it was quite worth it.

Source: Hacker News