OpenUMA – bring Apple-style unified memory to x86 AI inference (Rust, Linux)
Article URL: https://github.com/hamtun24/openuma Comments URL: https://news.ycombinator.com/item?id=47624865 Points: 1 # Comments: 0
OpenUMA (Unified Memory Abstraction) is a Rust middleware for detecting shared memory hardware (AMD APUs, Intel iGPUs), configuring unified memory pools, and generating optimal configs for AI inference engines.
┌─────────────────────────────────────────────────────────────────────────┐ │ OpenUMA v0.6.2 │ │ Unified Memory Abstraction for AI Inference │ └─────────────────────────────────────────────────────────────────────────┘┌─────────────────────────────────────────────────────────────────────────┐ │ OpenUMA v0.6.2 │ │ Unified Memory Abstraction for AI Inference │ └─────────────────────────────────────────────────────────────────────────┘Key Features
-
Hardware Detection - Automatic detection of AMD APUs and Intel iGPUs
-
Memory Partitioning - Intelligent iGPU/CPU memory allocation for LLM inference
-
Zero-Copy DMA-BUF - Direct memory transfers between CPU and iGPU
-
Multiple Engines - Generate configs for llama.cpp, Ollama, and KTransformers
-
Interactive TUI - Full terminal UI for hardware monitoring and configuration
-
Benchmarking - Real inference benchmarks with llama.cpp
Supported Hardware
Vendor Series Examples
AMD Zen 3 (Cezanne, Renoir) Ryzen 5 5600G, Ryzen 7 5700G
AMD Zen 4 (Phoenix, Hawk Point) Ryzen 7 7840HS, Ryzen AI 9 HX 370
AMD Zen 5 (Strix Point) Ryzen AI 9 HX 370, Ryzen AI 7 350
Intel Alder Lake, Raptor Lake Core i5-1240P, Core i7-12700H
Intel Meteor Lake, Lunar Lake Core Ultra 5 125H, Core Ultra 7 258V
Quick Start
# Build cargo build --release# Build cargo build --releaseDetect hardware
./target/release/openuma probe
Launch interactive TUI
./target/release/openuma tui
Generate config for llama.cpp
./target/release/openuma configure --engine llamacpp --model model.gguf`
Terminal UI
┌─────────────────────────────────────────────────────────────────────────┐ │ [D]ashboard [M]emory [B]enchmark [P]rofiles [C]onfigure [S]ettings│ ├─────────────────────────────────────────────────────────────────────────┤ │ │ │ ╔═══════════════════════════════════════════════════════════════════╗ │ │ ║ Hardware Overview ║ │ │ ╠═══════════════════════════════════════════════════════════════════╣ │ │ ║ CPU AMD Ryzen 5 5600G (Cezanne) ║ │ │ ║ 6 cores (12 threads), AVX2, 16MB L3 ║ │ │ ║ iGPU AMD Vega7 (Raven Ridge) ║ │ │ ║ 7 CUs, 512MB / 16384MB shared VRAM ║ │ │ ║ Vulkan ✓ OpenCL ✓ Zero-copy ✓ ║ │ │ ║ RAM 32GB DDR4-3200 (Dual-channel) ║ │ │ ║ 51.2 GB/s theoretical, 46.8 GB/s measured ║ │ │ ╠═══════════════════════════════════════════════════════════════════╣ │ │ ║ ✓ Unified Memory Available Tier: CONSUMER_UMA ║ │ │ ╚═══════════════════════════════════════════════════════════════════╝ │ │ │ │ Memory Partition: iGPU: 7168 MB (35.0%) CPU: 13312 MB (65.0%) │ │ [████████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░] 35% │ │ │ │ Strategy: HybridIgpu Zero-copy: Available │ │ │ │ [r] Refresh [q] Quit │ └─────────────────────────────────────────────────────────────────────────┘┌─────────────────────────────────────────────────────────────────────────┐ │ [D]ashboard [M]emory [B]enchmark [P]rofiles [C]onfigure [S]ettings│ ├─────────────────────────────────────────────────────────────────────────┤ │ │ │ ╔═══════════════════════════════════════════════════════════════════╗ │ │ ║ Hardware Overview ║ │ │ ╠═══════════════════════════════════════════════════════════════════╣ │ │ ║ CPU AMD Ryzen 5 5600G (Cezanne) ║ │ │ ║ 6 cores (12 threads), AVX2, 16MB L3 ║ │ │ ║ iGPU AMD Vega7 (Raven Ridge) ║ │ │ ║ 7 CUs, 512MB / 16384MB shared VRAM ║ │ │ ║ Vulkan ✓ OpenCL ✓ Zero-copy ✓ ║ │ │ ║ RAM 32GB DDR4-3200 (Dual-channel) ║ │ │ ║ 51.2 GB/s theoretical, 46.8 GB/s measured ║ │ │ ╠═══════════════════════════════════════════════════════════════════╣ │ │ ║ ✓ Unified Memory Available Tier: CONSUMER_UMA ║ │ │ ╚═══════════════════════════════════════════════════════════════════╝ │ │ │ │ Memory Partition: iGPU: 7168 MB (35.0%) CPU: 13312 MB (65.0%) │ │ [████████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░] 35% │ │ │ │ Strategy: HybridIgpu Zero-copy: Available │ │ │ │ [r] Refresh [q] Quit │ └─────────────────────────────────────────────────────────────────────────┘Commands
Command Description
openuma probe
Detect hardware profile
openuma tui
Launch interactive terminal UI
openuma partition --model
Show memory partition for model
openuma configure --engine --model
Generate engine config
openuma benchmark --model
Run inference benchmark
openuma zerocopy --test
Test DMA-BUF zero-copy
openuma serve
Start REST API server
openuma profile list
List known hardware profiles
Supported Inference Engines
llama.cpp
openuma configure --engine llamacpp --model llama3-8b-q4_k_m.gguf
Ollama
openuma configure --engine ollama --model llama3-8b-q4_k_m.gguf
KTransformers (MoE models)
openuma configure --engine ktransformers --model deepseek-v3-q4km.gguf
How It Works
Memory Model
┌────────────────────────────────────────────────────────────────┐ │ Unified Memory Pool │ │ │ │ ┌──────────────┐ ┌──────────────────────┐ │ │ │ iGPU VRAM │ ◄── Zero-Copy ──►│ System RAM │ │ │ │ (Shared) │ DMA-BUF │ (DDR4/DDR5) │ │ │ └──────────────┘ └──────────────────────┘ │ │ │ │ Attention layers benefit from iGPU │ │ MoE experts stay on CPU │ └────────────────────────────────────────────────────────────────┘┌────────────────────────────────────────────────────────────────┐ │ Unified Memory Pool │ │ │ │ ┌──────────────┐ ┌──────────────────────┐ │ │ │ iGPU VRAM │ ◄── Zero-Copy ──►│ System RAM │ │ │ │ (Shared) │ DMA-BUF │ (DDR4/DDR5) │ │ │ └──────────────┘ └──────────────────────┘ │ │ │ │ Attention layers benefit from iGPU │ │ MoE experts stay on CPU │ └────────────────────────────────────────────────────────────────┘Key Insight
For LLM inference on APUs:
-
Attention layers → benefit from iGPU (parallel matrix ops)
-
MoE expert layers → should stay on CPU (sparse activation)
-
KV cache → benefits from unified memory zero-copy
Benchmarking
# Quick benchmark openuma benchmark --model llama3-8b-q4_k_m.gguf# Quick benchmark openuma benchmark --model llama3-8b-q4_k_m.ggufFull multi-backend comparison
openuma benchmark --model model.gguf --full
Architecture
openuma/ ├── crates/ │ ├── hw_probe/ # Hardware detection │ ├── mem_mgr/ # Memory partitioning + zero-copy │ ├── config_gen/ # Model metadata (GGUF) │ ├── profile_db/ # Hardware profile database │ ├── benchmark/ # Inference benchmarking │ ├── api_server/ # REST API │ ├── cli/ # CLI interface │ └── tui/ # Terminal UI └── profiles/ # Hardware profilesopenuma/ ├── crates/ │ ├── hw_probe/ # Hardware detection │ ├── mem_mgr/ # Memory partitioning + zero-copy │ ├── config_gen/ # Model metadata (GGUF) │ ├── profile_db/ # Hardware profile database │ ├── benchmark/ # Inference benchmarking │ ├── api_server/ # REST API │ ├── cli/ # CLI interface │ └── tui/ # Terminal UI └── profiles/ # Hardware profilesInstallation
Option A — Download Binary (Linux x86_64)
# Download latest release curl -L https://github.com/hamtun24/openuma/releases/latest/download/openuma-linux-x86_64.tar.gz \ | tar xz# Download latest release curl -L https://github.com/hamtun24/openuma/releases/latest/download/openuma-linux-x86_64.tar.gz \ | tar xzRun it
./openuma probe`
Option B — Build from Source
# Prerequisites: Rust 1.70+ git clone https://github.com/hamtun24/openuma.git cd openuma cargo build --release ./target/release/openuma probe# Prerequisites: Rust 1.70+ git clone https://github.com/hamtun24/openuma.git cd openuma cargo build --release ./target/release/openuma probeSystem Requirements
Requirement Notes
OS Linux (kernel 5.10+)
CPU Any x86_64 with AMD APU or Intel iGPU
RAM 16GB minimum, 32GB recommended
Optional llama.cpp in PATH for real benchmarks
Optional Vulkan drivers for iGPU acceleration
Install Vulkan Drivers (if missing)
# AMD iGPU sudo apt install mesa-vulkan-drivers# AMD iGPU sudo apt install mesa-vulkan-driversIntel iGPU
sudo apt install intel-media-va-driver mesa-vulkan-drivers`
Install llama.cpp (optional)
git clone https://github.com/ggerganov/llama.cpp cd llama.cpp mkdir build && cd build cmake .. -DLLAMA_BUILD_EXAMPLES=ON make -j$(nproc) export PATH="$PATH:$(pwd)/bin"git clone https://github.com/ggerganov/llama.cpp cd llama.cpp mkdir build && cd build cmake .. -DLLAMA_BUILD_EXAMPLES=ON make -j$(nproc) export PATH="$PATH:$(pwd)/bin"Real World Results
OpenUMA's value is in the configuration it generates — not just detecting hardware, but knowing the exact flags that extract maximum performance from it.
Example: AMD Ryzen 5 5600G + 32GB DDR4
Setup Command Tokens/sec
llama.cpp defaults
llama-cli -m model.gguf
~3.1 t/s
OpenUMA-configured
openuma configure --engine llamacpp --model model.gguf
~7.2 t/s
Improvement
+132%
What OpenUMA changed:
-
Enabled Vulkan backend (default is CPU)
-
Set correct --n-gpu-layers for available shared VRAM
-
Configured dual-channel memory-aware thread count
-
Disabled mmap in favor of zero-copy DMA-BUF path
Note: Numbers above are estimates from the profile database for this hardware. Run openuma benchmark --model your-model.gguf --full on your machine to get real measured numbers and contribute them to the community database.
Community Benchmarks
This section will grow as users submit hardware profiles. Submit your results →
Contributing
Contributions welcome! Open issues and pull requests.
License
MIT License - see LICENSE for details.
OpenUMA - Making every x86 machine a first-class AI citizen.
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
githubtrunk/903193372bfbe50205ce21ee9808cbda81d480b6: [MPS] Fix hi/lo swap typo in Metal Philox RNG single_round (#179227)
mulhilo returns (hi, lo) as (.x, .y) , but single_round was using .y (lo) in the XOR position and .x (hi) as the passthrough — the reverse of the standard Philox Feistel structure. This caused c10::metal::randn to produce biased normal samples (mean ≈ -0.076 instead of 0.0). Add test for c10::metal::randn in TestMetalLibrary that for a given seed computed mean and std and make sure it falls within certain tolerance levels. CUDA implementation of the same algorithm for referenc https://github.com/pytorch/pytorch/blob/7231118db31/aten/src/ATen/core/PhiloxRNGEngine.h#L202-L213 Found with help of Claude while debugging bias in torch.distributions.Gamma implementaiton Pull Request resolved: #179227 Approved by: https://github.com/Skylion007
viable/strict/1775228618: [DTensor] redistribute from/to _StridedShard through Replicate (#179059)
why care about redistributing from/to _StridedShard. As I was fixing _StridedShard.full_tensor(), I found cartesian_prod goes through _view_ops.py to generate _StridedShard, becuase of decomposation to meshgrid → flatten → stack. It triggers _StridedShard-to-Shard redistribution and ended up with Runtime error This PR propose redistributing from/to _StridedShard through Replicate. It's not optimal but it ensures correctness. @zpcore might have a more efficient solution repro cartesian_prod import torch import torch.distributed as dist from torch.distributed.tensor import DTensor, Shard, Replicate, init_device_mesh import os dist.init_process_group(backend="gloo") rank = dist.get_rank() mesh = init_device_mesh("cpu", (2,)) # Reference result on full tensors a_full = torch.tensor([1, 2, 3, 4

GPT-5.1 Codex, GPT-5.1-Codex-Max, and GPT-5.1-Codex-Mini deprecated
We have deprecated the following models across all GitHub Copilot experiences (including Copilot Chat, inline edits, ask and agent modes, and code completions) on April 1, 2026. Model Deprecation date The post GPT-5.1 Codex, GPT-5.1-Codex-Max, and GPT-5.1-Codex-Mini deprecated appeared first on The GitHub Blog .
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Products
March Madness 2026: How to watch the Final Four
Let’s face it: your bracket was probably busted a long time ago. The 2026 NCAA basketball tournaments, affectionately known as March Madness , are ending soon. The Final Four for both the men’s and women’s tournaments starts this weekend. Both the men’s and women’s tournaments are available to stream through various apps and services, but navigating the web of broadcasters and TV channels can be confusing. We’ve broken down when all the games are happening, where to watch and the best options for saving some cash doing so. What does the Final Four start? The men’s NCAA Basketball Tournament Final Four begins on Saturday, April 4 with two games. The first game begins at 6:09PM ET with the second following at 8:49PM ET. The winners will then face each other for the national championship on M

Take-Two laid off the head its AI division and an undisclosed number of staff
Take-Two, the owner of Grand Theft Auto developer Rockstar Games, has seemingly laid off the head of its AI division, Luke Dicken, and several staff members working under him. "It’s truly disappointing that I have to share with you that my time with T2 — and that of my team — has come to an end," Dicken shared in a LinkedIn post spotted by Game Developer . When asked to confirm the layoffs in its AI division, Take-Two declined to comment. Dicken writes that his team was "developing cutting edge technology to support game development" and his post specifically notes that he's trying to find roles for staff with experience in things like "procedural content for games" and "machine learning." It's unclear how many people other than Dicken have been impacted by these layoffs, but the timing


Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!