OpenUMA – bring Apple-style unified memory to x86 AI inference (Rust, Linux)

Hacker News AI Topby hamtun24April 3, 20265 min read1 views

Article URL: https://github.com/hamtun24/openuma Comments URL: https://news.ycombinator.com/item?id=47624865 Points: 1 # Comments: 0

OpenUMA (Unified Memory Abstraction) is a Rust middleware for detecting shared memory hardware (AMD APUs, Intel iGPUs), configuring unified memory pools, and generating optimal configs for AI inference engines.

┌─────────────────────────────────────────────────────────────────────────┐ │ OpenUMA v0.6.2 │ │ Unified Memory Abstraction for AI Inference │ └─────────────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────────────┐ │ OpenUMA v0.6.2 │ │ Unified Memory Abstraction for AI Inference │ └─────────────────────────────────────────────────────────────────────────┘

Key Features

Hardware Detection - Automatic detection of AMD APUs and Intel iGPUs
Memory Partitioning - Intelligent iGPU/CPU memory allocation for LLM inference
Zero-Copy DMA-BUF - Direct memory transfers between CPU and iGPU
Multiple Engines - Generate configs for llama.cpp, Ollama, and KTransformers
Interactive TUI - Full terminal UI for hardware monitoring and configuration
Benchmarking - Real inference benchmarks with llama.cpp

Supported Hardware

Vendor Series Examples

AMD Zen 3 (Cezanne, Renoir) Ryzen 5 5600G, Ryzen 7 5700G

AMD Zen 4 (Phoenix, Hawk Point) Ryzen 7 7840HS, Ryzen AI 9 HX 370

AMD Zen 5 (Strix Point) Ryzen AI 9 HX 370, Ryzen AI 7 350

Intel Alder Lake, Raptor Lake Core i5-1240P, Core i7-12700H

Intel Meteor Lake, Lunar Lake Core Ultra 5 125H, Core Ultra 7 258V

Quick Start

# Build cargo build --release

# Build cargo build --release

Detect hardware

./target/release/openuma probe

Launch interactive TUI

./target/release/openuma tui

Generate config for llama.cpp

./target/release/openuma configure --engine llamacpp --model model.gguf`

Terminal UI

┌─────────────────────────────────────────────────────────────────────────┐ │ [D]ashboard [M]emory [B]enchmark [P]rofiles [C]onfigure [S]ettings│ ├─────────────────────────────────────────────────────────────────────────┤ │ │ │ ╔═══════════════════════════════════════════════════════════════════╗ │ │ ║ Hardware Overview ║ │ │ ╠═══════════════════════════════════════════════════════════════════╣ │ │ ║ CPU AMD Ryzen 5 5600G (Cezanne) ║ │ │ ║ 6 cores (12 threads), AVX2, 16MB L3 ║ │ │ ║ iGPU AMD Vega7 (Raven Ridge) ║ │ │ ║ 7 CUs, 512MB / 16384MB shared VRAM ║ │ │ ║ Vulkan ✓ OpenCL ✓ Zero-copy ✓ ║ │ │ ║ RAM 32GB DDR4-3200 (Dual-channel) ║ │ │ ║ 51.2 GB/s theoretical, 46.8 GB/s measured ║ │ │ ╠═══════════════════════════════════════════════════════════════════╣ │ │ ║ ✓ Unified Memory Available Tier: CONSUMER_UMA ║ │ │ ╚═══════════════════════════════════════════════════════════════════╝ │ │ │ │ Memory Partition: iGPU: 7168 MB (35.0%) CPU: 13312 MB (65.0%) │ │ [████████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░] 35% │ │ │ │ Strategy: HybridIgpu Zero-copy: Available │ │ │ │ [r] Refresh [q] Quit │ └─────────────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────────────┐ │ [D]ashboard [M]emory [B]enchmark [P]rofiles [C]onfigure [S]ettings│ ├─────────────────────────────────────────────────────────────────────────┤ │ │ │ ╔═══════════════════════════════════════════════════════════════════╗ │ │ ║ Hardware Overview ║ │ │ ╠═══════════════════════════════════════════════════════════════════╣ │ │ ║ CPU AMD Ryzen 5 5600G (Cezanne) ║ │ │ ║ 6 cores (12 threads), AVX2, 16MB L3 ║ │ │ ║ iGPU AMD Vega7 (Raven Ridge) ║ │ │ ║ 7 CUs, 512MB / 16384MB shared VRAM ║ │ │ ║ Vulkan ✓ OpenCL ✓ Zero-copy ✓ ║ │ │ ║ RAM 32GB DDR4-3200 (Dual-channel) ║ │ │ ║ 51.2 GB/s theoretical, 46.8 GB/s measured ║ │ │ ╠═══════════════════════════════════════════════════════════════════╣ │ │ ║ ✓ Unified Memory Available Tier: CONSUMER_UMA ║ │ │ ╚═══════════════════════════════════════════════════════════════════╝ │ │ │ │ Memory Partition: iGPU: 7168 MB (35.0%) CPU: 13312 MB (65.0%) │ │ [████████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░] 35% │ │ │ │ Strategy: HybridIgpu Zero-copy: Available │ │ │ │ [r] Refresh [q] Quit │ └─────────────────────────────────────────────────────────────────────────┘

Commands

Command Description

openuma probe Detect hardware profile

openuma tui Launch interactive terminal UI

openuma partition --model Show memory partition for model

openuma configure --engine --model Generate engine config

openuma benchmark --model Run inference benchmark

openuma zerocopy --test Test DMA-BUF zero-copy

openuma serve Start REST API server

openuma profile list List known hardware profiles

Supported Inference Engines

llama.cpp

openuma configure --engine llamacpp --model llama3-8b-q4_k_m.gguf

Ollama

openuma configure --engine ollama --model llama3-8b-q4_k_m.gguf

KTransformers (MoE models)

openuma configure --engine ktransformers --model deepseek-v3-q4km.gguf

How It Works

Memory Model

┌────────────────────────────────────────────────────────────────┐ │ Unified Memory Pool │ │ │ │ ┌──────────────┐ ┌──────────────────────┐ │ │ │ iGPU VRAM │ ◄── Zero-Copy ──►│ System RAM │ │ │ │ (Shared) │ DMA-BUF │ (DDR4/DDR5) │ │ │ └──────────────┘ └──────────────────────┘ │ │ │ │ Attention layers benefit from iGPU │ │ MoE experts stay on CPU │ └────────────────────────────────────────────────────────────────┘

┌────────────────────────────────────────────────────────────────┐ │ Unified Memory Pool │ │ │ │ ┌──────────────┐ ┌──────────────────────┐ │ │ │ iGPU VRAM │ ◄── Zero-Copy ──►│ System RAM │ │ │ │ (Shared) │ DMA-BUF │ (DDR4/DDR5) │ │ │ └──────────────┘ └──────────────────────┘ │ │ │ │ Attention layers benefit from iGPU │ │ MoE experts stay on CPU │ └────────────────────────────────────────────────────────────────┘

Key Insight

For LLM inference on APUs:

Attention layers → benefit from iGPU (parallel matrix ops)
MoE expert layers → should stay on CPU (sparse activation)
KV cache → benefits from unified memory zero-copy

Benchmarking

# Quick benchmark openuma benchmark --model llama3-8b-q4_k_m.gguf

# Quick benchmark openuma benchmark --model llama3-8b-q4_k_m.gguf

Full multi-backend comparison

openuma benchmark --model model.gguf --full

╔════════════════════════════════════════════════════════════════════╗ ║ OpenUMA Benchmark Report ║ ╠════════════════════════════════════════════════════════════════════╣ ║ Best Backend: vulkan (12.5 t/s) ║ Average TPS: 8.2 ╠════════════════════════════════════════════════════════════════════╣ ║ Test 1: model.gguf [vulkan] ║ └── 12.5 tokens/sec | 8000 ms ║ Test 2: model.gguf [opencl] ║ └── 10.2 tokens/sec | 9800 ms ║ Test 3: model.gguf [cpu] ║ └── 4.8 tokens/sec | 20800 ms ╠════════════════════════════════════════════════════════════════════╣ ║ Recommendations: ║ • Best performing backend: vulkan (~12.5 tokens/sec) ║ • GPU acceleration provides 2.6x speedup over CPU ╚════════════════════════════════════════════════════════════════════╝`

Architecture

openuma/ ├── crates/ │ ├── hw_probe/ # Hardware detection │ ├── mem_mgr/ # Memory partitioning + zero-copy │ ├── config_gen/ # Model metadata (GGUF) │ ├── profile_db/ # Hardware profile database │ ├── benchmark/ # Inference benchmarking │ ├── api_server/ # REST API │ ├── cli/ # CLI interface │ └── tui/ # Terminal UI └── profiles/ # Hardware profiles

openuma/ ├── crates/ │ ├── hw_probe/ # Hardware detection │ ├── mem_mgr/ # Memory partitioning + zero-copy │ ├── config_gen/ # Model metadata (GGUF) │ ├── profile_db/ # Hardware profile database │ ├── benchmark/ # Inference benchmarking │ ├── api_server/ # REST API │ ├── cli/ # CLI interface │ └── tui/ # Terminal UI └── profiles/ # Hardware profiles

Installation

Option A — Download Binary (Linux x86_64)

# Download latest release curl -L https://github.com/hamtun24/openuma/releases/latest/download/openuma-linux-x86_64.tar.gz \  | tar xz

# Download latest release curl -L https://github.com/hamtun24/openuma/releases/latest/download/openuma-linux-x86_64.tar.gz \  | tar xz

Run it

./openuma probe`

Option B — Build from Source

# Prerequisites: Rust 1.70+ git clone https://github.com/hamtun24/openuma.git cd openuma cargo build --release ./target/release/openuma probe

# Prerequisites: Rust 1.70+ git clone https://github.com/hamtun24/openuma.git cd openuma cargo build --release ./target/release/openuma probe

System Requirements

Requirement Notes

OS Linux (kernel 5.10+)

CPU Any x86_64 with AMD APU or Intel iGPU

RAM 16GB minimum, 32GB recommended

Optional llama.cpp in PATH for real benchmarks

Optional Vulkan drivers for iGPU acceleration

Install Vulkan Drivers (if missing)

# AMD iGPU sudo apt install mesa-vulkan-drivers

# AMD iGPU sudo apt install mesa-vulkan-drivers

Intel iGPU

sudo apt install intel-media-va-driver mesa-vulkan-drivers`

Install llama.cpp (optional)

git clone https://github.com/ggerganov/llama.cpp cd llama.cpp mkdir build && cd build cmake .. -DLLAMA_BUILD_EXAMPLES=ON make -j$(nproc) export PATH="$PATH:$(pwd)/bin"

git clone https://github.com/ggerganov/llama.cpp cd llama.cpp mkdir build && cd build cmake .. -DLLAMA_BUILD_EXAMPLES=ON make -j$(nproc) export PATH="$PATH:$(pwd)/bin"

Real World Results

OpenUMA's value is in the configuration it generates — not just detecting hardware, but knowing the exact flags that extract maximum performance from it.

Example: AMD Ryzen 5 5600G + 32GB DDR4

Setup Command Tokens/sec

llama.cpp defaults llama-cli -m model.gguf ~3.1 t/s

OpenUMA-configured openuma configure --engine llamacpp --model model.gguf ~7.2 t/s

Improvement

+132%

What OpenUMA changed:

Enabled Vulkan backend (default is CPU)
Set correct --n-gpu-layers for available shared VRAM
Configured dual-channel memory-aware thread count
Disabled mmap in favor of zero-copy DMA-BUF path

Note: Numbers above are estimates from the profile database for this hardware. Run openuma benchmark --model your-model.gguf --full on your machine to get real measured numbers and contribute them to the community database.

Community Benchmarks

This section will grow as users submit hardware profiles. Submit your results →

Contributing

Contributions welcome! Open issues and pull requests.

License

MIT License - see LICENSE for details.

OpenUMA - Making every x86 machine a first-class AI citizen.

Original source

Hacker News AI Top

https://github.com/hamtun24/openuma

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

github

ModelsFresh

trunk/903193372bfbe50205ce21ee9808cbda81d480b6: [MPS] Fix hi/lo swap typo in Metal Philox RNG single_round (#179227)

mulhilo returns (hi, lo) as (.x, .y) , but single_round was using .y (lo) in the XOR position and .x (hi) as the passthrough — the reverse of the standard Philox Feistel structure. This caused c10::metal::randn to produce biased normal samples (mean ≈ -0.076 instead of 0.0). Add test for c10::metal::randn in TestMetalLibrary that for a given seed computed mean and std and make sure it falls within certain tolerance levels. CUDA implementation of the same algorithm for referenc https://github.com/pytorch/pytorch/blob/7231118db31/aten/src/ATen/core/PhiloxRNGEngine.h#L202-L213 Found with help of Claude while debugging bias in torch.distributions.Gamma implementaiton Pull Request resolved: #179227 Approved by: https://github.com/Skylion007

PyTorch Releases

1mabout 4 hours ago

ProductsFresh

viable/strict/1775228618: [DTensor] redistribute from/to _StridedShard through Replicate (#179059)

why care about redistributing from/to _StridedShard. As I was fixing _StridedShard.full_tensor(), I found cartesian_prod goes through _view_ops.py to generate _StridedShard, becuase of decomposation to meshgrid → flatten → stack. It triggers _StridedShard-to-Shard redistribution and ended up with Runtime error This PR propose redistributing from/to _StridedShard through Replicate. It's not optimal but it ensures correctness. @zpcore might have a more efficient solution repro cartesian_prod import torch import torch.distributed as dist from torch.distributed.tensor import DTensor, Shard, Replicate, init_device_mesh import os dist.init_process_group(backend="gloo") rank = dist.get_rank() mesh = init_device_mesh("cpu", (2,)) # Reference result on full tensors a_full = torch.tensor([1, 2, 3, 4

PyTorch Releases

1mabout 8 hours ago

ModelsLive

GPT-5.1 Codex, GPT-5.1-Codex-Max, and GPT-5.1-Codex-Mini deprecated

We have deprecated the following models across all GitHub Copilot experiences (including Copilot Chat, inline edits, ask and agent modes, and code completions) on April 1, 2026. Model Deprecation date The post GPT-5.1 Codex, GPT-5.1-Codex-Max, and GPT-5.1-Codex-Mini deprecated appeared first on The GitHub Blog .

GitHub Copilot Changelog

1mabout 2 hours ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 182 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Products

ProductsLive

Automakers are teaming up, speeding up, and hoping AI can help them down a tough road ahead

Executives at the New York Auto Show are all saying the same thing: The industry needs AI to speed up production and reduce costs in order to survive.

Business Insider

6mabout 1 hour ago

ProductsLive

March Madness 2026: How to watch the Final Four

Let’s face it: your bracket was probably busted a long time ago. The 2026 NCAA basketball tournaments, affectionately known as March Madness , are ending soon. The Final Four for both the men’s and women’s tournaments starts this weekend. Both the men’s and women’s tournaments are available to stream through various apps and services, but navigating the web of broadcasters and TV channels can be confusing. We’ve broken down when all the games are happening, where to watch and the best options for saving some cash doing so. What does the Final Four start? The men’s NCAA Basketball Tournament Final Four begins on Saturday, April 4 with two games. The first game begins at 6:09PM ET with the second following at 8:49PM ET. The winners will then face each other for the national championship on M

Engadget

7m29 minutes ago

ProductsLive

Take-Two laid off the head its AI division and an undisclosed number of staff

Take-Two, the owner of Grand Theft Auto developer Rockstar Games, has seemingly laid off the head of its AI division, Luke Dicken, and several staff members working under him. "It’s truly disappointing that I have to share with you that my time with T2 — and that of my team — has come to an end," Dicken shared in a LinkedIn post spotted by Game Developer . When asked to confirm the layoffs in its AI division, Take-Two declined to comment. Dicken writes that his team was "developing cutting edge technology to support game development" and his post specifically notes that he's trying to find roles for staff with experience in things like "procedural content for games" and "machine learning." It's unclear how many people other than Dicken have been impacted by these layoffs, but the timing

Engadget

2m34 minutes ago

ProductsLive

Best HSE Software in 2026: Top 10 Platforms for Safety Professionals

A comprehensive comparison of HSE management software for construction, oil and gas, and manufacturing teams. Continue reading on Medium »

Medium AI

1m42 minutes ago