Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessThe Cathedral, the Bazaar, and the Winchester Mystery HouseO'Reilly RadarSources: Mercor asked professionals in fields like entertainment to sell their prior work materials for AI training, even if the IP could belong to ex-employers (Katherine Bindley/Wall Street Journal)TechmemeMarch Madness 2026: How to watch the Final FourEngadgetSony buys machine learning firm behind volumetric 3D images to level-up PlayStation tech - TweakTownGoogle News: Machine LearningTake-Two laid off the head its AI division and an undisclosed number of staffEngadgetThe Week’s 10 Biggest Funding Rounds: Largest Financings Went To Defense, Wearables, Energy And SecurityCrunchbase NewsAutomated Security Assertion Generation Using LLMs (U. of Florida) - Semiconductor EngineeringGoogle News: LLMStop Using Robotic AI Voices — Here’s How to Make Them Sound Human (For Free)Medium AILangChain4j TokenWindowChatMemory Crash: IndexOutOfBoundsException Explained and FixedMedium AIGoogle TurboQuant Codes explainedMedium AIStop Storing Data in CSV Like It’s 2010-Apache Parquet Will Change How You Think About StorageMedium AIBest HSE Software in 2026: Top 10 Platforms for Safety ProfessionalsMedium AIBlack Hat USADark ReadingBlack Hat AsiaAI BusinessThe Cathedral, the Bazaar, and the Winchester Mystery HouseO'Reilly RadarSources: Mercor asked professionals in fields like entertainment to sell their prior work materials for AI training, even if the IP could belong to ex-employers (Katherine Bindley/Wall Street Journal)TechmemeMarch Madness 2026: How to watch the Final FourEngadgetSony buys machine learning firm behind volumetric 3D images to level-up PlayStation tech - TweakTownGoogle News: Machine LearningTake-Two laid off the head its AI division and an undisclosed number of staffEngadgetThe Week’s 10 Biggest Funding Rounds: Largest Financings Went To Defense, Wearables, Energy And SecurityCrunchbase NewsAutomated Security Assertion Generation Using LLMs (U. of Florida) - Semiconductor EngineeringGoogle News: LLMStop Using Robotic AI Voices — Here’s How to Make Them Sound Human (For Free)Medium AILangChain4j TokenWindowChatMemory Crash: IndexOutOfBoundsException Explained and FixedMedium AIGoogle TurboQuant Codes explainedMedium AIStop Storing Data in CSV Like It’s 2010-Apache Parquet Will Change How You Think About StorageMedium AIBest HSE Software in 2026: Top 10 Platforms for Safety ProfessionalsMedium AI
AI NEWS HUBbyEIGENVECTOREigenvector

OpenUMA – bring Apple-style unified memory to x86 AI inference (Rust, Linux)

Hacker News AI Topby hamtun24April 3, 20265 min read1 views
Source Quiz

Article URL: https://github.com/hamtun24/openuma Comments URL: https://news.ycombinator.com/item?id=47624865 Points: 1 # Comments: 0

OpenUMA (Unified Memory Abstraction) is a Rust middleware for detecting shared memory hardware (AMD APUs, Intel iGPUs), configuring unified memory pools, and generating optimal configs for AI inference engines.

┌─────────────────────────────────────────────────────────────────────────┐ │ OpenUMA v0.6.2 │ │ Unified Memory Abstraction for AI Inference │ └─────────────────────────────────────────────────────────────────────────┘

Key Features

  • Hardware Detection - Automatic detection of AMD APUs and Intel iGPUs

  • Memory Partitioning - Intelligent iGPU/CPU memory allocation for LLM inference

  • Zero-Copy DMA-BUF - Direct memory transfers between CPU and iGPU

  • Multiple Engines - Generate configs for llama.cpp, Ollama, and KTransformers

  • Interactive TUI - Full terminal UI for hardware monitoring and configuration

  • Benchmarking - Real inference benchmarks with llama.cpp

Supported Hardware

Vendor Series Examples

AMD Zen 3 (Cezanne, Renoir) Ryzen 5 5600G, Ryzen 7 5700G

AMD Zen 4 (Phoenix, Hawk Point) Ryzen 7 7840HS, Ryzen AI 9 HX 370

AMD Zen 5 (Strix Point) Ryzen AI 9 HX 370, Ryzen AI 7 350

Intel Alder Lake, Raptor Lake Core i5-1240P, Core i7-12700H

Intel Meteor Lake, Lunar Lake Core Ultra 5 125H, Core Ultra 7 258V

Quick Start

# Build cargo build --release

Detect hardware

./target/release/openuma probe

Launch interactive TUI

./target/release/openuma tui

Generate config for llama.cpp

./target/release/openuma configure --engine llamacpp --model model.gguf`

Terminal UI

┌─────────────────────────────────────────────────────────────────────────┐ │ [D]ashboard [M]emory [B]enchmark [P]rofiles [C]onfigure [S]ettings│ ├─────────────────────────────────────────────────────────────────────────┤ │ │ │ ╔═══════════════════════════════════════════════════════════════════╗ │ │ ║ Hardware Overview ║ │ │ ╠═══════════════════════════════════════════════════════════════════╣ │ │ ║ CPU AMD Ryzen 5 5600G (Cezanne) ║ │ │ ║ 6 cores (12 threads), AVX2, 16MB L3 ║ │ │ ║ iGPU AMD Vega7 (Raven Ridge) ║ │ │ ║ 7 CUs, 512MB / 16384MB shared VRAM ║ │ │ ║ Vulkan ✓ OpenCL ✓ Zero-copy ✓ ║ │ │ ║ RAM 32GB DDR4-3200 (Dual-channel) ║ │ │ ║ 51.2 GB/s theoretical, 46.8 GB/s measured ║ │ │ ╠═══════════════════════════════════════════════════════════════════╣ │ │ ║ ✓ Unified Memory Available Tier: CONSUMER_UMA ║ │ │ ╚═══════════════════════════════════════════════════════════════════╝ │ │ │ │ Memory Partition: iGPU: 7168 MB (35.0%) CPU: 13312 MB (65.0%) │ │ [████████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░] 35% │ │ │ │ Strategy: HybridIgpu Zero-copy: Available │ │ │ │ [r] Refresh [q] Quit │ └─────────────────────────────────────────────────────────────────────────┘

Commands

Command Description

openuma probe Detect hardware profile

openuma tui Launch interactive terminal UI

openuma partition --model Show memory partition for model

openuma configure --engine --model Generate engine config

openuma benchmark --model Run inference benchmark

openuma zerocopy --test Test DMA-BUF zero-copy

openuma serve Start REST API server

openuma profile list List known hardware profiles

Supported Inference Engines

llama.cpp

openuma configure --engine llamacpp --model llama3-8b-q4_k_m.gguf

Ollama

openuma configure --engine ollama --model llama3-8b-q4_k_m.gguf

KTransformers (MoE models)

openuma configure --engine ktransformers --model deepseek-v3-q4km.gguf

How It Works

Memory Model

┌────────────────────────────────────────────────────────────────┐ │ Unified Memory Pool │ │ │ │ ┌──────────────┐ ┌──────────────────────┐ │ │ │ iGPU VRAM │ ◄── Zero-Copy ──►│ System RAM │ │ │ │ (Shared) │ DMA-BUF │ (DDR4/DDR5) │ │ │ └──────────────┘ └──────────────────────┘ │ │ │ │ Attention layers benefit from iGPU │ │ MoE experts stay on CPU │ └────────────────────────────────────────────────────────────────┘

Key Insight

For LLM inference on APUs:

  • Attention layers → benefit from iGPU (parallel matrix ops)

  • MoE expert layers → should stay on CPU (sparse activation)

  • KV cache → benefits from unified memory zero-copy

Benchmarking

# Quick benchmark openuma benchmark --model llama3-8b-q4_k_m.gguf

Full multi-backend comparison

openuma benchmark --model model.gguf --full

 
╔════════════════════════════════════════════════════════════════════╗ ║ OpenUMA Benchmark Report ║ ╠════════════════════════════════════════════════════════════════════╣ ║ Best Backend: vulkan (12.5 t/s) ║ Average TPS: 8.2 ╠════════════════════════════════════════════════════════════════════╣ ║ Test 1: model.gguf [vulkan] ║ └── 12.5 tokens/sec | 8000 ms ║ Test 2: model.gguf [opencl] ║ └── 10.2 tokens/sec | 9800 ms ║ Test 3: model.gguf [cpu] ║ └── 4.8 tokens/sec | 20800 ms ╠════════════════════════════════════════════════════════════════════╣ ║ Recommendations: ║ • Best performing backend: vulkan (~12.5 tokens/sec) ║ • GPU acceleration provides 2.6x speedup over CPU ╚════════════════════════════════════════════════════════════════════╝`

Architecture

openuma/ ├── crates/ │ ├── hw_probe/ # Hardware detection │ ├── mem_mgr/ # Memory partitioning + zero-copy │ ├── config_gen/ # Model metadata (GGUF) │ ├── profile_db/ # Hardware profile database │ ├── benchmark/ # Inference benchmarking │ ├── api_server/ # REST API │ ├── cli/ # CLI interface │ └── tui/ # Terminal UI └── profiles/ # Hardware profiles

Installation

Option A — Download Binary (Linux x86_64)

# Download latest release curl -L https://github.com/hamtun24/openuma/releases/latest/download/openuma-linux-x86_64.tar.gz \  | tar xz

Run it

./openuma probe`

Option B — Build from Source

# Prerequisites: Rust 1.70+ git clone https://github.com/hamtun24/openuma.git cd openuma cargo build --release ./target/release/openuma probe

System Requirements

Requirement Notes

OS Linux (kernel 5.10+)

CPU Any x86_64 with AMD APU or Intel iGPU

RAM 16GB minimum, 32GB recommended

Optional llama.cpp in PATH for real benchmarks

Optional Vulkan drivers for iGPU acceleration

Install Vulkan Drivers (if missing)

# AMD iGPU sudo apt install mesa-vulkan-drivers

Intel iGPU

sudo apt install intel-media-va-driver mesa-vulkan-drivers`

Install llama.cpp (optional)

git clone https://github.com/ggerganov/llama.cpp cd llama.cpp mkdir build && cd build cmake .. -DLLAMA_BUILD_EXAMPLES=ON make -j$(nproc) export PATH="$PATH:$(pwd)/bin"

Real World Results

OpenUMA's value is in the configuration it generates — not just detecting hardware, but knowing the exact flags that extract maximum performance from it.

Example: AMD Ryzen 5 5600G + 32GB DDR4

Setup Command Tokens/sec

llama.cpp defaults llama-cli -m model.gguf ~3.1 t/s

OpenUMA-configured openuma configure --engine llamacpp --model model.gguf ~7.2 t/s

Improvement

+132%

What OpenUMA changed:

  • Enabled Vulkan backend (default is CPU)

  • Set correct --n-gpu-layers for available shared VRAM

  • Configured dual-channel memory-aware thread count

  • Disabled mmap in favor of zero-copy DMA-BUF path

Note: Numbers above are estimates from the profile database for this hardware. Run openuma benchmark --model your-model.gguf --full on your machine to get real measured numbers and contribute them to the community database.

Community Benchmarks

This section will grow as users submit hardware profiles. Submit your results →

Contributing

Contributions welcome! Open issues and pull requests.

License

MIT License - see LICENSE for details.

OpenUMA - Making every x86 machine a first-class AI citizen.

Original source

Hacker News AI Top

https://github.com/hamtun24/openuma
Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

github

Knowledge Map

Knowledge Map
TopicsEntitiesSource
OpenUMA – b…githubHacker News…

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 182 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Products