NPoco vs UkrGuru.Sql: When Streaming Beats Buffering
When we talk about database performance in .NET, we often compare ORMs as if they were interchangeable. In practice, the API shape matters just as much as the implementation . In this post, I benchmark NPoco and UkrGuru.Sql using BenchmarkDotNet, focusing on a very common task: reading a large table from SQL Server. The interesting part is not which library wins , but why the numbers differ so much. TL;DR : Streaming rows with IAsyncEnumerable is faster, allocates less, and scales better than loading everything into a list. Test Scenario The setup is intentionally simple and realistic. Database: SQL Server Table: Customers Dataset: SampleStoreLarge (large enough to stress allocations) Columns: CustomerId FullName Email CreatedAt All benchmarks execute the same SQL: SELECT CustomerId , Full
When we talk about database performance in .NET, we often compare ORMs as if they were interchangeable. In practice, the API shape matters just as much as the implementation.
In this post, I benchmark NPoco and UkrGuru.Sql using BenchmarkDotNet, focusing on a very common task: reading a large table from SQL Server. The interesting part is not which library wins, but why the numbers differ so much.
TL;DR: Streaming rows with IAsyncEnumerable is faster, allocates less, and scales better than loading everything into a list.
Test Scenario
The setup is intentionally simple and realistic.
-
Database: SQL Server
-
Table: Customers
-
Dataset: SampleStoreLarge (large enough to stress allocations)
-
Columns:
CustomerId FullName Email CreatedAt
All benchmarks execute the same SQL:
SELECT CustomerId, FullName, Email, CreatedAt FROM Customers
Enter fullscreen mode
Exit fullscreen mode
No filters, no projections — just raw read performance.
Benchmark Code
using BenchmarkDotNet.Attributes; using Microsoft.Data.SqlClient; using NPoco; using UkrGuru.Sql;using BenchmarkDotNet.Attributes; using Microsoft.Data.SqlClient; using NPoco; using UkrGuru.Sql;public class SqlBenchmark { private const string ConnectionString = "Server=(local);Database=SampleStoreLarge;Trusted_Connection=True;TrustServerCertificate=True;";
private const string CommandText = "SELECT CustomerId, FullName, Email, CreatedAt FROM Customers";
[Benchmark] public async Task NPoco_LoadList() { using var connection = new SqlConnection(ConnectionString); await connection.OpenAsync();
using var db = new Database(connection);
var list = await db.FetchAsync(CommandText); return list.Count; }
[Benchmark] public async Task UkrGuru_LoadList() { await using var connection = await DbHelper.CreateConnectionAsync(ConnectionString);
var list = await connection.ReadAsync(CommandText); return list.Count(); }
[Benchmark] public async Task UkrGuru_StreamRows() { int count = 0;
await using var command = await DbHelper.CreateCommandAsync( CommandText, connectionString: ConnectionString);
await foreach (var _ in command.ReadAsync()) count++;_
return count; } }`
Enter fullscreen mode
Exit fullscreen mode
All benchmarks were run in Release mode with BenchmarkDotNet.
Results (Execution Time)
Method Mean StdDev Median
NPoco_LoadList 8.23 ms 0.33 ms 8.22 ms
UkrGuru_LoadList 5.30 ms 0.57 ms 5.34 ms
UkrGuru_StreamRows 3.29 ms 0.14 ms 3.22 ms
At first glance, streaming is already ~2.5× faster than NPoco. But the real story starts when we look at memory.
Results (Memory & GC)
Method Gen0 Gen1 Gen2 Allocated
NPoco_LoadList 367 258 109 4.39 MB
UkrGuru_LoadList 203 188 70 2.33 MB
UkrGuru_StreamRows 164 – – 2.08 MB
This table explains almost everything.
What’s Actually Being Measured?
NPoco_LoadList
-
Uses FetchAsync()
-
Fully materializes a List
-
Allocates buffers and intermediate objects
✅ Idiomatic NPoco usage
❌ No streaming support
NPoco optimizes for developer productivity, not minimal allocations. That’s a valid trade‑off, but it shows up clearly in GC pressure.
UkrGuru_LoadList
-
Also builds a full list
-
Uses a leaner mapping pipeline
-
Roughly half the allocations of NPoco
✅ Same algorithm as NPoco
✅ Less overhead
This is a fair apple‑to‑apple comparison with NPoco’s approach.
UkrGuru_StreamRows
-
Uses IAsyncEnumerable
-
Processes rows one at a time
-
No list allocation
-
No Gen2 collections
✅ True async streaming
✅ Lowest latency
✅ Most stable GC behavior
This is not a micro‑optimization — it’s a different execution model.
Why Streaming Wins
The biggest improvement is not raw speed — it’s memory behavior.
-
Fewer allocations
-
Almost no object promotion
-
No Gen2 collections
That matters a lot under real load: ASP.NET requests, background workers, message consumers, etc.
Streaming doesn’t just run faster — it scales better.
About Fairness
This benchmark is not trying to prove that one ORM is “better” than another.
It compares three distinct patterns:
-
Buffered list materialization (NPoco)
-
Buffered list materialization with fewer abstractions
-
True async streaming
Comparing streaming to buffering is not “ORM vs ORM” — it’s algorithm vs algorithm.
When Should You Use Each?
Use NPoco when:
-
You want simple, expressive data access
-
Loading lists is acceptable
-
Developer time matters more than raw throughput
Use streaming (e.g. UkrGuru.Sql) when:
-
Result sets are large
-
Latency and GC pressure matter
-
You want full control over execution
Final Thoughts
Benchmarks don’t just measure libraries — they measure abstractions and APIs.
If your workload is dominated by large reads, switching from buffered lists to async streaming can cut both execution time and memory pressure dramatically.
Choose the tool that matches your data access pattern, not just the one you’re used to.
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
modelbenchmarkrelease
Vllm gemma4 26b a4b it-nvfp4 run success
#!/usr/bin/env bash set -euo pipefail BASE_DIR="/mnt/d/AI/docker-gemma4" PATCH_DIR="$BASE_DIR/nvfp4_patch" BUILD_DIR="$BASE_DIR/build" HF_CACHE_DIR="$BASE_DIR/hf-cache" LOG_DIR="$BASE_DIR/logs" PATCH_FILE="$PATCH_DIR/gemma4_patched.py" DOCKERFILE_PATH="$BUILD_DIR/Dockerfile" BASE_IMAGE="vllm/vllm-openai:gemma4" PATCHED_IMAGE="vllm-gemma4-nvfp4-patched" CONTAINER_NAME="vllm-gemma4-nvfp4" MODEL_ID="bg-digitalservices/Gemma-4-26B-A4B-it-NVFP4" SERVED_MODEL_NAME="gemma-4-26b-a4b-it-nvfp4" GPU_MEMORY_UTILIZATION="0.88" MAX_MODEL_LEN="512" MAX_NUM_SEQS="1" PORT=" " PATCH_URL=" https://huggingface.co/bg-digitalservices/Gemma-4-26B-A4B-it-NVFP4/resolve/main/gemma4_patched.py?download=true " if [[ -z "${HF_TOKEN:-}" ]]; then echo "[ERROR] HF_TOKEN environment variable is empty." echo "Please run th
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Products

Silverback AI Chatbot Shares an In-Depth Overview of Its AI Chatbot Feature and Its Role in Modern Digital Communication - The Register-Guard
Silverback AI Chatbot Shares an In-Depth Overview of Its AI Chatbot Feature and Its Role in Modern Digital Communication The Register-Guard

Silverback AI Chatbot Outlines AI Chatbot Feature for Structured Digital Interaction and Automated Communication - Palm Beach Daily News
Silverback AI Chatbot Outlines AI Chatbot Feature for Structured Digital Interaction and Automated Communication Palm Beach Daily News
b8672
hexagon: slight optimization for argosrt output init ( #21463 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu x64 (ROCm 7.2) Ubuntu x64 (OpenVINO) Windows: Windows x64 (CPU) Windows arm64 (CPU) Windows x64 (CUDA 12) - CUDA 12.4 DLLs Windows x64 (CUDA 13) - CUDA 13.1 DLLs Windows x64 (Vulkan) Windows x64 (SYCL) Windows x64 (HIP) openEuler: openEuler x86 (310p) openEuler x86 (910b, ACL Graph) openEuler aarch64 (310p) openEuler aarch64 (910b, ACL Graph)

Stop Prompting; Use the Design-Log Method to Build Predictable Tools
The article by Yoav Abrahami introduces the Design-Log Methodology, a structured approach to using AI in software development that combats the "context wall" — where AI models lose track of project history and make inconsistent decisions as codebases grow. The core idea is to maintain a version-controlled ./design-log/ folder in a Git repository, filled with markdown documents that capture design decisions, discussions, and implementation plans at the time they were made. This log acts as a shared brain between the developer and the AI, enabling the AI to act as a collaborative architect rather than just a code generator. By enforcing rules like read before you write, design before implementation, and immutable history, the methodology ensures consistency, reduces errors, and makes AI-assi



Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!