Open Source AI model open-source million report research github

Beyond Static RAG: Using 1958 Biochemistry to Beat Multi-Hop Retrieval by 14%

DEV Communityby EmilApril 1, 20262 min read0 views

Standard Retrieval-Augmented Generation (RAG) often falls short on complex, multi-hop questions because it relies on static "lock and key" query matching. If the information needed to answer a query is semantically distant from the original text, standard vector search simply won't find it. We've developed Induced-Fit Retrieval (IFR), a dynamic graph traversal approach that mutates the query vector at every step to discover semantically distant but logically connected information. The Core Results We ran our prototype through a rigorous test suite of 30 queries across multiple graph sizes, up to 5.2 million atoms. 14.3% higher nDCG@10 compared to a competitive RAG-rerank baseline. 15% Multi-hop Hit@20 in scenarios where traditional RAG methods scored 0%.<

Standard Retrieval-Augmented Generation (RAG) often falls short on complex, multi-hop questions because it relies on static "lock and key" query matching. If the information needed to answer a query is semantically distant from the original text, standard vector search simply won't find it.

We've developed Induced-Fit Retrieval (IFR), a dynamic graph traversal approach that mutates the query vector at every step to discover semantically distant but logically connected information.

The Core Results We ran our prototype through a rigorous test suite of 30 queries across multiple graph sizes, up to 5.2 million atoms.

14.3% higher nDCG@10 compared to a competitive RAG-rerank baseline.

15% Multi-hop Hit@20 in scenarios where traditional RAG methods scored 0%.

O(1) Latency Scaling: Latency remains near 10ms whether searching 100 atoms or 5.2 million.

Why Biochemistry? The system is inspired by Daniel Koshland’s 1958 "induced fit" model. In biology, enzymes change shape upon encountering a substrate to improve binding.

IFR applies this to Information Retrieval: instead of a static query vector, the vector mutates at each hop based on the visited node's embedding. This allows the query to follow the "curved manifolds" of high-dimensional embedding space that a fixed vector cannot reach.

Lessons from the Data Transparency is key to research, so we are also sharing our failures:

Catastrophic Drift: 67% of our failures occurred because the query mutated too aggressively, losing its original intent.

The Solution: v2 will implement an "Alpha Floor" to preserve at least 50% of the original query signal at all times.

We have open-sourced the prototype, our 18 raw JSON result logs, ablation studies, and full technical reports.

Check out the repo on GitHub: https://github.com/emil-celestix/celestix-ifr

Original source

DEV Community

https://dev.to/emilcelestix/beyond-static-rag-using-1958-biochemistry-to-beat-multi-hop-retrieval-by-14-4hfn

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

modelopen-sourcemillion

Models

A meta-analysis of the persuasive power of large language models - Nature

<a href="https://news.google.com/rss/articles/CBMiX0FVX3lxTFBtVkFYLUROMVdUY09HLWF5ZXl2TTBtNHJrSXhBQTRSLWtxUi1mQ2g3cmVBMVF2WnlELVNhUlFnNU41UDdNMDBWRHFZalJYTWdYVE5KcjNfVURLbkNFVTJj?oc=5" target="_blank">A meta-analysis of the persuasive power of large language models</a> Nature

Google News: LLM

1m4 months ago

Research Papers

Australian govt partners Anthropic on AI safety, research and infrastructure - Telecompaper

<a href="https://news.google.com/rss/articles/CBMiugFBVV95cUxNUjhfY3dKRFdBV3hIOW1PMXE4M1g2SGZkbjYxTWozbFBKdW1HN0RrU0VfdVRfbEt6MW0tRUhiQWsxUXppMzlnQk10SnVTZjY5MXBNVlYzWEtOeUZYSXBqTFZZb2lqX2hnRlZjV0pWMzkzNE5CNDl0TWV2MEczVHI2eGVIR0pZeFJTUE90VFNWSUkxdnloZzlYcHB4b0VRdC1QcXYxME0wRlFGVnAwaGhiYURNT1lYRkdOeEE?oc=5" target="_blank">Australian govt partners Anthropic on AI safety, research and infrastructure</a> Telecompaper

Google News: AI Safety

1m2 days ago

Models

Large language models in psychology - Nature

<a href="https://news.google.com/rss/articles/CBMiWEFVX3lxTE5ocmtjRFJXU1NaZ3pDZnc5WmoxUU56RlZ3Sy1CUTduYlh1YU52bEROb2pwUVBMRDgyWGNuYVQ0SHQ0c2djdHVmR1c2TUlrV1Vxa3JGbHRsWjA?oc=5" target="_blank">Large language models in psychology</a> Nature

Google News: LLM

1m4 months ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 235 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Open Source AI

Open Source AILive

Anthropic releases part of AI tool source code in 'error' - wataugademocrat.com

<a href="https://news.google.com/rss/articles/CBMi5gFBVV95cUxNTG5FV3JLdGlWWllNTUdFa2o1aTk1NHRQZmFZNnZKVzJHY2RJTzdvN3dxS0stU1BoTnRSeXlwcUF0YnpaZ1dVYl9sS1J3QzRjSDViREdLWlhHVlBiMi1OaXBkUXZ4S013MlRVMWZRM0tZeTJkZ1d1OWhaMDhSalhpOUd3SkdSYm95WlBHZG9mZzVVbk5OblFtRlQ1YU5zd3hCd1h5RVFJdzE5MzlGOS1vNmdwT2FENjNlUEpTcEtmRXdFc0pJcGdJcUlOaEVIZkFtbWI4bHBVbHk2QWVjeGJpT0RlblVHZw?oc=5" target="_blank">Anthropic releases part of AI tool source code in 'error'</a> wataugademocrat.com

Google News: Claude

1mabout 1 hour ago

Open Source AIFresh

GITEX AI ASIA 2026: Asia’s $78B AI and Quantum Inflection Point - Techsauce

<a href="https://news.google.com/rss/articles/CBMigAFBVV95cUxNZlAtTW5kdWFDSWNad2lOZXUxVTB5ZGZvVWpJRFF2NDY1X1NuWVR4WjZUR1N5c2h4cGFLZXFvUmNYZV8tRlJucWNzOUZjQ1ZibDhnUDkzSnR1eFhXMkNlbzlyTXRzZHFSM1ZaYVdoazQyTTJCbmJOSDB1ZkJrd0FWSg?oc=5" target="_blank">GITEX AI ASIA 2026: Asia’s $78B AI and Quantum Inflection Point</a> Techsauce

GNews AI Singapore

1mabout 2 hours ago

Open Source AIFresh

trunk/2aca6d4b70df9d8d03ca75b38521d76ddb95a56f: Revert "[FSDP2] add fqn to communication ops (#173838)"

This reverts commit <a class="commit-link" data-hovercard-type="commit" data-hovercard-url="https://github.com/pytorch/pytorch/commit/894b713e9815a80fd802e64714d8f283e139a104/hovercard" href="https://github.com/pytorch/pytorch/commit/894b713e9815a80fd802e64714d8f283e139a104"><tt>894b713</tt></a>. Reverted <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="3872587356" data-permission-text="Title is private" data-url="https://github.com/pytorch/pytorch/issues/173838" data-hovercard-type="pull_request" data-hovercard-url="/pytorch/pytorch/pull/173838/hovercard" href="https://github.com/pytorch/pytorch/pull/173838">#173838</a> on behalf of <a href="https://github.com/izaitsevfb">https://github.com/izaitsevfb</a> due to reverted internally (<a href="ht

PyTorch Releases

1mabout 6 hours ago

Open Source AIFresh

b8606

<details open=""> ggml-webgpu: port all AOT operators to JIT (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="4097310957" data-permission-text="Title is private" data-url="https://github.com/ggml-org/llama.cpp/issues/20728" data-hovercard-type="pull_request" data-hovercard-url="/ggml-org/llama.cpp/pull/20728/hovercard" href="https://github.com/ggml-org/llama.cpp/pull/20728">#20728</a>) <ul> <li>port cpy pipeline to shader lib with JIT compilation</li> <li>port glu pipeline to shader lib with JIT compilation</li> <li>port rope pipeline to shader lib with JIT compilation</li> <li>port soft_max pipeline to shader lib with JIT compilation</li> <li>removed unused functions from embed_wgsl.py which were used for old AOT template expansion</li> </ul>

llama.cpp Releases

1mabout 7 hours ago