Models claude gemini llama model product legal

MixtureOfAgents: Why One AI Is Worse Than Three

Dev.to AIby КириллApril 3, 20262 min read0 views

The Problem You send a question to GPT-4o. It answers. Sometimes brilliantly, sometimes wrong. You have no way to know which. What if you asked three models the same question and picked the best answer? That is MixtureOfAgents (MoA) — and it works. Real Test I asked 3 models: What is a nominal account (Russian banking)? Groq (Llama 3.3): Wrong. Confused with accounting. DeepSeek: Correct. Civil Code definition. Gemini: Wrong. Mixed with bookkeeping. One model = 33% chance of correct answer. Three models + judge = correct every time . The Code async function consult ( prompt , engines ) { const promises = engines . map ( eng => callEngine ( eng , prompt ) . then ( r => ({ engine : eng , response : r , ok : true })) . catch ( e => ({ engine : eng , error : e . message , ok : false })) ); ret

The Problem

You send a question to GPT-4o. It answers. Sometimes brilliantly, sometimes wrong. You have no way to know which.

What if you asked three models the same question and picked the best answer?

That is MixtureOfAgents (MoA) — and it works.

Real Test

I asked 3 models: What is a nominal account (Russian banking)?

Groq (Llama 3.3): Wrong. Confused with accounting.
DeepSeek: Correct. Civil Code definition.
Gemini: Wrong. Mixed with bookkeeping.

One model = 33% chance of correct answer. Three models + judge = correct every time.

The Code

async function consult(prompt, engines) {  const promises = engines.map(eng =>  callEngine(eng, prompt)  .then(r => ({ engine: eng, response: r, ok: true }))  .catch(e => ({ engine: eng, error: e.message, ok: false }))  );  return Promise.all(promises); }

async function consult(prompt, engines) {  const promises = engines.map(eng =>  callEngine(eng, prompt)  .then(r => ({ engine: eng, response: r, ok: true }))  .catch(e => ({ engine: eng, error: e.message, ok: false }))  );  return Promise.all(promises); }

// Run 3 engines in parallel const results = await consult(question, ["groq", "deepseek", "gemini"]); // All 3 respond in ~4 seconds (parallel, not sequential)`

Enter fullscreen mode

Exit fullscreen mode

Cost

Engine Speed Cost per 1M tokens

Groq 265ms ~$0 (free tier)

DeepSeek 1.4s $0.14

Gemini 1s Free tier

Total 4.3s ~$0.14

For $0.14 per query you get 3x reliability.

Judge Pattern

The cheapest model (Groq) judges which answer is best:

const judge = await groq(

const judge = await groq(

Pick the best answer: 1, 2, or 3. Just the number.\n${candidates}

);

);

Enter fullscreen mode

Exit fullscreen mode

Cost of judging: ~$0. Total pipeline: $0.14 for near-perfect answers.

When to Use

Critical decisions (legal, financial)
Content generation (pick best draft)
Data extraction (consensus = accuracy)
NOT for simple queries (waste of tokens)

Results

After running MoA in production for 45+ agents:

Quality: +40% on complex tasks
Cost: $0.14 vs $3/query with Claude alone
Reliability: 99%+ (if one engine fails, others cover)

Building AI agents? Run multiple models. It is cheaper than you think and better than you expect.

Original source

Dev.to AI

https://dev.to/_1353e04f14b156240b/mixtureofagents-why-one-ai-is-worse-than-three-2fln

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

claudegeminillama

ProductsLive

Top 10 AI Writing Tools for 2026: Complete Guide

Top 10 AI Writing Tools for 2026: Complete Guide Introduction AI writing tools are transforming content creation. But with so many options, which ones are actually worth using? In this guide, I'll share the top 10 AI writing tools for 2026. I've tested dozens of AI writing tools over the past year. These are the 10 that consistently produce the best results. The Top 10 AI Writing Tools for 2026 1. Jasper AI Best for: Overall content creation Why it's #1: Jasper AI consistently produces the highest quality content across all categories. It's the most versatile tool on the market. Key features: 50+ templates Brand voice customization SEO optimization Team collaboration Pricing: Starter: $49/month (35,000 words) Pro: $99/month (unlimited words) Business: Custom pricing Best for: Content marke

Dev.to AI

6m29 minutes ago

ProductsLive

Why I Use Claude Code for Everything

Most Claude users split their work across three products. Chat for conversations and questions. Code for development. Cowork for task management and desktop automation. Three interfaces, three separate memory systems, three places where your context gets fragmented. I stopped doing that. I run everything through Claude Code. One interface, one memory system, full control. Here is how and why. The Problem With Splitting Your Work When you use Chat for a planning conversation, then switch to Code for implementation, then use Cowork to manage tasks, each one starts from zero. Chat does not know what you discussed in Code. Code does not know what you planned in Chat. You re-explain context every time you switch. Even within each product, memory is limited. Chat has built-in memory but it is se

Dev.to AI

8m23 minutes ago

ProductsFresh

There should be $100M grants to automate AI safety

This post reflects my personal opinion and not necessarily that of other members of Apollo Research. TLDR: I think funders should heavily incentivize AI safety work that enables spending $100M+ in compute or API budgets on automated AI labor that directly and differentially translates to safety. Motivation I think we are in a short timeline world (and we should take the possibility seriously even if we don't have full confidence yet). This means that I think funders should aim to allocate large amounts of money (e.g. $1-50B per year across the ecosystem) on AI safety in the next 2-3 years. I think that the AI safety funders have been allocating way too little funding and their spending has been far too conservative in the past 5 years. So, in my opinion, we should definitely continue rampi

lesswrong.com

13mabout 2 hours ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 140 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Models

ModelsLive

EconomyAI: Route to the Cheapest LLM That Works

EconomyAI: Route to the Cheapest LLM That Works Введение в EconomyAI Как разработчик, работающий с большими языковыми моделями (LLM), я часто сталкивался с ��роблемой балансирования производительности и стоимости. Моя система, чат-бот, используемый тысячами пользователей ежедневно, сильно зависит от LLM для понимания и ответа на пользовательский ввод. Однако высокие вычислительные требования этих моделей привели к значительным расходам, и мои ежемесячные счета за облачные услуги превышали 5 000 долларов. Чтобы снизить затраты без ущерба для производительности, я начал работать над EconomyAI, маршрутом к самой дешевой LLM, которая работает. Проблема с традиционными LLM Традиционные LLM, такие как те, которые предоставляются крупными облачными провайдерами, часто являются черными ящиками с о

Dev.to AI

4m17 minutes ago

ModelsLive

Canônico

Aqui pensando em gramática e jeito de escrever o que fazemos em software. Em desenvolvimento de software escrever com padrão sempre foi algo importante e diria que quase obrigatório para uma equipe com pensamento de longo prazo. Em diferentes aspectos, não só na formatação do código fonte, mas também como nomear variáveis e diferentes estruturas nomeadas em um código, como testes automatizados e também de modelo de dados. Em desenvolvimento de software nos dias atuais, pensar em como fazemos descoberta de um determinado problema e como fazemos construção. No caso de descoberta o que existe de entrada? Documentações, reuniões com clientes (transcrições e desenhos), algum tipo de manual de apoio, relatórios e outras regras de negócio que podem ser usadas como base. E no caso de entrega, todo

Dev.to AI

2m15 minutes ago

ModelsLive

Running 1bit Bonsai 8B on 2GB VRAM (MX150 mobile GPU)

I have an older laptop from ~2018, an Asus Zenbook UX430U. It was quite powerful in its time, with an i7-8550U CPU @ 1.80GHz (4 physical cores and an Intel iGPU), 16GB RAM and an additional NVIDIA MX150 GPU with 2GB VRAM. I think the GPU was intended for CAD applications, Photoshop filters or such - it is definitely not a gaming laptop. I'm using Linux Mint with the Cinnamon desktop using the iGPU only, leaving the MX150 free for other uses. I never thought I would run LLMs on this machine, though I've occasionally used the MX150 GPU to train small PyTorch or TensorFlow models; it is maybe 3 times faster than using just the CPU. However, when the 1-bit Bonsai 8B model was released, I couldn't resist trying out if I could run it on this GPU. So I took the llama.cpp fork from PrismML, compil

Reddit r/LocalLLaMA

4mabout 1 hour ago

ModelsFresh

Gemma 4 is a KV_cache Pig

Ignoring the 8 bit size of Nvidia’s marketed 4 bit quantization of the dense model… The dense model KV cache architecture uses 3x or more the memory than what I have seen with other models. It seems like the big choice was 256 head dim instead of 128. I am looking at 490KB per 8 bit token of KV cache versus 128KB on Qwen3. I am running the nvidia weights at 4 bit on an rtx pro 6000 with 96GB of RAM and 8 bit kv cache and still only have room for 115k tokens. I was surprised is all. The model scales well in vllm and seems quite smart. submitted by /u/IngeniousIdiocy [link] [comments]

Reddit r/LocalLLaMA

1mabout 3 hours ago