[P] Gemma 4 running on NVIDIA B200 and AMD MI355X from the same inference stack, 15% throughput gain over vLLM on Blackwell

Reddit r/MachineLearningby /u/carolinedfrasca https://www.reddit.com/user/carolinedfrascaApril 2, 20261 min read0 views

Google DeepMind dropped Gemma 4 today: Gemma 4 31B: dense, 256K context, redesigned architecture targeting efficiency and long-context quality Gemma 4 26B A4B: MoE, 26B total / 4B active per forward pass, 256K context Both are natively multimodal (text, image, video, dynamic resolution). We got both running on MAX on launch day across NVIDIA B200 and AMD MI355X from the same stack. On B200 we're seeing 15% higher output throughput vs. vLLM (happy to share more on methodology if useful). Free playground if you want to test without spinning anything up: https://www.modular.com/#playground submitted by /u/carolinedfrasca [link] [comments]

Could not retrieve the full article text.

Read on Reddit r/MachineLearning →

Original source

Reddit r/MachineLearning

https://www.reddit.com/r/MachineLearning/comments/1saot07/p_gemma_4_running_on_nvidia_b200_and_amd_mi355x/

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

launchmultimodal

ModelsLive

Google launches Gemma 4 open models with 140 languages, 400M downloads - geo.tv

Google launches Gemma 4 open models with 140 languages, 400M downloads geo.tv

Google News: DeepMind

1mabout 1 hour ago

ModelsLive

Can We Secure AI With Formal Methods? January-March 2026

In the month or so around the previous new years, as 2024 became 2025, we were saying “2025: year of the agent”. MCP was taking off, the inspect-ai and pydantic-ai python packages were becoming the standards, products were branching out from chatbots to heavy and autonomous use of toolcalls. While much of the product engineering scene may have underdelivered (in the sense that “planning a vacation” isn’t entirely something most people do with agents yet), the field of FMxAI I think was right on target. Feels like there’s an agentic component to everything I read these days. What is 2026 the year of? Besides “year of investors pressure all the math companies to pivot to program synthesis”? I’m declaring it now The number of blogposts relating to secure program synthesis went exponential sin

LessWrong AI

9mabout 1 hour ago

Products

New multimodal dataset will help in the development of ethical AI systems

By Shaina Raza and Deval Pandya The Vector Institute’s AI Engineering team has developed Newsmediabias-plus (NMB+), a new multimodal dataset. It includes full-text articles alongside comprehensive publication details. It also [ ] The post New multimodal dataset will help in the development of ethical AI systems appeared first on Vector Institute for Artificial Intelligence .

Vector Institute

1mover 1 year ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 178 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

[P] Gemma 4 running on NVIDIA B200 and AMD MI355X from the same inference stack, 15% throughput gain over vLLM on Blackwell

Daily AI Digest

More about

Google launches Gemma 4 open models with 140 languages, 400M downloads - geo.tv

Can We Secure AI With Formal Methods? January-March 2026

New multimodal dataset will help in the development of ethical AI systems

Knowledge Map

Connected Articles — Knowledge Graph

Discussion

More in Models

Google launches Gemma 4 open models with 140 languages, 400M downloads - geo.tv

AI World Models: What Leaders Should Know - WSJ

LLMs Protect Each Other From Shutdown, Study Finds - nationaltoday.com

New ways to balance cost and reliability in the Gemini API - blog.google