Open Source AI github trending open-source

🔥 google-deepmind/gemma

GitHub TrendingApril 5, 20262 min read1 views

Gemma open-weight LLM library, from Google DeepMind — Trending on GitHub today with 45 new stars.

Gemma is a family of open-weights Large Language Model (LLM) by Google DeepMind, based on Gemini research and technology.

This repository contains the implementation of the gemma PyPI package. A JAX library to use and fine-tune Gemma.

For examples and use cases, see our documentation. Please report issues and feedback in our GitHub.

Installation

Install JAX for CPU, GPU or TPU. Follow the instructions on the JAX website.
Run

pip install gemma

Examples

Here is a minimal example to have a multi-turn, multi-modal conversation with Gemma:

from gemma import gm

Model and parameters

model = gm.nn.Gemma3_4B() params = gm.ckpts.load_params(gm.ckpts.CheckpointPath.GEMMA3_4B_IT)

Example of multi-turn conversation

sampler = gm.text.ChatSampler( model=model, params=params, multi_turn=True, )

prompt = """Which of the two images do you prefer?

Image 1: Image 2:

Write your answer as a poem.""" out0 = sampler.chat(prompt, images=[image1, image2])

out1 = sampler.chat('What about the other image ?')`

Our documentation contains various Colabs and tutorials, including:

Sampling
Multi-modal
Fine-tuning
LoRA
...

Additionally, our examples/ folder contain additional scripts to fine-tune and sample with Gemma.

Learn more about Gemma

To use this library: Gemma documentation
Technical reports for metrics and model capabilities:

Gemma 1 Gemma 2 Gemma 3

Other Gemma implementations and doc on the Gemma ecosystem

Downloading the models

To download the model weights. See our documentation.

System Requirements

Gemma can run on a CPU, GPU and TPU. For GPU, we recommend 8GB+ RAM on GPU for The 2B checkpoint and 24GB+ RAM on GPU are used for the 7B checkpoint.

Contributing

We welcome contributions! Please read our Contributing Guidelines before submitting a pull request.

This is not an official Google product.

Original source

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

githubtrendingopen-source

Laws & RegulationLive

Street-Legal Physical-World Adversarial Rim for License Plates

arXiv:2604.02457v1 Announce Type: new Abstract: Automatic license plate reader (ALPR) systems are widely deployed to identify and track vehicles. While prior work has demonstrated vulnerabilities in ALPR systems, far less attention has been paid to their legality and physical-world practicality. We investigate whether low-resourced threat actors can engineer a successful adversarial attack against a modern open-source ALPR system. We introduce the Street-legal Physical Adversarial Rim (SPAR), a physically realizable white-box attack against the popular ALPR system fast-alpr. SPAR requires no access to ALPR infrastructure during attack deployment and does not alter or obscure the attacker's license plate. Based on prior legislation and case law, we argue that SPAR is street-legal in the sta

arXiv cs.CV

1mabout 1 hour ago

ReleasesLive

Differentiable SpaTiaL: Symbolic Learning and Reasoning with Geometric Temporal Logic for Manipulation Tasks

arXiv:2604.02643v1 Announce Type: new Abstract: Executing complex manipulation in cluttered environments requires satisfying coupled geometric and temporal constraints. Although Spatio-Temporal Logic (SpaTiaL) offers a principled specification framework, its use in gradient-based optimization is limited by non-differentiable geometric operations. Existing differentiable temporal logics focus on the robot's internal state and neglect interactive object-environment relations, while spatial logic approaches that capture such interactions rely on discrete geometry engines that break the computational graph and preclude exact gradient propagation. To overcome this limitation, we propose Differentiable SpaTiaL, a fully tensorized toolbox that constructs smooth, autograd-compatible geometric prim

arXiv cs.RO

1mabout 1 hour ago

ModelsLive

Reinforcement Learning from Human Feedback: A Statistical Perspective

arXiv:2604.02507v1 Announce Type: new Abstract: Reinforcement learning from human feedback (RLHF) has emerged as a central framework for aligning large language models (LLMs) with human preferences. Despite its practical success, RLHF raises fundamental statistical questions because it relies on noisy, subjective, and often heterogeneous feedback to learn reward models and optimize policies. This survey provides a statistical perspective on RLHF, focusing primarily on the LLM alignment setting. We introduce the main components of RLHF, including supervised fine-tuning, reward modeling, and policy optimization, and relate them to familiar statistical ideas such as Bradley-Terry-Luce (BTL) model, latent utility estimation, active learning, experimental design, and uncertainty quantification.

arXiv stat.ML

1mabout 1 hour ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 272 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Open Source AI

Open Source AILive

I tested speculative decoding on my home GPU cluster. Here's why it didn't help.

I spent Saturday night testing n-gram speculative decoding on consumer GPUs. The claim: speculative decoding can speed up LLM inference by 2-3x by predicting future tokens and verifying them in parallel. I wanted to see if that holds up on real hardware running diverse workloads. For the most part, it doesn't. But the journey was worth it, and I caught a benchmarking pitfall that I think a lot of people are falling into. The setup My home lab runs Kubernetes on a machine called Shadowstack. Two NVIDIA RTX 5060 Ti GPUs (16GB VRAM each, 32GB total). I use LLMKube, an open source K8s operator I built, to manage LLM inference workloads with llama.cpp. For this test I deployed two models: Gemma 4 26B-A4B : Google's Mixture of Experts model. 26B total params but only ~4B active per token. Runs a

DEV Community

7mabout 1 hour ago

Open Source AIFresh

Vllm gemma4 26b a4b it-nvfp4 run success

#!/usr/bin/env bash set -euo pipefail BASE_DIR="/mnt/d/AI/docker-gemma4" PATCH_DIR="$BASE_DIR/nvfp4_patch" BUILD_DIR="$BASE_DIR/build" HF_CACHE_DIR="$BASE_DIR/hf-cache" LOG_DIR="$BASE_DIR/logs" PATCH_FILE="$PATCH_DIR/gemma4_patched.py" DOCKERFILE_PATH="$BUILD_DIR/Dockerfile" BASE_IMAGE="vllm/vllm-openai:gemma4" PATCHED_IMAGE="vllm-gemma4-nvfp4-patched" CONTAINER_NAME="vllm-gemma4-nvfp4" MODEL_ID="bg-digitalservices/Gemma-4-26B-A4B-it-NVFP4" SERVED_MODEL_NAME="gemma-4-26b-a4b-it-nvfp4" GPU_MEMORY_UTILIZATION="0.88" MAX_MODEL_LEN="512" MAX_NUM_SEQS="1" PORT=" " PATCH_URL=" https://huggingface.co/bg-digitalservices/Gemma-4-26B-A4B-it-NVFP4/resolve/main/gemma4_patched.py?download=true " if [[ -z "${HF_TOKEN:-}" ]]; then echo "[ERROR] HF_TOKEN environment variable is empty." echo "Please run th

Reddit r/LocalLLaMA

2mabout 3 hours ago

Open Source AIFresh

TurboQuant on Apple Silicon: real benchmarks on Mac Mini M4 16GB and M3 Max 48GB

I’ve been testing TurboQuant this week on two machines and wanted to share the actual numbers. Why this matters: TurboQuant compresses the KV cache, not the model weights. On long contexts, KV cache can take several GB of memory, so reducing it can make a big difference even when throughput stays similar. In the setup I tested, K stays at q8_0 and V goes to turbo3 (~3-bit). That asymmetric tradeoff makes sense because errors in the keys affect attention routing more directly, while values often tolerate heavier compression better. Benchmark 1: Mac Mini M4 16GB — Qwen3-14B Q4_K_M at 8K context → Without TurboQuant: KV cache 1280 MiB, K (f16): 640 MiB, V (f16): 640 MiB — 9.95 t/s → With TurboQuant: KV cache 465 MiB, K (q8_0): 340 MiB, V (turbo3): 125 MiB — 9.25 t/s Almost 3x compression, wit

Reddit r/LocalLLaMA

2mabout 4 hours ago

Open Source AI

Colorado Farm Show gets underway in Greeley with a look at AI in agriculture - Greeley Tribune

Colorado Farm Show gets underway in Greeley with a look at AI in agriculture Greeley Tribune

GNews AI agriculture

1m2 months ago