Arcee AI Releases Trinity Large Thinking: An Apache 2.0 Open Reasoning Model for Long-Horizon Agents and Tool Use
The landscape of open-source artificial intelligence has shifted from purely generative models toward systems capable of complex, multi-step reasoning. While proprietary reasoning models have dominated the conversation, Arcee AI has released Trinity Large Thinking. This release is an open-weight reasoning model distributed under the Apache 2.0 license, positioning it as a transparent alternative for developers [ ] The post Arcee AI Releases Trinity Large Thinking: An Apache 2.0 Open Reasoning Model for Long-Horizon Agents and Tool Use appeared first on MarkTechPost .
The landscape of open-source artificial intelligence has shifted from purely generative models toward systems capable of complex, multi-step reasoning. While proprietary ‘reasoning’ models have dominated the conversation, Arcee AI has released Trinity Large Thinking.
This release is an open-weight reasoning model distributed under the Apache 2.0 license, positioning it as a transparent alternative for developers building autonomous agents. Unlike models optimized solely for conversational chat, Trinity Large Thinking is specifically developed for long-horizon agents, multi-turn tool calling, and maintaining context coherence over extended workflows.
Architecture: Sparse MoE at Frontier Scale
Trinity Large Thinking is the reasoning-oriented iteration of Arcee’s Trinity Large series. Technically, it is a sparse Mixture-of-Experts (MoE) model with 400 billion total parameters. However, its architecture is designed for inference efficiency; it activates only 13 billion parameters per token using a 4-of-256 expert routing strategy.
This sparsity provides the world-knowledge density of a massive model without the prohibitive latency typical of dense 400B architectures. Key technical innovations in the Trinity Large family include:
-
SMEBU (Soft-clamped Momentum Expert Bias Updates): A new MoE load balancing strategy that prevents expert collapse and ensures more uniform utilization of the model’s specialized pathways.
-
Muon Optimizer: Arcee utilized the Muon optimizer during the training of the 17-trillion-token pre-training phase, which allows for higher capital and sample efficiency compared to standard AdamW implementations.
-
Attention Mechanism: The model features interleaved local and global attention alongside gated attention to enhance its ability to comprehend and recall details within large contexts.
Reasoning
A core differentiator of Trinity Large Thinking is its behavior during the inference phase. Arcee team in their docs state that the model utilizes a ‘thinking’ process prior to delivering its final response. This internal reasoning allows the model to plan multi-step tasks and verify its logic before generating an answer.
Performance: Agents, Tools, and Context
Trinity Large Thinking is optimized for the ‘Agentic’ era. Rather than competing purely on general-knowledge trivia, its performance is measured by its reliability in complex software environments.
Benchmarks and Rankings
The model has demonstrated strong performance in PinchBench, a benchmark designed to evaluate model capability in environments relevant to autonomous agents. Currently, Trinity Large Thinking holds the #2 spot on PinchBench, trailing only behind Claude Opus-4.6.
Technical Specifications
-
Context Window: The model supports a 262,144-token context window (as listed on OpenRouter), making it capable of processing massive datasets or long conversational histories for agentic loops.
-
Multi-Turn Reliability: The training focused heavily on multi-turn tool use and structured outputs, ensuring that the model can call APIs and extract parameters with high precision over many turns.
Key Takeaways
-
High-Efficiency Sparse MoE Architecture: Trinity Large Thinking is a 400B-parameter sparse Mixture-of-Experts (MoE) model. It utilizes a 4-of-256 routing strategy, activating only 13B parameters per token during inference to provide frontier-scale intelligence with the speed and throughput of a much smaller model.
-
Optimized for Agentic Workflows: Unlike standard chat models, this release is specifically tuned for long-horizon tasks, multi-turn tool calling, and high instruction-following accuracy. It currently ranks #2 on PinchBench, a benchmark for autonomous agent capabilities, trailing only behind Claude 3.5 Opus.
-
Expanded Context Window: The model supports an extensive context window of 262,144 tokens (on OpenRouter). This allows it to maintain coherence across massive technical documents, complex codebases, and extended multi-step reasoning chains without losing track of early instructions.
-
True Open Ownership: Distributed under the Apache 2.0 license, Trinity Large Thinking offers ‘True Open’ weights available on Hugging Face. This permits enterprises to audit, fine-tune, and self-host the model within their own infrastructure, ensuring data sovereignty and regulatory compliance.
-
Advanced Training Stability: To achieve frontier-class performance with high capital efficiency, Arcee employed the Muon optimizer and a proprietary load-balancing technique called SMEBU (Soft-clamped Momentum Expert Bias Updates), which ensures stable expert utilization and prevents performance degradation during complex reasoning tasks.
Check out the Technical details and Model Weight. Also, feel free to follow us on Twitter and don’t forget to join our 120k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
modelreleaseopen-source
A Muon-Accelerated Algorithm for Low Separation Rank Tensor Generalized Linear Models
arXiv:2604.04726v1 Announce Type: new Abstract: Tensor-valued data arise naturally in multidimensional signal and imaging problems, such as biomedical imaging. When incorporated into generalized linear models (GLMs), naive vectorization can destroy their multi-way structure and lead to high-dimensional, ill-posed estimation. To address this challenge, Low Separation Rank (LSR) decompositions reduce model complexity by imposing low-rank multilinear structure on the coefficient tensor. A representative approach for estimating LSR-based tensor GLMs (LSR-TGLMs) is the Low Separation Rank Tensor Regression (LSRTR) algorithm, which adopts block coordinate descent and enforces orthogonality of the factor matrices through repeated QR-based projections. However, the repeated projection steps can be

Noisy Nonreciprocal Pairwise Comparisons: Scale Variation, Noise Calibration, and Admissible Ranking Regions
arXiv:2604.04588v1 Announce Type: new Abstract: Pairwise comparisons are widely used in decision analysis, preference modeling, and evaluation problems. In many practical situations, the observed comparison matrix is not reciprocal. This lack of reciprocity is often treated as a defect to be corrected immediately. In this article, we adopt a different point of view: part of the nonreciprocity may reflect a genuine variation in the evaluation scale, while another part is due to random perturbations. We introduce an additive model in which the unknown underlying comparison matrix is consistent but not necessarily reciprocal. The reciprocal component carries the global ranking information, whereas the symmetric component describes possible scale variation. Around this structured matrix, we ad

TM-BSN: Triangular-Masked Blind-Spot Network for Real-World Self-Supervised Image Denoising
arXiv:2604.04484v1 Announce Type: new Abstract: Blind-spot networks (BSNs) enable self-supervised image denoising by preventing access to the target pixel, allowing clean signal estimation without ground-truth supervision. However, this approach assumes pixel-wise noise independence, which is violated in real-world sRGB images due to spatially correlated noise from the camera's image signal processing (ISP) pipeline. While several methods employ downsampling to decorrelate noise, they alter noise statistics and limit the network's ability to utilize full contextual information. In this paper, we propose the Triangular-Masked Blind-Spot Network (TM-BSN), a novel blind-spot architecture that accurately models the spatial correlation of real sRGB noise. This correlation originates from demosa
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Releases

A Muon-Accelerated Algorithm for Low Separation Rank Tensor Generalized Linear Models
arXiv:2604.04726v1 Announce Type: new Abstract: Tensor-valued data arise naturally in multidimensional signal and imaging problems, such as biomedical imaging. When incorporated into generalized linear models (GLMs), naive vectorization can destroy their multi-way structure and lead to high-dimensional, ill-posed estimation. To address this challenge, Low Separation Rank (LSR) decompositions reduce model complexity by imposing low-rank multilinear structure on the coefficient tensor. A representative approach for estimating LSR-based tensor GLMs (LSR-TGLMs) is the Low Separation Rank Tensor Regression (LSRTR) algorithm, which adopts block coordinate descent and enforces orthogonality of the factor matrices through repeated QR-based projections. However, the repeated projection steps can be

Sharp asymptotic theory for Q-learning with LDTZ learning rate and its generalization
arXiv:2604.04218v1 Announce Type: new Abstract: Despite the sustained popularity of Q-learning as a practical tool for policy determination, a majority of relevant theoretical literature deals with either constant ($\eta_{t}\equiv \eta$) or polynomially decaying ($\eta_{t} = \eta t^{-\alpha}$) learning schedules. However, it is well known that these choices suffer from either persistent bias or prohibitively slow convergence. In contrast, the recently proposed linear decay to zero (\texttt{LD2Z}: $\eta_{t,n}=\eta(1-t/n)$) schedule has shown appreciable empirical performance, but its theoretical and statistical properties remain largely unexplored, especially in the Q-learning setting. We address this gap in the literature by first considering a general class of power-law decay to zero (\te

Loop-Extrusion Linkage: Spectral Ordering and Interval-Based Structure Discovery for Continuous Optimization
arXiv:2604.04273v1 Announce Type: new Abstract: The rapid growth of nature-inspired metaheuristics has exposed a persistent gap between metaphorical novelty and genuine algorithmic advancement. Motivated by the biophysics of chromatin loop extrusion -- a well-characterized genome-folding process driven by SMC motor complexes and conditional barriers -- we introduce the Loop-Extrusion Linkage (LEL) operator, a structure-learning wrapper that combines online variable-interaction estimation, spectral seriation via the Fiedler vector, and adaptive interval-based subspace search. LEL constructs a sparse interaction graph from successful optimization steps, derives a heuristic one-dimensional variable ordering, and generates overlapping evaluation subsets through stochastic interval growth modul

Learning-Based Fault Detection for Legged Robots in Remote Dynamic Environments
arXiv:2604.03397v1 Announce Type: new Abstract: Operations in hazardous environments put humans, animals, and machines at high risk for physically damaging consequences. In contrast to humans and animals, quadruped robots cannot naturally identify and adjust their locomotion to a severely debilitated limb. The ability to detect limb damage and adjust movement to a new physical morphology is the difference between survival and death for humans and animals. The same can be said for quadruped robots autonomously carrying out remote assignments in dynamic, complex settings. This work presents the development and implementation of an off-line learning-based method to detect single limb faults from proprioceptive sensor data in a quadrupedal robot. The aim of the fault detection technique is to


Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!