MiniMax Just Dropped a 428B Open-Source Model Built for Agents. The Numbers Are Hard to Ignore.
MiniMax just published M3 โ a 428-billion-parameter model with open weights, already uploaded to HuggingFace. It's not just big; it's built with agents and coding in mind from the ground up. And the benchmark results put it in serious territory.
Here's what's actually in it.
The Architecture: Big Total, Efficient Active
M3 uses a mixture-of-experts design: 428B total parameters, but only 23B are activated per inference. That's the same trick that makes models like GPT-4 cheaper to run than their raw parameter counts suggest โ most of the model is dormant for any given request.
The result is a model that punches above its compute weight. You get the representation capacity of a very large model without paying the full cost at inference time.
The Benchmark Numbers Worth Paying Attention To
MiniMax is leading with the agent and coding results, and for good reason.
On SWE-Bench Pro โ the hardest version of the software engineering benchmark, which uses real GitHub issues โ M3 scores 59.0%. That's a strong result for an open-weights model, putting it in range of the frontier closed models that have dominated this leaderboard.
Terminal Bench 2.1, which tests how well a model can operate in an actual terminal environment (writing commands, reading output, adjusting), comes in at 66.0%. This is exactly the kind of benchmark that matters for coding agents.
On MCP Atlas โ a benchmark specifically designed around the Model Context Protocol, testing tool use and agent coordination โ M3 scores 74.2%. MCP is quickly becoming the standard interface for connecting AI models to external tools, and a dedicated benchmark score here is notable.
One Million Token Context
M3 extends to a 1 million token context window, achieved using MiniMax's own sparse attention implementation. A 1M context isn't new as a headline number at this point, but what matters is how it's implemented โ sparse attention keeps the compute tractable instead of scaling quadratically with context length.
For agents that need to hold a long project history, read a large codebase, or maintain context across an extended task, a real 1M context is genuinely useful.
Open Weights, With a Catch
The weights are live on HuggingFace now. But there's a timing note: the technical report and full weight release are coming in roughly 10 days. What's available now appears to be the model weights for download, with the full documentation to follow.
MiniMax is also launching MiniMax Code โ a coding-focused tool built on M3 โ alongside an API platform, so you can access M3 through an API without running it yourself.
Why Open Matters Here
The pattern in AI over the past year has been: closed models release something impressive, then open-source catches up within months. M3 is another data point in that direction โ a model with frontier-class agent benchmarks, available as open weights.
That matters for anyone building on AI. Open weights mean you can run locally, fine-tune for your use case, avoid API rate limits, and not be tied to a single provider's pricing. For agent workloads especially, where you might be making hundreds or thousands of calls per task, cost structure matters a lot.
What This Means If You Use OpenClaw
OpenClaw is designed to work with the leading AI models โ and the expanding set of capable open models is directly relevant to what's possible.
As models like M3 mature and their tooling stabilises, agents running on OpenClaw benefit from a wider pool of capable, cost-effective options. A model that scores 59% on SWE-Bench Pro and 74% on MCP Atlas isn't an experiment โ it's a serious option for production agent workloads.
The open-source AI ecosystem getting more capable means more choice, more competition on price, and ultimately better agents for everyone building on this infrastructure.