You Can Now Run Any LLM Inside Claude Code
Anthropic just shipped something significant - and they did it without a keynote, without a press release, and without a blog post. If you blinked, you missed it. Buried in their documentation is a feature that fundamentally changes how developers can use Claude Code and Claude's collaborative coding environment: you can now load any model into it.
Not just Anthropic's models. Any model.
What changed?
Claude Code - Anthropic's agentic coding tool - now supports external model loading. That means you can swap in models like:
- Qwen (Alibaba's open-weight series)
- Kimi K2 (Moonshot AI's long-context model)
- Gemma (Google's lightweight open model)
- GPT-5.5 (OpenAI)
- Grok (xAI)
This was not announced with fanfare. There was no keynote. No blog post. Just documentation that you'd have to stumble across to find - the kind of release that rewards the curious and punishes those who only follow official channels.
Why this is a big deal
1) Session limits are no longer a hard stop
If you've used Claude Code heavily, you've hit the session limit wall. It's frustrating - you're mid-flow on a complex task and suddenly you're locked out until the next billing cycle or reset window.
With model swapping, that wall disappears. Hit your Claude limit? Load Qwen or Gemma and keep going. Your workflow doesn't have to stop just because one model's quota ran out.
2) Cost optimization is now practical
Not every task requires the most powerful model. According to LangChain's analysis from last month, minimax m2.7 delivers 84% of the performance at just 5% of the cost of frontier models.
For routine coding tasks - boilerplate generation, documentation, refactoring - that tradeoff is often entirely acceptable. With model flexibility built into Claude Code, you can now make intelligent cost decisions at the task level rather than being locked into a single pricing tier.
3) It signals something about Anthropic's position
This level of openness is unusual for a company that has historically kept its ecosystem tightly controlled. The most plausible explanation is that Anthropic's GPU capacity is under more pressure than their public communications suggest. Allowing users to offload to third-party models reduces demand on Anthropic's own infrastructure while keeping users inside the Claude Code environment.
Whatever the reason, the result is a more flexible, more resilient tool for developers.
How to get started
The feature is live and documented - though you'll need to find the relevant docs yourself, as Anthropic hasn't made them easy to discover. The setup involves configuring your model provider credentials within Claude Code's settings and selecting your preferred model for a given session.
Credit to Michael Niaki for surfacing this feature and sharing it with the community.
Conclusion
The ability to run any LLM inside Claude Code is a quiet but meaningful shift in how AI-assisted development can work. It removes hard limits, opens up cost optimization strategies, and gives developers genuine flexibility in how they build.
The fact that Anthropic shipped this without announcement is either a sign of humility or a sign of necessity - but either way, the feature is real, it works, and it's worth knowing about.
If you're a heavy Claude Code user, go find those docs. It's worth the hunt.

.png)
