Google Antigravity: Every Model Compared (Gemini, Claude, GPT-OSS)

Google Antigravity ships with seven models in the free public preview — two Gemini 3.1 Pro variants, Gemini 3 Flash, Claude Sonnet 4.6, Claude Sonnet 4.6 with Thinking, Claude Opus 4.6 with Thinking, and GPT-OSS 120B. That's a genuinely impressive lineup for a free tool, but it raises an immediate question: which one do you actually use?

The answer depends on the task, the rate limits, and how fast you burn through credits. Here is the full breakdown.

The Model Lineup

Model	Context	Best for	Credit cost
Gemini 3.1 Pro (High)	1M tokens	Large codebases, default workhorse	Moderate
Gemini 3.1 Pro (Low)	1M tokens	Faster, lighter tasks	Low
Gemini 3 Flash	1M tokens	High-volume, quick tasks	Very low
Claude Sonnet 4.6	1M tokens (beta)	Balanced coding and reasoning	Moderate
Claude Sonnet 4.6 Thinking	1M tokens (beta)	Structured reasoning tasks	High
Claude Opus 4.6 Thinking	1M tokens (beta)	Hardest problems	Very high
GPT-OSS 120B	400K tokens	Alternative perspective, open-weight	Moderate

Note: Claude Opus 4.7 was released April 16, 2026. As of writing, Antigravity has not yet added it — only 4.6 is available.

Gemini 3.1 Pro: The Default for Good Reason

Gemini 3.1 Pro (High) is the default model and the right choice for most agentic tasks in Antigravity. Its 1M token context window is stable and production-ready — not a beta flag, not a workaround. For large codebase work, this matters. You can load an entire monorepo in a single pass without chunking.

On SWE-Bench Verified — the benchmark that measures real-world software engineering problem-solving — Gemini 3.1 Pro scores 80.6%. Claude Opus 4.6 edges it at 80.8%. In practice, the difference between those numbers is negligible on most tasks.

The Low variant of Gemini 3.1 Pro is worth switching to when you want faster responses on tasks that don't need the full model — code formatting, simple refactors, boilerplate generation.

Use Gemini 3.1 Pro when: working with large codebases, running extended agentic sessions, or just getting things done without thinking about which model to pick.

Gemini 3 Flash: Burn Rate Saver

Flash is the speed-optimised model. It generates faster, costs far fewer credits, and — critically — its rate limit refreshes approximately every 5 hours rather than weekly. On the free tier, this makes Flash the only model you can use heavily without hitting the weekly ceiling.

The tradeoff is reasoning depth. Flash handles well-defined tasks cleanly. Give it an ambiguous requirement or a multi-layer debugging problem and it starts to show its limits.

Use Gemini 3 Flash when: running high-volume tasks, generating boilerplate, summarising documentation, or any task where you'd reach for Haiku in the Anthropic lineup.

Claude Sonnet 4.6: When You Want Anthropic's Approach

Claude Sonnet 4.6 in Antigravity gives you Anthropic's model at no additional cost — and it shows a genuinely different character to the Gemini models. Sonnet tends to produce cleaner, more idiomatic code in typed languages, and handles instruction-following with a precision that Gemini occasionally fumbles.

The Thinking variant adds extended reasoning before responding. It's the right choice when the task involves multiple competing constraints or when you want the model to work through an architecture decision before committing to code.

Use Claude Sonnet 4.6 when: working in TypeScript, Python, or any strongly typed codebase where code style and idiom matter, or when Gemini's output doesn't quite match your conventions.

Claude Opus 4.6 Thinking: The Most Capable, The Most Expensive

Opus 4.6 with Thinking is the ceiling of what Antigravity offers. It is the right model when you genuinely have a hard problem — a debugging session where the root cause isn't obvious, an architecture decision with many competing constraints, or a refactor that touches deep abstractions.

The critical thing to know about Opus in Antigravity: it burns credits approximately 4x faster than Gemini models. The Thinking mode generates thousands of hidden reasoning tokens before producing output, and those tokens count against your quota. A single complex Opus session has been reported to consume 635+ credits. On the free tier's weekly cap, you will hit your limit quickly if you use Opus as a default.

There is also a reported identity bug: when asked directly, Claude Opus in Antigravity sometimes identifies itself as "Claude Sonnet 4." The underlying model is still Opus — but it's a known quirk worth being aware of.

Use Claude Opus 4.6 when: Sonnet and Gemini Pro have both failed the task, or when you're working on something genuinely complex where the extra reasoning quality justifies the credit cost.

GPT-OSS 120B: The Wildcard

GPT-OSS 120B is OpenAI's open-weight model, and its presence in Antigravity is more interesting as a philosophical choice than a practical one. Google offering a competitor's open-weight model alongside its own is a statement about model optionality being a feature.

In practice, GPT-OSS 120B has a smaller context window (400K vs 1M) and broadly sits below Gemini 3.1 Pro and Claude Opus on coding benchmarks. The main reason to reach for it is familiarity — if your team has built intuitions around GPT-style outputs and you want consistency with outputs you generate elsewhere.

Use GPT-OSS 120B when: you specifically want GPT-style output for consistency with other tooling, or you're running comparisons across model families.

Rate Limits: The Real Constraint

The free tier's rate limit situation changed in March 2026 and caused significant user frustration. The original 5-hour refresh window for all models was replaced with a weekly refresh for Gemini Pro and Claude models, while Gemini Flash retained the 5-hour cycle.

The practical impact:

Flash — still relatively liberal, refreshes every ~5 hours
Gemini Pro / Sonnet / Opus — weekly refresh on the free tier, hard cap of approximately 20 daily requests
Google AI Pro subscribers — higher limits, weekly window retained but with more headroom
Google AI Ultra subscribers — most frequent refresh cycle

The advice this produces is clear: treat Flash as your default for anything it can handle, and save the Pro/Claude models for tasks that genuinely need them. Burning Opus credits on boilerplate generation will lock you out of the models when you need them.

Recommended Defaults

If you are on the free tier and want to stay productive without hitting limits:

Default: Gemini 3.1 Pro (High) for agentic tasks
Volume work: Gemini 3 Flash for anything repetitive
Typed languages / style-sensitive code: Claude Sonnet 4.6
Hard problems only: Claude Opus 4.6 Thinking — use sparingly
Avoid: GPT-OSS 120B unless you have a specific reason

The model that best uses Antigravity's agentic architecture is Gemini 3.1 Pro — it's what the platform is built around, it has the most stable 1M context window, and it won't burn through your weekly quota in a single afternoon session.

Sources: