Magic

Frontier Code Models for Software Engineering

Magic AI Closed source Since 2024

Research lab building frontier code models with a custom LTM (Long-Term Memory) architecture purpose-built for 100M+ token context windows. Backed by $515M from Sequoia, Jane Street, and CapitalG.

+ Pros

Ultra-long 100M+ token context window enables whole-repository understanding unmatched in the industry
Proprietary LTM architecture reduces attention compute cost by ~1,000x compared to standard transformer attention
Massive $515M funding and Google Cloud GB200 partnership ensure long-term compute access and infrastructure
Full custom training and inference stack (no PyTorch autograd) built from scratch enables rapid architectural iteration
Published AGI Readiness Policy shows commitment to responsible development rare among coding AI labs

− Cons

No publicly available product, API, CLI, or IDE extension — research preview only with invite access
No public pricing information — enterprise-only access model limits adoption and evaluation
Tiny team (~23 people) creates significant execution risk for an ambitious research and product roadmap
LTM-2-mini code generation quality still trails frontier models — architectural promise not yet matched by output
No community edition, self-hosted option, or open-source components limit transparency and independent evaluation

Pricing

Enterprise Access

Contact for pricing

Custom enterprise deployment with dedicated compute, SLAs, and direct research collaboration

Introduction

Magic is a frontier research lab building the most architecturally ambitious code generation system in existence. Founded by Eric Steinberger and headquartered in San Francisco, Magic has raised $515 million from an investor roster that includes Sequoia Capital, Jane Street, CapitalG, Elad Gil, Nat Friedman, Daniel Gross, Eric Schmidt, and Atlassian. The thesis is simple: the path to automated software engineering runs through far longer context windows, and the standard Transformer attention mechanism cannot get us there economically.

Magic’s answer is LTM — Long-Term Memory — a custom sequence-dimension algorithm that replaces standard attention with something roughly 1,000x cheaper at 100-million-token scale. The company has already trained LTM-2-mini, a proof-of-concept model that demonstrates the architecture works, and is now training LTM-2 on a Google Cloud supercomputer built from NVIDIA GB200 NVL72 clusters.

LTM achieves roughly 1,000x cheaper attention at 100-million-token scale versus standard Transformer attention.

If the roadmap delivers, Magic will offer the first model capable of understanding an entire enterprise codebase — millions of lines — in a single context window.

The LTM Architecture

Standard Transformer attention scales quadratically with sequence length. At 100,000 tokens the cost is high; at 100 million it is computationally prohibitive for all but the wealthiest organizations. Magic’s LTM architecture sidesteps this with a custom sequence-dimension algorithm that achieves linear or near-linear scaling. The company has not published full architectural details — this is a closed research lab, not an open-source project — but the core claim is an attention alternative designed from the ground up for software-length sequences.

The implications for code understanding are significant. Today’s best coding assistants operate on fragments: a file here, a diff there, a retrieval-augmented chunk when context exceeds the window. Magic’s model ingests everything — every file, every dependency, every historical commit — in a single forward pass. There is no retrieval step, no sliding window, no “forgetting” of code written ten thousand lines away. For enterprise teams maintaining multi-million-line monorepos, this is a fundamentally different capability.

The 100-Million-Token Context Window

To put 100 million tokens in perspective: that is approximately 10 million lines of code, or roughly 750 novels worth of text. It is enough to hold the entire Linux kernel, the Chromium browser engine, and a significant portion of a modern cloud SaaS backend simultaneously. No other production model — not GPT-4 , not Claude 3.5 , not Gemini — offers anything close.

100 million tokens is approximately 10 million lines of code, or roughly 750 novels worth of text.

Magic demonstrated this capability publicly with LTM-2-mini, showing the model generate a fully functional GUI framework from scratch, entirely within a single 100M-token context. The model understood the entire codebase it was generating and maintained coherence across hundreds of thousands of generated tokens. For anyone who has watched a model lose track of a function signature 4,000 tokens in, the demonstration was arresting.

The HashHop Benchmark

Standard long-context benchmarks like Needle-in-a-Haystack (NIAH) have been gamed by many frontier labs — models learn to spot the artificial “needle” pattern. Magic created HashHop, a custom benchmark that replaces semantic hints with cryptographic hash lookups across long contexts. Because the model must track purely statistical patterns with no semantic shortcuts, HashHop eliminates the benchmark hacking that plagues NIAH. Magic claims LTM-2-mini achieves near-perfect accuracy on HashHop at 100M tokens while standard attention models degrade to random at far shorter lengths. The benchmark is available on GitHub .

The Google Cloud Partnership

In August 2024, Magic announced a partnership with Google Cloud to build what it calls the Magic-G5: a training supercomputer built from NVIDIA GB200 NVL72 racks. This followed the earlier Magic-G4, an H100-based cluster. The GB200 NVL72 is NVIDIA’s most advanced datacenter platform, pairing Grace CPUs with Blackwell GPUs in a 72-GPU liquid-cooled chassis. Google Cloud’s involvement is significant — it signals that Google Cloud sees enough promise in Magic’s architecture to commit production-grade capacity.

Magic has also stated that it writes its entire training and inference stack from scratch — no PyTorch autograd, no off-the-shelf CUDA libraries. Combined with the custom hardware partnership, this gives Magic an unusual degree of control over its cost curve: it can optimize across the full stack from algorithm to silicon.

Funding and Backing

Magic’s $515 million in total funding is extraordinary for a 23-person company. The round structures reflect deep conviction from investors who have backed some of the most consequential technology companies of the last two decades. Nat Friedman (former GitHub CEO) and Daniel Gross (former YC AI partner) led early rounds and remain closely involved. Jane Street, the quantitative trading giant known for rigorous technical diligence, participated — a strong signal that the architecture passes muster with some of the world’s best technical evaluators.

This funding buys Magic time and compute. Training frontier models requires both, and Magic has secured enough runway to iterate through multiple architectural generations without the quarterly-pressure constraints that force less-funded labs into premature productization.

Safety Approach

Rare among code-generation labs, Magic has published an AGI Readiness Policy — a public document outlining its approach to safety as capabilities scale. The policy addresses cybersecurity, model evaluation, responsible release, and governance. For a company building toward automated software engineering, where a sufficiently capable model could write and deploy code autonomously, these are not theoretical concerns. Magic’s willingness to publish safety commitments before it has a product on the market is a differentiating signal in a field where safety discourse often follows deployment rather than preceding it.

Who Is It For?

Today, almost no one — Magic is in closed research preview with no public API, no pricing, and no self-serve access. The company’s focus is enterprise customers with multi-million-line codebases and the budget to commission custom deployments. If LTM-2 delivers on its promise, the addressable market expands dramatically: any organization that struggles with codebase-level reasoning — every organization with a large codebase — becomes a potential customer.

For individual developers and small teams, Magic is not yet a practical tool. There is no CLI to download, no VS Code extension to install, no playground to try. But the architecture and the trajectory are worth watching closely, because Magic is solving a problem — context scale — that every other coding agent vendor is either ignoring or papering over with retrieval hacks.

Conclusion

Magic represents one of the most technically ambitious bets in the AI coding space. The LTM architecture, if it scales as claimed, could unlock a qualitative shift in how models interact with code — from file-level autocomplete to repository-level understanding. The $515 million war chest, Google Cloud partnership, and custom infrastructure give the team the resources to pursue that vision. The small team, closed research preview, and absence of a public product mean the vision remains unproven at scale.

For enterprise teams evaluating long-term coding platform strategy, Magic is a name worth knowing — even if it is not yet a tool worth using. If LTM-2 ships and delivers, the gap between Magic and every other coding agent may be measured not in features but in orders of magnitude of context.

More in this Space

Vix

Open source

Vix is a Go-native, open-source (AGPL-3.0) AI coding agent that slashes token costs by 40-50% using a stem agent architecture and Tree-sitter virtual filesystem. It rethinks the plan/execute loop — keeping LLM cache warm across Explore, Plan, and Execute phases — while shipping Programmable Workflows, Whiteboard Mode with voice AI, MCP server support, and a self-evolving agent that writes its own scheduled jobs and watchers.

Late — High-Leverage AI Agent Orchestration

Closed source

Orchestrate an entire AI dev team on 5GB VRAM using ephemeral subagents, exact-match diffs, and a zero-dependency Go binary. Works with any OpenAI-compatible model — local or cloud.

Paca