Pricing — lakecode

The measured cost model

We publish the economics, not just the price

Every number below is measured and dated. Cost model 2026-05-31 on the benchmark corpus (Databricks Python SDK: 6.3 MB of source → 29,931 claims); query costs from Stage 1, May 2026, Claude Sonnet 4.6 on both sides.

$12.3

one-time ingestion of the 6.3 MB benchmark corpus (≈ $1.9 / MB of source)

$0.033

per grounded question — vs $0.113 for the grep-everything baseline

~150

queries until the substrate has paid for itself on that corpus

crossover_queries ≈ ingestion_cost ÷ $0.08

Ingestion scales with codebase size, the per-query saving is forever — a 10× bigger repo ingests for ~$120 and breaks even around 1,500 queries.

Ingestion is per-codebase, one-time

~$1.9 per MB of source, ~$0.0004 per claim. The bill is dominated (79%) by the extraction model writing structured claims — extraction runs on Claude Haiku via Databricks at $1/$5 per 1M tokens; embeddings are a rounding error.

Queries are nearly free

$0.033 mean per grounded question against $0.113 for the agentic baseline — same model both sides, so the gap is the compiled context. Absolute costs, not just ratios: at this scale your spend is dominated by what you don't re-explore.

The substrate is shared — that's the pricing advantage

Ingestion is per-codebase, not per-user. A team of 20 sharing one substrate pays ~$0.60 each, one-time. After that, one engineer's findings ground everyone's sessions — the asset compounds while the marginal cost stays flat.

Ongoing cost is churn, not queries

The substrate refreshes incrementally on commit: only changed entities re-ingest (~$0.0015 per entity). Maintenance is cents per week on a normal commit cadence, not a recurring full re-ingest.

Caveats, attached on purpose: one corpus measured; extraction output sampled from n=6 production calls (±15%); the per-question figure is the Sonnet-answer pipeline. Method docs ship with the beta.

Three ways in — all of them "talk to us" during beta

Final tiers land with general availability, priced on the cost model above. No invented numbers in the meantime.

Individual

Beta

The hosted console + CLI, with a personal substrate that compounds as you work.

✓ lakecode CLI — a full coding agent
✓ Auto-prime + write-back on your repos
✓ Hosted console: claims, provenance, sessions
✓ Bring your own Anthropic key, or capped managed key

Request access

The point of the product

Team

One shared org substrate. The second engineer never rediscovers the first one's fix.

✓ Shared substrate across the whole team
✓ Promotion ladder: local → workspace → org
✓ Review queue — nothing becomes org truth silently
✓ Org keys + member management
✓ Weekly usage & cost report during beta

Request beta access

Enterprise · Databricks

The Databricks edition: grounded in your lakehouse, governed in Unity Catalog.

✓ In-workspace Databricks App
✓ Ingests UC metadata, lineage, notebooks
✓ Agent knowledge projected into UC lineage & audit
✓ White-glove onboarding with your platform team

Talk to us

What design partners actually get

Onboarding done with you

We ingest your repos and docs, and write a golden-question acceptance set about your systems — with a pass bar we agree on before you commit to anything.

The flywheel demo on your code

Cold fail → teach once → fresh-session pass, run live on your own repo during onboarding. The product has to sell itself on your code, not ours.

Honest terms

Free or flat-fee, a one-page agreement, weekly usage reports, a named feedback channel — and your substrate is exportable at any time, deleted on exit.

Frequently asked questions

How does pricing work during the beta?

Free or a flat fee, agreed up front in a one-page beta agreement. Token usage is measured per LLM call and we send you a weekly usage and cost report — so when metered pricing arrives, you'll already know exactly what your workload costs. No credit card, no self-serve billing.

What will pricing look like after the beta?

It will follow the cost structure above: a per-codebase setup (ingestion + incremental maintenance) plus cheaply metered queries. The substrate's economics are per-codebase, not per-seat — pricing will be too. We publish the cost model so you can sanity-check the price against it.

Whose API key pays for the LLM calls?

Either works: bring your own Anthropic key, or use a managed key from us with a spend cap. It's decided up front in the beta agreement because it changes cost exposure on both sides.

What happens to our code and data?

Ingestion extracts structured claims from your code and docs into your org's substrate — in the beta, a dedicated database for your org, physically separate from other tenants. The substrate is your data: exportable at any time, deleted on exit. This is written into the beta agreement, not just the website.

Which models does lakecode use?

The agent runs on claude-sonnet-4-6 by default — the same model every published benchmark is pinned to. Ingestion uses a cheaper extraction model (Claude Haiku via Databricks) because writing claims is the cost driver, not reading code.

Why ~150 queries to break even?

Ingestion of the benchmark corpus cost $12.3 one-time, and a grounded question costs about $0.08 less than re-deriving the answer from scratch ($0.033 vs $0.113). $12.3 ÷ $0.08 ≈ 150 queries. Bigger codebases ingest for more and break even later — but the discount on every query after that is permanent, and it's shared by everyone on the substrate.

Pay for the codebase, not the seat