AI Gateway — Lakecode Enterprise Docs

Enterprise documentation

This content is available to Enterprise customers. Sign in with your enterprise account or contact sales to get access.

Overview

In Lakecode Enterprise, all LLM requests are routed through Databricks AI Gateway. This gives your platform team centralized control over model access, rate limits, logging, and cost tracking — without requiring changes to lakecode workflows.

Why AI Gateway

No direct API keys — lakecode never holds model provider credentials. All requests go through your gateway endpoint.
Centralized logging — every LLM call is logged in Unity Catalog with user identity, prompt, response, latency, and token count.
Rate limiting — enforce per-user or per-team rate limits to control cost and prevent runaway usage.
Model governance — control which models are available and route requests to approved endpoints.

Setup

Create an AI Gateway route in your Databricks workspace that points to your preferred model provider:

# Create a gateway route for Claude (via Anthropic)
databricks serving-endpoints create --json '{
  "name": "ai-gateway",
  "config": {
    "served_entities": [{
      "name": "claude-sonnet",
      "external_model": {
        "name": "claude-sonnet-4-20250514",
        "provider": "anthropic",
        "anthropic_config": {
          "anthropic_api_key": "{{secrets/lakecode/anthropic-key}}"
        }
      }
    }]
  },
  "ai_gateway": {
    "usage_tracking_config": {
      "enabled": true
    },
    "rate_limits": [{
      "key": "user",
      "renewal_period": "minute",
      "calls": 20
    }]
  }
}'

Then point lakecode to the gateway in your AssistantSpec:

models:
  gateway_endpoint: /serving-endpoints/ai-gateway
  planner: claude-sonnet-4-20250514
  sql: claude-sonnet-4-20250514
  summarizer: claude-haiku-4-5-20251001

Supported providers

Databricks AI Gateway supports multiple model providers. Lakecode works with any provider that offers tool-use capable models:

Anthropic — Claude Sonnet, Claude Haiku (recommended)
OpenAI — GPT-4o, GPT-4o-mini
Databricks Foundation Models — DBRX, Llama 3
Azure OpenAI — for Azure-deployed workspaces

Rate limiting

Rate limits can be configured at multiple levels:

# In your AI Gateway configuration
"rate_limits": [
  {
    "key": "user",
    "renewal_period": "minute",
    "calls": 20
  },
  {
    "key": "endpoint",
    "renewal_period": "minute",
    "calls": 200
  }
]

Additionally, per-user hourly limits can be set in AssistantSpec:

models:
  rate_limit: 100  # max requests per user per hour

When a user hits a rate limit, their workflow is paused (not terminated) and resumes automatically when the limit window resets.

Logging & cost tracking

With usage tracking enabled, AI Gateway logs every request to Unity Catalog. You can query this data for cost analysis and usage patterns:

-- Total tokens by user this week
SELECT
  request_metadata.user,
  SUM(usage.prompt_tokens) AS prompt_tokens,
  SUM(usage.completion_tokens) AS completion_tokens,
  SUM(usage.total_tokens) AS total_tokens
FROM lakecode_state.gateway_logs
WHERE timestamp > current_date() - INTERVAL 7 DAYS
GROUP BY request_metadata.user
ORDER BY total_tokens DESC

Fallback routing

Configure fallback models in case your primary model endpoint is unavailable:

models:
  gateway_endpoint: /serving-endpoints/ai-gateway
  planner: claude-sonnet-4-20250514
  planner_fallback: gpt-4o

  sql: claude-sonnet-4-20250514
  sql_fallback: gpt-4o

  summarizer: claude-haiku-4-5-20251001
  summarizer_fallback: gpt-4o-mini

Fallback is automatic — if the primary model returns a 5xx or times out after 30 seconds, the request is retried against the fallback model.