This content is available to Enterprise customers. Sign in with your enterprise account or contact sales to get access.
In Lakecode Enterprise, all LLM requests are routed through Databricks AI Gateway. This gives your platform team centralized control over model access, rate limits, logging, and cost tracking — without requiring changes to lakecode workflows.
Create an AI Gateway route in your Databricks workspace that points to your preferred model provider:
# Create a gateway route for Claude (via Anthropic)
databricks serving-endpoints create --json '{
"name": "ai-gateway",
"config": {
"served_entities": [{
"name": "claude-sonnet",
"external_model": {
"name": "claude-sonnet-4-20250514",
"provider": "anthropic",
"anthropic_config": {
"anthropic_api_key": "{{secrets/lakecode/anthropic-key}}"
}
}
}]
},
"ai_gateway": {
"usage_tracking_config": {
"enabled": true
},
"rate_limits": [{
"key": "user",
"renewal_period": "minute",
"calls": 20
}]
}
}'
Then point lakecode to the gateway in your AssistantSpec:
models:
gateway_endpoint: /serving-endpoints/ai-gateway
planner: claude-sonnet-4-20250514
sql: claude-sonnet-4-20250514
summarizer: claude-haiku-4-5-20251001
Databricks AI Gateway supports multiple model providers. Lakecode works with any provider that offers tool-use capable models:
Rate limits can be configured at multiple levels:
# In your AI Gateway configuration
"rate_limits": [
{
"key": "user",
"renewal_period": "minute",
"calls": 20
},
{
"key": "endpoint",
"renewal_period": "minute",
"calls": 200
}
]
Additionally, per-user hourly limits can be set in AssistantSpec:
models:
rate_limit: 100 # max requests per user per hour
When a user hits a rate limit, their workflow is paused (not terminated) and resumes automatically when the limit window resets.
With usage tracking enabled, AI Gateway logs every request to Unity Catalog. You can query this data for cost analysis and usage patterns:
-- Total tokens by user this week
SELECT
request_metadata.user,
SUM(usage.prompt_tokens) AS prompt_tokens,
SUM(usage.completion_tokens) AS completion_tokens,
SUM(usage.total_tokens) AS total_tokens
FROM lakecode_state.gateway_logs
WHERE timestamp > current_date() - INTERVAL 7 DAYS
GROUP BY request_metadata.user
ORDER BY total_tokens DESC
Configure fallback models in case your primary model endpoint is unavailable:
models:
gateway_endpoint: /serving-endpoints/ai-gateway
planner: claude-sonnet-4-20250514
planner_fallback: gpt-4o
sql: claude-sonnet-4-20250514
sql_fallback: gpt-4o
summarizer: claude-haiku-4-5-20251001
summarizer_fallback: gpt-4o-mini
Fallback is automatic — if the primary model returns a 5xx or times out after 30 seconds, the request is retried against the fallback model.