The Hidden Costs of AI: Preventing Token Shock in AWS Bedrock
GenAI is cheap on Day 1 and brutal on Day 30. Implement quotas and cost governance using API Gateway throttling, per-tenant budgets, and Bedrock usage logs.
The bill that wakes the CFO up is never the proof-of-concept. It's the Monday after marketing turns the feature on for every customer and one enterprise tenant runs a 2-million-token batch job through your Claude endpoint.
Token Shock is preventable, but only if you treat model invocation like any other paid API: budgeted, throttled, and attributed.
Three layers of control
1. Per-tenant throttling at the edge
API Gateway usage plans give you requestsPerSecond and burstLimit per API key. Issue one key per tenant. Cap large tenants at a higher ceiling, free-tier users at a low one. This bounds requests, not tokens — but it caps the blast radius of a runaway client.
2. Per-call token caps in the Lambda
Every call to InvokeModel must include max_tokens. Don't trust the client to send a sensible one — overwrite it in the Lambda based on the tenant tier:
MAX_TOKENS_BY_TIER = {"free": 512, "pro": 4096, "enterprise": 16000}
body["max_tokens"] = MAX_TOKENS_BY_TIER[tenant_tier]
3. Daily spend budget per tenant
Stream Bedrock invocation logs to CloudWatch, parse inputTokenCount and outputTokenCount, multiply by the model's published price, write to a DynamoDB table keyed by (tenant, date). When a tenant crosses 80% of their daily budget, the Lambda starts returning 429 Quota exceeded before invoking the model.
What "cost governance" actually looks like
- A per-tenant dashboard showing tokens, dollars, and top prompts by cost — visible to account managers, not just engineers.
- An AWS Budget alarm at 50% / 80% / 100% of monthly spend per model, paging the on-call.
- A monthly report of "top 10 most expensive prompts" — these are almost always a bug (someone pasting a 50-page PDF) or an abuse pattern.
Token Shock isn't a model problem. It's a governance problem dressed in model clothing. Fix it the same way you fix every other runaway cost: meter it, attribute it, throttle it, and show the invoice to the team that's generating it.
Further reading: Bedrock pricing.