Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.costa.app/llms.txt

Use this file to discover all available pages before exploring further.

Rate Limits by Model

Standard Coding ModelEnterprise Tier:
  • Requests: 5,000 per minute
  • Tokens: 1,000,000 per minute
  • Concurrent requests: 100
Professional Tier:
  • Requests: 500 per minute
  • Tokens: 200,000 per minute
  • Concurrent requests: 20
Typical Usage: 1 request = ~500-1,500 tokens
Advanced Reasoning ModelEnterprise Tier:
  • Requests: 2,000 per minute
  • Tokens: 800,000 per minute
  • Concurrent requests: 50
Professional Tier:
  • Requests: 200 per minute
  • Tokens: 100,000 per minute
  • Concurrent requests: 10
Typical Usage: 1 request = ~1,000-4,000 tokens
Fast General Purpose ModelEnterprise Tier:
  • Requests: 8,000 per minute
  • Tokens: 1,500,000 per minute
  • Concurrent requests: 120
Professional Tier:
  • Requests: 800 per minute
  • Tokens: 300,000 per minute
  • Concurrent requests: 25
Typical Usage: 1 request = ~300-1,000 tokens
Compliance Specialist ModelEnterprise Tier:
  • Requests: 1,000 per minute
  • Tokens: 500,000 per minute
  • Concurrent requests: 30
Professional Tier:
  • Requests: 100 per minute
  • Tokens: 50,000 per minute
  • Concurrent requests: 5
Typical Usage: 1 request = ~2,000-6,000 tokens

Rate Limit Headers

Costa returns standard rate limit headers with every API response:
HTTP/1.1 200 OK
X-RateLimit-Limit-Requests: 5000
X-RateLimit-Remaining-Requests: 4999
X-RateLimit-Reset-Requests: 1640995200
X-RateLimit-Limit-Tokens: 1000000
X-RateLimit-Remaining-Tokens: 998500
X-RateLimit-Reset-Tokens: 1640995200
X-RateLimit-Limit-Concurrent: 100
X-RateLimit-Used-Concurrent: 5

Error Responses

When rate limits are exceeded, Costa returns a 429 Too Many Requests status:
{
  "error": {
    "message": "Rate limit exceeded for requests. Try again in 30 seconds.",
    "type": "rate_limit_error",
    "code": "rate_limit_exceeded",
    "details": {
      "limit_type": "requests",
      "reset_time": 1640995230,
      "retry_after": 30
    }
  }
}

Rate Limit Optimization

Token Optimization Strategies

Input Optimization

Reduce Input Tokens• Remove unnecessary whitespace and comments • Use concise, specific prompts • Exclude irrelevant code context • Summarize large code blocks

Output Optimization

Control Output Tokens• Set appropriate max_tokens limits • Use specific instructions for concise responses • Request code snippets instead of full files • Use streaming for real-time applications

Enterprise Features

Dedicated Rate Limits

Enterprise customers can request dedicated rate limit pools:

Team Isolation

Separate Limits per Team• Independent rate limits for each development team • Prevent one team from affecting others • Custom limits based on team size and usage

Project Allocation

Project-Specific Limits• Allocate rate limits to specific projects • Priority queuing for critical applications • Burst capacity for deployment periods

Rate Limit Monitoring

Real-time Usage Tracking• Live rate limit consumption graphs • Historical usage patterns • Team and project breakdowns • Alert thresholds and notifications
Programmatic Monitoring• Rate limit usage API endpoints • Webhook notifications for limit approaches • Custom alerting integrations • Usage forecasting and planning

Support

Rate Limit Issues

Contact support for rate limit increases or technical issues

Enterprise Sales

Discuss custom rate limits and dedicated infrastructure options

Rate Limit Increases: Enterprise customers can request rate limit increases based on legitimate business needs. Contact our support team with your use case details.