Skip to content

Usage Tracking

Every API request is tracked with detailed usage metrics including tokens, cost, and latency.

Per-Request Tracking

Every response includes a usage object:

{
  "usage": {
    "tokens_prompt": 25,
    "tokens_completion": 150,
    "tokens_total": 175,
    "cost": 0.0052,
    "latency": 1200.5
  }
}

What's Tracked

Metric Description
tokens_prompt Input tokens sent to the model
tokens_completion Output tokens generated by the model
tokens_total Sum of prompt + completion tokens
cost Cost in USD for this request
latency Processing time in milliseconds
cache_read_tokens Tokens served from response cache
cache_write_tokens Tokens written to response cache
reasoning_tokens Internal reasoning tokens (supported models)
web_search_count Web searches triggered

Credit System

  • Each account has a credit balance in USD
  • Credits are deducted based on the cost of each request
  • Minimum credit requirements vary by endpoint type
  • BYOK requests do not consume credits
  • Check your balance through the IndoxHub dashboard

In-stream usage events

Streaming responses emit the same totals inside the stream as a named usage_final SSE event, so clients don't need a second API call to retrieve billing data after a stream ends:

event: usage_final
data: {"type":"usage_final","input_tokens":15,"output_tokens":1,"cost_usd":2.85e-06,"latency_ms":4693}

The fields map 1:1 to the per-request usage object documented above (input_tokenstokens_prompt, output_tokenstokens_completion, cost_usdcost, latency_mslatency). See SSE Events for the full per-event reference.

Storage

Usage data is stored in two layers:

  • MongoDB: Detailed per-request logs with full request/response data
  • PostgreSQL: Aggregated daily summaries for efficient reporting
Documentation last built on May 23, 2026