Usage Tracking¶

Every API request is tracked with detailed usage metrics including tokens, cost, and latency.

Per-Request Tracking¶

Every response includes a usage object:

{
  "usage": {
    "tokens_prompt": 25,
    "tokens_completion": 150,
    "tokens_total": 175,
    "cost": 0.0052,
    "latency": 1200.5
  }
}

What's Tracked¶

Metric	Description
`tokens_prompt`	Input tokens sent to the model
`tokens_completion`	Output tokens generated by the model
`tokens_total`	Sum of prompt + completion tokens
`cost`	Cost in USD for this request
`latency`	Processing time in milliseconds
`cache_read_tokens`	Tokens served from response cache
`cache_write_tokens`	Tokens written to response cache
`reasoning_tokens`	Internal reasoning tokens (supported models)
`web_search_count`	Web searches triggered

Credit System¶

Each account has a credit balance in USD
Credits are deducted based on the cost of each request
Minimum credit requirements vary by endpoint type
BYOK requests do not consume credits
Check your balance through the IndoxHub dashboard

In-stream usage events¶

Streaming responses emit the same totals inside the stream as a named usage_final SSE event, so clients don't need a second API call to retrieve billing data after a stream ends:

event: usage_final
data: {"type":"usage_final","input_tokens":15,"output_tokens":1,"cost_usd":2.85e-06,"latency_ms":4693}

The fields map 1:1 to the per-request usage object documented above (input_tokens ≡ tokens_prompt, output_tokens ≡ tokens_completion, cost_usd ≡ cost, latency_ms ≡ latency). See SSE Events for the full per-event reference.

Storage¶

Usage data is stored in two layers:

MongoDB: Detailed per-request logs with full request/response data
PostgreSQL: Aggregated daily summaries for efficient reporting