Skip to content

Speech-to-Text (STT)

Transcribe audio files to text, or translate audio to English.

Transcription

Endpoint: POST /api/v1/audio/stt/transcriptions
Auth: Required
Content-Type: multipart/form-data

Form Parameters

Field Type Required Default Description
file file Yes Audio file to transcribe
model string No whisper-1 STT model ID
provider string No openai Provider name
language string No Language code (e.g. en, es, fr)
prompt string No Guide text for the model
response_format string No json json, text, srt, verbose_json, vtt
temperature float No 0.0 Sampling temperature (0.0–1.0)
timestamp_granularities string No JSON string: ["word", "segment"]
byok_api_key string No Your own provider API key

Examples

import requests

with open("audio.mp3", "rb") as f:
    response = requests.post(
        "https://api.indoxhub.com/api/v1/audio/stt/transcriptions",
        headers={"Authorization": "Bearer YOUR_API_KEY"},
        files={"file": ("audio.mp3", f, "audio/mpeg")},
        data={
            "model": "openai/whisper-1",
            "language": "en",
            "response_format": "json"
        }
    )
print(response.json()["data"]["text"])
curl https://api.indoxhub.com/api/v1/audio/stt/transcriptions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F file=@audio.mp3 \
  -F model=openai/whisper-1 \
  -F language=en \
  -F response_format=json

Transcription Response

{
  "request_id": "550e8400-e29b-41d4-a716-446655440000",
  "created_at": "2026-04-07T12:00:00Z",
  "duration_ms": 3200.0,
  "provider": "openai",
  "model": "whisper-1",
  "success": true,
  "data": {
    "text": "Hello, welcome to the IndoxHub platform.",
    "language": "en",
    "duration": 4.5,
    "words": null,
    "segments": null
  },
  "usage": {
    "type": "audio",
    "seconds": 4.5
  }
}

Translation

Translate audio from any language to English.

Endpoint: POST /api/v1/audio/stt/translations
Auth: Required
Content-Type: multipart/form-data

Note

Translation is currently only supported with OpenAI's whisper-1 model.

Form Parameters

Field Type Required Default Description
file file Yes Audio file to translate
model string No whisper-1 Model ID
provider string No openai Provider name
prompt string No Style guide for the output
response_format string No json json, text, srt, verbose_json, vtt
temperature float No 0.0 Sampling temperature
byok_api_key string No Your own provider API key

Example

curl https://api.indoxhub.com/api/v1/audio/stt/translations \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F file=@spanish_audio.mp3 \
  -F model=openai/whisper-1 \
  -F response_format=json
Documentation last built on May 23, 2026