Document Processing¶
Use IndoxHub for document summarization, data extraction, translation, and classification.
Document Summarization¶
import requests
API_KEY = "YOUR_API_KEY"
def summarize(text, max_length="2 paragraphs"):
response = requests.post(
"https://api.indoxhub.com/api/v1/chat/completions",
headers={"Authorization": f"Bearer {API_KEY}"},
json={
"model": "anthropic/claude-haiku-4.5",
"messages": [
{
"role": "system",
"content": f"Summarize the text in {max_length}. "
"Be concise and preserve key points."
},
{"role": "user", "content": text}
],
"temperature": 0.3
}
)
return response.json()["data"]
Data Extraction¶
Extract structured data from unstructured text:
import json
def extract_entities(text):
response = requests.post(
"https://api.indoxhub.com/api/v1/chat/completions",
headers={"Authorization": f"Bearer {API_KEY}"},
json={
"model": "openai/gpt-4o-mini",
"messages": [
{
"role": "system",
"content": "Extract entities as JSON: {names: [], dates: [], amounts: [], locations: []}"
},
{"role": "user", "content": text}
],
"temperature": 0.1
}
)
return json.loads(response.json()["data"])
Document Classification¶
def classify(text, categories):
cats = ", ".join(categories)
response = requests.post(
"https://api.indoxhub.com/api/v1/chat/completions",
headers={"Authorization": f"Bearer {API_KEY}"},
json={
"model": "openai/gpt-4o-mini",
"messages": [
{
"role": "system",
"content": f"Classify the text into one of: {cats}. "
"Respond with only the category name."
},
{"role": "user", "content": text}
],
"temperature": 0.0
}
)
return response.json()["data"].strip()
result = classify(
"The Q3 earnings exceeded expectations with 15% YoY growth",
["finance", "technology", "healthcare", "sports"]
)
# Returns: "finance"
Audio Transcription Pipeline¶
Transcribe audio files and then process the text:
def transcribe_and_summarize(audio_path):
# Step 1: Transcribe
with open(audio_path, "rb") as f:
resp = requests.post(
"https://api.indoxhub.com/api/v1/audio/stt/transcriptions",
headers={"Authorization": f"Bearer {API_KEY}"},
files={"file": f},
data={"model": "openai/whisper-1"}
)
transcript = resp.json()["data"]["text"]
# Step 2: Summarize
summary = summarize(transcript, max_length="3 bullet points")
return {"transcript": transcript, "summary": summary}
Tips¶
- Low temperature (0.0–0.3) for extraction and classification tasks
- JSON mode — Ask the model to output JSON for structured extraction
- Batch processing — Process multiple documents sequentially with error handling
- Choose cost-effective models —
openai/gpt-4o-miniordeepseek/deepseek-chatfor high-volume processing