Vision & Multimodal¶

Send images alongside text in chat completions for visual understanding.

Endpoint: POST /api/v1/chat/completions
Auth: Required

Multimodal Messages¶

Pass images as part of the content array in a message:

PythonJavaScriptcURLOpenAI SDK

import requests

response = requests.post(
    "https://api.indoxhub.com/api/v1/chat/completions",
    headers={"Authorization": "Bearer YOUR_API_KEY"},
    json={
        "model": "openai/gpt-4o",
        "messages": [
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": "What's in this image?"},
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": "https://example.com/photo.jpg"
                        }
                    }
                ]
            }
        ],
        "max_tokens": 300
    }
)
print(response.json()["data"])

const response = await fetch("https://api.indoxhub.com/api/v1/chat/completions", {
  method: "POST",
  headers: {
    "Authorization": "Bearer YOUR_API_KEY",
    "Content-Type": "application/json"
  },
  body: JSON.stringify({
    model: "openai/gpt-4o",
    messages: [{
      role: "user",
      content: [
        { type: "text", text: "What's in this image?" },
        { type: "image_url", image_url: { url: "https://example.com/photo.jpg" } }
      ]
    }],
    max_tokens: 300
  })
});
const data = await response.json();
console.log(data.data);

curl https://api.indoxhub.com/api/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o",
    "messages": [{
      "role": "user",
      "content": [
        {"type": "text", "text": "What is in this image?"},
        {"type": "image_url", "image_url": {"url": "https://example.com/photo.jpg"}}
      ]
    }],
    "max_tokens": 300
  }'

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://api.indoxhub.com/v1"
)
response = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "What's in this image?"},
            {"type": "image_url", "image_url": {"url": "https://example.com/photo.jpg"}}
        ]
    }],
    max_tokens=300
)
print(response.choices[0].message.content)

Base64 Images¶

Send images as base64-encoded data:

import base64

with open("photo.jpg", "rb") as f:
    b64 = base64.b64encode(f.read()).decode()

response = requests.post(
    "https://api.indoxhub.com/api/v1/chat/completions",
    headers={"Authorization": "Bearer YOUR_API_KEY"},
    json={
        "model": "openai/gpt-4o",
        "messages": [{
            "role": "user",
            "content": [
                {"type": "text", "text": "Describe this image"},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/jpeg;base64,{b64}"
                    }
                }
            ]
        }]
    }
)

Supported Models¶

Vision capabilities are available on models with image in their input_modalities. Check the Models endpoint for current support.

Common vision models:

openai/gpt-4o — Best quality vision
openai/gpt-4o-mini — Fast and affordable vision
anthropic/claude-opus-4-7 — Strong visual reasoning (newest flagship)
anthropic/claude-opus-4-6 — Strong visual reasoning
google/gemini-2.0-flash — Fast multimodal