Building on Conduix

Embedding Conduix as the LLM backend for your own product — drop-in setup, model discovery, usage & cost, and serving your own customers.

If you're a SaaS or platform putting Conduix behind your own product, this page answers the questions a technical integration review asks first: how compatible is the API, how do I discover models, where do token counts and cost live, and how do I isolate usage across my own customers. Everything here works against the API as it ships today.

It's an OpenAI drop-in

Conduix speaks the OpenAI Chat Completions wire format. Point your existing OpenAI (or Anthropic-compatible) SDK at https://api.conduix.ai/v1, swap the API key, and you're routing across ten providers with no other code changes.

Endpoint
POST /v1/chat/completions
Auth
Authorization: Bearer cx_live_… / cx_test_…
Streaming
Standard SSE — set stream: true
Errors
OpenAI error taxonomy (invalid_request_error, insufficient_quota, …)
Test calls
cx_test_… keys run the full path, non-billable — use them in CI

Full setup with Python / TypeScript / curl is in the Quickstart.

Official SDKs & tooling

Two first-party packages, each a thin wrapper over standard tooling:

conduix (PyPI)
Python SDK — an OpenAI-client subclass with the base URL built in
@conduix/mcp-server
MCP server — lets AI agents read their own spend, models & audit
bash
pip install conduix
Python SDK
from conduix import Conduix

client = Conduix(api_key="cx_live_…")  # base URL is built in
resp = client.chat.completions.create(
    model="claude-sonnet-4-20250514",
    messages=[{"role": "user", "content": "Hello"}],
)

The MCP server exposes your gateway to AI agents (Claude Desktop, Cursor, Cline) as read-only tools — list models, check spend, scan the audit log. It reads gateway metadata only; it never proxies your completions. Add it to your MCP host config:

claude_desktop_config.json
{
  "mcpServers": {
    "conduix": {
      "command": "npx",
      "args": ["-y", "@conduix/mcp-server"],
      "env": { "CONDUIX_API_KEY": "cx_live_…" }
    }
  }
}

Discover models — don't hardcode a format

Model ids are provider-native — exactly what the upstream provider calls them. There is no provider/model routing prefix: you call gpt-4o-mini or claude-sonnet-4-20250514, not anthropic/claude-sonnet-4. Conduix routes to the right provider from the id alone.

The source of truth is GET /v1/models — fetch it at integration time rather than baking in a list. Each entry carries a conduix extension with the provider, tier, context window, and the per-1M-token price you pay (in credits).

GET /v1/models (excerpt)
{
  "object": "list",
  "data": [
    {
      "id": "gpt-4o-mini",
      "object": "model",
      "owned_by": "openai",
      "conduix": {
        "display_name": "GPT-4o mini",
        "provider": "openai",
        "tier": "mid",
        "context_window": 128000,
        "input_price_per_m_credits": 0.165,
        "output_price_per_m_credits": 0.66
      }
    }
  ]
}
Need a model Conduix doesn't curate, or your own self-hosted endpoint? Register it as a BYO endpoint and call model: "byo:<slug>/<your-model>" — the only Conduix-namespaced id form.

Token counts and cost

Token counts come back inline in the standard OpenAI usage object on every response — prompt_tokens, completion_tokens, total_tokens. Populate your own usage records straight from there.

Charged cost is reported out-of-band, not in the completion body. Pair the x-conduix-request-id response header with the account API to read what each call was charged:

Inline (sync)
usage{} on the response → token counts
GET /v1/account/usage
Rolled-up tokens + charged credits for a window
GET /v1/account/audit
Per-request rows, keyed by request_id
Currency
Credits (1 credit ≈ $1), reported in micro-credits (÷ 1,000,000)
Usage rows are written asynchronously, so the account API is eventually consistent — a call you just made may take a moment to appear. Don't read it back synchronously. Recommended: compute cost at call time from the /v1/models price × the inline usage, then reconcile against /v1/account/usage on a schedule.

The account API

Everything your dashboard shows is also readable programmatically with the same API key — no session login required. Useful for your own internal dashboards, billing reconciliation, or CI cost gates.

GET /v1/account/balance
Credit balance, plan, spend caps, recent ledger
GET /v1/account/usage
Rolled-up usage + charged credits (filter by window / model)
GET /v1/account/audit
Recent calls: model, provider, tokens, latency, status, request_id

Serving your own customers

Reselling Conduix to your own workspaces, teams, or end customers is a first-class pattern. The building block is one API key per customer, minted under your account. Each key carries its own governance:

rate_limit
Requests per minute, per key
allowed_models
Allowlist which models that customer can call
pii_mode
Off / redact / tokenize — per key
pii_on_detect
Allow / warn / require-ack / block — per key
Mint a scoped key per customer
{
  "name": "workspace:acme-corp",
  "allowed_models": ["gpt-4o-mini", "claude-haiku-4-5-20251001"],
  "rate_limit": 120,
  "pii_mode": "redact"
}

Store each customer's key encrypted at rest (your existing secrets/Fernet pattern is a fine fit), and attribute usage per customer using the inline usage on each call — you already know which key you used.

Credit balance and spend caps are account-level, shared across all your keys — they are not per-key today. If you need a hard spend ceiling per customer, enforce it in your own meter, or run separate Conduix accounts. For programmatic sub-accounts with isolated balances, talk to sales@conduix.ai.

Trust & compliance

What a security review will look for, and where it lives:

PII governance
Detect, redact, or tokenize PII before it reaches a provider — see Governance
Audit trail
Every call logged; readable via /v1/account/audit
Region pinning
Constrain routing to a data-residency region
Data processing
See the DPA
Security review
DPA, BAA, or vendor-review requests → security@conduix.ai

Integration FAQ

Is Conduix a true OpenAI drop-in?

Yes. POST /v1/chat/completions with a Bearer key, the same request and response shapes, SSE streaming, and OpenAI-style error codes. Most integrations change only the base URL and the API key.

Is there an official SDK?

Yes — the conduix Python package (a thin OpenAI subclass with the base URL built in) and @conduix/mcp-server for AI agents — both in the "Official SDKs & tooling" section above. TypeScript users can point the OpenAI npm SDK at Conduix today.

What model name do I send?

The provider-native id (e.g. gpt-4o, claude-sonnet-4-20250514, gemini-2.5-pro). There's no provider/model prefix. Fetch the live list from GET /v1/models.

Does the response tell me what the call cost?

Token counts, yes — inline in usage. The charged credit amount is reported via /v1/account/usage and /v1/account/audit (keyed by the x-conduix-request-id header), which are eventually consistent. Compute cost at call time from the model price for a synchronous number.

Can I set a spend cap per one of my customers?

Spend caps and the credit balance are account-level today, not per-key. Per key you get rate limits, model allowlists, and PII policy. For a hard per-customer ceiling, enforce it in your own meter or use separate accounts — and contact sales for programmatic sub-accounts.

Can I create a separate sub-account per customer?

Each Conduix account (organization) has its own balance and spend caps. The standard multi-tenant pattern is one scoped API key per customer under a single account. Isolated per-customer balances require separate accounts today; sales@conduix.ai can scope a reseller arrangement.

How do I test without being billed?

Use a cx_test_… key. It exercises the full routing, governance, and response path but never charges credits — ideal for CI.

Does a provider fallback change my integration?

No. The response shape is identical and the model field echoes what you requested. Inspect x-conduix-fallback and x-conduix-model-served if you want to know which upstream actually served the call.

What happens when credits run out or a cap is hit?

The call is rejected with a 402 / insufficient_quota error before any provider is billed — distinguishable from a bad request, so you can surface a clean "top up" prompt to your user.