πŸ”§ Integration Guide

How It Works

Get full cost visibility on your AI API usage in just 3 simple steps. No code changes required. Works with any programming language.

Integration in 3 Simple Steps

From signup to full cost tracking in under 5 minutes

1

Create Your Project & Add API Keys

Sign up for a free account, create a project, and securely store your AI provider API keys. We use AES-256 encryption to protect your credentials.

Supported Providers:
OpenAI Anthropic Mistral AI Replicate +10 more
Dashboard Encrypted
Project Name: My AI Application
Provider: OpenAI
API Key: sk-proj-********************
Status: βœ… Active

Budget Limit: $500/month
Current Usage: $127.45 (25.5%)
Remaining: $372.55
2

Replace API Base URL

Simply replace the provider's base URL with our proxy endpoint. Your unique project token routes requests through our cost tracking layer.

What happens:
  • We forward your request to the provider (no latency)
  • Calculate cost based on usage (tokens/images)
  • Log metadata (model, timestamp, latency)
  • Return the original response unchanged
Before β†’ After One Line Change
- base_url = "https://api.openai.com/v1"
+ base_url = "https://proxy.apicostmonitor.com/v1"

headers = {
    "Authorization": "Bearer YOUR_OPENAI_KEY",
    "X-Project-Token": "acm_proj_abc123xyz"
}
3

Monitor Costs in Real-Time

Access your personalized dashboard to track costs per project, model, or time period. Set budget alerts and receive notifications before overspending.

Dashboard Features:
πŸ“Š Real-time cost graphs
πŸ“ˆ Usage analytics
πŸ”” Budget alerts
πŸ“₯ Export to CSV/JSON
Dashboard Analytics Live
Today's Activity:
─────────────────────────────────
Requests: 1,247 (+12% vs yesterday)
Total Cost: $12.45
Avg Latency: 342ms

Top Models:
1. gpt-4o         $8.23 (66%)
2. claude-3-sonnet $2.89 (23%)
3. mistral-large  $1.33 (11%)

⚠️ Alert: Project "ChatBot" at 85% of budget

Integration Code Examples

Works with any language. Here are the most popular implementations.

from openai import OpenAI

# Initialize client with API Cost Monitor proxy
client = OpenAI(
    api_key="your-openai-api-key",
    base_url="https://proxy.apicostmonitor.com/v1",
    default_headers={
        "X-Project-Token": "acm_proj_YOUR_TOKEN_HERE"
    }
)

# Use exactly as before - no code changes needed!
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "user", "content": "Explain quantum computing"}
    ]
)

print(response.choices[0].message.content)

# Cost tracking happens automatically in background!
# Check your dashboard for real-time cost updates
const OpenAI = require('openai');

// Initialize client with API Cost Monitor proxy
const client = new OpenAI({
  apiKey: 'your-openai-api-key',
  baseURL: 'https://proxy.apicostmonitor.com/v1',
  defaultHeaders: {
    'X-Project-Token': 'acm_proj_YOUR_TOKEN_HERE'
  }
});

// Use exactly as before - no code changes needed!
async function main() {
  const response = await client.chat.completions.create({
    model: 'gpt-4o',
    messages: [
      { role: 'user', content: 'Explain quantum computing' }
    ]
  });

  console.log(response.choices[0].message.content);
}

main();

// Cost tracking happens automatically in background!
// Check your dashboard for real-time cost updates
curl https://proxy.apicostmonitor.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_OPENAI_API_KEY" \
  -H "X-Project-Token: acm_proj_YOUR_TOKEN_HERE" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {
        "role": "user",
        "content": "Explain quantum computing"
      }
    ]
  }'

# Response includes standard OpenAI format
# Cost tracking happens server-side automatically
<?php

// Using Guzzle HTTP client
use GuzzleHttp\Client;

$client = new Client([
    'base_uri' => 'https://proxy.apicostmonitor.com/v1/',
    'headers' => [
        'Authorization' => 'Bearer YOUR_OPENAI_API_KEY',
        'X-Project-Token' => 'acm_proj_YOUR_TOKEN_HERE',
        'Content-Type' => 'application/json'
    ]
]);

$response = $client->post('chat/completions', [
    'json' => [
        'model' => 'gpt-4o',
        'messages' => [
            ['role' => 'user', 'content' => 'Explain quantum computing']
        ]
    ]
]);

$data = json_decode($response->getBody(), true);
echo $data['choices'][0]['message']['content'];

// Cost tracking happens automatically!

Technical FAQ

Common questions about integration and performance

Our proxy adds less than 50ms of overhead on average. This includes:

  • 10-20ms: Request validation and token extraction
  • 5-10ms: Database logging (async, non-blocking)
  • 20-30ms: Network routing to provider

For most AI API calls that take 500ms-5s, this represents <2% overhead. We use edge computing (Cloudflare Workers) to minimize latency globally.

We use the exact same pricing models as the providers:

  • Text Models: Count input/output tokens using tiktoken (OpenAI) or provider-specific tokenizers
  • Image Models: Parse resolution and steps from request parameters
  • Pricing Database: Updated daily with latest provider pricing (supports GPT-4, Claude, Mistral, Replicate, etc.)

Our cost calculations are typically accurate to Β±$0.0001 compared to provider invoices.

No, we never store request content or responses. We only log metadata:

  • βœ… Timestamp, model name, token counts, cost
  • βœ… HTTP status code, latency (ms)
  • βœ… Project ID (for grouping)
  • ❌ Prompts, completions, or any user data
  • ❌ IP addresses or identifying information

Your data flows directly from your app β†’ our proxy β†’ provider β†’ back to your app. We're just a transparent middleware layer.

We guarantee 99.9% uptime with multiple redundancy layers:

  • Multi-region deployment: Edge workers in 200+ locations worldwide
  • Automatic failover: If one region fails, requests route to nearest healthy region
  • Rate limit protection: We cache provider rate limits to prevent cascading failures
  • Graceful degradation: If logging fails, requests still go through (monitoring continues when service recovers)

You can also implement a fallback: if our proxy returns 5xx errors, switch temporarily to direct provider URLs.

Yes, streaming is fully supported! We handle Server-Sent Events (SSE) transparently:

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[...],
    stream=True  # βœ… Works perfectly!
)

for chunk in response:
    print(chunk.choices[0].delta.content)

Cost calculation happens after the full stream completes. Latency for streaming is identical to non-streaming requests.

βœ… Currently Supported:
  • OpenAI: GPT-4o, GPT-4, GPT-3.5, DALL-E 3, Whisper, TTS
  • Anthropic: Claude 3.5 Sonnet, Claude 3 Opus/Haiku
  • Mistral AI: Mistral Large, Medium, Small
  • Replicate: Stable Diffusion, Flux, LLaMA models
🚧 Coming Soon:
  • Google Gemini / Vertex AI
  • Cohere
  • HuggingFace Inference
  • Azure OpenAI

Request a provider: Contact us if you need a specific provider integrated.

Ready to Start Tracking?

Join 500+ developers who trust API Cost Monitor for transparent AI cost tracking.

No credit card required β€’ 14-day free trial β€’ Cancel anytime