How It Works
Get full cost visibility on your AI API usage in just 3 simple steps. No code changes required. Works with any programming language.
Integration in 3 Simple Steps
From signup to full cost tracking in under 5 minutes
Create Your Project & Add API Keys
Sign up for a free account, create a project, and securely store your AI provider API keys. We use AES-256 encryption to protect your credentials.
Project Name: My AI Application
Provider: OpenAI
API Key: sk-proj-********************
Status: ✅ Active
Budget Limit: $500/month
Current Usage: $127.45 (25.5%)
Remaining: $372.55
Replace API Base URL
Simply replace the provider's base URL with our proxy endpoint. Your unique project token routes requests through our cost tracking layer.
- We forward your request to the provider (no latency)
- Calculate cost based on usage (tokens/images)
- Log metadata (model, timestamp, latency)
- Return the original response unchanged
- base_url = "https://api.openai.com/v1"
+ base_url = "https://proxy.apicostmonitor.com/v1"
headers = {
"Authorization": "Bearer YOUR_OPENAI_KEY",
"X-Project-Token": "acm_proj_abc123xyz"
}
Monitor Costs in Real-Time
Access your personalized dashboard to track costs per project, model, or time period. Set budget alerts and receive notifications before overspending.
Today's Activity:
─────────────────────────────────
Requests: 1,247 (+12% vs yesterday)
Total Cost: $12.45
Avg Latency: 342ms
Top Models:
1. gpt-4o $8.23 (66%)
2. claude-3-sonnet $2.89 (23%)
3. mistral-large $1.33 (11%)
⚠️ Alert: Project "ChatBot" at 85% of budget
Integration Code Examples
Works with any language. Here are the most popular implementations.
from openai import OpenAI
# Initialize client with API Cost Monitor proxy
client = OpenAI(
api_key="your-openai-api-key",
base_url="https://proxy.apicostmonitor.com/v1",
default_headers={
"X-Project-Token": "acm_proj_YOUR_TOKEN_HERE"
}
)
# Use exactly as before - no code changes needed!
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "user", "content": "Explain quantum computing"}
]
)
print(response.choices[0].message.content)
# Cost tracking happens automatically in background!
# Check your dashboard for real-time cost updates
const OpenAI = require('openai');
// Initialize client with API Cost Monitor proxy
const client = new OpenAI({
apiKey: 'your-openai-api-key',
baseURL: 'https://proxy.apicostmonitor.com/v1',
defaultHeaders: {
'X-Project-Token': 'acm_proj_YOUR_TOKEN_HERE'
}
});
// Use exactly as before - no code changes needed!
async function main() {
const response = await client.chat.completions.create({
model: 'gpt-4o',
messages: [
{ role: 'user', content: 'Explain quantum computing' }
]
});
console.log(response.choices[0].message.content);
}
main();
// Cost tracking happens automatically in background!
// Check your dashboard for real-time cost updates
curl https://proxy.apicostmonitor.com/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_OPENAI_API_KEY" \
-H "X-Project-Token: acm_proj_YOUR_TOKEN_HERE" \
-d '{
"model": "gpt-4o",
"messages": [
{
"role": "user",
"content": "Explain quantum computing"
}
]
}'
# Response includes standard OpenAI format
# Cost tracking happens server-side automatically
<?php
// Using Guzzle HTTP client
use GuzzleHttp\Client;
$client = new Client([
'base_uri' => 'https://proxy.apicostmonitor.com/v1/',
'headers' => [
'Authorization' => 'Bearer YOUR_OPENAI_API_KEY',
'X-Project-Token' => 'acm_proj_YOUR_TOKEN_HERE',
'Content-Type' => 'application/json'
]
]);
$response = $client->post('chat/completions', [
'json' => [
'model' => 'gpt-4o',
'messages' => [
['role' => 'user', 'content' => 'Explain quantum computing']
]
]
]);
$data = json_decode($response->getBody(), true);
echo $data['choices'][0]['message']['content'];
// Cost tracking happens automatically!
Technical FAQ
Common questions about integration and performance
Our proxy adds less than 50ms of overhead on average. This includes:
- 10-20ms: Request validation and token extraction
- 5-10ms: Database logging (async, non-blocking)
- 20-30ms: Network routing to provider
For most AI API calls that take 500ms-5s, this represents <2% overhead. We use edge computing (Cloudflare Workers) to minimize latency globally.
We use the exact same pricing models as the providers:
- Text Models: Count input/output tokens using tiktoken (OpenAI) or provider-specific tokenizers
- Image Models: Parse resolution and steps from request parameters
- Pricing Database: Updated daily with latest provider pricing (supports GPT-4, Claude, Mistral, Replicate, etc.)
Our cost calculations are typically accurate to ±$0.0001 compared to provider invoices.
No, we never store request content or responses. We only log metadata:
- ✅ Timestamp, model name, token counts, cost
- ✅ HTTP status code, latency (ms)
- ✅ Project ID (for grouping)
- ❌ Prompts, completions, or any user data
- ❌ IP addresses or identifying information
Your data flows directly from your app → our proxy → provider → back to your app. We're just a transparent middleware layer.
We guarantee 99.9% uptime with multiple redundancy layers:
- Multi-region deployment: Edge workers in 200+ locations worldwide
- Automatic failover: If one region fails, requests route to nearest healthy region
- Rate limit protection: We cache provider rate limits to prevent cascading failures
- Graceful degradation: If logging fails, requests still go through (monitoring continues when service recovers)
You can also implement a fallback: if our proxy returns 5xx errors, switch temporarily to direct provider URLs.
Yes, streaming is fully supported! We handle Server-Sent Events (SSE) transparently:
response = client.chat.completions.create(
model="gpt-4o",
messages=[...],
stream=True # ✅ Works perfectly!
)
for chunk in response:
print(chunk.choices[0].delta.content)
Cost calculation happens after the full stream completes. Latency for streaming is identical to non-streaming requests.
✅ Currently Supported:
- OpenAI: GPT-4o, GPT-4, GPT-3.5, DALL-E 3, Whisper, TTS
- Anthropic: Claude 3.5 Sonnet, Claude 3 Opus/Haiku
- Mistral AI: Mistral Large, Medium, Small
- Replicate: Stable Diffusion, Flux, LLaMA models
🚧 Coming Soon:
- Google Gemini / Vertex AI
- Cohere
- HuggingFace Inference
- Azure OpenAI
Request a provider: Contact us if you need a specific provider integrated.
Ready to Start Tracking?
Join 500+ developers who trust API Cost Monitor for transparent AI cost tracking.
No credit card required • 14-day free trial • Cancel anytime