Quick Start
Get up and running with Infralo in minutes.
Welcome to Infralo! This guide will walk you through the essential steps to register a model, authorize your workspace, and send your first request through the Infralo gateway.
Core Architecture
Infralo decouples provider credentials from client applications using a secure, virtualized access model:
Provider
↓
Global LLM Collection
↓
Workspace LLM Collection
├──────────────┐
▼ ▼
Direct Model Deployment
\ /
\ /
▼ ▼
Virtual API Key
↓
Gateway RequestModels are registered globally, authorized at the workspace level, and accessed by client applications using virtual API keys.
Step 1: Add an LLM to the Collection
Before applications can call a model, it must be registered in the Global Collection by an administrator.
- Navigate to LLM Collection (
/llms) in the sidebar. - Click Browse Catalog to choose from a list of pre-configured popular models (e.g., GPT-4o, Claude 3.5 Sonnet) OR click Add LLM to register a custom model endpoint.
- Enter the required details:
- Alias Name: The identifier used throughout your workspace (e.g.,
gpt-4o-prod). - Provider: Select your vendor (e.g.,
openai,anthropic,google). - Provider Model Name: The exact name recognized by the provider (e.g.,
gpt-4o). - API Key: Your provider credentials (securely stored).
- Alias Name: The identifier used throughout your workspace (e.g.,
Step 2: Create a Workspace
Workspaces isolate environments (e.g., Staging, Production) and control which members, API keys, and models are active.
- Click the workspace switcher in the sidebar and select Create Workspace (or go to workspaces).
- Enter a name (e.g.,
Development). - Click Create.
Step 3: Whitelist LLMs to the Workspace
Workspaces have access to no models by default. You must explicitly enable models from the Global Collection.
- In your workspace, go to the Model Whitelist (
/workspaces/[wsId]/models) page. - Click Manage Whitelist to open the Model Whitelist Manager.
- Under Global Registry (left panel), search for your registered model and click Whitelist.
- Under Workspace Whitelist (right panel), turn the model's status toggle to Enabled.
Step 4: Generate a Virtual API Key
Virtual API keys (prefixed with vk_...) authenticate client applications sending requests to the gateway.
- Navigate to API Keys (
/workspaces/[wsId]/api-keys). - Click Generate Key.
- Name your key (e.g.,
Backend Application Key) and select an expiration policy. - Click Create and copy your virtual key.
Granular Controls
Infralo supports restrictive access policies on keys (e.g. scoping to specific models, deployments, or API paths). For this Quick Start, keep the default "Allow All" permissions.
Step 5: Send Your First Request
Now you can send requests to Infralo's gateway. The gateway exposes standard OpenAI-compatible endpoints:
- Direct Model Call: Use a whitelisted model Alias Name.
- Load-Balanced Call: Use a virtual Deployment Name (grouping multiple models).
Authentication Headers
Authenticate client applications using either:
- Standard Bearer token header:
Authorization: Bearer vk_... - Direct API Key header:
x-api-key: vk_...
Code Examples
from openai import OpenAI
client = OpenAI(
# Paste your workspace virtual API key here
api_key="vk_your_workspace_api_key",
# Set the gateway base URL
base_url="http://localhost:8000/api/v1"
)
# Example 1: Direct Model Call (routes directly to a specific whitelisted model)
response = client.chat.completions.create(
model="gpt-4o-prod",
messages=[
{"role": "user", "content": "Hello Infralo!"}
]
)
print("Direct Model Response:", response.choices[0].message.content)
# Example 2: Deployment Call (routes through a load-balanced deployment pool)
# (Uncomment after setting up a Deployment)
# response = client.chat.completions.create(
# model="production-chat-lb",
# messages=[
# {"role": "user", "content": "Hello Infralo Load Balancer!"}
# ]
# )
# print("Deployment Response:", response.choices[0].message.content)# Direct Model Call
curl http://localhost:8000/api/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer vk_your_workspace_api_key" \
-d '{
"model": "gpt-4o-prod",
"messages": [
{"role": "user", "content": "Hello Infralo!"}
]
}'Step 6: Verify in Observability
Monitor gateway traffic and trace model execution in real-time.
- Verify Logs: Go to Logs (
/workspaces/[wsId]/logs) to watch your request resolve in real-time (see Workspace Observability). Click the request to open the Trace View to inspect:- Request latency, status, and exact token counts.
- The underlying provider model executed.
- Cache status and module execution spans.
- Review Metrics: Go to Metrics (
/workspaces/[wsId]/metrics) to view dashboards for overall request volume, latency distribution, cache savings, and cost analytics (see Workspace Observability).
Expected Result
If your setup is working correctly, you should observe:
- 200 OK response returned from your SDK or cURL command.
- Request log populated under Logs within seconds.
- Telemetry and token usage recorded in the workspace Metrics dashboard.
What's Next?
Now that you've successfully sent your first request, explore these additional capabilities:
- Deployments — Create virtual endpoints that provide load balancing, failover, circuit breakers, and response caching.
- Runtime Modules — Customize request and response processing with prompt transformations, PII protection, validation, and other gateway extensions.
- Observability — Monitor request logs, traces, token usage, latency, cache performance, and costs across your gateway traffic.
- Identity & Access Management — Manage users, workspace memberships, roles, permissions, and Single Sign-On (SSO/OIDC) for secure access control.
- Audit Logs — Review administrative activity and configuration changes for compliance and operational auditing.