LLM Collection

Learn how Infralo structures LLM credentials and access controls across your organization.

Infralo structures model access using a hierarchical tier designed for enterprise-grade control, compliance, and developer convenience.

Instead of embedding raw API keys inside individual application codes or granting open access to expensive models, Infralo isolates provider credentials from workloads using a structured model management pipeline:

┌────────────────────────────────────────┐
│         Global LLM Collection          │  ← Admin registers API keys & limits
└──────────────────┬─────────────────────┘


┌────────────────────────────────────────┐
│     Workspace Whitelist (Collection)   │  ← Workspace managers authorize models
└──────────────────┬─────────────────────┘


┌────────────────────────────────────────┐
│              Deployments               │  ← Virtual endpoints route to whitelisted models
└────────────────────────────────────────┘

The Three Tiers

1. Global LLM Collection

The Global Collection is the master registry for the entire platform. Platform administrators register models from providers (such as OpenAI, Anthropic, Google, Azure, or private custom endpoints) and associate them with API credentials, default token pricing, and platform-wide rate limits.

  • Goal: Secure credentials storage and centralized model registry.
  • Key Concepts: Credentials encryption, connection testing, capability mapping, global RPM/TPM limits.

2. Workspace Whitelist (Workspace Collection)

A Workspace Whitelist defines which models from the Global Collection are authorized for use in a specific workspace (e.g. Staging, Production, Marketing-Team).

  • Goal: Granular access control and tenant isolation.
  • Key Concepts: Explicit authorization, toggle states (enabled/disabled), workspace-level compliance.

3. Deployments

Deployments consume the whitelisted models of a workspace, routing application requests to them via virtual endpoints. (Deployments are configured separately and are covered in the Deployments guide).

  • Goal: Expose stable routing endpoints for client applications.
  • Key Concepts: Virtual routing targets, model weights, fallback pools.

Core Benefits

  • Credential Decoupling: Application developers never touch or see raw OpenAI/Anthropic/Azure API keys. They only interface with virtual gateway keys scoped to deployments.
  • Granular Cost & Limits Management: Set RPM and TPM caps globally to prevent runaway costs, and enter token costs to track workspace expenditure in real-time.
  • Simplified Operations: Workspaces manage model availability via whitelists. Developers target stable virtual endpoints without worrying about underlying provider keys, endpoint migrations, or key rotation.
  • Consistent Capabilities: Query model configurations to determine if downstream clients can use tools, system prompts, or structured outputs.

On this page