Global LLM Collection

Learn how to register model providers, configure endpoint parameters, and define pricing structure globally.

            Provider

     Global LLM Collection     ← You are here

    Workspace LLM Collection
        ├──────────────┐
        ▼              ▼
  Direct Model    Deployment
        \              /
         \            /
          ▼          ▼
         Virtual API Key

        Gateway Request

The Global LLM Collection serves as the single source of truth for all machine learning models integrated into your Infralo environment. Platform administrators use the Global Collection to manage API credentials (see Virtual API Keys), track model pricing, and control global rate-limiting policies so the observability dashboard can compute cost reports.


Registering a Model

Administrators can register models in two ways:

  1. Browse Catalog: Choose from pre-configured popular models (e.g. GPT-4o, Claude 3.5 Sonnet, Gemini 1.5 Pro). This pre-fills model capabilities, context windows, and default pricing.
  2. Custom Registration: Manually specify provider endpoints, custom model aliases, and parameters. This is useful for private deployments, self-hosted models (e.g., Llama via vLLM), or custom Azure OpenAI resource groups.

Testing Connections

Infralo includes a Test Connection feature on the registration form. Before saving, the platform will send a minimal ping request to the provider with the configured API key and endpoint to verify connectivity.


Configuration Settings

The registration form contains several configuration groups:

1. General Info

  • Alias Name: A unique identifier for the model across the platform (e.g., gpt-4o-production-us). This is the identifier that workspaces and deployments reference.
  • Provider: Select your provider vendor:
    • openai
    • anthropic
    • google (Gemini)
    • azure (Azure OpenAI)
    • custom (compatible with standard OpenAI or Anthropic payload structures)
  • Model Type: Select the modality. Options include chat (Completions) and embedding (Vector generation).
  • Provider Model Name: The exact name recognized by the provider's API (e.g., gpt-4-turbo, claude-3-5-sonnet-20240620).

2. Endpoint Settings (Optional)

  • API Base URL: Override the standard provider endpoint. Required for custom providers (e.g., http://10.0.0.5:8000/v1) or Azure resource endpoints.
  • API Version: Specify the API version query parameter, primarily used for Azure OpenAI deployments (e.g., 2024-02-15-preview).

3. Rate Limits

Set rate limits to prevent runaway costs or avoid tripping provider limits:

  • RPM Limit: Requests Per Minute. Set to -1 for unlimited.
  • TPM Limit: Tokens Per Minute. Set to -1 for unlimited.

4. Pricing (Per 1 Million Tokens)

Specify pricing rates in USD per million tokens to ensure Infralo's observability dashboard can compute cost reports:

  • Cost per Input Token: The price for prompt/context tokens.
  • Cost per Output Token: The price for completion/response tokens.
  • Cost per Cached Input Token: The price for prompt tokens that hit provider-side context caches (e.g., Anthropic Prompt Caching or OpenAI Prompt Caching).

5. Capabilities & Metadata

Define model capabilities so that client SDKs and routing engines understand which features are supported:

  • Max Input Tokens: The maximum prompt context length supported.
  • Max Output Tokens: The maximum response length supported.
  • Embedding Dimension: The size of output vector arrays (only applicable for embedding type models).
  • Capabilities Checkboxes:
    • Supports System Prompt: Can handle system-level instructions.
    • Supports Tool Calls: Can output structured tool calls (function calling).
    • Supports Image Input: Can process multi-modal image inputs.
    • Supports Structured Output: Can guarantee JSON output schema compliance.

On this page