Observability Overview
Understand how Infralo captures, processes, and structures real-time telemetry across your machine learning workflows.
Infralo is built from the ground up for deep, real-time visibility into your AI workloads. Every transaction passing through the API gateway—whether direct model completions or load-balanced deployments—is intercepted to collect operational telemetry.
This data is used to optimize costs, diagnose routing configurations, ensure data compliance, and track provider performance.
ClickHouse Backend
Infralo stores and processes all observability data using ClickHouse, a columnar database built specifically for real-time analytics.
- High-throughput Ingestion: The gateway streams logs asynchronously, ensuring that tracking latency does not add overhead to the user request.
- Instant Analytics: Real-time analytical queries calculate latency averages, token shares, caching ratios, and USD expenditures in milliseconds, even under massive transaction volume.
- Lifecycle Management: Telemetry records are automatically partitioned by month and retained for 90 days before deletion.
Observability Scopes
Observability in Infralo is separated into two logical scopes to isolate data and matches user access permissions:
- Global Observability: For Platform Administrators and DevOps. Focuses on cross-workspace aggregation, platform-wide provider costs, error distribution, and global usage analytics.
- Workspace Observability: For Workspace Owners and AI Engineers. Focuses on day-to-day debugging, inspection of prompt payloads, execution step tracing, and workspace-scoped budget tracking.
Telemetry Types
Infralo collects five dimensions of telemetry for every transaction:
- Logs: Structured event records capturing request parameters, identity context (key hints, end-user IDs), timestamps, and HTTP results.
- Traces: Sequential execution timelines mapping how a single gateway call traverses routing, Runtime Modules, and downstream provider connections.
- Metrics: Aggregated metrics representing overall volume trends, error rates, and throughput.
- Cost Tracking: Micro-dollar pricing calculated in real-time based on input, output, and cached token prices set in the Global Registry.
- Cache Observability: Measures tokens and dollars saved, as well as latency gains, generated by the Response Cache.