Global Observability
Platform-wide visibility, cross-workspace cost tracking, and provider analytics for Infralo administrators.
Global Observability provides centralized monitoring and system-wide telemetry across the entire Infralo platform. It is designed for platform administrators, DevOps, and operations teams who need to oversee resource utilization, verify provider health, and manage multi-tenant budgets.
Access & Permissions
Global observability tools require system-wide permissions (see Roles & Permissions) to prevent workspace members from viewing sensitive cross-workspace data.
- Required Permissions:
- View Global Logs: Grants access to the global logs console (
/observability/logs). - View Global Metrics: Grants access to the global metrics dashboard (
/observability/metrics).
- View Global Logs: Grants access to the global logs console (
Global Logs
The Global Logs console aggregates requests from all workspaces and deployments. It provides an operational log stream for system-level monitoring.
- Platform Auditing: Search and filter requests across the entire gateway by timestamp, workspace, status, or model provider.
- Troubleshooting Outages: Track error spikes across multiple workspaces to distinguish between provider outages (e.g., Anthropic returning
503) and workspace-specific configuration issues.
Global Metrics & Dashboards
The global metrics dashboard compiles platform-wide analytics into four functional tabs:
1. Overview
Aggregates high-level system indicators:
- KPI Cards: Tracks total requests, total costs in USD, average latency, total tokens, and cache hit rate across all workspaces.
- Trend Charts: Visualizes Request Volume, Cost, Latency, and Token Usage patterns over time.
2. Workspaces
Provides tenant-specific usage comparisons:
- Compare total request volumes, cumulative costs, and cache hit ratios/savings across all workspaces (e.g., comparing development environments against production deployments).
- Identify high-usage tenants or sudden budget spikes.
3. Models
Tracks vendor-level load and cost distributions:
- Provider & Model Share: Visualizes request volume and cost distribution across OpenAI, Anthropic, Google, Azure, or Custom endpoints.
- Model Efficiency: Lists request count, token volume, average latency, and costs grouped by model name.
4. Reliability
Monitors the overall health and error rates of the gateway:
- Tracks total error rates and status code distributions (
2xx,4xx,5xx). - Monitors error types and platform-wide retry activities to verify how failover rules are performing under load.
Typical Use Cases
- Cost Allocation & Showback: Retrieve cost-per-workspace metrics to allocate AI spend back to specific teams or projects.
- Canary & Routing Optimization: Audit provider latency and error trends to refine deployment routing weights or switch default providers.
- SLA Auditing: Verify that Azure OpenAI or private custom endpoints are meeting performance agreements compared to public endpoints.