Workspace Observability
Debug requests, trace execution spans, and analyze workspace costs and caching metrics.
Workspace Observability is designed for developers, AI engineers, and workspace managers. Scoped strictly to a single workspace, it provides the granular diagnostic details needed to inspect request payloads, verify pre- and post-processing runtime modules, and optimize model performance and costs.
Access & Permissions
Observability details inside a workspace are restricted to users with appropriate workspace-level permissions:
- View Logs: Allows members to search logs and inspect request payloads and trace spans.
- View Metrics: Allows members to access the workspace-level metrics dashboards.
Request Logs & Drawer Details
The workspace logs console (/workspaces/[wsId]/logs) is the developer's primary workspace console. Click any request row to open the Log Detail Drawer:
- Metadata Header: Displays the unique Request ID, Trace ID, workspace API key hint used to authenticate the client, and custom end-user identifiers.
- Payloads Tab: Offers full JSON code blocks showing the exact prompt/messages input sent to the gateway and the final text/JSON response returned from the LLM.
- Token Usage: Displays the counts for input tokens, output tokens, and tokens saved by context caching.
- Cost Metrics: Evaluates uncached input cost, cached input cost, and output cost in micro-dollars, calculated automatically using your collection pricing.
Distributed Tracing & Spans
When a workspace deployment is configured with multiple models (such as in a Fallback failover configuration) or chains multiple Runtime Modules, Infralo tracks the lifecycle of the transaction using Trace Spans.
A visual Trace Canvas maps out the execution path of the request from start to finish. Each step contains its own start time, end time, and latency in milliseconds.
Trace Span Types
- Gateway Spans: Shows the initial request interception, routing selection, and endpoint classification (
chat,embeddings,responses,models). - Module Spans: Logs the latencies, inputs, and outputs of Pre-processing (PRE stage) and Post-processing (POST stage) runtime modules (e.g., measuring how long a PII Tokenization module took and its exact string substitutions).
- LLM Spans: Records the connection to the model provider. If a deployment triggers a retry or failover (e.g., in a Fallback configuration), each attempt is logged as an independent child span (e.g., Attempt #1, Attempt #2) showing the exact target model and error messages.
- Cache Spans: Logs response cache lookups and hits.
Workspace Analytics Dashboards
The workspace metrics dashboard (/workspaces/[wsId]/metrics) compiles telemetry into several primary analytics tabs, preceded by a high-level KPI Card Grid showing total requests, costs, tokens, average latency, and cache hit rates.
1. Usage & Cost
Tracks consumption and throughput over time:
- Trends: Monitors request volume, token counts, and cost charts over time to track integration activity.
2. Performance
Analyzes latency and response distributions:
- Latency Percentiles: Displays
P50(median),P95, andP99latency markers to help developers track tail-latency degradation. - Model Latencies: Compares the average response time of different whitelisted models within the workspace.
3. Models & Deployments
Provides granular breakdown per execution target:
- Usage & Cost Share: Displays which models and deployments are consuming the majority of your budget or processing the most requests.
- Detailed Table: Lists request count, token volume, average latency, and costs grouped by model alias.
4. Cache
Tracks the optimization metrics generated by Infralo's Response Cache (configured in Deployments):
- Hit Rate: The percentage of requests served directly from the cache.
- Savings Analysis: Calculates the count of input/output tokens saved and cumulative financial cost savings.
- Latency Gain: Compares cache hit times (sub-millisecond) against cache miss times (provider network latency).
5. Insights
Provides advanced patterns and optimization audits:
- Distributions: Visualizes request finish reasons (e.g., stop, length, tool_calls, content_filter).
- Optimization: Flags anomalies and highlights opportunities for token/cost optimizations.