Workspace Observability

Debug requests, trace execution spans, and analyze workspace costs and caching metrics.

Workspace Observability is designed for developers, AI engineers, and workspace managers. Scoped strictly to a single workspace, it provides the granular diagnostic details needed to inspect request payloads, verify pre- and post-processing runtime modules, and optimize model performance and costs.

Access & Permissions

Observability details inside a workspace are restricted to users with appropriate workspace-level permissions:

View Logs: Allows members to search logs and inspect request payloads and trace spans.
View Metrics: Allows members to access the workspace-level metrics dashboards.

Request Logs & Drawer Details

The workspace logs console (/workspaces/[wsId]/logs) is the developer's primary workspace console. Click any request row to open the Log Detail Drawer:

Metadata Header: Displays the unique Request ID, Trace ID, workspace API key hint used to authenticate the client, and custom end-user identifiers.
Payloads Tab: Offers full JSON code blocks showing the exact prompt/messages input sent to the gateway and the final text/JSON response returned from the LLM.
Token Usage: Displays the counts for input tokens, output tokens, and tokens saved by context caching.
Cost Metrics: Evaluates uncached input cost, cached input cost, and output cost in micro-dollars, calculated automatically using your collection pricing.

Distributed Tracing & Spans

When a workspace deployment is configured with multiple models (such as in a Fallback failover configuration) or chains multiple Runtime Modules, Infralo tracks the lifecycle of the transaction using Trace Spans.

A visual Trace Canvas maps out the execution path of the request from start to finish. Each step contains its own start time, end time, and latency in milliseconds.

Trace Span Types

Gateway Spans: Shows the initial request interception, routing selection, and endpoint classification (chat, embeddings, responses, models).
Module Spans: Logs the latencies, inputs, and outputs of Pre-processing (PRE stage) and Post-processing (POST stage) runtime modules (e.g., measuring how long a PII Tokenization module took and its exact string substitutions).
LLM Spans: Records the connection to the model provider. If a deployment triggers a retry or failover (e.g., in a Fallback configuration), each attempt is logged as an independent child span (e.g., Attempt #1, Attempt #2) showing the exact target model and error messages.
Cache Spans: Logs response cache lookups and hits.

Workspace Analytics Dashboards

The workspace metrics dashboard (/workspaces/[wsId]/metrics) compiles telemetry into several primary analytics tabs, preceded by a high-level KPI Card Grid showing total requests, costs, tokens, average latency, and cache hit rates.

1. Usage & Cost

Tracks consumption and throughput over time:

Trends: Monitors request volume, token counts, and cost charts over time to track integration activity.

2. Performance

Analyzes latency and response distributions:

Latency Percentiles: Displays P50 (median), P95, and P99 latency markers to help developers track tail-latency degradation.
Model Latencies: Compares the average response time of different whitelisted models within the workspace.

3. Models & Deployments

Provides granular breakdown per execution target:

Usage & Cost Share: Displays which models and deployments are consuming the majority of your budget or processing the most requests.
Detailed Table: Lists request count, token volume, average latency, and costs grouped by model alias.

4. Cache

Tracks the optimization metrics generated by Infralo's Response Cache (configured in Deployments):

Hit Rate: The percentage of requests served directly from the cache.
Savings Analysis: Calculates the count of input/output tokens saved and cumulative financial cost savings.
Latency Gain: Compares cache hit times (sub-millisecond) against cache miss times (provider network latency).

5. Insights

Provides advanced patterns and optimization audits:

Distributions: Visualizes request finish reasons (e.g., stop, length, tool_calls, content_filter).
Optimization: Flags anomalies and highlights opportunities for token/cost optimizations.

Workspace Observability

On this page