Runtime Modules
Pre- and post-processing modules that extend the Infralo gateway pipeline without touching your application code.
Runtime Modules are pluggable processing units that execute as part of the Infralo gateway request pipeline — before the request is sent to the LLM (PRE stage), after the LLM response is received (POST stage), or both.
They let you enforce security policies, transform data, and enrich context at the infrastructure level, so every LLM request processed by your workspace automatically benefits — no application-level changes required.
How It Works
Every request flowing through the Infralo gateway passes through an ordered pipeline:
Client Request
│
▼
┌─────────────┐
│ PRE Stage │ ← Safety + Transformation modules run here
└─────────────┘
│
▼
┌─────────────┐
│ LLM Call │ ← Request is sent to the provider
└─────────────┘
│
▼
┌─────────────┐
│ POST Stage │ ← Safety + Transformation modules run here
└─────────────┘
│
▼
Client ResponseModules are attached to a Deployment and run in the order you configure them. Multiple modules can run in the same stage and their outputs are chained — the output of one becomes the input of the next.
Module Categories
Safety
Safety modules protect your data and enforce compliance policies on traffic flowing through the gateway.
| Module | Stage | Description |
|---|---|---|
| PII Redaction | PRE / POST | Irreversibly masks detected PII with [REDACTED:TYPE] placeholders |
| PII Tokenization | PRE | Replaces PII with reversible tokens [PII_<ID>] stored in Redis |
| PII Restoration | POST | Restores tokenized PII in LLM responses back to original values |
Transformation
Transformation modules shape how content flows into and out of the LLM — optimizing tokens, standardizing output, and injecting context.
| Module | Stage | Description |
|---|---|---|
| Regex Replacement | PRE / POST | Applies user-defined regex substitution rules to requests or responses |
| JSON to TOON | PRE | Converts embedded JSON to TOON format for 30–60% token reduction |
| Current Datetime | PRE | Injects the current date and time into each request for temporal context |
| JSON Normalizer | POST | Extracts, parses, and optionally validates JSON output from LLM responses |
| Content Template | PRE / POST | Wraps messages or responses using a configurable template string |
Stage Reference
| Stage | When it runs | Typical use cases |
|---|---|---|
| PRE | Before the request is forwarded to the LLM | Sanitize inputs, inject context, transform format |
| POST | After the LLM response is received | Redact outputs, normalize JSON, restore tokens |
| PRE / POST | Both stages (configured independently) | PII redaction, regex rules, content templates |
Modules are per-deployment
Runtime Modules are configured at the Deployment level. Each deployment has its own independent set of modules and stage configuration.
Attaching a Module to a Deployment
- Navigate to your workspace and open the Deployments section.
- Select the deployment you want to configure.
- Open the Runtime Modules tab.
- Click Add module to PRE/POST stage, select a module from the popover list to add it to your draft configuration.
- Adjust the execution order or group modules into parallel execution steps using the Step controls (chevron buttons) next to each module.
- Click Save Changes at the top of the tab to persist your configuration.
Combining Modules
Modules are designed to be composed. A common pattern for end-to-end PII protection is:
PRE Stage: PII Tokenization → (LLM sees tokens, not real data)
POST Stage: PII Restoration → (Client sees original values in response)Order matters
Modules in the same stage execute in the order they appear in the list. Ensure your pipeline order is correct — for example, run PII Tokenization before any Content Template in the PRE stage so the template wraps already-sanitized content.
Sequential vs. Parallel Execution
By default, Infralo executes modules within a stage sequentially — each module receives the output of the previous one as its input. This is the safest and most predictable execution model.
PRE Stage (Sequential — default):
Raw Request
│
▼
[Module 1: PII Tokenization]
│ output feeds into next module
▼
[Module 2: Content Template]
│ output feeds into next module
▼
[Module 3: Regex Replacement]
│
▼
Processed Request → LLMWhen to Consider Parallel Execution
If you have multiple independent modules in the same stage — modules that read the original input but do not depend on each other's output — you can consider running them in parallel to reduce per-request latency.
PRE Stage (Parallel — advanced):
Raw Request
/ \
▼ ▼
[Module A: Datetime] [Module B: Regex]
\ /
▼ ▼
Merge & Continue → LLMParallel execution is most appropriate when:
- Modules are stateless and read-only with respect to the shared request state.
- Each module's transformation targets distinct, non-overlapping fields (e.g., Module A appends to the system prompt, Module B rewrites a user message field).
- None of the modules depend on the result of a sibling module in the same stage.
Race Condition Risks
Parallelizing modules that write to the same field is dangerous
If two modules running concurrently both modify the same field (e.g., both inject content into the system message), the final value is non-deterministic — whichever module finishes last will overwrite the other's output.
Common scenarios that will cause race conditions if parallelized:
| Scenario | Risk |
|---|---|
PII Tokenization + Content Template (both modify the messages array) | Template may wrap un-tokenized content or vice versa, causing data leakage |
| PII Tokenization + PII Redaction on the same stage | Token map may be built on already-redacted text, making restoration impossible |
| Multiple Regex Replacements targeting overlapping patterns in the same field | Replacements interfere — output depends on execution order, which is undefined in parallel mode |
Recommendation
| Execution Mode | When to Use |
|---|---|
| Sequential (default) | Always use this when modules share or mutate the same fields. Safe, predictable, easier to debug. |
| Parallel (advanced) | Only use when modules are provably independent and target distinct fields. Benchmark first — the overhead of coordination can outweigh gains for simple modules. |
Start sequential, optimize later
Design your module pipeline sequentially first. Profile your gateway latency under load, then selectively parallelize only the modules you have verified are independent. Premature parallelization is a common source of subtle, hard-to-reproduce data corruption bugs in production.