Runtime Modules

Pre- and post-processing modules that extend the Infralo gateway pipeline without touching your application code.

Runtime Modules are pluggable processing units that execute as part of the Infralo gateway request pipeline — before the request is sent to the LLM (PRE stage), after the LLM response is received (POST stage), or both.

They let you enforce security policies, transform data, and enrich context at the infrastructure level, so every LLM request processed by your workspace automatically benefits — no application-level changes required.


How It Works

Every request flowing through the Infralo gateway passes through an ordered pipeline:

Client Request


 ┌─────────────┐
 │  PRE Stage  │  ← Safety + Transformation modules run here
 └─────────────┘


 ┌─────────────┐
 │  LLM Call   │  ← Request is sent to the provider
 └─────────────┘


 ┌─────────────┐
 │  POST Stage │  ← Safety + Transformation modules run here
 └─────────────┘


  Client Response

Modules are attached to a Deployment and run in the order you configure them. Multiple modules can run in the same stage and their outputs are chained — the output of one becomes the input of the next.


Module Categories

Safety

Safety modules protect your data and enforce compliance policies on traffic flowing through the gateway.

ModuleStageDescription
PII RedactionPRE / POSTIrreversibly masks detected PII with [REDACTED:TYPE] placeholders
PII TokenizationPREReplaces PII with reversible tokens [PII_<ID>] stored in Redis
PII RestorationPOSTRestores tokenized PII in LLM responses back to original values

Transformation

Transformation modules shape how content flows into and out of the LLM — optimizing tokens, standardizing output, and injecting context.

ModuleStageDescription
Regex ReplacementPRE / POSTApplies user-defined regex substitution rules to requests or responses
JSON to TOONPREConverts embedded JSON to TOON format for 30–60% token reduction
Current DatetimePREInjects the current date and time into each request for temporal context
JSON NormalizerPOSTExtracts, parses, and optionally validates JSON output from LLM responses
Content TemplatePRE / POSTWraps messages or responses using a configurable template string

Stage Reference

StageWhen it runsTypical use cases
PREBefore the request is forwarded to the LLMSanitize inputs, inject context, transform format
POSTAfter the LLM response is receivedRedact outputs, normalize JSON, restore tokens
PRE / POSTBoth stages (configured independently)PII redaction, regex rules, content templates

Modules are per-deployment

Runtime Modules are configured at the Deployment level. Each deployment has its own independent set of modules and stage configuration.


Attaching a Module to a Deployment

  1. Navigate to your workspace and open the Deployments section.
  2. Select the deployment you want to configure.
  3. Open the Runtime Modules tab.
  4. Click Add module to PRE/POST stage, select a module from the popover list to add it to your draft configuration.
  5. Adjust the execution order or group modules into parallel execution steps using the Step controls (chevron buttons) next to each module.
  6. Click Save Changes at the top of the tab to persist your configuration.

Combining Modules

Modules are designed to be composed. A common pattern for end-to-end PII protection is:

PRE Stage:  PII Tokenization  →  (LLM sees tokens, not real data)
POST Stage: PII Restoration   →  (Client sees original values in response)

Order matters

Modules in the same stage execute in the order they appear in the list. Ensure your pipeline order is correct — for example, run PII Tokenization before any Content Template in the PRE stage so the template wraps already-sanitized content.


Sequential vs. Parallel Execution

By default, Infralo executes modules within a stage sequentially — each module receives the output of the previous one as its input. This is the safest and most predictable execution model.

PRE Stage (Sequential — default):

  Raw Request


  [Module 1: PII Tokenization]
      │  output feeds into next module

  [Module 2: Content Template]
      │  output feeds into next module

  [Module 3: Regex Replacement]


  Processed Request → LLM

When to Consider Parallel Execution

If you have multiple independent modules in the same stage — modules that read the original input but do not depend on each other's output — you can consider running them in parallel to reduce per-request latency.

PRE Stage (Parallel — advanced):

              Raw Request
             /            \
            ▼              ▼
  [Module A: Datetime]  [Module B: Regex]
            \              /
             ▼            ▼
         Merge & Continue → LLM

Parallel execution is most appropriate when:

  • Modules are stateless and read-only with respect to the shared request state.
  • Each module's transformation targets distinct, non-overlapping fields (e.g., Module A appends to the system prompt, Module B rewrites a user message field).
  • None of the modules depend on the result of a sibling module in the same stage.

Race Condition Risks

Parallelizing modules that write to the same field is dangerous

If two modules running concurrently both modify the same field (e.g., both inject content into the system message), the final value is non-deterministic — whichever module finishes last will overwrite the other's output.

Common scenarios that will cause race conditions if parallelized:

ScenarioRisk
PII Tokenization + Content Template (both modify the messages array)Template may wrap un-tokenized content or vice versa, causing data leakage
PII Tokenization + PII Redaction on the same stageToken map may be built on already-redacted text, making restoration impossible
Multiple Regex Replacements targeting overlapping patterns in the same fieldReplacements interfere — output depends on execution order, which is undefined in parallel mode

Recommendation

Execution ModeWhen to Use
Sequential (default)Always use this when modules share or mutate the same fields. Safe, predictable, easier to debug.
Parallel (advanced)Only use when modules are provably independent and target distinct fields. Benchmark first — the overhead of coordination can outweigh gains for simple modules.

Start sequential, optimize later

Design your module pipeline sequentially first. Profile your gateway latency under load, then selectively parallelize only the modules you have verified are independent. Premature parallelization is a common source of subtle, hard-to-reproduce data corruption bugs in production.

On this page