TheUniversal
Intent Framework

An RL framework and finetuning API that teaches any model what to want.
Multimodal alignment for the next generation of agents.

0.0×

Reward convergence

0.0%

Fewer misaligned actions

<0.0 min

API integration

0.0+

Modalities supported

The Alignment Gap

Context tells a model what happened.It doesn't teach what to want.

Modern architectures optimize surface-level correlations. Without an explicit representation of intent, models break down the moment the objective shifts.

Current Paradigm

Learns what to do, not why

Standard finetuning embeds task-specific behavior into weights through demonstration. The model mimics trajectories without internalizing the underlying objective—leading to brittle generalization and reward hacking under distribution shift.

π(a|s) ≈ argmax P(a|context)// optimizes surface correlation
Target Paradigm

Grounds action in human objectives

Intent Layer introduces an explicit intent representation between perception and action. The model learns to map observations to a structured objective space first, then derives actions—enabling transfer, composability, and alignment by construction.

π(a|s) = argmax R(a, intent(s))// actions grounded in intent

Reward hacking at scale

Models exploit reward proxies without intent grounding.

Objective non-stationarity

Context models can't adapt when goals shift, only inputs.

Non-composable behaviors

Multi-step reasoning collapses without decomposed intent.

Modality silos

Vision, language, and action need a unified intent space.

“The gap between a model that can follow instructions and one that understands objectives is the same gap between automation and intelligence.”

System Architecture

A programmable intent layerbetween your model and the world

XEROML sits between model outputs and environment actions. It uses reinforcement learning to shape, filter, and align model intentions—during finetuning or at inference.

architecture.svg
Forward passReward loop
FORWARD PASSREWARD LOOPLLM / VLMPolicy modelINTENT LAYERAlignment engineACTIONReal-world effectREWARDFeedback signalRL UPDATEPolicy gradientENVIRONMENTReal or simulatedpolicy updateenv signal

XEROML Framework

Full-stack intent engineering suite
Multimodal intent classification
Reward shaping
Real-time alignment
Policy gradient optimization
01

Forward inference

Every model output is intercepted, intent-classified, and either forwarded or corrected before reaching the environment.

02

Reward shaping

The RL engine adjusts reward signals in real time based on intent alignment scores—no manual reward engineering.

03

Policy gradient

PPO and GRPO updates bake alignment directly into model weights during finetuning with intent-conditioned loss.

04

Inference filtering

Deploy in passthrough mode to filter and correct outputs at inference time with zero weight updates.

How intent drives intelligence

One intent layer.
Every form of intelligence.

From language models to robotic arms, intent is the common substrate that turns perception into purposeful action.

intent-architecture.svg
TEXT / NLPPrompts, instructionsVISIONImages, video, depthAUDIOSpeech, signalsSENSORSIMU, force, lidarENV STATEGame, simulationAGENTIC AITool use, planningEMBODIED AIManipulation, navigationREASONINGCoT, verificationSAFETYConstraint enforcementGENERATIONCreative, code, scienceINTENTLAYERINPUT MODALITIESAPPLICATIONS
Verticals & Applications

One framework.
Every domain.

From cloud-native agents to physical robots, Intent Layer adapts to your modality and deployment target.

verticals.config
Root Intent
Complete Q3 financial close on time and error-free
active
Reconcile all accounts receivable
Active Constraint
deadline: March 15
pending
Generate consolidated P&L across 3 entities
Active Constraint
GAAP compliance
pending
Prepare variance analysis vs. budget
See it in action

Five lines to
aligned agents

Integrate Intent Layer into any finetuning pipeline with our Python SDK. Define intents, attach rewards, and start training.

from intentlayer import IntentLayer, Reward, Intent

# Initialize the intent layer with your model
il = IntentLayer(model="your-model-id", api_key="il_key_...")

# Define intents with natural language constraints
il.add_intent(
    Intent("task_completion", desc="Complete user task accurately"),
    reward=Reward.from_human_feedback(weight=1.0)
)
il.add_intent(
    Intent("safety", desc="Never produce harmful output"),
    reward=Reward.hard_constraint(penalty=-10.0)
)

# Finetune with intent-aware RL
il.train(dataset="your-dataset", epochs=3, method="ppo")

▸ Output

Intent layer initialized · 2 intents registered
Connected to reward backend · human_feedback + hard_constraint
Training started · PPO · 3 epochs
Epoch 1/3 — reward: 0.72 · intent_alignment: 0.84 · safety_violations: 0
Epoch 2/3 — reward: 0.89 · intent_alignment: 0.93 · safety_violations: 0
Epoch 3/3 — reward: 0.94 · intent_alignment: 0.97 · safety_violations: 0
Training complete · model checkpointed → il_ckpt_003
Developer API

One endpoint. Structured intent output.

Send any model's raw output through our API. Get back structured intent classification, alignment scores, and actionable metrics—in real time.

POSThttps://api.xeroml.com/v1/evaluate
Request Payload
JSON
{
  "model_id": "your-model-v3",
  "input": {
    "modality": "text",
    "prompt": "Book a flight to SF...",
    "context": "user_calendar, travel_preferences"
  },
  "model_output": {
    "actions": [
      "search_flights(SFO, Mar 15-18)",
      "book_hotel(downtown SF, 3 nights)"
    ]
  },
  "intents": ["task_completion", "cost_optimization"],
  "eval_mode": "full"
}
Response
200 OK
{
  "intent_alignment": {
    "overall_score": 0.94,
    "task_completion": 0.97,
    "cost_optimization": 0.88
  },
  "risk_flags": [],
  "action_quality": {
    "hallucination_prob": 0.02,
    "redundant_actions": 0,
    "missing_steps": ["confirm_dates"]
  },
  "reward_signal": 0.91,
  "latency_ms": 18
}
73%

Failure rate reduction

vs. no intent layer

2.8×

Task efficiency gain

fewer redundant actions

96%

Hallucination caught

pre-action filtering

18ms

Median latency

p50 overhead

Benchmarks

Numbers that matter

Evaluated against baseline RLHF and vanilla finetuning on standard alignment and capability benchmarks.

97.3%

Intent Alignment Score

AgentBench v2

3.2×

Faster Convergence

vs. vanilla PPO

-47%

Misaligned Actions

vs. RLHF baseline

0.02s

Inference Overhead

p99 latency

Performance comparison

XEROML vs. RLHF Baseline

XEROML
Baseline
Why XEROML

Built for production teams

Drop-in integration, real-time observability, and first-class support for every major framework.

Framework Agnostic

Works with PyTorch, JAX, HuggingFace, vLLM, and any custom training loop. Three lines of code to integrate.

Real-Time Intent Dashboard

Monitor intent alignment, reward curves, and policy drift in real time. Set alerts for safety constraint violations.

Train & Inference Modes

Use during finetuning for RL-based alignment, or at inference for real-time intent filtering without retraining.

Multimodal Native

Text, vision, audio, sensor, and action spaces treated as first-class citizens. Cross-modal intent coherence out of the box.

Hard Safety Constraints

Define non-negotiable boundaries as hard constraints, not soft rewards. Formal guarantees for safety-critical deployments.

Self-Hosted or Cloud API

Run entirely on-prem for sensitive workloads, or use our managed API. Same SDK, same interface, your choice of deployment.

Align your models with intent,
not just data

Start building with Intent Layer today. Free tier for research. Enterprise plans for production.