TheUniversal
Intent Framework

An RL framework and finetuning API that teaches any model what to want.
Multimodal alignment for the next generation of agents.

Talk to us Playground

0.0×

Reward convergence

0.0%

Fewer misaligned actions

<0.0 min

API integration

0.0+

Modalities supported

The Alignment Gap

Context tells a model what happened.It doesn't teach what to want.

Modern architectures optimize surface-level correlations. Without an explicit representation of intent, models break down the moment the objective shifts.

Current Paradigm

Learns what to do, not why

Standard finetuning embeds task-specific behavior into weights through demonstration. The model mimics trajectories without internalizing the underlying objective—leading to brittle generalization and reward hacking under distribution shift.

π(a|s) ≈ argmax P(a|context)// optimizes surface correlation

Target Paradigm

Grounds action in human objectives

Intent Layer introduces an explicit intent representation between perception and action. The model learns to map observations to a structured objective space first, then derives actions—enabling transfer, composability, and alignment by construction.

π(a|s) = argmax R(a, intent(s))// actions grounded in intent

Reward hacking at scale

Models exploit reward proxies without intent grounding.

Objective non-stationarity

Context models can't adapt when goals shift, only inputs.

Non-composable behaviors

Multi-step reasoning collapses without decomposed intent.

Modality silos

Vision, language, and action need a unified intent space.

“The gap between a model that can follow instructions and one that understands objectives is the same gap between automation and intelligence.”

System Architecture

A programmable intent layerbetween your model and the world

XEROML sits between model outputs and environment actions. It uses reinforcement learning to shape, filter, and align model intentions—during finetuning or at inference.

architecture.svg

Forward passReward loop

XEROML Framework

Full-stack intent engineering suite

Multimodal intent classification

Reward shaping

Real-time alignment

Policy gradient optimization

Forward inference

Every model output is intercepted, intent-classified, and either forwarded or corrected before reaching the environment.

Reward shaping

The RL engine adjusts reward signals in real time based on intent alignment scores—no manual reward engineering.

Policy gradient

PPO and GRPO updates bake alignment directly into model weights during finetuning with intent-conditioned loss.

Inference filtering

Deploy in passthrough mode to filter and correct outputs at inference time with zero weight updates.

Verticals & Applications

One framework.
Every domain.

From cloud-native agents to physical robots, Intent Layer adapts to your modality and deployment target.

verticals.config

view_tree.tsx

Root Intent

Complete Q3 financial close on time and error-free

active

Reconcile all accounts receivable

Active Constraint

deadline: March 15

pending

Generate consolidated P&L across 3 entities

Active Constraint

GAAP compliance

pending

Prepare variance analysis vs. budget

See it in action

Five lines to
aligned agents

Integrate Intent Layer into any finetuning pipeline with our Python SDK. Define intents, attach rewards, and start training.

from intentlayer import IntentLayer, Reward, Intent

# Initialize the intent layer with your model
il = IntentLayer(model="your-model-id", api_key="il_key_...")

# Define intents with natural language constraints
il.add_intent(
    Intent("task_completion", desc="Complete user task accurately"),
    reward=Reward.from_human_feedback(weight=1.0)
)
il.add_intent(
    Intent("safety", desc="Never produce harmful output"),
    reward=Reward.hard_constraint(penalty=-10.0)
)

# Finetune with intent-aware RL
il.train(dataset="your-dataset", epochs=3, method="ppo")

from intentlayer import IntentLayer, SensorFusion, Intent
from intentlayer.embodied import SimEnv, ActionSpace

# Connect to your simulation or robot
env = SimEnv("pybullet://manipulation-v2")
il = IntentLayer(model="policy-net-v3", modality="vision+proprioception")

# Define embodied intents
il.add_intent(Intent("grasp_object",
    desc="Pick up target object without collision",
    sensors=SensorFusion(["rgb_cam", "force_torque", "joint_pos"])
))

# Train with sim-to-real intent transfer
il.train(env=env, episodes=10000, transfer="sim2real")

from intentlayer import IntentLayer, MultimodalIntent
from intentlayer.modalities import TextEncoder, VisionEncoder, AudioEncoder

# Stack modalities into a unified intent space
il = IntentLayer(
    model="multimodal-v2",
    encoders=[TextEncoder(), VisionEncoder(), AudioEncoder()]
)

# Cross-modal intent: "describe what you see, in the tone you hear"
il.add_intent(MultimodalIntent(
    name="cross_modal_coherence",
    desc="Maintain semantic alignment across all input modalities",
    fusion="attention"  # or "concat", "late_fusion"
))

il.train(dataset="multimodal-corpus", method="grpo")

▸ Output

✓ Intent layer initialized · 2 intents registered
✓ Connected to reward backend · human_feedback + hard_constraint
✓ Training started · PPO · 3 epochs
  Epoch 1/3 — reward: 0.72 · intent_alignment: 0.84 · safety_violations: 0
  Epoch 2/3 — reward: 0.89 · intent_alignment: 0.93 · safety_violations: 0
  Epoch 3/3 — reward: 0.94 · intent_alignment: 0.97 · safety_violations: 0
✓ Training complete · model checkpointed → il_ckpt_003

Developer API

One endpoint. Structured intent output.

Send any model's raw output through our API. Get back structured intent classification, alignment scores, and actionable metrics—in real time.

POSThttps://api.xeroml.com/v1/evaluate

Request Payload

JSON

{
  "model_id": "your-model-v3",
  "input": {
    "modality": "text",
    "prompt": "Book a flight to SF...",
    "context": "user_calendar, travel_preferences"
  },
  "model_output": {
    "actions": [
      "search_flights(SFO, Mar 15-18)",
      "book_hotel(downtown SF, 3 nights)"
    ]
  },
  "intents": ["task_completion", "cost_optimization"],
  "eval_mode": "full"
}

Response

200 OK

{
  "intent_alignment": {
    "overall_score": 0.94,
    "task_completion": 0.97,
    "cost_optimization": 0.88
  },
  "risk_flags": [],
  "action_quality": {
    "hallucination_prob": 0.02,
    "redundant_actions": 0,
    "missing_steps": ["confirm_dates"]
  },
  "reward_signal": 0.91,
  "latency_ms": 18
}

73%

Failure rate reduction

vs. no intent layer

2.8×

Task efficiency gain

fewer redundant actions

96%

Hallucination caught

pre-action filtering

18ms

Median latency

p50 overhead

Benchmarks

Numbers that matter

Evaluated against baseline RLHF and vanilla finetuning on standard alignment and capability benchmarks.

97.3%

Intent Alignment Score

AgentBench v2

3.2×

Faster Convergence

vs. vanilla PPO

-47%

Misaligned Actions

vs. RLHF baseline

0.02s

Inference Overhead

p99 latency

Performance comparison

XEROML vs. RLHF Baseline

XEROML

Baseline

TheUniversal
Intent Framework

Context tells a model what happened.It doesn't teach what to want.

Learns what to do, not why

Grounds action in human objectives

Reward hacking at scale

Objective non-stationarity

Non-composable behaviors

Modality silos

A programmable intent layerbetween your model and the world

Forward inference

Reward shaping

Policy gradient

Inference filtering

One intent layer.
Every form of intelligence.

One framework.
Every domain.

Five lines to
aligned agents

One endpoint. Structured intent output.

Failure rate reduction

Task efficiency gain

Hallucination caught

Median latency

Numbers that matter

Intent Alignment Score

Faster Convergence

Misaligned Actions

Inference Overhead

XEROML vs. RLHF Baseline

Built for production teams

Framework Agnostic

Real-Time Intent Dashboard

Train & Inference Modes

Multimodal Native

Hard Safety Constraints

Self-Hosted or Cloud API

Align your models with intent,
not just data

Latest thinking

Intent-Conditioned RL: Moving Beyond Reward Hacking

Building an Intent-Aware Coding Agent in 30 Minutes

Sim-to-Real Transfer with Embodied Intents at Scale

TheUniversal Intent Framework

Context tells a model what happened.It doesn't teach what to want.

Learns what to do, not why

Grounds action in human objectives

Reward hacking at scale

Objective non-stationarity

Non-composable behaviors

Modality silos

A programmable intent layerbetween your model and the world

Forward inference

Reward shaping

Policy gradient

Inference filtering

One intent layer.Every form of intelligence.

One framework.Every domain.

Five lines to aligned agents

One endpoint. Structured intent output.

Failure rate reduction

Task efficiency gain

Hallucination caught

Median latency

Numbers that matter

Intent Alignment Score

Faster Convergence

Misaligned Actions

Inference Overhead

XEROML vs. RLHF Baseline

Built for production teams

Framework Agnostic

Real-Time Intent Dashboard

Train & Inference Modes

Multimodal Native

Hard Safety Constraints

Self-Hosted or Cloud API

Align your models with intent,not just data

Latest thinking

Intent-Conditioned RL: Moving Beyond Reward Hacking

Building an Intent-Aware Coding Agent in 30 Minutes

Sim-to-Real Transfer with Embodied Intents at Scale

TheUniversal
Intent Framework

One intent layer.
Every form of intelligence.

One framework.
Every domain.

Five lines to
aligned agents

Align your models with intent,
not just data