Decision Pipeline
How rules, ML, and LLM escalation work together to make fast decisions.
The decision pipeline is the core of Sparkient. Every call to /decide flows through three stages, each progressively more powerful — and only as far as needed.
The Three Stages
Input → [1. Rules] → [2. Classifier] → [3. Escalation] → Response
< 1ms < 100ms 150ms+Stage 1: Hard Rules (CEL)
Latency: < 1ms
CEL rules are evaluated first. If any rule matches, the decision is returned immediately. This is the fastest path — pure logic, no ML involved.
Use this for:
- Compliance requirements ("always block amounts over $50,000")
- Known patterns ("reject if the user is banned")
- Rate limiting ("escalate if more than 10 requests in 1 minute")
Stage 2: ML Classifier (ONNX)
Latency: < 100ms (typically 5–30ms)
If no rules match and a trained model is deployed, the classifier runs inference. It uses the ONNX-exported LightGBM model with pre-computed features and optional text embeddings.
The classifier returns a decision along with:
- Confidence score (0.0 to 1.0)
- Class probabilities for all options
- Reason codes from the training data
If the confidence is above the auto_decide threshold, the decision is returned. If it's below the escalation threshold, it moves to Stage 3.
Stage 3: LLM Escalation (Gemini)
Latency: 150ms–3s
The fallback for low-confidence decisions. Gemini receives the input and decision type context and produces a structured decision with explanation.
This stage includes:
- Automatic retry with exponential backoff
- Structured output parsing
- Timeout protection
In practice, a well-trained model escalates less than 5% of decisions. The LLM is a safety net, not the primary path.
Response Format
Every decision — regardless of which stage produced it — returns the same structured format:
{
"decision": "approve",
"confidence": 0.94,
"reason_codes": ["safe_content"],
"latency_ms": 8.3,
"stage": "classifier",
"escalate": false,
"fallback_used": false,
"rules_triggered": [],
"class_probabilities": {
"approve": 0.94,
"flag": 0.04,
"reject": 0.02
},
"request_id": "req_abc123"
}The stage field tells you which stage produced the decision:
"rules"— a hard rule matched"classifier"— the ML model decided"escalation"— the LLM fallback was used"fallback"— the LLM escalation was triggered due to an error
Latency Breakdown
| Stage | Typical Latency | When It Runs |
|---|---|---|
| Rules | 0.1–0.5ms | Always (first check) |
| Feature extraction | 1–5ms | If no rule matched |
| Text embedding | 2–8ms | If input has text fields |
| ONNX inference | 0.5–2ms | If model is deployed |
| Total (no escalation) | 5–30ms | 95%+ of requests |
| LLM escalation | 150–3000ms | Low-confidence or no model |