Training
How Sparkient compiles LLM intelligence into fast ML models.
Sparkient's training pipeline uses a teacher-student architecture: a large language model (the teacher) generates training data, and a small, fast model (the student) learns to replicate its decisions.
The Training Pipeline
Define Decision Type
↓
Generate Examples (Gemini teacher)
↓
Label Examples (Gemini teacher)
↓
Augment Rare Classes (Gemini teacher)
↓
Feature Engineering (auto-detected)
↓
Text Encoding (DeBERTa-v3)
↓
Model Training (LightGBM + Optuna)
↓
ONNX Export
↓
Deploy to ProductionSynthetic Data Generation
You don't need to bring your own training data. Sparkient's teacher LLM generates realistic, diverse examples based solely on your decision type definition.
- Generation — Gemini creates input examples that cover the full space of possible decisions
- Labelling — Gemini assigns decisions and reason codes to each example, using the same reasoning a human expert would apply
- Augmentation — Gap analysis identifies underrepresented classes, and Gemini generates targeted examples to balance the dataset
Feature Engineering
Features are auto-detected from your input schema:
| Input Type | Feature Strategy |
|---|---|
| Numbers | Z-score normalization |
| Booleans | Binary encoding |
| Strings (short) | Categorical encoding |
| Strings (long) | Text embedding (Model2Vec, 256-dim, sub-ms) |
| Arrays | Length + aggregation features |
| Nested objects | Flattened with dot notation |
For text-heavy decisions, a DeBERTa-v3-small encoder is fine-tuned on your data and its embeddings are stacked as features for the final classifier.
Model Training
The final classifier is LightGBM with Optuna hyperparameter tuning:
- Automatic cross-validation
- Bayesian hyperparameter optimization
- Multi-class classification with probability calibration
- Export to ONNX format for portable, fast inference
Triggering Training
curl -X POST https://api.sparkient.ai/api/v1/decision-types/{id}/train \
-H "Authorization: Bearer YOUR_API_KEY"Training runs asynchronously. You can check the status via the dashboard or the policies endpoint.
Deploying a Model
After training completes, a policy is created containing the trained model. To activate it:
curl -X POST https://api.sparkient.ai/api/v1/decision-types/{id}/policies/{policy_id}/deploy \
-H "Authorization: Bearer YOUR_API_KEY"Once deployed, subsequent /decide calls use the trained model instead of falling through to the LLM escalation path.