Sparkient Docs

Sparkient's training pipeline uses a teacher-student architecture: a large language model (the teacher) generates training data, and a small, fast model (the student) learns to replicate its decisions.

Training a model is required before you can call /decide. Deploying a trained model unlocks the decision endpoint for that decision type.

The Training Pipeline

Define Decision Type
        ↓
  Generate Examples (LLM teacher)
        ↓
  Label Examples (LLM teacher)
        ↓
  Augment Rare Classes (LLM teacher)
        ↓
  Feature Engineering (auto-detected)
        ↓
  Text Encoding
        ↓
  Model Training
        ↓
  Model Export
        ↓
  Deploy to Production

Synthetic Data Generation

You don't need to bring your own training data. Sparkient's teacher LLM generates realistic, diverse examples based solely on your decision type definition.

Generation — The teacher LLM creates input examples that cover the full space of possible decisions
Labelling — The teacher LLM assigns decisions and reason codes to each example, using the same reasoning a human expert would apply
Augmentation — Gap analysis identifies underrepresented classes, and the teacher LLM generates targeted examples to balance the dataset

Feature Engineering

Features are auto-detected from your input schema:

Input Type	Feature Strategy
Numbers	Z-score normalization
Booleans	Binary encoding
Strings (short)	Categorical encoding
Strings (long)	Lightweight text embedding (256-dim, sub-ms)
Arrays	Length + aggregation features
Nested objects	Flattened with dot notation

For text-heavy decisions, a fine-tuned text encoder is trained on your data and its embeddings are stacked as features for the final classifier.

Model Training

The final classifier is a gradient-boosted model with automated hyperparameter tuning:

Automatic cross-validation
Bayesian hyperparameter optimization
Multi-class classification with probability calibration
Export to a compiled model format for portable, fast inference

Triggering Training

curl -X POST https://api.sparkient.ai/api/v1/decision-types/{id}/train \
  -H "Authorization: Bearer YOUR_API_KEY"

Training runs asynchronously. You can check the status via the dashboard or the policies endpoint.

Tuning Augmentation Size

Data augmentation is the single biggest lever for model quality. The augment_target_size parameter controls how many total training examples the pipeline targets after augmentation.

Default: number_of_options × 300 (e.g. 4 options → 1,200 examples)

For most decision types, the default produces a good model. Increase it when:

Your decision involves free-text input (descriptions, messages, reviews) — text classifiers benefit from more diverse examples
You have many options (6+) — each class needs enough examples for the model to learn the boundaries
Your model's F1 score is below your target — more data often helps

Increase augmentation target

{
  "preset": "balanced",
  "augment_target_size": 2000,
  "target_f1": 0.85
}

Start with the default. Check your model's F1 score after training, then increase augment_target_size and retrain if quality is below your threshold. Maximum: 5,000.

Deploying a Model

After training completes, a policy is created containing the trained model. To activate it:

curl -X POST https://api.sparkient.ai/api/v1/decision-types/{id}/policies/{policy_id}/deploy \
  -H "Authorization: Bearer YOUR_API_KEY"

Once deployed, subsequent /decide calls use the trained model instead of falling through to the LLM escalation path.

Training