Sparkient Docs
Core Concepts

Training

How Sparkient compiles LLM intelligence into fast ML models.

Sparkient's training pipeline uses a teacher-student architecture: a large language model (the teacher) generates training data, and a small, fast model (the student) learns to replicate its decisions.

The Training Pipeline

Define Decision Type

  Generate Examples (Gemini teacher)

  Label Examples (Gemini teacher)

  Augment Rare Classes (Gemini teacher)

  Feature Engineering (auto-detected)

  Text Encoding (DeBERTa-v3)

  Model Training (LightGBM + Optuna)

  ONNX Export

  Deploy to Production

Synthetic Data Generation

You don't need to bring your own training data. Sparkient's teacher LLM generates realistic, diverse examples based solely on your decision type definition.

  1. Generation — Gemini creates input examples that cover the full space of possible decisions
  2. Labelling — Gemini assigns decisions and reason codes to each example, using the same reasoning a human expert would apply
  3. Augmentation — Gap analysis identifies underrepresented classes, and Gemini generates targeted examples to balance the dataset

Feature Engineering

Features are auto-detected from your input schema:

Input TypeFeature Strategy
NumbersZ-score normalization
BooleansBinary encoding
Strings (short)Categorical encoding
Strings (long)Text embedding (Model2Vec, 256-dim, sub-ms)
ArraysLength + aggregation features
Nested objectsFlattened with dot notation

For text-heavy decisions, a DeBERTa-v3-small encoder is fine-tuned on your data and its embeddings are stacked as features for the final classifier.

Model Training

The final classifier is LightGBM with Optuna hyperparameter tuning:

  • Automatic cross-validation
  • Bayesian hyperparameter optimization
  • Multi-class classification with probability calibration
  • Export to ONNX format for portable, fast inference

Triggering Training

curl -X POST https://api.sparkient.ai/api/v1/decision-types/{id}/train \
  -H "Authorization: Bearer YOUR_API_KEY"

Training runs asynchronously. You can check the status via the dashboard or the policies endpoint.

Deploying a Model

After training completes, a policy is created containing the trained model. To activate it:

curl -X POST https://api.sparkient.ai/api/v1/decision-types/{id}/policies/{policy_id}/deploy \
  -H "Authorization: Bearer YOUR_API_KEY"

Once deployed, subsequent /decide calls use the trained model instead of falling through to the LLM escalation path.

On this page