Ilura
GUIDE · FORGE-TRAIN-PUBLISH

How does the forge-train-publish flow work?

Forge: you define an agent in natural language — purpose, tools, boundaries. Train: every risky action surfaces as an approval prompt; your decisions are written to a Bayesian decision profile and into a LoRA adapter via DPO. Publish: when confidence and eval scores cross a threshold, the matured agent ships to a cloud runtime as an API. Use is training; no separate dataset preparation.

01Kimler için?

This flow speaks to:

  • Professionals who want an agent that decides like they do. Email triage, document review, calendar prep — your taste, encoded.
  • Small teams shipping customer-facing agents. The voice of the brand becomes the trained behavior.
  • Compliance-conscious users. Every decision the agent took is in the audit trail with a reason.

02Nasıl çalışır?

Modern agent training distills to four steps:

  1. Teacher-student. A frontier LLM (Claude, GPT, Gemini, Mistral) plays teacher — solves hard cases and emits "I reasoned this way" explanations. A local open-weight model (Llama, Mistral) on Ollama plays student.
  2. Human-in-the-loop trajectory. Every tool call comes to you first. Approvals are positive samples; denials are negative samples.
  3. Bayesian decision profile. Approvals/denials are summarized as Beta distributions and EMA (exponential moving average). The agent statistically tracks "in this situation, the user leans X."
  4. LoRA + DPO. LoRA (Low-Rank Adaptation) provides parameter-efficient fine-tuning; DPO (Direct Preference Optimization) shifts the model toward your preferences via preference pairs. No full retrain — small adapter files (~10-100 MB).

03Ilura ile nasıl yapılır?

In Ilura, training is part of using:

  • No separate "dataset prep" screen. The agent runs in a sandbox while you work; risky decisions surface in the approval flow.
  • Bayesian profile is visible. The Settings scene shows where the agent leans on each decision class — statistical, not anecdotal.
  • LoRA adapters are local. Training data, the adapter file, the Bayesian snapshot — all in local SQLite, hash-chained.
  • Learning continues after publish. Through the living tether, every production decision can become a new training signal once you review it.

04Sık sorulan sorular

How many examples do I need to train an agent?

Classical fine-tuning needs thousands of examples. LoRA + DPO drops this to 50-200. With Ilura's teacher-student approach, the first 5-10 approvals/denials already shape early behavior — LoRA fires once Bayesian confidence crosses a maturity threshold.

Does training data leave my machine?

No. The desktop app stores training examples, the Bayesian profile, and LoRA adapters in local SQLite. Only license verification and (optional) published-agent traffic talk to the cloud.

How does LoRA differ from full fine-tuning?

Full fine-tuning rewrites all model weights, produces gigabyte files, and consumes GPU hours. LoRA attaches small (10-100 MB) adapters next to the base model, leaves the base intact, and trains in minutes to hours. Ilura uses LoRA + DPO.

What is Direct Preference Optimization (DPO)?

DPO is a preference-learning method: two model responses are compared, the one the user prefers is rewarded, the other penalized. Simpler than RLHF (no reward model, no PPO loop) — you operate directly on preference pairs.

Which student models can I use?

Any open-weight model running on Ollama — Llama 3.1/3.2/3.3, Mistral, Qwen, Phi, Gemma. Ilura recommends a starting model based on your machine's memory and the agent's Bayesian maturity.

When does training fire?

Two triggers: (1) maturity threshold — once Bayesian confidence is high enough, LoRA runs automatically; (2) manual — a "train now" button in the Training scene. A 5-minute tick loop checks both paths.

05İlgili sayfalar

yanındayım — Ilura