How does the forge-train-publish flow work?
Forge: you define an agent in natural language — purpose, tools, boundaries. Train: every risky action surfaces as an approval prompt; your decisions are written to a Bayesian decision profile and into a LoRA adapter via DPO. Publish: when confidence and eval scores cross a threshold, the matured agent ships to a cloud runtime as an API. Use is training; no separate dataset preparation.
01Kimler için?
This flow speaks to:
- Professionals who want an agent that decides like they do. Email triage, document review, calendar prep — your taste, encoded.
- Small teams shipping customer-facing agents. The voice of the brand becomes the trained behavior.
- Compliance-conscious users. Every decision the agent took is in the audit trail with a reason.
02Nasıl çalışır?
Modern agent training distills to four steps:
- Teacher-student. A frontier LLM (Claude, GPT, Gemini, Mistral) plays teacher — solves hard cases and emits "I reasoned this way" explanations. A local open-weight model (Llama, Mistral) on Ollama plays student.
- Human-in-the-loop trajectory. Every tool call comes to you first. Approvals are positive samples; denials are negative samples.
- Bayesian decision profile. Approvals/denials are summarized as Beta distributions and EMA (exponential moving average). The agent statistically tracks "in this situation, the user leans X."
- LoRA + DPO. LoRA (Low-Rank Adaptation) provides parameter-efficient fine-tuning; DPO (Direct Preference Optimization) shifts the model toward your preferences via preference pairs. No full retrain — small adapter files (~10-100 MB).
03Ilura ile nasıl yapılır?
In Ilura, training is part of using:
- No separate "dataset prep" screen. The agent runs in a sandbox while you work; risky decisions surface in the approval flow.
- Bayesian profile is visible. The Settings scene shows where the agent leans on each decision class — statistical, not anecdotal.
- LoRA adapters are local. Training data, the adapter file, the Bayesian snapshot — all in local SQLite, hash-chained.
- Learning continues after publish. Through the living tether, every production decision can become a new training signal once you review it.
04Sık sorulan sorular
How many examples do I need to train an agent?
Classical fine-tuning needs thousands of examples. LoRA + DPO drops this to 50-200. With Ilura's teacher-student approach, the first 5-10 approvals/denials already shape early behavior — LoRA fires once Bayesian confidence crosses a maturity threshold.
Does training data leave my machine?
No. The desktop app stores training examples, the Bayesian profile, and LoRA adapters in local SQLite. Only license verification and (optional) published-agent traffic talk to the cloud.
How does LoRA differ from full fine-tuning?
Full fine-tuning rewrites all model weights, produces gigabyte files, and consumes GPU hours. LoRA attaches small (10-100 MB) adapters next to the base model, leaves the base intact, and trains in minutes to hours. Ilura uses LoRA + DPO.
What is Direct Preference Optimization (DPO)?
DPO is a preference-learning method: two model responses are compared, the one the user prefers is rewarded, the other penalized. Simpler than RLHF (no reward model, no PPO loop) — you operate directly on preference pairs.
Which student models can I use?
Any open-weight model running on Ollama — Llama 3.1/3.2/3.3, Mistral, Qwen, Phi, Gemma. Ilura recommends a starting model based on your machine's memory and the agent's Bayesian maturity.
When does training fire?
Two triggers: (1) maturity threshold — once Bayesian confidence is high enough, LoRA runs automatically; (2) manual — a "train now" button in the Training scene. A 5-minute tick loop checks both paths.
05İlgili sayfalar
yanındayım — Ilura