Q: How does LoRA differ from full fine-tuning?

Full fine-tuning rewrites all model weights, produces gigabyte files, and consumes GPU hours. LoRA attaches small (10-100 MB) adapters next to the base model, leaves the base intact, and trains in minutes to hours. Ilura uses LoRA + DPO.

Q: When does training fire?

Two triggers: (1) maturity threshold — once Bayesian confidence is high enough, LoRA runs automatically; (2) manual — a "train now" button in the Training scene. A 5-minute tick loop checks both paths.

Question 1

How does the forge-train-publish flow work?

Accepted Answer

Forge: you define an agent in natural language — purpose, tools, boundaries. Train: every risky action surfaces as an approval prompt; your decisions are written to a Bayesian decision profile and into a LoRA adapter via DPO. Publish: when confidence and eval scores cross a threshold, the matured agent ships to a cloud runtime as an API. Use is training; no separate dataset preparation.

Question 2

How many examples do I need to train an agent?

Accepted Answer

Classical fine-tuning needs thousands of examples. LoRA + DPO drops this to 50-200. With Ilura's teacher-student approach, the first 5-10 approvals/denials already shape early behavior — LoRA fires once Bayesian confidence crosses a maturity threshold.

Question 3

Does training data leave my machine?

Accepted Answer

No. The desktop app stores training examples, the Bayesian profile, and LoRA adapters in local SQLite. Only license verification and (optional) published-agent traffic talk to the cloud.

Question 4

How does LoRA differ from full fine-tuning?

Accepted Answer

Full fine-tuning rewrites all model weights, produces gigabyte files, and consumes GPU hours. LoRA attaches small (10-100 MB) adapters next to the base model, leaves the base intact, and trains in minutes to hours. Ilura uses LoRA + DPO.

Question 5

What is Direct Preference Optimization (DPO)?

Accepted Answer

DPO is a preference-learning method: two model responses are compared, the one the user prefers is rewarded, the other penalized. Simpler than RLHF (no reward model, no PPO loop) — you operate directly on preference pairs.

Question 6

Which student models can I use?

Accepted Answer

Any open-weight model running on Ollama — Llama 3.1/3.2/3.3, Mistral, Qwen, Phi, Gemma. Ilura recommends a starting model based on your machine's memory and the agent's Bayesian maturity.

Question 7

When does training fire?

Accepted Answer

Two triggers: (1) maturity threshold — once Bayesian confidence is high enough, LoRA runs automatically; (2) manual — a "train now" button in the Training scene. A 5-minute tick loop checks both paths.

How does the forge-train-publish flow work?

01Kimler için?

02Nasıl çalışır?

03Ilura ile nasıl yapılır?

04Sık sorulan sorular