Skip to main content
Improve is Adaline’s reviewed prompt-improvement workflow. It turns production evidence into a proposed prompt version: Behaviors identify repeated patterns, Evaluators and Datasets score candidates, Auto Prompt Optimization explores prompt changes, and Review decides what ships. Use Improve when the issue is prompt-addressable: instructions, examples, variables, model settings, response schemas, or tool-use guidance. If the root cause is stale retrieval, a broken tool, missing metadata, or a backend bug, fix that layer first. To run an Improve cycle, Adaline primarily needs production logs, useful Behaviors, and a prompt stored in Adaline. Evaluators and datasets make the cycle stronger when you already have them, and Adaline can also generate draft evaluators and synthetic cases during the cycle. Improve is not currently silent auto-deploy. Adaline generates the evidence packet and candidates; a human or external AI Agent can review the diagnosis, diff, regressions, release impact and can choose to deploy via Adaline or externally. Improve page showing pending review, in progress, and history cycles

What a cycle does

An Improve cycle is attached to one prompt in one project. Improve stage provenance showing Behavior, Evaluator, Dataset, Prompt, and Review evidence that contributed to a cycle
StageWhat happens
BehaviorsSelects the repeated pattern or issue the cycle should improve.
EvalsUses authored and auto generated evaluators to score the baseline and candidates.
DatasetsBuilds validation coverage from linked datasets, production cases, and generated edge cases.
PromptsExplores candidate prompt snapshots and blocks unsafe or regressing options.
ReviewPackages the selected candidate with diff, scores, examples, cost, tokens, latency, and final actions.
The quality of a cycle depends on the quality of its input evidence. Specific Behaviors, representative logs with readable spans, a clear focus, and relevant evaluator or dataset coverage give Adaline better material to diagnose the issue and compare candidates. Weak or noisy evidence can still produce a candidate, but the review decision will be less confident. The cycle should make the release decision easier: what changed, why it changed, what improved, what regressed, and where it will deploy.

Trigger a Cycle

Choose the prompt, focus, behaviors, thoroughness, and reviewers.

Review a Cycle

Inspect diagnosis, diffs, scores, traffic examples, and final actions.

Auto Generated Evaluators

Understand generated checks created from production evidence.

Synthetic Datasets

Use generated cases and production traces as validation coverage.

Auto Prompt Optimization

Understand candidate exploration, safety gates, and prompt diffs.

Behaviors

Understand the behavior evidence Improve can target.