Fine-Tuning,
Now Fully Reproducible.

The specialized fine-tuning system for Gaslamp. Seamlessly train Reasoning (GRPO), Vision, and Text models across Nvidia GPUs and Apple Silicon with zero friction and a 100% auditable gaslamp.md roadbook.

gaslamp/unsloth-buddy[ACTIVE]
claude ~/project $ /unsloth-buddy I need a model to parse medical records. I have a MacBook Air.

// Phase 1: Interview
[unsloth-buddy] Locking scope: SFT method, Qwen2.5-7B, Apple Silicon target.
[unsloth-buddy] Wrote decisions to gaslamp.md

// Phase 3: Hardware constraints
[unsloth-buddy] Environment: Darwin arm64. Unsloth unavailable.
[unsloth-buddy] Pivoting to mlx-tune backend.
[unsloth-buddy] VRAM check: Peak overhead ~4.2GB. Fits in 16GB.

// Phase 4: Training execution
[unsloth-buddy] Streaming loss...
Loss: 0.812 | LR: 2e-5 | Epoch: 0.1

The Reproducibility Contract.

Models without audit trails are just prototypes. Unsloth-Buddy documents every decision—from exact quantization settings to data parsing logic—in a structured, 11-section gaslamp.md roadbook.

Hand this file to any MLE (or a fresh agent session months later) to identically reproduce the project end-to-end.

# gaslamp.md
## 3. Model & LoRA
Base: mlx-community/Qwen2.5-7B
Rank: 16 | Alpha: 32
📖 Learn: Rank 16 provides enough capacity for specialized domain terminology without excessive VRAM overhead.
## 4. Data Strategy
Format: ChatML
Source: generated by src/prepare.py
📖 Learn: The JSONL is mapped to ChatML dynamically to prevent padding token bleed during GRPO reward generation.

Task-Aware Dashboards.

SSE streaming terminal UI. Whether you're tracking SFT loss curves or DPO chosen/rejected reward Deltas, the dashboard automatically adapts to your method.

Memory Breakdown
Total Peak VRAM 10.2 GB
■ Base Model    ■ LoRA Overhead
GRPO Dashboard (Terminal)
Reward ± StdDev Confidence Band

Built for Empowered Teams.

We handle the infra and the math, so you can focus on the product value.

🚧 The 5-Point Interview

Generative AI is optimistic; it writes broken code happily. Unsloth-Buddy forces a requirements interview (Method, Model, Data, Hardware, Deploy) to lock scope before writing a single line of PyTorch.

🔍 Apple vs Nvidia Routing

Hardware routing happens at the skill level. It detects your silicon and generates either native Unsloth scripts or MLX-Tune scripts. No more "CUDA out of memory" on a MacBook.

🛡️ 2-Stage Env Checking

We probe the system Python standard library, then verify the specific virtual environment. If the Unsloth wheel mismatches your system, it blocks execution before wasting compute credits.

One Conversation. One Reproducible Model.

A fine-tuning system you talk to like a colleague. Describe what you want. It locks the scope, formats the data, checks your hardware, trains the model, and hands you an audit trail.

01

Interview

Locks in the method, model, data, hardware, and deploy target before writing a single line of code.

02

Data Strategy

Acquires and reformats your data into the exact schema the specific trainer (SFT, DPO, GRPO) requires.

03

Env & Math

Hardware scan blocks on misconfiguration. Calculates exact baseline vs LoRA overhead VRAM requirements.

04

Train

Generates the optimized Unsloth or MLX training script and streams loss metrics to the terminal UI.

05

Evaluate

Runs the fine-tuned adapter against the base model side-by-side so you can see the actual delta.

06

Export

Automatically merges adapters. Generates a reproducible load+generate script for deployment.