Fine-Tuning,
Now Fully Reproducible.
The specialized fine-tuning system for Gaslamp. Seamlessly train Reasoning (GRPO), Vision, and Text models across Nvidia GPUs and Apple Silicon with zero friction and a 100% auditable gaslamp.md roadbook.
// Phase 1: Interview
[unsloth-buddy] Locking scope: SFT method, Qwen2.5-7B, Apple Silicon target.
[unsloth-buddy] Wrote decisions to gaslamp.md
// Phase 3: Hardware constraints
[unsloth-buddy] Environment: Darwin arm64. Unsloth unavailable.
[unsloth-buddy] Pivoting to mlx-tune backend.
[unsloth-buddy] VRAM check: Peak overhead ~4.2GB. Fits in 16GB.
// Phase 4: Training execution
[unsloth-buddy] Streaming loss...
Loss: 0.812 | LR: 2e-5 | Epoch: 0.1
█
The Reproducibility Contract.
Models without audit trails are just prototypes. Unsloth-Buddy documents every decision—from exact quantization settings to data parsing logic—in a structured, 11-section gaslamp.md roadbook.
Hand this file to any MLE (or a fresh agent session months later) to identically reproduce the project end-to-end.
Rank: 16 | Alpha: 32
Source: generated by src/prepare.py
Task-Aware Dashboards.
SSE streaming terminal UI. Whether you're tracking SFT loss curves or DPO chosen/rejected reward Deltas, the dashboard automatically adapts to your method.
Built for Empowered Teams.
We handle the infra and the math, so you can focus on the product value.
🚧 The 5-Point Interview
Generative AI is optimistic; it writes broken code happily. Unsloth-Buddy forces a requirements interview (Method, Model, Data, Hardware, Deploy) to lock scope before writing a single line of PyTorch.
🔍 Apple vs Nvidia Routing
Hardware routing happens at the skill level. It detects your silicon and generates either native Unsloth scripts or MLX-Tune scripts. No more "CUDA out of memory" on a MacBook.
🛡️ 2-Stage Env Checking
We probe the system Python standard library, then verify the specific virtual environment. If the Unsloth wheel mismatches your system, it blocks execution before wasting compute credits.
One Conversation. One Reproducible Model.
A fine-tuning system you talk to like a colleague. Describe what you want. It locks the scope, formats the data, checks your hardware, trains the model, and hands you an audit trail.
Interview
Locks in the method, model, data, hardware, and deploy target before writing a single line of code.
Data Strategy
Acquires and reformats your data into the exact schema the specific trainer (SFT, DPO, GRPO) requires.
Env & Math
Hardware scan blocks on misconfiguration. Calculates exact baseline vs LoRA overhead VRAM requirements.
Train
Generates the optimized Unsloth or MLX training script and streams loss metrics to the terminal UI.
Evaluate
Runs the fine-tuned adapter against the base model side-by-side so you can see the actual delta.
Export
Automatically merges adapters. Generates a reproducible load+generate script for deployment.