Fine-Tuning,
Now Fully Reproducible.

The specialized fine-tuning system for Gaslamp. Seamlessly train Reasoning (GRPO), Vision, and Text models across Nvidia GPUs and Apple Silicon with zero friction and a 100% auditable gaslamp.md roadbook.

Install the Skill Samples Roadbook

gaslamp/unsloth-buddy[ACTIVE]

claude ~/project $ /unsloth-buddy I need a model to parse medical records. I have a MacBook Air.

// Phase 1: Interview
[unsloth-buddy] Locking scope: SFT method, Qwen2.5-7B, Apple Silicon target.
[unsloth-buddy] Wrote decisions to gaslamp.md

// Phase 3: Hardware constraints
[unsloth-buddy] Environment: Darwin arm64. Unsloth unavailable.
[unsloth-buddy] Pivoting to mlx-tune backend.
[unsloth-buddy] VRAM check: Peak overhead ~4.2GB. Fits in 16GB.

// Phase 5.5: Demo Builder
[unsloth-buddy] Evaluating cross-entropy...
[unsloth-buddy] Generated static HTML comparison board 'demos/qwen2.5-medical/index.html'

// Phase 7: Reflection
[unsloth-buddy] Synthesizing memory snapshot...
[unsloth-buddy] +1 lesson saved to ~/.gaslamp/lessons.md
█

user.md

lessons.md

skills.md

Frozen Snapshot

Injected at Phase 0.gaslamp_context/

🧠 Self-Evolving Memory.

The second time you fine-tune, it already knows your adapter path convention. The agent learns from your hardware constraints, hyperparameter tweaks, and setup requirements.

In Phase 7, it captures these "gotchas" into ~/.gaslamp/ as reusable skills and lessons. Every new project injects a Frozen Snapshot at startup natively applying your past knowledge.

"Because debugging an OOM error once is enough."

The Reproducibility Contract.

Models without audit trails are just prototypes. Unsloth-Buddy documents every decision—from exact quantization settings to data parsing logic—in a structured, 11-section gaslamp.md roadbook.

Hand this file to any MLE (or a fresh agent session months later) to identically reproduce the project end-to-end.

# gaslamp.md

## 3. Model & LoRA

Base: mlx-community/Qwen2.5-7B
Rank: 16 | Alpha: 32

📖 Learn: Rank 16 provides enough capacity for specialized domain terminology without excessive VRAM overhead.

## 4. Data Strategy

Format: ChatML
Source: generated by src/prepare.py

📖 Learn: The JSONL is mapped to ChatML dynamically to prevent padding token bleed during GRPO reward generation.

Task-Aware Dashboards.

SSE streaming terminal UI. Whether you're tracking SFT loss curves or DPO chosen/rejected reward Deltas, the dashboard automatically adapts to your method.

Memory Breakdown

Total Peak VRAM 10.2 GB

■ Base Model ■ LoRA Overhead

GRPO Dashboard (Terminal)

Reward ± StdDev Confidence Band

Built for Empowered Teams.

We handle the infra and the math, so you can focus on the product value.

🚧 The 2-Question Interview

Generative AI is optimistic; it writes broken code happily. Unsloth-Buddy forces a simplified 2-question interview (Task + Data) capturing scope, method, and audience before jumping to PyTorch.

🔍 Apple vs Nvidia Routing

Hardware routing happens at the skill level. It detects your silicon and generates either native Unsloth scripts or MLX-Tune scripts. No more "CUDA out of memory" on a MacBook.

👁️ Native Vision SFT

Train Qwen2.5-VL and Gemma 4 Vision directly on Apple Silicon M-series chips via native integration with `mlx-vlm`. Plus static VLM HTML demos.

One Conversation. An 8-Phase Lifecycle.

Describe what you want. The agent locks scope, formats data, checks hardware, trains, generates demo UI, handles local deploy, and stores lessons.

Init

Creates project directory and injects the read-only frozen memory snapshot from past sessions.

Interview

2-questions locking task + data; capturing domain/audience for demo building.

Data Strategy

Acquires and reformats your data into the exact schema required natively.

Env & Math

Hardware scan, virtual environment block, and peak VRAM estimation calculation.

Train

Generates logic and executes SFT/DPO/GRPO/Vision models natively and securely.

Demo

Evaluate models side-by-side on an automatic portable HTML viewer dashboard.

Deploy

Export combined adapters or auto-quantize native GGUF and host via llama.cpp directly.

Reflect

Synthesizes run lessons, setup traps, and scenario recipes into a reusable memory footprint.

Fine-Tuning, Now Fully Reproducible.