Tinker RL Training
Docker-only execution
All commands run inside Docker containers. Use the provided scripts.
Tinker training performs real gradient updates using GRPO-style group rollouts.
All training runs inside ray-worker and uses EnvActor + emulator rewards.
1) Quick sanity run
Run the command below from the repo root in Docker:
./scripts/train_design2code_tinker.sh \
--task-dir tasks \
--limit 2 \
--group-size 2 \
--groups-per-batch 1 \
--num-steps 1 \
--max-tokens 1024 \
--max-total-tokens 3000000 \
--model Qwen/Qwen3-4B-Instruct-2507
2) Budgeted training template
Run the command below from the repo root in Docker:
./scripts/train_design2code_tinker.sh \
--task-dir tasks \
--limit 12 \
--model Qwen/Qwen3-4B-Instruct-2507 \
--learning-rate 4e-5 \
--lora-rank 32 \
--group-size 4 \
--groups-per-batch 2 \
--max-tokens 4096 \
--num-steps 10 \
--max-total-tokens 3000000
Outputs
/home/ray/app/output/tinker_train/(metrics, checkpoints)/home/ray/app/output/tinker_train/trained_sampler_path.txt
Evaluate the trained model
Tinker uses a sampler path, not the raw model ID. Use the sampler path from training:
./scripts/run_design2code_tinker.sh \
--task-dir tasks \
--limit 12 \
--policy tinker \
--model $(cat /home/ray/app/output/tinker_train/trained_sampler_path.txt) \
--tokenizer-model Qwen/Qwen3-4B-Instruct-2507 \
--episodes 3 \
--max-tokens 4096 \
--max-total-tokens 3000000
Tokenizer cache
Use a persistent cache to avoid repeated downloads:
export HF_HOME=/home/ray/.cache/huggingface
If HF_HUB_ENABLE_HF_TRANSFER=1 is set, hf_transfer must be installed. Otherwise unset the flag.
Next Steps
- Read the Architecture overview: Architecture Overview
- Run a Design2Code task: Design2Code Runs
- Review troubleshooting: Troubleshooting