Rollout Algorithms

Docker-only execution

All commands run inside Docker containers. Use the provided scripts.

W8-RL provides multiple rollout schedulers to maximize throughput and task diversity. These are configured at runtime via --scheduler-type and --horizon-policy.

Schedulers

FIFO: baseline queue
SHDS: short-horizon diversified scheduler
GRPO: group-based sampling

Horizon policies

Fixed: constant max_steps
Bucketed: different max_steps per difficulty bucket

Early stop

Early stop policies terminate hopeless episodes early based on browser signals. This increases throughput without reducing signal quality.

Where it lives

w8_rl/rollout/schedulers/
w8_rl/rollout/horizon.py
w8_rl/rollout/early_stop.py

Example

Run the command below from the repo root in Docker:

python -m w8_rl.rollout.coordinator_main \
  --scheduler-type shds \
  --horizon-policy bucketed \
  --max-episodes 10

All rollout execution still runs inside Docker.

Next Steps

Read the Architecture overview: Architecture Overview
Run a Design2Code task: Design2Code Runs
Review troubleshooting: Troubleshooting

Schedulers​

Horizon policies​

Early stop​

Where it lives​

Example​

Next Steps​