SWE-bench-verified Runs
Docker-only execution
All commands run inside Docker containers. Use the provided scripts.
SWE-bench-verified tasks are run via the Ray rollout path.
Each task is a dedicated Docker container defined in tasks/<instance_id>/.
Run a single task
Run the command below from the repo root in Docker:
./scripts/run_all_tasks.sh --task django__django-10914
Run N tasks
Run the command below from the repo root in Docker:
./scripts/run_all_tasks.sh --limit 5
Two-phase task build
The tasks/ directory is populated by a build step:
python -m w8_rl.cli build --limit 10
Only tasks with a Dockerfile are runnable. See tasks/README.md for details.
Next Steps
- Read the Architecture overview: Architecture Overview
- Run a Design2Code task: Design2Code Runs
- Review troubleshooting: Troubleshooting