subjectId
inScope
Data-science programming benchmark with 1,000 problems across NumPy, Pandas, SciPy, scikit-learn, PyTorch, TensorFlow, and Matplotlib. Scored by per-problem pass-rate.
outOfScope
General code-generation, agentic tool-use, repository-scale tasks, and non-Python languages.
outOfScopeReasonIds