Dataset
ScienceQA-32M
32M science Q&A pairs across physics, chemistry, biology, and math — verified for LLM training.
32,000,000
Pairs
Physics, Chemistry, Biology, Math, Earth Science
Subjects
60%+ of solutions
Human-authored
EN, ZH (partial)
Languages
Thirty-two million curated science question-answer pairs sourced from K-12 and undergraduate study platforms. Each pair includes the original question, a step-by-step worked solution, the final answer, and subject/topic metadata. Used to fine-tune LLMs for STEM tutoring, reasoning, and answer-grounded generation.
Data sample
What a record looks like
Sample Q&A pair
JSONIllustrative — full sample available under NDA
{
"id": "sci_8240412",
"subject": "Physics",
"topic": "Kinematics",
"question": "A ball is dropped from 45 m. How long until it hits the ground?",
"solution": "Use h = 1/2 g t^2. Solve for t: t = sqrt(2h/g) = sqrt(90/9.8) ≈ 3.03 s.",
"answer": "≈ 3.03 seconds"
}