Data Provenance and Processing Details
Technical note for readers who want a transparent account of data origins, linkage methods, and known limitations.
Updated March 21, 2026.
Data sources
-
ODE assessment group reports for 2024-25 (ELA, Math, Science).
- Assessment Group Reports landing page
- ELA section - base workbook
data/raw/ORSchoolTestData.xlsx (archived locally Jan. 12, 2026)
- Math section - base workbook
data/raw/ORSchoolTestDataMath.xlsx (archived locally Jan. 14, 2026)
- Science section - base workbook
data/raw/ORSchoolTestDataScience.xlsx (archived locally Jan. 14, 2026)
- pagr_schools_ela_tot_raceethnicity_2425.xlsx (archived locally Feb. 1, 2026)
- pagr_schools_ela_tot_gnd_2425.xlsx (archived locally Mar. 15, 2026)
- pagr_schools_ela_tot_ext_swa_swd_2425.xlsx (archived locally Mar. 15, 2026)
- pagr_schools_ela_tot_elp_ine_mig_tag_2425.xlsx (archived locally Mar. 15, 2026)
- pagr_schools_ela_tot_sep_hom_mlc_sfc_2425.xlsx (archived locally Mar. 15, 2026)
- pagr_schools_math_tot_raceethnicity_2425.xlsx (archived locally Feb. 1, 2026)
- pagr_schools_math_tot_gnd_2425.xlsx (archived locally Mar. 15, 2026)
- pagr_schools_math_tot_ext_swa_swd_2425.xlsx (archived locally Mar. 15, 2026)
- pagr_schools_math_tot_elp_ine_mig_tag_2425.xlsx (archived locally Mar. 15, 2026)
- pagr_schools_math_tot_sep_hom_mlc_sfc_2425.xlsx (archived locally Mar. 15, 2026)
- pagr_schools_science_tot_studentgroups_2425.xlsx (archived locally Jan. 14, 2026)
- pagr_schools_science_raceethnicity_2425.xlsx (archived locally Feb. 1, 2026)
-
ODE Regular Attenders file for attendance/absence context (2024-25).
-
ODE class-size workbook (2024-25) and ODE school-level spending metadata (2023-24).
-
ODE report card media aggregate (school-level context percentages).
- U.S. Census geographies and ACS 5-year neighborhood context metrics (income, education, poverty, labor).
- NCES EDGE locale classifications (city/suburb/town/rural).
How the data is processed
- Assessment workbook families are parsed into CSV and rows with numeric participant counts are retained for analysis.
- For ELA, Math, and Science, the base subject workbooks are combined with supplemental 2024-25 assessment-group workbooks; exact overlapping row keys are treated as validation checks, only genuinely new subgroup rows are appended, and shared total-population rows remained unchanged wherever they were present in that merge.
- School addresses are added, then geocoded to Census tract geography using Census batch geocoding with fallback passes.
- Tract-level ACS context variables are appended and then propagated across subject datasets using school-level keys.
- Attendance, class size, spending, school type, virtual status, and locale labels are merged from separate source files.
- ODE media aggregate school percentages are merged by
(District ID, School ID), with unique-school-ID fallback checks and sample name validation.
- Quality checks and merge logs are maintained to support reproducibility and external review.
ODE media aggregate enrichment (2024-25 datasets)
The following school-level percentage fields were added to the 2024-25 processed ELA, Math, and Science datasets:
Students with Disabilities, Students Experiencing Poverty, Required Childhood Vaccinations, Average Teacher Retention Rate, and Experienced Teachers.
College Going 2022-23 was also merged but has narrower grade-span relevance.
- Primary merge key:
(District ID, School ID).
- Fallback key: unique
School ID when district pairing is unavailable.
- Verification: duplicate/conflict checks, parse-anomaly checks, and random sample name checks.
- Scope: applied to 2024-25 datasets only (not the 2018-19 math comparison file).
Coverage results from source-vs-processed audit
A direct comparison of ODE source rows against processed files found no loss of rows where participant counts are numeric.
Coverage gaps come from ODE suppression/no-test markers in source files.
| Subject |
Total-pop school coverage |
Total-pop row coverage |
Known represented students |
| ELA |
1,164 / 1,250 (93.1%) |
3,504 / 4,067 (86.2%) |
261,061 |
| Math |
1,130 / 1,250 (90.4%) |
3,413 / 4,067 (83.9%) |
254,385 |
| Science |
1,129 / 1,226 (92.1%) |
2,442 / 2,794 (87.4%) |
219,872 |
Known deficiencies (current datasets)
Small but non-random gaps remain in contextual fields. The largest source is suppressed assessment rows and unmatched attendance rows.
| Dataset |
Missing address schools |
Missing tract/ACS schools |
Missing attendance schools (>=1 unmatched row) |
Missing locale schools |
| ELA |
6 |
6 |
105 |
1 |
| Math |
6 |
6 |
96 |
1 |
| Science |
8 |
8 |
132 |
14 |
Attendance counts above mean schools with at least one unmatched attendance row, not schools with no attendance data.
Fully missing attendance is much smaller (ELA: 4 schools, Math: 3, Science: 5), and virtual/online programs account for a sizable share
of attendance-gap schools (ELA: 32/105, Math: 27/96, Science: 47/132), so statewide patterns are generally less sensitive than raw counts suggest.
Separately, 17 of 20 schools with missing address and/or locale are virtual/online; the 9 schools missing addresses cannot be linked to tract SES
and are excluded from SES-association analyses.
For ODE media aggregate fields, missingness is low for most columns, but College Going 2022-23 is intentionally sparse and high-school-oriented.
It should be interpreted as a specialized indicator, not a universal school-level context field.
Important disclosure on timing alignment
School type, virtual status, and spending metadata currently come from ODE's 2023-24 school-level spending file,
while assessment outcomes are from 2024-25. This is a known temporal mismatch and will be replaced when a synchronized
2024-25 spending metadata file is available.
For audit purposes, detailed script and merge-log references are available on request.