Oregon School Assessment

Data Provenance and Processing Details

Technical note for readers who want a transparent account of data origins, linkage methods, and known limitations. Updated March 21, 2026.

Data sources

How the data is processed

  • Assessment workbook families are parsed into CSV and rows with numeric participant counts are retained for analysis.
  • For ELA, Math, and Science, the base subject workbooks are combined with supplemental 2024-25 assessment-group workbooks; exact overlapping row keys are treated as validation checks, only genuinely new subgroup rows are appended, and shared total-population rows remained unchanged wherever they were present in that merge.
  • School addresses are added, then geocoded to Census tract geography using Census batch geocoding with fallback passes.
  • Tract-level ACS context variables are appended and then propagated across subject datasets using school-level keys.
  • Attendance, class size, spending, school type, virtual status, and locale labels are merged from separate source files.
  • ODE media aggregate school percentages are merged by (District ID, School ID), with unique-school-ID fallback checks and sample name validation.
  • Quality checks and merge logs are maintained to support reproducibility and external review.

ODE media aggregate enrichment (2024-25 datasets)

The following school-level percentage fields were added to the 2024-25 processed ELA, Math, and Science datasets: Students with Disabilities, Students Experiencing Poverty, Required Childhood Vaccinations, Average Teacher Retention Rate, and Experienced Teachers. College Going 2022-23 was also merged but has narrower grade-span relevance.

  • Primary merge key: (District ID, School ID).
  • Fallback key: unique School ID when district pairing is unavailable.
  • Verification: duplicate/conflict checks, parse-anomaly checks, and random sample name checks.
  • Scope: applied to 2024-25 datasets only (not the 2018-19 math comparison file).

Coverage results from source-vs-processed audit

A direct comparison of ODE source rows against processed files found no loss of rows where participant counts are numeric. Coverage gaps come from ODE suppression/no-test markers in source files.

Subject Total-pop school coverage Total-pop row coverage Known represented students
ELA 1,164 / 1,250 (93.1%) 3,504 / 4,067 (86.2%) 261,061
Math 1,130 / 1,250 (90.4%) 3,413 / 4,067 (83.9%) 254,385
Science 1,129 / 1,226 (92.1%) 2,442 / 2,794 (87.4%) 219,872

Known deficiencies (current datasets)

Small but non-random gaps remain in contextual fields. The largest source is suppressed assessment rows and unmatched attendance rows.

Dataset Missing address schools Missing tract/ACS schools Missing attendance schools (>=1 unmatched row) Missing locale schools
ELA 6 6 105 1
Math 6 6 96 1
Science 8 8 132 14

Attendance counts above mean schools with at least one unmatched attendance row, not schools with no attendance data. Fully missing attendance is much smaller (ELA: 4 schools, Math: 3, Science: 5), and virtual/online programs account for a sizable share of attendance-gap schools (ELA: 32/105, Math: 27/96, Science: 47/132), so statewide patterns are generally less sensitive than raw counts suggest. Separately, 17 of 20 schools with missing address and/or locale are virtual/online; the 9 schools missing addresses cannot be linked to tract SES and are excluded from SES-association analyses.

For ODE media aggregate fields, missingness is low for most columns, but College Going 2022-23 is intentionally sparse and high-school-oriented. It should be interpreted as a specialized indicator, not a universal school-level context field.

Important disclosure on timing alignment

School type, virtual status, and spending metadata currently come from ODE's 2023-24 school-level spending file, while assessment outcomes are from 2024-25. This is a known temporal mismatch and will be replaced when a synchronized 2024-25 spending metadata file is available.

For audit purposes, detailed script and merge-log references are available on request.