Spending and Class Size vs Performance: Findings Memo

Spending and Class Size vs Performance: Findings Memo
Updated: 2026-07-16

Scope
- Data: 2024-2025 processed English, Math, Science school rows.
- Outcome: Percent Proficient.
- Predictors of interest: Overall spending per student, Classroom spending per student, Median class size.
- Context controls (baseline continuity spec): Per-capita income, Adult BA+ rate, Regular attendance.
- Companion interpretation note: poverty-aware Evidence Lab checks (Students Experiencing Poverty) reinforce that hardship context is important, but do not elevate spending/class-size above BA+/attendance in project-level ordering.
- Weighting: students with reported Level 1-4 results.

Methods used
1) Weighted bivariate checks (correlation and simple slope).
2) Weighted multivariable OLS-style checks with controls.
3) Weighted ridge regression with 5-fold CV (to stabilize collinear spending variables).
4) District fixed-effects ridge (within-district demeaning) with CV and permutation importance.

Executive summary
- Your chart-level impression is largely confirmed:
  class size shows little robust predictive contribution once SES and attendance are included.
- Spending shows a detectable but inconsistent contribution:
  weak in Math, modest in English and Science in some model views, and sensitive to model form.
- Education and attendance remain the dominant signals in nearly all models.
- Income is generally intermediate: weaker than education and attendance, often similar to or above spending.
- Poverty-aware companion analyses suggest this ordering remains broadly intact: hardship context matters, but spending/class-size still read as secondary statewide signals in these cross-sectional models.

Key findings by factor
1) Median class size
- Bivariate association can appear positive.
- After controls and regularization, standardized effects are near zero in all subjects.
- Permutation importance is near zero in most settings.
- Interpretation: class size is not a strong standalone predictor in these school-level cross-sections.

2) Spending (overall/classroom)
- The two spending variables are highly collinear (weighted corr ~0.85 to 0.88).
- Ridge regularization was necessary to reduce coefficient instability.
- Combined spending signal (permutation drop in CV R2):
  - Non-FE ridge: English ~0.068, Math ~0.012, Science ~0.099.
  - FE ridge: English ~0.298, Math ~0.019, Science ~0.065.
- Interpretation: spending can carry a secondary signal, but strength is uneven across subjects and model frames.
  The English FE result is notably larger and should be treated as suggestive, not definitive, until replicated with lagged spending and alternative FE specifications.

3) Education, attendance, income
- Education and attendance consistently produce the largest predictive contribution.
- Math FE ridge particularly emphasizes attendance.
- Income remains meaningful but below education/attendance in most configurations.

How to interpret the apparent contradictions
- Sign flips and coefficient changes across models are expected under strong collinearity
  (especially between overall and classroom spending).
- Permutation importance is more stable for relative contribution than raw coefficient signs.
- FE models answer a different question (within-district differences) than pooled models
  (between + within combined), so effect sizes are not directly interchangeable.

What this means for the project
- It is reasonable to say:
  "At this stage, spending and class size do not show a clean, dominant relationship with proficiency comparable to adult education and attendance."
- It is not yet reasonable to say:
  "Spending has no effect."
  This data structure (single-year spending, high collinearity, cross-sectional school aggregates) limits causal interpretation.

Recommended next analyses (highest value first)
1) Lag-aligned resource models
- Match outcomes to prior-year or multi-year averaged spending/class-size where possible.
- Rationale: test outcomes may respond to resource conditions with delay.

2) FE + regularized nonlinear checks
- Add spline terms for spending and attendance under ridge/elastic-net.
- Rationale: linear effects may understate threshold or diminishing-return structure.

3) Spending decomposition
- Replace or augment the available spending measures with more policy-proximal components
  (instructional share, staffing mix, support services, etc., if available).
- Rationale: total spending may mask the relevant channel.

4) Robustness by school level (elementary/middle/high)
- Run the same ridge/FE stack by school level.
- Rationale: resource-performance coupling may differ by grade span.

Files produced
- Technical model report:
  docs/spending_class_size_ridge_report.txt
- Companion OLS-style report:
  docs/spending_class_size_effects_report.txt
- Scripts:
  scripts/report_spending_classsize_ridge.py
  scripts/report_spending_classsize_effects.py