EDM-ARS & LSAR — Educational Data Mining Research Tools

Overview

What It Does

Inspired by FARS, EDM-ARS is an open-source, domain-specific multi-agent LLM pipeline that automates the complete workflow of educational data mining research, starting from prediction tasks as its first supported paradigm. Given the HSLS:09 dataset and a research prompt, it formulates a research question, engineers features, trains and compares multiple ML models, runs SHAP explainability and subgroup fairness analysis, retrieves real citations via the Semantic Scholar API with three-layer verification (exact title match, Jaccard similarity, CrossRef cross-validation), and produces a complete ACM sigconf-formatted LaTeX paper — with a built-in Critic agent that enforces methodological rigor through automated peer review and targeted revision loops.

🎓

Domain-Specific Design

Built around the HSLS:09 longitudinal dataset with a three-tier variable registry: ~95 hand-curated Tier 1 variables with educational annotations, auto-generated Tier 2 variables, and Tier 3 exclusions (weights, IDs) enforced programmatically. v1.2 adds multilevel analysis with automatic school-clustering detection and intraclass correlation for HSLS:09's nested structure.

📄

Real Academic Output

A new OutlineAgent creates adaptive paper outlines before writing, then the Writer fills prose into an ACM sigconf LaTeX skeleton with preamble protection — preventing LLM outputs from corrupting formatting, ensuring structurally correct output every time.

🔁

Critic-Gated Revision Loop

After analysis, the Critic reviews all prior agents' outputs and can route targeted revisions back to any stage — up to 2 cycles — before writing begins. v1.2 strengthens gap-driven research questions with novelty requirements and theoretical motivation.

💾

Checkpoint & Resume

Pipeline state is serialized to checkpoint.json after every stage. Interrupted runs resume from the last completed stage — no work is lost.

Architecture

Six-Agent Pipeline

🔍 1

ProblemFormulator

Searches Semantic Scholar, scopes the research question & hypothesis

🛠 2

DataEngineer

Cleans features, outputs test_protected.csv with pre-encoding subgroup labels for fairness analysis

📊 3

Analyst

Trains model battery (LR, RF, XGBoost, ElasticNet, MLP, Stacking), runs SHAP & subgroup analysis in phased execution

📝 5

OutlineAgent

Creates adaptive paper outlines with section-level planning, adjusting structure based on actual results

✍️ 6

Writer

Fills structured results into ACM sigconf LaTeX template with preamble protection — template-based, never free-form

⚖️ Agent 4 · Gatekeeper

Critic

Reviews all prior agents' outputs for methodological soundness.
Issues PASS / REVISE / ABORT verdicts.

claude-opus — highest-tier model

🔄

Revision loop — on REVISE, targeted instructions are routed back to ProblemFormulator, DataEngineer, Analyst, or OutlineAgent selectively. Up to 2 cycles before the Writer is unblocked regardless.

Features

Key Capabilities

🤖

6 Specialized Agents

Coordinated by a state-machine orchestrator. Each agent has its own system prompt, temperature, and model tier (Opus for Critic, Sonnet for all others). v1.2 adds the OutlineAgent for adaptive paper planning.

📜

End-to-End Automation

From a raw CSV and a research prompt to a compiled ACM LaTeX paper — with real citations, methodology validation, and SHAP explainability figures.

🛡

Self-Healing Pipeline

Contract validation at every stage boundary. Auto-patching for classifiable errors (SHAP failure, dtype mismatch, missing column) before falling back to LLM repair.

📚

Live Academic Citations

The ProblemFormulator queries the Semantic Scholar API with exponential-backoff retry logic to retrieve and validate real, current citations.

⚗️

6-Model Battery

Logistic Regression, Random Forest, XGBoost, ElasticNet, MLP, and a Stacking Ensemble are trained, compared, and reported with SHAP explainability. v1.2 adds a Model Quality Gate (AUC ≥ 0.60 for classification, R² ≥ 0.05 for regression) before computing SHAP.

🐳

Docker Sandboxing

LLM-generated analysis code executes inside a Docker sandbox (network-disabled). Gracefully falls back to subprocess when Docker is unavailable.

📋

Outline-First Writing New

A dedicated OutlineAgent creates adaptive section-level outlines before prose generation, adjusting structure based on model convergence, surprising predictors, and subgroup disparities.

🏫

Multilevel Analysis New

Automatic school-clustering detection and intraclass correlation computation for HSLS:09's nested data structure, supporting hierarchical modeling insights.

🔬

Sensitivity Analysis New

Drop-and-retrain protocols for high-missingness variables, testing robustness of findings against missing data patterns and flagging potential concerns.

Companion Tool

LSAR — Learning Science Auto-Reviewer

Inspired by PaperReview.ai, LSAR is an automated, agentic paper reviewer designed for learning science conferences. It evaluates manuscripts across 8 quality dimensions and supports 4 major venues — creating a generate-then-review feedback loop with EDM-ARS.

🎯

Relevance

Alignment with venue scope and topic fit

💡

Novelty

Originality and contribution beyond prior work

📖

Theoretical Grounding

Strength of conceptual framework and motivation

🔧

Methodological Rigor

Soundness of research design and analysis

📊

Empirical Support

Quality and interpretation of evidence

🌟

Significance

Impact on the field and practical implications

⚖️

Ethics & Fairness

Ethical considerations and bias awareness

✏️

Communication

Clarity, structure, and readability of writing

AIED EDM L@S LAK

🔎

Venue Detection — automatically classifies papers for their intended conference or allows manual specification

📦

Batch Processing — review multiple papers in a single run from directories or CSV manifests

📄

Multiple Formats — generates reviews in Markdown, JSON, and PDF reports with modular pipeline stages

🔄

EDM-ARS generates a paper → LSAR reviews it across 8 dimensions → feedback drives targeted revisions — a closed-loop generate-then-review cycle for automated research refinement.

View LSAR on GitHub

Publications

Papers & Reports

Technical reports and demo papers generated by EDM-ARS. Demo papers are full ACM-formatted manuscripts produced by the pipeline on HSLS:09 prediction tasks.

Technical Report March 2026

EDM-ARS: A Domain-Specific Multi-Agent System for Automated Educational Data Mining Research

Chenguang Pan, Zhou Zhang, Weixuan Xiao, Chengyuan Yao

PDF →

Demo Paper LSAR Reviewed March 2026

Fairness-Through-Unawareness Does Not Produce Equitable Predictions: Evidence from HSLS:09 on Subgroup Disparities in College Attendance Prediction

EDM-ARS (Automated) · Target Venue: EDM

Paper PDF → LSAR Review →

Demo Paper LSAR Reviewed March 2026

Do Ninth-Grade Math and Science Identity Predict STEM Degree Non-Completion? Evidence from HSLS:09 and Machine Learning

EDM-ARS (Automated) · Target Venue: EDM

Paper PDF → LSAR Review →

Built With

Tech Stack

Core Pipeline

Python 3.11 Anthropic API Claude Sonnet Claude Opus (Critic) Docker PyYAML

ML & Analysis

scikit-learn XGBoost SHAP pandas matplotlib seaborn

Data & Literature

HSLS:09 Dataset Semantic Scholar API 95-var Tier 1 Registry

Output

ACM sigconf LaTeX Template-Based Generation SHAP Figures

Scope & Roadmap

v1.2 & Beyond

The current release targets prediction tasks on HSLS:09 with six major quality improvements in v1.2. A multi-phase roadmap will expand EDM-ARS into a full research automation platform for educational data science.

✅ v1.2 — Completed

Prediction Pipeline — end-to-end ML prediction workflow on HSLS:09

Self-Healing Architecture — contract validation, phased Analyst execution, error taxonomy & auto-patching

Docker Sandbox — isolated code execution with subprocess fallback

LaTeX Template System — fixed ACM skeleton with %%PLACEHOLDER%% markers

Checkpoint & Resume — pipeline state persisted after every stage

Outline-First Paper Generation — OutlineAgent creates adaptive outlines before writing

Template Preamble Protection — prevents LLM outputs from corrupting ACM LaTeX formatting

Model Quality Gate — AUC ≥ 0.60 / R² ≥ 0.05 thresholds before SHAP computation

Multilevel Analysis — school clustering detection and intraclass correlation for nested data

Gap-Driven Research Questions — strengthened novelty requirements and theoretical motivation

Sensitivity Analysis — drop-and-retrain protocols for high-missingness variables

🔭 Future Roadmap

Phase 1 — Polymorphism Refactor — extract TaskTemplate & DatasetAdapter abstract base classes; decouple pipeline from HSLS:09 specifics

Phase 2 — Findings Memory & Multi-Branch Ideation — persistent FindingsMemory store across runs; ProblemFormulator generates N diverse candidate specs with novelty scoring

Phase 3 — Causal Inference — propensity score matching, IPW, TMLE, heterogeneous treatment effects, and optimal treatment regimes via econml / dowhy

Phase 4 — Multi-Dataset & Transfer Learning — dataset adapters for ELS:2002, PISA 2022, ASSISTments; cross-population and cross-wave transfer experiments

Phase 5 — Controlled Human Evaluation — blinded comparison of EDM-ARS papers vs. matched human-authored papers on quality, correctness, and efficiency metrics

Educational Data Mining
Automated Research System

What It Does

Domain-Specific Design

Real Academic Output

Critic-Gated Revision Loop

Checkpoint & Resume

Six-Agent Pipeline

Key Capabilities

Demo Run Stats

LSAR — Learning Science Auto-Reviewer

Papers & Reports

EDM-ARS: A Domain-Specific Multi-Agent System for Automated Educational Data Mining Research

Fairness-Through-Unawareness Does Not Produce Equitable Predictions: Evidence from HSLS:09 on Subgroup Disparities in College Attendance Prediction

Do Ninth-Grade Math and Science Identity Predict STEM Degree Non-Completion? Evidence from HSLS:09 and Machine Learning

Tech Stack

v1.2 & Beyond

Educational Data MiningAutomated Research System

What It Does

Domain-Specific Design

Real Academic Output

Critic-Gated Revision Loop

Checkpoint & Resume

Six-Agent Pipeline

Key Capabilities

Demo Run Stats

LSAR — Learning Science Auto-Reviewer

Papers & Reports

EDM-ARS: A Domain-Specific Multi-Agent System for Automated Educational Data Mining Research

Fairness-Through-Unawareness Does Not Produce Equitable Predictions: Evidence from HSLS:09 on Subgroup Disparities in College Attendance Prediction

Do Ninth-Grade Math and Science Identity Predict STEM Degree Non-Completion? Evidence from HSLS:09 and Machine Learning

Tech Stack

v1.2 & Beyond

Educational Data Mining
Automated Research System