๐Ÿช EDM-ARS  ยท  v1.2 ยท Open Source

Educational Data Mining
Automated Research System

A multi-agent LLM pipeline that turns a dataset and a research prompt into a complete, reviewer-ready academic paper โ€” with automated quality review via LSAR.

6
Specialized Agents
7/10
Critic Quality Score
10-15 min
Demo Runtime
$2โ€“$5
Typical API Cost Per Paper

Overview

What It Does

Inspired by FARS, EDM-ARS is an open-source, domain-specific multi-agent LLM pipeline that automates the complete workflow of educational data mining research, starting from prediction tasks as its first supported paradigm. Given the HSLS:09 dataset and a research prompt, it formulates a research question, engineers features, trains and compares multiple ML models, runs SHAP explainability and subgroup fairness analysis, retrieves real citations via the Semantic Scholar API with three-layer verification (exact title match, Jaccard similarity, CrossRef cross-validation), and produces a complete ACM sigconf-formatted LaTeX paper โ€” with a built-in Critic agent that enforces methodological rigor through automated peer review and targeted revision loops.

๐ŸŽ“

Domain-Specific Design

Built around the HSLS:09 longitudinal dataset with a three-tier variable registry: ~95 hand-curated Tier 1 variables with educational annotations, auto-generated Tier 2 variables, and Tier 3 exclusions (weights, IDs) enforced programmatically. v1.2 adds multilevel analysis with automatic school-clustering detection and intraclass correlation for HSLS:09's nested structure.

๐Ÿ“„

Real Academic Output

A new OutlineAgent creates adaptive paper outlines before writing, then the Writer fills prose into an ACM sigconf LaTeX skeleton with preamble protection โ€” preventing LLM outputs from corrupting formatting, ensuring structurally correct output every time.

๐Ÿ”

Critic-Gated Revision Loop

After analysis, the Critic reviews all prior agents' outputs and can route targeted revisions back to any stage โ€” up to 2 cycles โ€” before writing begins. v1.2 strengthens gap-driven research questions with novelty requirements and theoretical motivation.

๐Ÿ’พ

Checkpoint & Resume

Pipeline state is serialized to checkpoint.json after every stage. Interrupted runs resume from the last completed stage โ€” no work is lost.


Six-Agent Pipeline

๐Ÿ” 1
ProblemFormulator
Searches Semantic Scholar, scopes the research question & hypothesis
๐Ÿ›  2
DataEngineer
Cleans features, outputs test_protected.csv with pre-encoding subgroup labels for fairness analysis
๐Ÿ“Š 3
Analyst
Trains model battery (LR, RF, XGBoost, ElasticNet, MLP, Stacking), runs SHAP & subgroup analysis in phased execution
๐Ÿ“ 5
OutlineAgent
Creates adaptive paper outlines with section-level planning, adjusting structure based on actual results
โœ๏ธ 6
Writer
Fills structured results into ACM sigconf LaTeX template with preamble protection โ€” template-based, never free-form
โš–๏ธ  Agent 4 ยท Gatekeeper
Critic
Reviews all prior agents' outputs for methodological soundness.
Issues PASS / REVISE / ABORT verdicts.
claude-opus โ€” highest-tier model
๐Ÿ”„

Revision loop โ€” on REVISE, targeted instructions are routed back to ProblemFormulator, DataEngineer, Analyst, or OutlineAgent selectively. Up to 2 cycles before the Writer is unblocked regardless.


Features

Key Capabilities

๐Ÿค–

6 Specialized Agents

Coordinated by a state-machine orchestrator. Each agent has its own system prompt, temperature, and model tier (Opus for Critic, Sonnet for all others). v1.2 adds the OutlineAgent for adaptive paper planning.

๐Ÿ“œ

End-to-End Automation

From a raw CSV and a research prompt to a compiled ACM LaTeX paper โ€” with real citations, methodology validation, and SHAP explainability figures.

๐Ÿ›ก

Self-Healing Pipeline

Contract validation at every stage boundary. Auto-patching for classifiable errors (SHAP failure, dtype mismatch, missing column) before falling back to LLM repair.

๐Ÿ“š

Live Academic Citations

The ProblemFormulator queries the Semantic Scholar API with exponential-backoff retry logic to retrieve and validate real, current citations.

โš—๏ธ

6-Model Battery

Logistic Regression, Random Forest, XGBoost, ElasticNet, MLP, and a Stacking Ensemble are trained, compared, and reported with SHAP explainability. v1.2 adds a Model Quality Gate (AUC ≥ 0.60 for classification, R² ≥ 0.05 for regression) before computing SHAP.

๐Ÿณ

Docker Sandboxing

LLM-generated analysis code executes inside a Docker sandbox (network-disabled). Gracefully falls back to subprocess when Docker is unavailable.

๐Ÿ“‹

Outline-First Writing New

A dedicated OutlineAgent creates adaptive section-level outlines before prose generation, adjusting structure based on model convergence, surprising predictors, and subgroup disparities.

๐Ÿซ

Multilevel Analysis New

Automatic school-clustering detection and intraclass correlation computation for HSLS:09's nested data structure, supporting hierarchical modeling insights.

๐Ÿ”ฌ

Sensitivity Analysis New

Drop-and-retrain protocols for high-missingness variables, testing robustness of findings against missing data patterns and flagging potential concerns.

Demo Run Stats

Numbers from the first end-to-end pipeline run, producing a complete ACM sigconf paper on HSLS:09 college-enrollment prediction.

7/10
Critic Quality Score
2
Revision Cycles
67 min
Total Pipeline Runtime
$7.57
API Cost (vs $5 target)

LSAR — Learning Science Auto-Reviewer

Inspired by PaperReview.ai, LSAR is an automated, agentic paper reviewer designed for learning science conferences. It evaluates manuscripts across 8 quality dimensions and supports 4 major venues โ€” creating a generate-then-review feedback loop with EDM-ARS.

๐ŸŽฏ
Relevance

Alignment with venue scope and topic fit

๐Ÿ’ก
Novelty

Originality and contribution beyond prior work

๐Ÿ“–
Theoretical Grounding

Strength of conceptual framework and motivation

๐Ÿ”ง
Methodological Rigor

Soundness of research design and analysis

๐Ÿ“Š
Empirical Support

Quality and interpretation of evidence

๐ŸŒŸ
Significance

Impact on the field and practical implications

โš–๏ธ
Ethics & Fairness

Ethical considerations and bias awareness

โœ๏ธ
Communication

Clarity, structure, and readability of writing

AIED EDM L@S LAK
๐Ÿ”Ž

Venue Detection โ€” automatically classifies papers for their intended conference or allows manual specification

๐Ÿ“ฆ

Batch Processing โ€” review multiple papers in a single run from directories or CSV manifests

๐Ÿ“„

Multiple Formats โ€” generates reviews in Markdown, JSON, and PDF reports with modular pipeline stages

๐Ÿ”„

EDM-ARS generates a paper → LSAR reviews it across 8 dimensions → feedback drives targeted revisions โ€” a closed-loop generate-then-review cycle for automated research refinement.


Publications

Papers & Reports

Technical reports and demo papers generated by EDM-ARS. Demo papers are full ACM-formatted manuscripts produced by the pipeline on HSLS:09 prediction tasks.

Technical Report March 2026

EDM-ARS: A Domain-Specific Multi-Agent System for Automated Educational Data Mining Research

Chenguang Pan, Zhou Zhang, Weixuan Xiao, Chengyuan Yao

PDF →
Demo Paper LSAR Reviewed March 2026

Fairness-Through-Unawareness Does Not Produce Equitable Predictions: Evidence from HSLS:09 on Subgroup Disparities in College Attendance Prediction

EDM-ARS (Automated) · Target Venue: EDM

Demo Paper LSAR Reviewed March 2026

Do Ninth-Grade Math and Science Identity Predict STEM Degree Non-Completion? Evidence from HSLS:09 and Machine Learning

EDM-ARS (Automated) · Target Venue: EDM

Built With

Tech Stack

Core Pipeline
Python 3.11 Anthropic API Claude Sonnet Claude Opus (Critic) Docker PyYAML
ML & Analysis
scikit-learn XGBoost SHAP pandas matplotlib seaborn
Data & Literature
HSLS:09 Dataset Semantic Scholar API 95-var Tier 1 Registry
Output
ACM sigconf LaTeX Template-Based Generation SHAP Figures

v1.2 & Beyond

The current release targets prediction tasks on HSLS:09 with six major quality improvements in v1.2. A multi-phase roadmap will expand EDM-ARS into a full research automation platform for educational data science.

โœ…  v1.2 โ€” Completed
Prediction Pipeline โ€” end-to-end ML prediction workflow on HSLS:09
Self-Healing Architecture โ€” contract validation, phased Analyst execution, error taxonomy & auto-patching
Docker Sandbox โ€” isolated code execution with subprocess fallback
LaTeX Template System โ€” fixed ACM skeleton with %%PLACEHOLDER%% markers
Checkpoint & Resume โ€” pipeline state persisted after every stage
Outline-First Paper Generation โ€” OutlineAgent creates adaptive outlines before writing
Template Preamble Protection โ€” prevents LLM outputs from corrupting ACM LaTeX formatting
Model Quality Gate โ€” AUC ≥ 0.60 / R² ≥ 0.05 thresholds before SHAP computation
Multilevel Analysis โ€” school clustering detection and intraclass correlation for nested data
Gap-Driven Research Questions โ€” strengthened novelty requirements and theoretical motivation
Sensitivity Analysis โ€” drop-and-retrain protocols for high-missingness variables
Phase 1 โ€” Polymorphism Refactor โ€” extract TaskTemplate & DatasetAdapter abstract base classes; decouple pipeline from HSLS:09 specifics
Phase 2 โ€” Findings Memory & Multi-Branch Ideation โ€” persistent FindingsMemory store across runs; ProblemFormulator generates N diverse candidate specs with novelty scoring
Phase 3 โ€” Causal Inference โ€” propensity score matching, IPW, TMLE, heterogeneous treatment effects, and optimal treatment regimes via econml / dowhy
Phase 4 โ€” Multi-Dataset & Transfer Learning โ€” dataset adapters for ELS:2002, PISA 2022, ASSISTments; cross-population and cross-wave transfer experiments
Phase 5 โ€” Controlled Human Evaluation โ€” blinded comparison of EDM-ARS papers vs. matched human-authored papers on quality, correctness, and efficiency metrics