๐Ÿช EDM-ARS  ยท  v1.1 ยท Open Source

Educational Data Mining
Automated Research System

A multi-agent LLM pipeline that turns a dataset and a research prompt into a complete, reviewer-ready academic paper.

5
Specialized Agents
7/10
Critic Quality Score
10-15 min
Demo Runtime
$2โ€“$5
Typical API Cost Per Paper

Overview

What It Does

EDM-ARS is an open-source, domain-specific multi-agent LLM pipeline that automates the complete workflow of prediction-focused educational data mining research. Given the HSLS:09 dataset and a research prompt, it formulates a research question, engineers features, trains and compares multiple ML models, runs SHAP explainability and subgroup fairness analysis, retrieves real citations via the Semantic Scholar API with three-layer verification (exact title match, Jaccard similarity, CrossRef cross-validation), and produces a complete ACM sigconf-formatted LaTeX paper โ€” with a built-in Critic agent that enforces methodological rigor through automated peer review and targeted revision loops.

๐ŸŽ“

Domain-Specific Design

Built around the HSLS:09 longitudinal dataset with a three-tier variable registry: ~95 hand-curated Tier 1 variables with educational annotations, auto-generated Tier 2 variables, and Tier 3 exclusions (weights, IDs) enforced programmatically.

๐Ÿ“„

Real Academic Output

The Writer fills prose into a fixed ACM sigconf LaTeX skeleton with %%PLACEHOLDER%% markers โ€” never generating boilerplate from scratch, ensuring structurally correct output every time.

๐Ÿ”

Critic-Gated Revision Loop

After analysis, the Critic reviews all prior agents' outputs and can route targeted revisions back to any stage โ€” up to 2 cycles โ€” before writing begins.

๐Ÿ’พ

Checkpoint & Resume

Pipeline state is serialized to checkpoint.json after every stage. Interrupted runs resume from the last completed stage โ€” no work is lost.


Five-Agent Pipeline

๐Ÿ” 1
ProblemFormulator
Searches Semantic Scholar, scopes the research question & hypothesis
๐Ÿ›  2
DataEngineer
Cleans features, outputs test_protected.csv with pre-encoding subgroup labels for fairness analysis
๐Ÿ“Š 3
Analyst
Trains model battery (LR, RF, XGBoost, ElasticNet, MLP, Stacking), runs SHAP & subgroup analysis in phased execution
โœ๏ธ 5
Writer
Fills structured results into ACM sigconf LaTeX template โ€” template-based, never free-form generation
โš–๏ธ  Agent 4 ยท Gatekeeper
Critic
Reviews all prior agents' outputs for methodological soundness.
Issues PASS / REVISE / ABORT verdicts.
claude-opus โ€” highest-tier model
๐Ÿ”„

Revision loop โ€” on REVISE, targeted instructions are routed back to ProblemFormulator, DataEngineer, or Analyst selectively. Up to 2 cycles before the Writer is unblocked regardless.


Features

Key Capabilities

๐Ÿค–

5 Specialized Agents

Coordinated by a state-machine orchestrator. Each agent has its own system prompt, temperature, and model tier (Opus for Critic, Sonnet for all others).

๐Ÿ“œ

End-to-End Automation

From a raw CSV and a research prompt to a compiled ACM LaTeX paper โ€” with real citations, methodology validation, and SHAP explainability figures.

๐Ÿ›ก

Self-Healing Pipeline

Contract validation at every stage boundary. Auto-patching for classifiable errors (SHAP failure, dtype mismatch, missing column) before falling back to LLM repair.

๐Ÿ“š

Live Academic Citations

The ProblemFormulator queries the Semantic Scholar API with exponential-backoff retry logic to retrieve and validate real, current citations.

โš—๏ธ

6-Model Battery

Logistic Regression, Random Forest, XGBoost, ElasticNet, MLP, and a Stacking Ensemble are trained, compared, and reported with SHAP explainability where applicable.

๐Ÿณ

Docker Sandboxing

LLM-generated analysis code executes inside a Docker sandbox (network-disabled). Gracefully falls back to subprocess when Docker is unavailable.

Demo Run Stats

Numbers from the first end-to-end pipeline run, producing a complete ACM sigconf paper on HSLS:09 college-enrollment prediction.

7/10
Critic Quality Score
2
Revision Cycles
67 min
Total Pipeline Runtime
$7.57
API Cost (vs $5 target)

Publications

Papers & Reports

Technical reports and demo papers generated by EDM-ARS. Demo papers are full ACM-formatted manuscripts produced by the pipeline on HSLS:09 prediction tasks.

Technical Report March 2026

EDM-ARS: A Domain-Specific Multi-Agent System for Automated Educational Data Mining Research

Chenguang Pan, Zhou Zhang, Weixuan Xiao, Chengyuan Yao

PDF →

Built With

Tech Stack

Core Pipeline
Python 3.11 Anthropic API Claude Sonnet Claude Opus (Critic) Docker PyYAML
ML & Analysis
scikit-learn XGBoost SHAP pandas matplotlib seaborn
Data & Literature
HSLS:09 Dataset Semantic Scholar API 95-var Tier 1 Registry
Output
ACM sigconf LaTeX Template-Based Generation SHAP Figures

Pilot v1.1 & Beyond

The current release targets prediction tasks on HSLS:09 only. A six-phase roadmap will expand EDM-ARS into a full research automation platform for educational data science.

โœ…  Pilot v1.1 โ€” Completed
Prediction Pipeline โ€” end-to-end ML prediction workflow on HSLS:09
Self-Healing Architecture โ€” contract validation, phased Analyst execution, error taxonomy & auto-patching
Docker Sandbox โ€” isolated code execution with subprocess fallback
LaTeX Template System โ€” fixed ACM skeleton with %%PLACEHOLDER%% markers
Checkpoint & Resume โ€” pipeline state persisted after every stage
Phase 1 โ€” Polymorphism Refactor โ€” extract TaskTemplate & DatasetAdapter abstract base classes; decouple pipeline from HSLS:09 specifics
Phase 2 โ€” Findings Memory & Multi-Branch Ideation โ€” persistent FindingsMemory store across runs; ProblemFormulator generates N diverse candidate specs with novelty scoring
Phase 3 โ€” Causal Inference โ€” propensity score matching, IPW, TMLE, heterogeneous treatment effects, and optimal treatment regimes via econml / dowhy
Phase 4 โ€” Outline-First Writing & Narrative Archetypes โ€” split Writer into OutlineAgent + ProseAgent; archetypes include "The Surprising Predictor," "The Fairness Audit," "The Policy Brief"
Phase 5 โ€” Multi-Dataset & Transfer Learning โ€” dataset adapters for ELS:2002, PISA 2022, ASSISTments; cross-population and cross-wave transfer experiments
Phase 6 โ€” Controlled Human Evaluation โ€” blinded comparison of EDM-ARS papers vs. matched human-authored papers on quality, correctness, and efficiency metrics