SciDER: Scientific Data-centric End-to-end Researcher

An end-to-end agentic system that parses raw experimental data, designs hypotheses, and drives self-evolving scientific workflows across domains.

William & Mary · University of Minnesota · University of North Carolina at Chapel Hill Ke Lin · Yilin Lu · Shreyas Bhat · Xuehang Guo · Junier Oliva · Qingyun Wang

📄 Paper (arXiv) 🌐 Project Page ▶ Live Demo 🎥 Video ⌁ Code

Authors

Ke Lin · William & Mary

Yilin Lu · University of Minnesota

Shreyas Bhat · University of North Carolina at Chapel Hill

Xuehang Guo · William & Mary

Junier Oliva · University of North Carolina at Chapel Hill

Qingyun Wang · William & Mary

Contact

{klin07, xguo15, qwang16}@wm.edu

lu000661@umn.edu

{shbhat, joliva}@cs.unc.edu

Abstract

Automated scientific discovery with large language models is transforming the research lifecycle from ideation to experimentation, yet existing agents struggle to autonomously process raw data collected from scientific experiments. We introduce SciDER, a data-centric end-to-end system that automates the research lifecycle. Unlike traditional frameworks, our specialized agents collaboratively parse and analyze raw scientific data, generate hypotheses and experimental designs grounded in specific data characteristics, and write and execute corresponding code. Evaluation on three benchmarks shows SciDER excels in specialized data-driven scientific discovery and outperforms general-purpose agents and state-of-the-art models through its self-evolving memory and critic-led feedback loop. Distributed as a modular Python package, we also provide easy-to-use PyPI packages with a lightweight web interface to accelerate autonomous, data-driven research and aim to be accessible to all researchers and developers.

System Highlights

Data-first pipeline. Parses raw experimental artifacts and meta-data directly.

Self-evolving memory. Curates discoveries into reusable scientific context.

Critic-led feedback. Iteratively validates hypotheses before execution.

Domain versatility. Built to generalize across scientific disciplines.

Release Snapshot

Open-source codebase on GitHub with modular components.

PyPI packages for fast installation and reproducible research.

Web interface for interactive experimentation and demos.

Deployment-ready architecture for labs and applied teams.

Agent Comparison

Comparison figure between general agents and SciDER

Demo Walkthrough

Upload raw experimental data or connect lab storage.
SciDER parses observations, protocols, and metadata.
Hypothesis generator proposes next-step experiments.
Critic validates feasibility and novelty.
Memory engine distills outcomes for future cycles.

Demo Links

Paper: arxiv.org/abs/2603.01421

Live demo: huggingface.co/spaces/AI4Research/scider

Video: youtu.be/2SQluhKP6RM

Project page: harryluumn.github.io/scider-proj-page

Resources

Paper

Our preprint highlights the data-centric workflow, evaluation across three benchmarks, and a lightweight tooling stack for research labs.

📄 arXiv:2603.01421

Open Questions

We are actively exploring broader domain coverage, interactive experiment planning, and long-horizon memory evaluation. Collaborators are welcome.