LEARN: A Story-Driven Layout-to-Image Generation Framework for STEM Instruction

Authors

Maoquan Zhang,

Bisser Raytchev,

Xiujuan Sun

Date

08/2025

Publisher

arXiv

Link

http://arxiv.org/pdf/2508.11153v1

LEARN is a layout-aware diffusion framework designed to generate pedagogically aligned illustrations for STEM education. It leverages a curated BookCover dataset that provides narrative layouts and structured visual cues, enabling the model to depict abstract and sequential scientific concepts with strong semantic alignment. Through layout-conditioned generation, contrastive visual-semantic training, and prompt modulation, LEARN produces coherent visual sequences that support mid-to-high-level reasoning in line with Bloom's taxonomy while reducing extraneous cognitive load as emphasized by Cognitive Load Theory. By fostering spatially organized and story-driven narratives, the framework counters fragmented attention often induced by short-form media and promotes sustained conceptual focus. Beyond static diagrams, LEARN demonstrates potential for integration with multimodal systems and curriculum-linked knowledge graphs to create adaptive, exploratory educational content. As the first generative approach to unify layout-based storytelling, semantic structure learning, and cognitive scaffolding, LEARN represents a novel direction for generative AI in education. The code and dataset will be released to facilitate future research and practical deployment.

What is the application?

Teaching – Instructional Materials

Who is the user?

Student,

Who age?

Why use AI?

Outcomes – Other Academic,

Outcomes – Differentiation,