SAPGraph: Structure-aware Scientific Document Summarization

Published in Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics, 2022

Project Overview

SAPGraph is a novel approach for structure-aware extractive summarization of scientific papers using heterogeneous graph neural networks. The project addresses the challenge of capturing complex structural relationships in scientific documents to improve summarization quality.

Key Contributions

Heterogeneous Graph Construction: Developed methods to represent scientific papers as heterogeneous graphs
Structure-aware Extraction: Created algorithms that leverage document structure for better summary generation
Graph Neural Network Architecture: Designed specialized GNN models for scientific document processing

Technical Approach

The project uses:

Graph Representation: Converting scientific papers into heterogeneous graphs with different node types
Neural Processing: Applying graph neural networks to capture structural relationships
Extractive Summarization: Selecting key sentences based on graph structure and content

Results

Published at AACL 2022 conference
Demonstrated improved performance over baseline summarization methods
Successfully captured structural information from scientific documents

Technologies Used

PyTorch, PyTorch Geometric
Graph Neural Networks
Natural Language Processing
Scientific document processing tools

Project Status

Completed - Successfully published and contributed to the field of scientific document summarization.

Recommended citation: S Qi, L Li, Y Li, J Jiang, D Hu, Y Li, Y Zhu, Y Zhou, M Litvak, N Vanetik. (2022). "SAPGraph: Structure-aware extractive summarization for scientific papers with heterogeneous graph." Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics.
Download Paper

Share on

Bluesky Facebook LinkedIn X (formerly Twitter)

Siya Qi