SAPGraph: Structure-aware Scientific Document Summarization
Published in Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics, 2022
Project Overview
SAPGraph is a novel approach for structure-aware extractive summarization of scientific papers using heterogeneous graph neural networks. The project addresses the challenge of capturing complex structural relationships in scientific documents to improve summarization quality.
Key Contributions
- Heterogeneous Graph Construction: Developed methods to represent scientific papers as heterogeneous graphs
- Structure-aware Extraction: Created algorithms that leverage document structure for better summary generation
- Graph Neural Network Architecture: Designed specialized GNN models for scientific document processing
Technical Approach
The project uses:
- Graph Representation: Converting scientific papers into heterogeneous graphs with different node types
- Neural Processing: Applying graph neural networks to capture structural relationships
- Extractive Summarization: Selecting key sentences based on graph structure and content
Results
- Published at AACL 2022 conference
- Demonstrated improved performance over baseline summarization methods
- Successfully captured structural information from scientific documents
Technologies Used
- PyTorch, PyTorch Geometric
- Graph Neural Networks
- Natural Language Processing
- Scientific document processing tools
Project Status
Completed - Successfully published and contributed to the field of scientific document summarization.
Recommended citation: S Qi, L Li, Y Li, J Jiang, D Hu, Y Li, Y Zhu, Y Zhou, M Litvak, N Vanetik. (2022). "SAPGraph: Structure-aware extractive summarization for scientific papers with heterogeneous graph." Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics.
Download Paper