SAPGraph: Structure-aware Scientific Document Summarization

Published in Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics, 2022

Project Overview

SAPGraph is a novel approach for structure-aware extractive summarization of scientific papers using heterogeneous graph neural networks. The project addresses the challenge of capturing complex structural relationships in scientific documents to improve summarization quality.

Key Contributions

  • Heterogeneous Graph Construction: Developed methods to represent scientific papers as heterogeneous graphs
  • Structure-aware Extraction: Created algorithms that leverage document structure for better summary generation
  • Graph Neural Network Architecture: Designed specialized GNN models for scientific document processing

Technical Approach

The project uses:

  • Graph Representation: Converting scientific papers into heterogeneous graphs with different node types
  • Neural Processing: Applying graph neural networks to capture structural relationships
  • Extractive Summarization: Selecting key sentences based on graph structure and content

Results

  • Published at AACL 2022 conference
  • Demonstrated improved performance over baseline summarization methods
  • Successfully captured structural information from scientific documents

Technologies Used

  • PyTorch, PyTorch Geometric
  • Graph Neural Networks
  • Natural Language Processing
  • Scientific document processing tools

Project Status

Completed - Successfully published and contributed to the field of scientific document summarization.

Recommended citation: S Qi, L Li, Y Li, J Jiang, D Hu, Y Li, Y Zhu, Y Zhou, M Litvak, N Vanetik. (2022). "SAPGraph: Structure-aware extractive summarization for scientific papers with heterogeneous graph." Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics.
Download Paper