PaperInsight
PaperInsight is an AI-powered research paper analysis platform that converts uploaded PDF papers into structured knowledge. The system extracts text, chunks content, generates summaries and metadata, stores structured results, and is designed to evolve toward a retrieval-augmented research assistant.
Problem
Research papers are often long, dense, and difficult to navigate. Readers spend significant time identifying the problem, methodology, results, limitations, and reproduction steps.
Existing PDF readers typically provide only raw text or generic summaries without creating reusable structured knowledge.
Solution
PaperInsight ingests research papers through a REST API and processes them asynchronously.
The platform:
- Extracts text from PDFs
- Segments content into chunks
- Generates structured paper summaries
- Stores reports and chunks in PostgreSQL
- Uses LLM-based extraction for higher quality summaries
- Validates extracted outputs using JSON Schema
The architecture separates API responsibilities from AI processing through a queue-based workflow.
User Experience
Current workflow:
- Upload PDF
- Receive Job ID
- Background worker processes paper
- User checks job status
- Structured report becomes available
Generated reports include:
- Title
- Problem Statement
- Motivation
- Methodology
- Results
- Limitations
- Reproduction Plan
- Section Metadata
Current Status
PaperInsight is currently in active MVP development.
Implemented:
- PDF upload
- PostgreSQL persistence
- Flyway migrations
- Redis queue
- Celery worker
- Chunk storage
- Report storage
- Ollama integration
- JSON schema validation
In progress:
- Robust LLM extraction
- Frontend UI
- Figure and table extraction
- RAG capabilities
Tech Stack
Backend
- Java 21
- Spring Boot 4
- REST APIs
Processing
- Python 3.11
- Celery
- Redis
Database
- PostgreSQL 16
- Flyway
AI
- Ollama
- Llama 3.1 8B
Infrastructure
- Docker Compose
Links
GitHub: TBD
Demo: TBD
Documentation: TBD

