PaperInsight

PaperInsight is an AI-powered research paper analysis platform that converts uploaded PDF papers into structured knowledge. The system extracts text, chunks content, generates summaries and metadata, stores structured results, and is designed to evolve toward a retrieval-augmented research assistant.

Problem

Research papers are often long, dense, and difficult to navigate. Readers spend significant time identifying the problem, methodology, results, limitations, and reproduction steps.

Existing PDF readers typically provide only raw text or generic summaries without creating reusable structured knowledge.

Solution

PaperInsight ingests research papers through a REST API and processes them asynchronously.

The platform:

Extracts text from PDFs
Segments content into chunks
Generates structured paper summaries
Stores reports and chunks in PostgreSQL
Uses LLM-based extraction for higher quality summaries
Validates extracted outputs using JSON Schema

The architecture separates API responsibilities from AI processing through a queue-based workflow.

User Experience

Current workflow:

Upload PDF
Receive Job ID
Background worker processes paper
User checks job status
Structured report becomes available

Generated reports include:

Title
Problem Statement
Motivation
Methodology
Results
Limitations
Reproduction Plan
Section Metadata

Current Status

PaperInsight is currently in active MVP development.

Implemented:

PDF upload
PostgreSQL persistence
Flyway migrations
Redis queue
Celery worker
Chunk storage
Report storage
Ollama integration
JSON schema validation

In progress:

Robust LLM extraction
Frontend UI
Figure and table extraction
RAG capabilities

Tech Stack

Backend

Java 21
Spring Boot 4
REST APIs

Processing

Python 3.11
Celery
Redis

Database

PostgreSQL 16
Flyway

AI

Ollama
Llama 3.1 8B

Infrastructure

Docker Compose

Links

GitHub: TBD

Demo: TBD

Documentation: TBD