Sina Bolouri

Project

PaperInsight

AI-powered research paper analysis platform that extracts, structures, summarizes, and stores knowledge from uploaded PDFs.

in_progressJava 21Spring Boot 4Python 3.11CeleryRedisPostgreSQL 16FlywayDockerOllamaLlama 3.1 8BJSON SchemaREST API
Visit Project

PaperInsight

PaperInsight is an AI-powered research paper analysis platform that converts uploaded PDF papers into structured knowledge. The system extracts text, chunks content, generates summaries and metadata, stores structured results, and is designed to evolve toward a retrieval-augmented research assistant.

Problem

Research papers are often long, dense, and difficult to navigate. Readers spend significant time identifying the problem, methodology, results, limitations, and reproduction steps.

Existing PDF readers typically provide only raw text or generic summaries without creating reusable structured knowledge.

Solution

PaperInsight ingests research papers through a REST API and processes them asynchronously.

The platform:

  • Extracts text from PDFs
  • Segments content into chunks
  • Generates structured paper summaries
  • Stores reports and chunks in PostgreSQL
  • Uses LLM-based extraction for higher quality summaries
  • Validates extracted outputs using JSON Schema

The architecture separates API responsibilities from AI processing through a queue-based workflow.

User Experience

Current workflow:

  1. Upload PDF
  2. Receive Job ID
  3. Background worker processes paper
  4. User checks job status
  5. Structured report becomes available

Generated reports include:

  • Title
  • Problem Statement
  • Motivation
  • Methodology
  • Results
  • Limitations
  • Reproduction Plan
  • Section Metadata

Current Status

PaperInsight is currently in active MVP development.

Implemented:

  • PDF upload
  • PostgreSQL persistence
  • Flyway migrations
  • Redis queue
  • Celery worker
  • Chunk storage
  • Report storage
  • Ollama integration
  • JSON schema validation

In progress:

  • Robust LLM extraction
  • Frontend UI
  • Figure and table extraction
  • RAG capabilities

Tech Stack

Backend

  • Java 21
  • Spring Boot 4
  • REST APIs

Processing

  • Python 3.11
  • Celery
  • Redis

Database

  • PostgreSQL 16
  • Flyway

AI

  • Ollama
  • Llama 3.1 8B

Infrastructure

  • Docker Compose

Links

GitHub: TBD

Demo: TBD

Documentation: TBD