What is Papermerge-Core
Repository: https://github.com/papermerge/papermerge-core
Papermerge-Core is an open-source Document Management System (DMS) designed for storing, OCR processing, and searching scanned documents.
In simple terms:
Think of it as Google Drive + powerful search + searchable scanned PDFs.
🚨 What Problem Does It Solve?
Imagine a real-life scenario:
- You have 500 receipts
- All scanned and saved as PDFs
- You want to search for words like "VAT" or "Company A"
The Problem
Scanned PDFs are just images
→ You cannot search text inside them
Papermerge’s Solution
- OCR → converts images into text
- Indexing → enables full-text search
- Folders + metadata + tags → organize everything
In short
It turns:
a messy pile of files
into
a clean, searchable document system like Google Drive
🧱 Tech Stack
Backend
- Python
- Django
- REST API (OpenAPI)
- Async workers for OCR and indexing
Frontend
- React
- Single Page Application (SPA)
- Communicates with backend entirely via API
Infrastructure
- Docker-friendly
- Redis (task queue)
- Search engines:
- Elasticsearch
- Xapian
- Whoosh
- Solr
- Tesseract OCR
✨ Features Developers Will Like
📁 Document System
- Folder tree structure
- Drag & drop uploads
- Versioning
- Page reorder / delete / extract
🔍 OCR + Full Text Search
- Scanned files become searchable
- Instant word search inside documents
🏷️ Metadata / Tags
- Custom fields
- Document types
- Great for invoices, contracts, receipts
👥 Multi-User
- Users / groups / permissions
- Document sharing
🔌 API-First Design
- Everything exposed via REST API
- Easy integration with other systems
⚡ Quick Summary
Papermerge = Google Drive + OCR + Search + Self-hosted
Stack: Django + React + Workers + Search Engine
Great for learning:
- Real-world backend architecture
- Async workers
- Search systems
- Document processing pipelines
- API-first design
This is a solid, production-grade project — not just a demo app.
Perfect for architecture study or internal knowledge sharing.









