Adding a Brain to My Filing Cabinet: Paperless-ngx with Local AI

Adding a Brain to My Filing Cabinet: Paperless-ngx with Local AI

TL;DR: I gave my private document archive (Paperless-ngx) an AI “brain” using a Mac Mini and some clever open-source software (Paperless-ai + Ollama). Now I can ask my digital filing cabinet questions, and it actually answers back, all without my data ever leaving my house.

Paperless-ngx helps me declutter and organize digitally

Last week, I wrote about how I started using an app called Paperless-ngx to manage the piles of paper on my desk. It’s a self-hosted application with a web interface (private to me) where I can upload scans of my bills, receipts, tax docs, etc. It makes everything searchable using Optical Character Recognition (OCR). When I need to find a specific document, I can just type in some keywords and get presented with those that match the query.

Paperless-ngx is easily one of my favorite applications, second only to Immich for managing my photos. I like it so much that this week I bought a document scanner to speed up the process of digitizing my files. Soon my gigantic multicolored metal filing cabinet in the garage will be obsolete for holding paper documents. Who knows? Maybe I will end up getting rid of it making more room for my garage gym! 

Paperless-ai adds a brain to my digital filing cabinet

As I was researching ways to make the most out of Paperless-ngx, I came across a YouTube video about adding AI functionality to Paperless-ngx with another self-hosted app called Paperless-ai.

“Searchable” works well when you have 50 documents. But what about when your library grows to hundreds (or thousands) of documents? That could become more difficult to manage. You have to remember the exact keyword, the right date range, or manually tag every single incoming bill, receipt, or medical form. 

What if my library could do more than just store the documents? What if it could understand them?

Paperless-AI is exactly what it sounds like: a bridge that connects my document library to a powerful Large Language Model (LLM), the same kind of technology that powers ChatGPT.

Paperless-ai Dashboard Overview

Adding this “brain” to my digital filing cabinet changes the workflow from passive storage to active intelligence. Here is what this integration enables me to do:

1. Set-and-Forget Automated Tagging

Before adding Paperless-ai, when a new PDF arrived, I had to manually tag it “Bill,” “Tax,” “Medical,” or “Receipt.” Now, Paperless-ai reads the entire document upon arrival. It understands the context and applies the correct tags automatically. If the document looks like an invoice from my ISP, it gets tagged “Utility” and “Internet” without me lifting a finger.

2. Smart, Consistent Document Titles

We all know the pain of files named SCAN001.pdf. Even when OCRed, those names are useless. Paperless-ai analyzes the content and renames the file following my preferred format. It extracts the correspondent (who sent it) and a brief summary to create a perfect title, like 2024-03-01 - Acme Corp - February Invoice.

3. Chat with Your Data

This is the feature that feels like magic. Because the AI has “read” and indexed my entire library, I can now open a chat interface and ask questions about my own personal data.

Instead of searching for “highest utility bill” and manually comparing PDFs, I can simply ask:

  • “Which utility bill was the highest in the last three months?”
  • “When does the warranty on the dishwasher expire?”
  • “Does this recent medical bill match the quote I was sent in January?”

The AI searches the index, finds the relevant documents, synthesizes the answer, and presents it to me in plain English, with citations linking directly to the source PDFs.

Why This Matters (and Why You Should Care About Privacy)

The true usefulness of this setup isn’t just about automated tagging or smart titles; it’s about unlocking the value of your own information. We all generate so much data, but we usually lose the thread of it. This system makes your information work for you.

And here is the most critical point: The entire pipeline is 100% local.

When you upload your tax returns, bank statements, and medical history to an “AI in the Cloud” (like OpenAI, Google, or Microsoft), you are handing your most sensitive data to a third-party corporation. Their business model is data. You do not own that relationship, and you do not control how your data is used, trained on, or secured.

In my homelab, my data never leaves my house. The AI that reads my bills, knows my health history, and answers my questions is running privately in my own living room.

You should not trust your sensitive life data to a cloud provider with a conflicting profit motive. My private library with a private brain proves that you don’t have to sacrifice modern intelligence to maintain absolute data sovereignty.


Technical Bits for Homelabbers

For those of you who want to know what’s under the hood, my setup is a bit of a “hybrid” distributed system.

The Storage Hub

The main application and my 3TB ZFS storage pool live on a Lenovo ThinkStation P310 running Fedora Server. This is my “Vault” for storing data, but its older Intel i7 processor isn’t exactly built for the heavy mathematical lifting required by modern AI.

The Standalone “Conductor”

Instead of tucking Paperless-ai inside the main “pod” (group) of Paperless-ngx containers, I chose to run it as its own standalone container. This keeps the architectures separate. If I want to update the AI logic or change a configuration, I don’t have to restart my entire document archive. It makes the whole system much easier to maintain and troubleshoot.

Offloading the Brainpower

To make the AI fast and responsive, I’ve offloaded the “thinking” to a 2020 M1 Mac Mini. While the Mac sits on my desk for casual use, it also runs Ollama, a service that hosts the actual AI models (Llama 3.2 for chatting and Nomic Embed for indexing).

Apple’s M1 chips have a “Neural Engine” specifically designed for these types of tasks. By pointing my Fedora server to the Mac Mini over my local network, I get faster document processing and chat responses without bogging down my main server or hearing its fans spin up every time I scan a receipt.


Next up! Several hours of document scanning! Who wants to join me?!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top