Walkthrough complete — Paperless-ngx with AI
You’ve completed the Paperless-ngx with AI walkthrough
Great! You've deployed the demo
Now let's walk through what you just deployed and see it in action.
Start WalkthroughChoose your next step
Generate Evidence Pack
Create your business case documentation with what you've learned.
Generate Evidence PackWalkthrough complete
You've just put 36 parish documents through an OCR + Bedrock classification pipeline, browsed the archive Bedrock built, and asked the documents plain-English questions. In 15 minutes you've seen what would otherwise be a multi-vendor procurement involving an EDMS, an OCR engine and an AI provider.
What you've learned
📄 Open-source, env-var configured
Upstream Paperless-ngx is fully configured by environment variables — no fork, no patch, no licence. The same image you'd download for your home server runs unchanged on Fargate.
Value: zero licence cost, no vendor lock-in, an active community of tens of thousands.
🧠 AI applied at the right layer
Bedrock isn't replacing OCR or replacing search — it's enriching the metadata Paperless already has. Title, tags, document type, correspondent, summary. Everything else (the search box, the inbox tag, the document detail view) is the upstream UI that Paperless users already know.
Value: AI where it adds value, conventional UI everywhere else.
🔍 Retrieval-augmented chat with citations
Every chat answer is grounded in the documents you've actually consumed and cites them by source. The Knowledge Base + S3 Vectors + Guardrails stack is fully managed — there's no vector DB cluster to size, no embedding pipeline to maintain, no separate moderation system.
Value: trustworthy answers, FOI-defensible, no hallucinated content.
🛡️ Guardrails as a first-class control
Bedrock Guardrails block political opinion, medical and legal advice, and anonymise UK PII (names, addresses, phone, email, NI numbers, NHS numbers, payment cards) at the answer layer — before the response leaves AWS, regardless of what the model would otherwise say.
Value: defensible PII handling, consistent content safety across every chat.
Production readiness
This demo runs the same upstream image used in production by tens of thousands of households and small organisations. For a council deployment you'd additionally want:
| Feature | Demo status | Production requirement |
|---|---|---|
| Authentication | Built-in admin user | SSO via SAML or OIDC, MFA |
| Custom domain | CloudFront URL | archive.yourcouncil.gov.uk + ACM certificate |
| Document ingestion | 36 sample parish documents | IMAP scrape, scanner-direct upload, batch import |
| Backup & retention | Aurora 1-day automated | Point-in-time recovery, retention policy aligned with statutory requirements |
| Audit log | CloudWatch Logs | Centralised SIEM, immutable logging, FOI-defensible record |
| Bedrock cost controls | Per-document classification | Budget alerts, model usage caps, on-prem fallback for sensitive content |
| Guardrails | Default content / topic / PII set | Topic and word lists tuned to your council's policies |
The point isn't that this scenario is production-ready out of the box. The point is that the heavy lifting — OCR, classification, RAG, Guardrails, multi-format conversion — is already done. The bits left are integration with your council's identity, network, retention rules and procurement framework.
Learn more
- Paperless-ngx documentation (opens in new tab) — full upstream docs
- Paperless-ngx on GitHub (opens in new tab) — 40,000+ stars, GPL-3.0
- Amazon Bedrock Knowledge Bases (opens in new tab)
- Amazon Bedrock Guardrails (opens in new tab)