Step 3: Open a document and search — Paperless-ngx Walkthrough

Drop into a single document, see its OCR’d content, then run a full-text search

Enable JavaScript for journey tracking

The phase navigator requires JavaScript to track your progress. The navigation links will still work, but your progress won't be saved.

Step 3 of 4 • 3 minutes

Open a document and search

See how OCR, Tika and Gotenberg made every document searchable, then try a keyword search.

Document detail view showing OCR'd email content from Highways East about a footpath query — Every document is OCR'd by Tesseract; .docx and .odt go via Gotenberg, .eml via Apache Tika — the full text becomes searchable

Search results filtered to the term 'planning' showing several planning notices — Full-text search combines Paperless's text index over OCR'd content with the AI-generated metadata

Expected outcome

Single document detail shows the OCR'd text alongside the original
Document notes panel shows the AI-generated summary
Full-text search returns matches across the AI metadata and OCR'd content

What to try

Open a document
From the Documents view, click any card — try one of the .eml emails (Highways East or the WREN grant decision are good) or one of the planning notice PDFs.
Look at the Content tab
The right-hand pane shows the document's OCR'd content. For PDFs this is Tesseract output. For .docx and .odt it's been converted via Gotenberg first. For .eml files it's the body extracted by Apache Tika. All three pipelines feed the same searchable text store.
Look at the Notes tab
The notes panel shows the AI-generated 1-2 sentence summary that the post-consume hook stored against the document. Useful when a clerk wants to scan an inbox at a glance.
Look at the Details tab
You'll see the AI-rewritten title, the AI-extracted correspondent, the AI-classified document type, and the AI tags — all editable if you want to override Bedrock's choices.
Search the archive
Use the search bar at the top. Try planning to find every document mentioning planning. Try Mill Lane — Paperless full-text-searches the OCR'd content, so this hits the body of the planning notice for The Old Forge. Try WREN grant — this picks up the email confirming the grant award.

The OCR pipeline:

Tesseract — OCRs scanned PDFs and image-only PDFs in the main Paperless container.
Apache Tika — runs as a sidecar on localhost:9998; extracts content from .eml, .odt, .rtf and dozens of other formats.
Gotenberg — runs as a sidecar on localhost:3000; converts .docx and other Office formats to PDF before they reach Tesseract.

All three sidecars are in the same ECS task, talking via localhost — there's no service-discovery DNS, no extra network surface, and they all scale together.

Step 3: Open a document and search — Paperless-ngx Walkthrough

Great! You've deployed the demo

Choose your next step

Generate Evidence Pack

Go Deeper Optional

Open a document and search

Expected outcome

What to try

Need help?

Cookies on NDX:Try AWS

Great! You've deployed the demo

Choose your next step

Generate Evidence Pack

Go Deeper Optional

Walkthrough progress

Open a document and search

Expected outcome

What to try

Need help?