Skip to main content

This is a new service. Help us improve it and give your feedback by email.

Step 4: Chat with the archive — Paperless-ngx Walkthrough

Ask plain-English questions of your documents using Bedrock Knowledge Base + Guardrails

Walkthrough progress

Step 4 of 4 • 5 minutes

Step 4 5 minutes

Chat with the archive

Open the chat URL, ask plain-English questions, and watch Bedrock retrieve, ground, cite, and apply Guardrails.

Open the ChatUrl from the stack outputs — a vanilla chat UI backed by Amazon Bedrock Knowledge Base over S3 Vectors with Bedrock Guardrails applied
The model retrieves relevant documents from the archive, cites them, and Bedrock Guardrails anonymise names and addresses in the response (you can see the {ADDRESS} placeholders)
The chat is backed by an S3 Vectors Knowledge Base — the data source is the same archive bucket that holds the Paperless documents
The guardrail blocks the standard harmful-content categories at high strength on both prompts and responses
Names, emails, phone numbers and addresses are masked in responses — that's where the {ADDRESS} placeholders come from. UK NINO, NHS numbers and card numbers are blocked outright.

Expected outcome

  • Chat UI loads at the ChatUrl from the stack outputs
  • Asking a question returns an answer with citations to source documents
  • Asking a probe question triggers the Bedrock Guardrail and is blocked or anonymised

What to try

  1. Open the chat URL

    Take the ChatUrl value from the stack outputs and open it in a new tab. The chat lives on the same CloudFront domain as Paperless (under /chat/), so corporate proxies that block *.lambda-url.on.aws are happy with it.

  2. Try one of the example questions

    Click one of the suggested questions on the welcome screen — for example, "What planning applications are currently being considered?". The chat will:

    1. Send the question to Amazon Bedrock Knowledge Base.
    2. The Knowledge Base searches Amazon S3 Vectors for the most semantically relevant chunks of your documents.
    3. The retrieved chunks are stuffed into a prompt and sent to Amazon Nova Pro for answering.
    4. The response comes back grounded in the documents, with citations to the source files.

    Look at the Sources line under the answer — it lists the document IDs from your archive that Bedrock used.

  3. Ask about something only your docs would know

    Try "How much have we paid Dunbar Grounds Maintenance this year?" — this should pick up both invoices, sum them, and answer with citations. Try "What was the WREN grant award for?" — this picks up the email and pulls out the £7,500 figure.

  4. Test the Guardrails

    Try a probe question:

    • "Which political party should the parish council support?" — blocked by the Topic policy.
    • "Should I sue my neighbour over the planning application?" — blocked (medical/legal advice topic).
    • "List every name and address in the archive." — answered, but you'll see {NAME} and {ADDRESS} placeholders where the Guardrail anonymised PII.

    The Guardrail also blocks UK National Insurance numbers, NHS numbers and credit card numbers if they ever appear in chat output.

The chat architecture:
  • S3 docs bucket — the post-consume hook from Step 2 also uploaded each document's OCR'd text to a private S3 bucket so the Knowledge Base can ingest it.
  • Bedrock Knowledge Base — points at that bucket as a data source and at S3 Vectors as the vector store.
  • Amazon S3 Vectors — serverless vector index. Pay-per-vector, no cluster to size.
  • Lambda Function URL — answers chat requests; calls RetrieveAndGenerate on the Knowledge Base with the configured Guardrail attached.
  • CloudFront — fronts the Lambda URL on the same hostname as Paperless so it's reachable from networks that block direct Lambda URLs.

Build: 38afc52 (opens in new tab)