Skip to main content

This is a prototype vision of how a future government service could work. It's not a real service yet, but we're exploring what it could look like. Your feedback will help shape the real service.

Step 3: Upload & Process Document - FOI Redaction Walkthrough

Upload your sample document and watch AI detect PII in real-time

Walkthrough progress

Step 3 of 5 • 2 minutes

Step 3 2 minutes

Upload & Process Document

Upload your downloaded FOI document and watch AI automatically detect personal information in real-time.

Find your FOI Redaction tool URL in CloudFormation Outputs
The FOI Redaction tool ready to process sensitive documents

Expected outcome

  • Document uploads successfully to the interface
  • Processing completes within 15-30 seconds
  • PII detection results appear automatically
  • You see the wow moment of instant PII identification

Upload your document

  1. Locate the upload area in the redaction interface

    You should see a drag-and-drop zone or a "Choose file" button

  2. Upload the sample document you downloaded

    Either drag the PDF file onto the drop zone, or click "Choose file" and select it from your downloads folder

  3. Wait for upload confirmation

    You'll see a progress indicator showing the file is uploading. For our sample documents (under 2MB), this takes 1-3 seconds.

  4. Watch the processing indicator

    After upload, the AI processing begins automatically. You'll see a status message like "Analyzing document for PII..."

Processing time: Sample documents process in 15-30 seconds. The AI is reading the entire document, extracting text, identifying PII entities, calculating confidence scores, and preparing the redacted output.

What's happening during processing

Behind the scenes, the AI performs these steps:

1

Document upload to S3

Your sample document is securely uploaded to an encrypted S3 bucket for processing. (1-3 seconds)

2

Text extraction (if needed)

If your document is a scanned PDF, Amazon Textract extracts text using OCR. Typed PDFs skip this step. (5-10 seconds for scanned documents)

3

PII entity detection

Amazon Comprehend analyzes the text and identifies PII entities (names, addresses, phone, email, etc.). Each detection gets a confidence score. (8-15 seconds)

4

Results preparation

The system prepares both the detection results (with locations and confidence scores) and the redacted output document. (2-5 seconds)

The wow moment: AI reads and understands your FOI document

In under 30 seconds, AI has read your entire FOI response, identified every instance of personal information across all pages, calculated confidence scores for each detection, and prepared a redacted version. This is work that would take an FOI officer 20-30 minutes of careful line-by-line review.

Technical detail
Amazon Comprehend uses machine learning models trained on millions of documents to recognize PII patterns. It doesn't just search for keywords - it understands context. For example, it knows 'Manchester' in '15 Elm Street, Manchester' is part of an address, but 'Manchester' in 'Manchester City Council' is not PII.

Processing status messages

During processing, you'll see status updates:

Uploading...
Document is being uploaded to secure storage (1-3 seconds)
Analyzing document...
AI is reading the document and detecting PII entities (15-25 seconds)
Processing complete
PII detection finished successfully. Results are ready to review.

While you wait...

Processing takes 15-30 seconds. Consider:

  • In production: This would run automatically when FOI responses are prepared
  • Volume processing: The system can handle multiple documents simultaneously
  • Time savings: 30 seconds vs 20-30 minutes manual review = 40-60× faster
  • Accuracy: AI doesn't get tired or miss PII due to document fatigue

Troubleshooting

Upload fails immediately

If upload fails before processing starts:

  • Check file size is under 10MB (our samples are all under 2MB)
  • Verify file format is PDF, DOCX, or TXT
  • Check file isn't corrupted (try opening it locally first)
  • Check browser console for CORS or network errors
  • Try refreshing the redaction interface and uploading again
Processing never completes (timeout)

If processing spinner continues beyond 60 seconds:

  • Refresh the page - processing may have completed but UI didn't update
  • Check the S3 bucket for uploaded files (confirms upload succeeded)
  • Check CloudWatch logs for Lambda function errors
  • Verify Comprehend API is responding (check AWS service health dashboard)
  • Try with a different sample document to isolate the issue
  • Check Lambda function timeout is set to 60 seconds minimum
Processing fails with error message

If you see an error message after processing starts:

  • Check error message for specifics (API quota, permission denied, etc.)
  • Verify Lambda has IAM permissions for Comprehend DetectPiiEntities API
  • Check Comprehend API quotas not exceeded (1,000 requests per second default)
  • Review CloudWatch logs for detailed error stack traces
  • Wait 1-2 minutes and retry (temporary AWS service issues)
  • Check document content isn't triggering Comprehend content filters
Something went wrong? Get help

If you're stuck or encounter unexpected behavior:

Note Screenshot placeholder: In production, this page would include animated GIFs or screenshots showing the upload process, processing status messages, and the completion notification.