Skip to main content

This is a new service. Help us improve it and give your feedback by email.

Step 8: PDF to Web Converter - LocalGov Drupal Walkthrough

Convert legacy PDF documents to accessible web content using AI

Walkthrough progress

Step 8 of 12 • 4 minutes

Step 8 4 minutes

PDF to Web Converter

Convert legacy PDF documents into accessible web pages using AI-powered document analysis.

Expected outcome

  • You understand what PDF to Web Converter does
  • You know how to access it from the admin menu
  • You've seen how PDFs become accessible web content

What is PDF to Web Converter?

LocalGov Drupal includes a PDF to Web conversion tool that transforms legacy PDF documents into accessible, standards-compliant web pages. This feature uses AI to extract text, tables, and structure from PDFs and convert them to semantic HTML.

Why convert PDFs? Many councils have important information locked in PDF documents that aren't accessible to screen readers, don't work well on mobile, and can't be found by search engines. Converting to web content solves all these problems.

Where to find it

The PDF to Web Converter is available in the Drupal admin menu:

  1. Click Content in the admin toolbar
  2. Select PDF to Web from the dropdown

Or navigate directly to: /admin/content/pdf-to-web

PDF to Web conversion form showing a drag-and-drop upload area, page title field, and Convert PDF button
The PDF conversion form with drag-and-drop upload and title field

How to convert a PDF

  1. Upload your PDF

    Drag and drop a PDF file onto the upload area, or click to browse and select a file. The converter works best with text-based PDFs (not scanned images).

  2. Enter a page title

    Give your new web page a descriptive title. This will become the page heading and appear in search results.

  3. Click Convert PDF

    The conversion process begins. You'll see real-time progress as the AI analyses your document.

  4. Review the preview

    Once complete, you'll see a preview of the converted HTML along with statistics about what was extracted.

  5. Create a draft page

    Click "Create Draft Page" to save the content as an unpublished Drupal page. You can then edit and publish it when ready.

What happens during conversion

The conversion process has four stages:

1

Extracting text

AWS Textract reads all text and tables from your PDF

2

Analysing structure

AI identifies headings, paragraphs, lists, and tables

3

Structuring content

Amazon Bedrock organises content into semantic HTML

4

Complete

Preview ready with conversion statistics

PDF conversion complete showing HTML preview with statistics including pages extracted, tables found, word count, and confidence score
Conversion complete with preview and statistics showing pages, tables, words, and confidence

Conversion statistics

After conversion, you'll see helpful statistics:

Statistic What it means
Pages extracted Number of PDF pages processed
Tables found Tables detected and converted to HTML
Word count Total words in the extracted content
Confidence score AI's confidence in extraction accuracy (higher is better)
Processing time How long the conversion took

AWS Services Used

Amazon Textract

Extracts text, tables, and layout information from documents using machine learning. Learn more about Amazon Textract (opens in new tab)

Amazon Bedrock

Uses foundation models to structure content into semantic HTML with proper headings. Learn more about Amazon Bedrock (opens in new tab)

Common use cases

Document type Why convert
Council reports Make committee reports searchable and accessible
Policy documents Improve discoverability of policies and procedures
Guidance leaflets Replace outdated PDFs with modern web content
Forms information Convert form instructions to accessible pages
Historical archives Digitise legacy documents for preservation

Tip: Always review converted content before publishing. The AI does excellent work but may need minor adjustments, especially for complex tables or unusual formatting.