Skip to main content

This is a prototype vision of how a future government service could work. It's not a real service yet, but we're exploring what it could look like. Your feedback will help shape the real service.

Step 3: Generate Audio - Text-to-Speech Walkthrough

Convert announcement text to speech with different neural voices

Walkthrough progress

Step 3 of 4 • 3 minutes

Step 3 3 minutes

Generate Audio with Different Voices

Convert your chosen announcement to audio using three UK English neural voices to compare their styles.

Find your Text to Speech application URL in CloudFormation Outputs
The application is ready to convert text into natural speech

Expected outcome

  • Generated audio with Amy (conversational voice)
  • Generated audio with Brian (authoritative voice)
  • Generated audio with Emma (formal voice)
  • Heard the difference in voice styles
  • Understand which voice suits which announcement type

Generate with three voices

Amazon Polly offers three UK English neural voices, each with a distinct style. Let's generate audio with all three to compare:

1 Amy - Conversational & Friendly

  1. In the Polly console, verify your announcement text is pasted in the input area
  2. Set "Engine" to Neural
  3. Set "Language" to British English (en-GB)
  4. Select voice Amy from the dropdown
  5. Click the orange "Listen" button
  6. Audio generates in 5-10 seconds - listen to the full announcement

What to notice: Amy's tone is warm and approachable. She sounds like a friendly council officer explaining information in person. Best for routine service updates and general community announcements.

2 Brian - Authoritative & Clear

  1. Keep the same announcement text in the input area
  2. Change voice to Brian from the dropdown
  3. Click "Listen" again
  4. Listen to the full announcement with Brian's voice

What to notice: Brian's tone is more formal and commanding. He conveys authority and importance. Best for emergency alerts, weather warnings, and urgent safety announcements where residents need to take immediate action.

3 Emma - Formal & Professional

  1. Keep the same announcement text in the input area
  2. Change voice to Emma from the dropdown
  3. Click "Listen" once more
  4. Listen to the full announcement with Emma's voice

What to notice: Emma's tone is precise and professional. She sounds official and measured. Best for committee announcements, planning notices, formal consultations, and legal statements where gravitas and accuracy matter.

Hear the Neural Voice Quality

These are not the robotic computer voices of the past. Amazon Polly neural engine uses deep learning to generate speech that sounds remarkably human. Notice the natural inflection, breathing patterns, and emotional tone appropriate for each voice. Amy sounds warm for service updates. Brian sounds urgent for emergency alerts. Emma sounds professional for formal notices. Residents listening to these announcements will not realize it is AI-generated - the quality rivals professional voice recording studios.

Technical detail
Polly neural text-to-speech (NTTS) uses a sequence-to-sequence model trained on hours of native British English speakers. It generates speech by predicting audio waveforms directly from text, capturing prosody (rhythm, stress, intonation) that makes speech sound natural. The UK-specific training means Polly correctly pronounces British place names, postcodes, and council terminology - critical for local government communications.

Voice comparison guide

Understanding which voice to use for different announcements:

Voice selection guide by announcement type
Announcement Type Recommended Voice Why This Voice?
Bin collection changes Amy Friendly, conversational - puts residents at ease
Library hours Amy Welcoming tone encourages community engagement
Park events Amy Warm delivery makes events sound appealing
Weather warnings Brian Authoritative tone conveys urgency and seriousness
Road closures (urgent) Brian Clear, commanding - ensures message is heard
Emergency alerts Brian Inspires confidence and prompts immediate action
Planning committee notices Emma Formal tone matches legal/statutory requirements
Council meeting agendas Emma Professional delivery conveys official business
Legal consultations Emma Precise pronunciation ensures accuracy in formal contexts

Before and after comparison

Before: Studio Recording

  • Process: Book voice talent, schedule recording session, brief talent, record multiple takes, edit audio, approve final version
  • Time: 1-2 weeks from brief to final audio
  • Cost: £150-£300 per announcement (talent fee, studio time, editing)
  • Revisions: £100 per re-recording if text changes
  • Consistency: Variable - depends on talent availability and performance
  • Result: Due to cost and time, councils rarely produce audio versions = non-compliance with PSBAR 2018

After: Polly Text-to-Speech

  • Process: Paste text, select voice (Amy/Brian/Emma), click "Listen" to preview, click "Synthesize to S3" to save
  • Time: 30 seconds from text to audio file
  • Cost: £0.008 per typical announcement (500 characters)
  • Revisions: £0.008 to regenerate with text changes (instant)
  • Consistency: Perfect - same neural voice every time
  • Result: 100% of announcements have audio versions = full WCAG 2.1 AA compliance

Time savings: 1-2 weeks reduced to 30 seconds = 99.98% faster.
Cost savings: £200 reduced to £0.008 = 99.996% cheaper.
Compliance improvement: 0% audio coverage to 100% = full accessibility.

Understanding audio quality

Neural vs Standard engine

You may have noticed the "Engine" dropdown offers "Standard" as well as "Neural". Here's the difference:

Neural engine

Deep learning generates natural speech with emotion, breathing patterns, and inflection. Sounds like a real person. £0.016 per 1000 characters.

Recommendation: Always use for public-facing content.

Standard engine

Older concatenative synthesis. Sounds robotic and unnatural. £0.004 per 1000 characters (4× cheaper).

Use case: Only for internal testing or non-public content where cost matters more than quality.

Troubleshooting

Audio sounds robotic, not natural

If voice quality is poor:

  • Check "Engine" is set to Neural not Standard
  • Verify you selected Amy, Brian, or Emma (UK neural voices)
  • If using Joanna, Kendra, Matthew, etc. - these are US voices or older standard voices
  • Ensure "Language" is set to "British English (en-GB)" not "English (US)"
Polly mispronounces place names or council terms

If specific words are mispronounced:

  • Use SSML phoneme tags to specify pronunciation (advanced feature)
  • Example: <phoneme alphabet='ipa' ph='rɛdɪŋ'>Reading</phoneme> for the town name
  • Create pronunciation lexicon for common council-specific terms (upload to Polly)
  • Spell out abbreviations if needed (NHS as "N H S" instead of "nuss")
  • Most UK place names and postcodes are pronounced correctly by default
"Listen" button takes a long time (>30 seconds)

If audio generation is slow:

  • Check announcement length - very long text (>2000 chars) takes longer
  • Network latency may delay audio streaming - check internet connection
  • AWS region may be experiencing high load - try again in a few minutes
  • Typical generation time: 5-15 seconds for 500-character announcement

Once you've heard all three voices, you're ready to compare them side-by-side and calculate ROI.