Step 3: Generate Audio - Text-to-Speech Walkthrough

Convert announcement text to speech with different neural voices

Enable JavaScript for journey tracking

The phase navigator requires JavaScript to track your progress. The navigation links will still work, but your progress won't be saved.

Step 3 of 4 • 3 minutes

Generate Audio with Different Voices

Convert your chosen announcement to audio using three UK English neural voices to compare their styles.

CloudFormation stack outputs showing the Text to Speech application URL — Find your Text to Speech application URL in CloudFormation Outputs

Text to Speech web interface with text input area and voice selection — The application is ready to convert text into natural speech

Expected outcome

Generated audio with Amy (conversational voice)
Generated audio with Brian (authoritative voice)
Generated audio with Emma (formal voice)
Heard the difference in voice styles
Understand which voice suits which announcement type

Generate with three voices

Amazon Polly offers three UK English neural voices, each with a distinct style. Let's generate audio with all three to compare:

1 Amy - Conversational & Friendly

In the Polly console, verify your announcement text is pasted in the input area
Set "Engine" to Neural
Set "Language" to British English (en-GB)
Select voice Amy from the dropdown
Click the orange "Listen" button
Audio generates in 5-10 seconds - listen to the full announcement

What to notice: Amy's tone is warm and approachable. She sounds like a friendly council officer explaining information in person. Best for routine service updates and general community announcements.

2 Brian - Authoritative & Clear

Keep the same announcement text in the input area
Change voice to Brian from the dropdown
Click "Listen" again
Listen to the full announcement with Brian's voice

What to notice: Brian's tone is more formal and commanding. He conveys authority and importance. Best for emergency alerts, weather warnings, and urgent safety announcements where residents need to take immediate action.

3 Emma - Formal & Professional

Keep the same announcement text in the input area
Change voice to Emma from the dropdown
Click "Listen" once more
Listen to the full announcement with Emma's voice

What to notice: Emma's tone is precise and professional. She sounds official and measured. Best for committee announcements, planning notices, formal consultations, and legal statements where gravitas and accuracy matter.

Hear the Neural Voice Quality

These are not the robotic computer voices of the past. Amazon Polly neural engine uses deep learning to generate speech that sounds remarkably human. Notice the natural inflection, breathing patterns, and emotional tone appropriate for each voice. Amy sounds warm for service updates. Brian sounds urgent for emergency alerts. Emma sounds professional for formal notices. Residents listening to these announcements will not realize it is AI-generated - the quality rivals professional voice recording studios.

Technical detail

Polly neural text-to-speech (NTTS) uses a sequence-to-sequence model trained on hours of native British English speakers. It generates speech by predicting audio waveforms directly from text, capturing prosody (rhythm, stress, intonation) that makes speech sound natural. The UK-specific training means Polly correctly pronounces British place names, postcodes, and council terminology - critical for local government communications.

Voice comparison guide

Understanding which voice to use for different announcements:

Voice selection guide by announcement type
Announcement Type	Recommended Voice	Why This Voice?
Bin collection changes	Amy	Friendly, conversational - puts residents at ease
Library hours	Amy	Welcoming tone encourages community engagement
Park events	Amy	Warm delivery makes events sound appealing
Weather warnings	Brian	Authoritative tone conveys urgency and seriousness
Road closures (urgent)	Brian	Clear, commanding - ensures message is heard
Emergency alerts	Brian	Inspires confidence and prompts immediate action
Planning committee notices	Emma	Formal tone matches legal/statutory requirements
Council meeting agendas	Emma	Professional delivery conveys official business
Legal consultations	Emma	Precise pronunciation ensures accuracy in formal contexts

Before and after comparison

Before: Studio Recording

Process: Book voice talent, schedule recording session, brief talent, record multiple takes, edit audio, approve final version
Time: 1-2 weeks from brief to final audio
Cost: £150-£300 per announcement (talent fee, studio time, editing)
Revisions: £100 per re-recording if text changes
Consistency: Variable - depends on talent availability and performance
Result: Due to cost and time, councils rarely produce audio versions = non-compliance with PSBAR 2018

After: Polly Text-to-Speech

Process: Paste text, select voice (Amy/Brian/Emma), click "Listen" to preview, click "Synthesize to S3" to save
Time: 30 seconds from text to audio file
Cost: £0.008 per typical announcement (500 characters)
Revisions: £0.008 to regenerate with text changes (instant)
Consistency: Perfect - same neural voice every time
Result: 100% of announcements have audio versions = full WCAG 2.1 AA compliance

Time savings: 1-2 weeks reduced to 30 seconds = 99.98% faster.
Cost savings: £200 reduced to £0.008 = 99.996% cheaper.
Compliance improvement: 0% audio coverage to 100% = full accessibility.

Understanding audio quality

Neural vs Standard engine

You may have noticed the "Engine" dropdown offers "Standard" as well as "Neural". Here's the difference:

Neural engine

Deep learning generates natural speech with emotion, breathing patterns, and inflection. Sounds like a real person. £0.016 per 1000 characters.

Recommendation: Always use for public-facing content.

Standard engine

Older concatenative synthesis. Sounds robotic and unnatural. £0.004 per 1000 characters (4× cheaper).

Use case: Only for internal testing or non-public content where cost matters more than quality.

Troubleshooting

Audio sounds robotic, not natural

If voice quality is poor:

Check "Engine" is set to Neural not Standard
Verify you selected Amy, Brian, or Emma (UK neural voices)
If using Joanna, Kendra, Matthew, etc. - these are US voices or older standard voices
Ensure "Language" is set to "British English (en-GB)" not "English (US)"

Polly mispronounces place names or council terms

If specific words are mispronounced:

Use SSML phoneme tags to specify pronunciation (advanced feature)
Example: <phoneme alphabet='ipa' ph='rɛdɪŋ'>Reading</phoneme> for the town name
Create pronunciation lexicon for common council-specific terms (upload to Polly)
Spell out abbreviations if needed (NHS as "N H S" instead of "nuss")
Most UK place names and postcodes are pronounced correctly by default

"Listen" button takes a long time (>30 seconds)

If audio generation is slow:

Check announcement length - very long text (>2000 chars) takes longer
Network latency may delay audio streaming - check internet connection
AWS region may be experiencing high load - try again in a few minutes
Typical generation time: 5-15 seconds for 500-character announcement

Once you've heard all three voices, you're ready to compare them side-by-side and calculate ROI.

Step 3: Generate Audio - Text-to-Speech Walkthrough

Great! You've deployed the demo

Choose your next step

Generate Evidence Pack

Go Deeper Optional

Generate Audio with Different Voices

Expected outcome

Generate with three voices

1 Amy - Conversational & Friendly

2 Brian - Authoritative & Clear

3 Emma - Formal & Professional

Hear the Neural Voice Quality

Voice comparison guide

Before and after comparison

Before: Studio Recording

After: Polly Text-to-Speech

Understanding audio quality

Neural vs Standard engine

Troubleshooting

Need help?

Cookies on NDX:Try AWS

Great! You've deployed the demo

Choose your next step

Generate Evidence Pack

Go Deeper Optional

Walkthrough progress

Generate Audio with Different Voices

Expected outcome

Generate with three voices

1 Amy - Conversational & Friendly

2 Brian - Authoritative & Clear

3 Emma - Formal & Professional

Hear the Neural Voice Quality

Voice comparison guide

Before and after comparison

Before: Studio Recording

After: Polly Text-to-Speech

Understanding audio quality

Neural vs Standard engine

Troubleshooting

Need help?