Step 3: Generate Audio - Text-to-Speech Walkthrough
Convert announcement text to speech with different neural voices
Great! You've deployed the demo
Now let's walk through what you just deployed and see it in action.
Start WalkthroughChoose your next step
Generate Evidence Pack
Create your business case documentation with what you've learned.
Generate Evidence PackWalkthrough progress
Step 3 of 4 • 3 minutes
Generate Audio with Different Voices
Convert your chosen announcement to audio using three UK English neural voices to compare their styles.
Screenshot updating - please check back soon
Screenshot updating - please check back soon
Expected outcome
- Generated audio with Amy (conversational voice)
- Generated audio with Brian (authoritative voice)
- Generated audio with Emma (formal voice)
- Heard the difference in voice styles
- Understand which voice suits which announcement type
Generate with three voices
Amazon Polly offers three UK English neural voices, each with a distinct style. Let's generate audio with all three to compare:
1 Amy - Conversational & Friendly
- In the Polly console, verify your announcement text is pasted in the input area
- Set "Engine" to Neural
- Set "Language" to British English (en-GB)
- Select voice Amy from the dropdown
- Click the orange "Listen" button
- Audio generates in 5-10 seconds - listen to the full announcement
What to notice: Amy's tone is warm and approachable. She sounds like a friendly council officer explaining information in person. Best for routine service updates and general community announcements.
2 Brian - Authoritative & Clear
- Keep the same announcement text in the input area
- Change voice to Brian from the dropdown
- Click "Listen" again
- Listen to the full announcement with Brian's voice
What to notice: Brian's tone is more formal and commanding. He conveys authority and importance. Best for emergency alerts, weather warnings, and urgent safety announcements where residents need to take immediate action.
3 Emma - Formal & Professional
- Keep the same announcement text in the input area
- Change voice to Emma from the dropdown
- Click "Listen" once more
- Listen to the full announcement with Emma's voice
What to notice: Emma's tone is precise and professional. She sounds official and measured. Best for committee announcements, planning notices, formal consultations, and legal statements where gravitas and accuracy matter.
Hear the Neural Voice Quality
These are not the robotic computer voices of the past. Amazon Polly neural engine uses deep learning to generate speech that sounds remarkably human. Notice the natural inflection, breathing patterns, and emotional tone appropriate for each voice. Amy sounds warm for service updates. Brian sounds urgent for emergency alerts. Emma sounds professional for formal notices. Residents listening to these announcements will not realize it is AI-generated - the quality rivals professional voice recording studios.
Technical detail
Voice comparison guide
Understanding which voice to use for different announcements:
| Announcement Type | Recommended Voice | Why This Voice? |
|---|---|---|
| Bin collection changes | Amy | Friendly, conversational - puts residents at ease |
| Library hours | Amy | Welcoming tone encourages community engagement |
| Park events | Amy | Warm delivery makes events sound appealing |
| Weather warnings | Brian | Authoritative tone conveys urgency and seriousness |
| Road closures (urgent) | Brian | Clear, commanding - ensures message is heard |
| Emergency alerts | Brian | Inspires confidence and prompts immediate action |
| Planning committee notices | Emma | Formal tone matches legal/statutory requirements |
| Council meeting agendas | Emma | Professional delivery conveys official business |
| Legal consultations | Emma | Precise pronunciation ensures accuracy in formal contexts |
Before and after comparison
Before: Studio Recording
- Process: Book voice talent, schedule recording session, brief talent, record multiple takes, edit audio, approve final version
- Time: 1-2 weeks from brief to final audio
- Cost: £150-£300 per announcement (talent fee, studio time, editing)
- Revisions: £100 per re-recording if text changes
- Consistency: Variable - depends on talent availability and performance
- Result: Due to cost and time, councils rarely produce audio versions = non-compliance with PSBAR 2018
After: Polly Text-to-Speech
- Process: Paste text, select voice (Amy/Brian/Emma), click "Listen" to preview, click "Synthesize to S3" to save
- Time: 30 seconds from text to audio file
- Cost: £0.008 per typical announcement (500 characters)
- Revisions: £0.008 to regenerate with text changes (instant)
- Consistency: Perfect - same neural voice every time
- Result: 100% of announcements have audio versions = full WCAG 2.1 AA compliance
Time savings: 1-2 weeks reduced to 30 seconds = 99.98% faster.
Cost savings: £200 reduced to £0.008 = 99.996% cheaper.
Compliance improvement: 0% audio coverage to 100% = full accessibility.
Understanding audio quality
Neural vs Standard engine
You may have noticed the "Engine" dropdown offers "Standard" as well as "Neural". Here's the difference:
- Neural engine
-
Deep learning generates natural speech with emotion, breathing patterns, and inflection. Sounds like a real person. £0.016 per 1000 characters.
Recommendation: Always use for public-facing content.
- Standard engine
-
Older concatenative synthesis. Sounds robotic and unnatural. £0.004 per 1000 characters (4× cheaper).
Use case: Only for internal testing or non-public content where cost matters more than quality.
Troubleshooting
Audio sounds robotic, not natural
If voice quality is poor:
- Check "Engine" is set to Neural not Standard
- Verify you selected Amy, Brian, or Emma (UK neural voices)
- If using Joanna, Kendra, Matthew, etc. - these are US voices or older standard voices
- Ensure "Language" is set to "British English (en-GB)" not "English (US)"
Polly mispronounces place names or council terms
If specific words are mispronounced:
- Use SSML phoneme tags to specify pronunciation (advanced feature)
- Example: <phoneme alphabet='ipa' ph='rɛdɪŋ'>Reading</phoneme> for the town name
- Create pronunciation lexicon for common council-specific terms (upload to Polly)
- Spell out abbreviations if needed (NHS as "N H S" instead of "nuss")
- Most UK place names and postcodes are pronounced correctly by default
"Listen" button takes a long time (>30 seconds)
If audio generation is slow:
- Check announcement length - very long text (>2000 chars) takes longer
- Network latency may delay audio streaming - check internet connection
- AWS region may be experiencing high load - try again in a few minutes
- Typical generation time: 5-15 seconds for 500-character announcement
Once you've heard all three voices, you're ready to compare them side-by-side and calculate ROI.