Architecture - Text to Speech

Visual and console tours of the text-to-speech pipeline

Enable JavaScript for journey tracking

The phase navigator requires JavaScript to track your progress. The navigation links will still work, but your progress won't be saved.

Architecture - Text to Speech

Visual and console tours of the text-to-speech pipeline

Simplify for me

Hide technical details

How Text-to-Speech Works

Choose your path: a visual tour with diagrams, or a hands-on console tour exploring the AWS infrastructure.

Visual Architecture Tour

Understand the text-to-speech synthesis pipeline

5 minutes

Text Input

User provides text content, optionally with SSML markup. Voice and engine selection determine synthesis parameters.
Amazon Polly Processing

Polly receives the request, selects the appropriate neural or standard engine, and synthesizes audio using deep learning models.
Audio Generation

Speech is generated as an audio stream in MP3 format. Neural voices use more compute but produce more natural results.
S3 Storage

Generated audio is stored in S3 with appropriate cache headers. Repeated requests for the same content serve from cache.
Presigned URL

A time-limited URL is generated for secure audio playback without exposing S3 directly.
Audio Playback

The HTML5 audio player streams content to the user, with download option for offline accessibility.

Console Architecture Tour

Explore the actual AWS resources powering text-to-speech

8 minutes Requires deployed stack

Amazon Polly Console

What to look for: Available voices, neural engine options, SSML examples, usage metrics

Open AWS Console
S3 Audio Bucket

What to look for: Audio file storage, cache headers, file naming patterns, storage costs

Open AWS Console
Lambda Function

What to look for: Synthesis function code, environment variables for voice selection, CloudWatch logs

Open AWS Console
CloudWatch Metrics

What to look for: Polly character count, synthesis latency, error rates

Open AWS Console

Next steps

Test the Limits

Long text, non-English, and special characters.

Try challenges

Production Guidance

Learn what changes for accessibility integration.

View guidance

Architecture - Text to Speech

Great! You've deployed the demo

Choose your next step

Generate Evidence Pack

Go Deeper Optional

Architecture - Text to Speech

Essential exploration complete

How Text-to-Speech Works

Architecture Tours

Visual Architecture Tour

Text Input

Amazon Polly Processing

Audio Generation

S3 Storage

Presigned URL

Audio Playback

Console Architecture Tour

Amazon Polly Console

S3 Audio Bucket

Lambda Function

CloudWatch Metrics

Next steps

Test the Limits

Production Guidance

Cookies on NDX:Try AWS

Great! You've deployed the demo

Choose your next step

Generate Evidence Pack

Go Deeper Optional

Architecture - Text to Speech

Essential exploration complete

How Text-to-Speech Works

Architecture Tours

Visual Architecture Tour

Text Input

Amazon Polly Processing

Audio Generation

S3 Storage

Presigned URL

Audio Playback

Console Architecture Tour

Amazon Polly Console

S3 Audio Bucket

Lambda Function

CloudWatch Metrics

Next steps

Test the Limits

Production Guidance