Youtube video transcript summarizer

Find a file

Clinton Billedeaux 8978a1fb59 chore: add local agent directory to .gitignore		2026-05-18 11:03:07 -05:00
.gitignore	chore: add local agent directory to .gitignore	2026-05-18 11:03:07 -05:00
AppSettings.cs	feat: initialize YouTube summarizer project with OpenAI integration and map-reduce processing strategy	2026-05-18 11:00:15 -05:00
appsettings.json	feat: initialize YouTube summarizer project with OpenAI integration and map-reduce processing strategy	2026-05-18 11:00:15 -05:00
ConsoleRenderer.cs	feat: initialize YouTube summarizer project with OpenAI integration and map-reduce processing strategy	2026-05-18 11:00:15 -05:00
Program.cs	feat: initialize YouTube summarizer project with OpenAI integration and map-reduce processing strategy	2026-05-18 11:00:15 -05:00
README.md	feat: initialize YouTube summarizer project with OpenAI integration and map-reduce processing strategy	2026-05-18 11:00:15 -05:00
summarize.sln	feat: initialize YouTube summarizer project with OpenAI integration and map-reduce processing strategy	2026-05-18 11:00:15 -05:00
SummarizerService.cs	feat: initialize YouTube summarizer project with OpenAI integration and map-reduce processing strategy	2026-05-18 11:00:15 -05:00
TranscriptFileService.cs	feat: initialize YouTube summarizer project with OpenAI integration and map-reduce processing strategy	2026-05-18 11:00:15 -05:00
VideoModels.cs	feat: initialize YouTube summarizer project with OpenAI integration and map-reduce processing strategy	2026-05-18 11:00:15 -05:00
YouTubeService.cs	feat: initialize YouTube summarizer project with OpenAI integration and map-reduce processing strategy	2026-05-18 11:00:15 -05:00
YoutubeSummarizer.csproj	feat: initialize YouTube summarizer project with OpenAI integration and map-reduce processing strategy	2026-05-18 11:00:15 -05:00

README.md

YouTube Video Summarizer

A .NET 8 console application that fetches YouTube video transcripts and produces structured summaries using an LLM (Ollama or OpenAI).

Prerequisites

.NET 8 SDK
A YouTube Data API v3 key → Google Cloud Console
Local Ollama (Recommended) or an OpenAI API key.

Setup

# 1. Clone / copy the project
cd YoutubeSummarizer

# 2. Copy the example config and fill in your keys
cp appsettings.example.json appsettings.json
nano appsettings.json   # or your editor of choice

# 3. Restore packages
dotnet restore

# 4. Run
dotnet run

Google Cloud Setup (YouTube API Key)

Go to console.cloud.google.com
Create or select a project
APIs & Services → Library → search "YouTube Data API v3" → Enable
APIs & Services → Credentials → Create Credentials → API key
(Optional but recommended) Restrict the key to only the YouTube Data API v3

Free quota: 10,000 units/day. Each video lookup costs ~3 units. You can summarize thousands of videos before hitting the limit.

Configuration Reference

Key	Description	Default
`YouTube:ApiKey`	Your YouTube Data API v3 key	(required)
`LLM:BaseUrl`	API endpoint	`http://localhost:11434/v1`
`LLM:ApiKey`	API key (any for Ollama)	`ollama`
`LLM:Model`	Chat model to use	`qwen3:14b`
`LLM:MaxTokens`	Max tokens in summary response	`1500`
`LLM:TimeoutSeconds`	Max time for LLM generation	`300`
`Summarizer:ChunkWordLimit`	Words per chunk for long videos	`3000`
`Summarizer:ShowTranscript`	Print raw transcript before summary	`false`

Architecture

Program.cs
│  Main loop → parses URL → calls pipeline
│
├── YouTubeService
│     ├── ExtractVideoId()      — URL parsing
│     ├── GetVideoMetadataAsync() — YouTube Data API v3 (Videos.list)
│     └── GetTranscriptAsync()   — Caption list + timedtext download
│
├── SummarizerService
│     ├── SummarizeAsync()      — Routes to single-pass or chunked
│     ├── SinglePassSummarize() — One OpenAI call for short videos
│     └── ChunkedSummarize()    — Map-reduce for long videos
│
└── ConsoleRenderer             — All terminal output / formatting

Caption Quality Transparency

The app tracks how the transcript was obtained and flags it accordingly:

Source	Label	Warning shown?
Owner-published captions	`✓ Owner-published`	No
Community-contributed	`✓ Community captions`	Minor note
Auto-generated (ASR)	`~ Auto-generated`	Yes — accuracy caveat
No captions (metadata only)	`✗ Metadata only`	Yes — limited accuracy

Long Video Strategy

Videos with transcripts exceeding ChunkWordLimit words use a map-reduce approach:

Split — transcript divided into overlapping chunks (200-word overlap preserves context at boundaries)
Map — each chunk summarized independently
Reduce — chunk summaries combined into a final coherent summary

This handles hour-long lectures, conference talks, and podcasts without hitting model context limits.

Environment Variable Overrides

You can override appsettings.json values with environment variables, useful for CI or Docker:

export YouTube__ApiKey="your-key"
export LLM__ApiKey="ollama"
dotnet run

Note the double-underscore __ as the section separator (standard .NET configuration convention).