Youtube video transcript summarizer
Find a file
2026-05-25 00:57:52 -05:00
.gitignore chore: add local agent directory to .gitignore 2026-05-18 11:03:07 -05:00
AppSettings.cs feat: initialize YouTube summarizer project with OpenAI integration and map-reduce processing strategy 2026-05-18 11:00:15 -05:00
appsettings.json feat: initialize YouTube summarizer project with OpenAI integration and map-reduce processing strategy 2026-05-18 11:00:15 -05:00
ConsoleRenderer.cs feat: initialize YouTube summarizer project with OpenAI integration and map-reduce processing strategy 2026-05-18 11:00:15 -05:00
Program.cs feat: initialize YouTube summarizer project with OpenAI integration and map-reduce processing strategy 2026-05-18 11:00:15 -05:00
README.md Update README.md 2026-05-25 00:57:52 -05:00
summarize.sln feat: initialize YouTube summarizer project with OpenAI integration and map-reduce processing strategy 2026-05-18 11:00:15 -05:00
SummarizerService.cs feat: initialize YouTube summarizer project with OpenAI integration and map-reduce processing strategy 2026-05-18 11:00:15 -05:00
TranscriptFileService.cs feat: initialize YouTube summarizer project with OpenAI integration and map-reduce processing strategy 2026-05-18 11:00:15 -05:00
VideoModels.cs feat: initialize YouTube summarizer project with OpenAI integration and map-reduce processing strategy 2026-05-18 11:00:15 -05:00
YouTubeService.cs feat: initialize YouTube summarizer project with OpenAI integration and map-reduce processing strategy 2026-05-18 11:00:15 -05:00
YoutubeSummarizer.csproj feat: initialize YouTube summarizer project with OpenAI integration and map-reduce processing strategy 2026-05-18 11:00:15 -05:00

YouTube Video Summarizer

A .NET 8 console application that fetches YouTube video transcripts and produces structured summaries using an LLM (Ollama or OpenAI).


Prerequisites


Setup

# 1. Clone / copy the project
cd YoutubeSummarizer

# 2. Copy the example config and fill in your keys
cp appsettings.example.json appsettings.json
nano appsettings.json   # or your editor of choice

# 3. Restore packages
dotnet restore

# 4. Run
dotnet run

Google Cloud Setup (YouTube API Key)

  1. Go to console.cloud.google.com
  2. Create or select a project
  3. APIs & Services → Library → search "YouTube Data API v3" → Enable
  4. APIs & Services → Credentials → Create Credentials → API key
  5. (Optional but recommended) Restrict the key to only the YouTube Data API v3

Free quota: 10,000 units/day. Each video lookup costs ~3 units. You can summarize thousands of videos before hitting the limit.


Configuration Reference

Key Description Default
YouTube:ApiKey Your YouTube Data API v3 key (required)
LLM:BaseUrl API endpoint http://localhost:11434/v1
LLM:ApiKey API key (any for Ollama) ollama
LLM:Model Chat model to use qwen3:14b
LLM:MaxTokens Max tokens in summary response 1500
LLM:TimeoutSeconds Max time for LLM generation 300
Summarizer:ChunkWordLimit Words per chunk for long videos 3000
Summarizer:ShowTranscript Print raw transcript before summary false

Architecture

Program.cs
│  Main loop → parses URL → calls pipeline
│
├── YouTubeService
│     ├── ExtractVideoId()      — URL parsing
│     ├── GetVideoMetadataAsync() — YouTube Data API v3 (Videos.list)
│     └── GetTranscriptAsync()   — Caption list + timedtext download
│
├── SummarizerService
│     ├── SummarizeAsync()      — Routes to single-pass or chunked
│     ├── SinglePassSummarize() — One OpenAI call for short videos
│     └── ChunkedSummarize()    — Map-reduce for long videos
│
└── ConsoleRenderer             — All terminal output / formatting

Caption Quality Transparency

The app tracks how the transcript was obtained and flags it accordingly:

Source Label Warning shown?
Owner-published captions ✓ Owner-published No
Community-contributed ✓ Community captions Minor note
Auto-generated (ASR) ~ Auto-generated Yes — accuracy caveat
No captions (metadata only) ✗ Metadata only Yes — limited accuracy

Long Video Strategy

Videos with transcripts exceeding ChunkWordLimit words use a map-reduce approach:

  1. Split — transcript divided into overlapping chunks (200-word overlap preserves context at boundaries)
  2. Map — each chunk summarized independently
  3. Reduce — chunk summaries combined into a final coherent summary

This handles hour-long lectures, conference talks, and podcasts without hitting model context limits.


Environment Variable Overrides

You can override appsettings.json values with environment variables, useful for CI or Docker:

export YouTube__ApiKey="your-key"
export LLM__ApiKey="ollama"
dotnet run

Note the double-underscore __ as the section separator (standard .NET configuration convention).