# YouTube Video Summarizer A .NET 8 console application that fetches YouTube video transcripts and produces structured summaries using an LLM (Ollama or OpenAI). --- ## Prerequisites - [.NET 8 SDK](https://dotnet.microsoft.com/download) - A **YouTube Data API v3** key → [Google Cloud Console](https://console.cloud.google.com) - **Local Ollama** (Recommended) or an **OpenAI API key**. --- ## Setup ```bash # 1. Clone / copy the project cd YoutubeSummarizer # 2. Copy the example config and fill in your keys cp appsettings.example.json appsettings.json nano appsettings.json # or your editor of choice # 3. Restore packages dotnet restore # 4. Run dotnet run ``` --- ## Google Cloud Setup (YouTube API Key) 1. Go to [console.cloud.google.com](https://console.cloud.google.com) 2. Create or select a project 3. **APIs & Services → Library** → search "YouTube Data API v3" → Enable 4. **APIs & Services → Credentials → Create Credentials → API key** 5. (Optional but recommended) Restrict the key to only the YouTube Data API v3 > Free quota: **10,000 units/day**. Each video lookup costs ~3 units. You can summarize thousands of videos before hitting the limit. --- ## Configuration Reference | Key | Description | Default | |---|---|---| | `YouTube:ApiKey` | Your YouTube Data API v3 key | *(required)* | | `LLM:BaseUrl` | API endpoint | `http://localhost:11434/v1` | | `LLM:ApiKey` | API key (any for Ollama) | `ollama` | | `LLM:Model` | Chat model to use | `qwen3:14b` | | `LLM:MaxTokens` | Max tokens in summary response | `1500` | | `LLM:TimeoutSeconds` | Max time for LLM generation | `300` | | `Summarizer:ChunkWordLimit` | Words per chunk for long videos | `3000` | | `Summarizer:ShowTranscript` | Print raw transcript before summary | `false` | --- ## Architecture ``` Program.cs │ Main loop → parses URL → calls pipeline │ ├── YouTubeService │ ├── ExtractVideoId() — URL parsing │ ├── GetVideoMetadataAsync() — YouTube Data API v3 (Videos.list) │ └── GetTranscriptAsync() — Caption list + timedtext download │ ├── SummarizerService │ ├── SummarizeAsync() — Routes to single-pass or chunked │ ├── SinglePassSummarize() — One OpenAI call for short videos │ └── ChunkedSummarize() — Map-reduce for long videos │ └── ConsoleRenderer — All terminal output / formatting ``` ### Caption Quality Transparency The app tracks how the transcript was obtained and flags it accordingly: | Source | Label | Warning shown? | |---|---|---| | Owner-published captions | `✓ Owner-published` | No | | Community-contributed | `✓ Community captions` | Minor note | | Auto-generated (ASR) | `~ Auto-generated` | Yes — accuracy caveat | | No captions (metadata only) | `✗ Metadata only` | Yes — limited accuracy | ### Long Video Strategy Videos with transcripts exceeding `ChunkWordLimit` words use a **map-reduce** approach: 1. **Split** — transcript divided into overlapping chunks (200-word overlap preserves context at boundaries) 2. **Map** — each chunk summarized independently 3. **Reduce** — chunk summaries combined into a final coherent summary This handles hour-long lectures, conference talks, and podcasts without hitting model context limits. --- ## Environment Variable Overrides You can override `appsettings.json` values with environment variables, useful for CI or Docker: ```bash export YouTube__ApiKey="your-key" export LLM__ApiKey="ollama" dotnet run ``` Note the double-underscore `__` as the section separator (standard .NET configuration convention).