May 25, 2026

How to Build a Custom YouTube Transcript Fetcher and Video Downloader in Python

As developers, we often need to build automation tools that extract content from YouTube—whether for AI summarization engines, video analysis, or automated archive pipelines. While the official YouTube Data API v3 is powerful, it has strict quota limits and requires API keys, making it suboptimal for lightweight scripts and developer toolsets.

Fortunately, we can build a robust, production-grade media downloader and subtitle extractor using open-source packages like youtube-transcript-api and yt-dlp.

In this guide, we will design and implement two production-ready automation scripts:

transcript_fetcher.py: Extracts video transcripts, handles multiple language codes, and auto-falls back to auto-generated captions.
video_downloader.py: Downloads video or high-quality audio files with custom format selection.

🏗️ High-Level Architecture

Our automation pipeline behaves like a modular flow, allowing scripts to be executed independently or piped into downstream systems (like LLM summarizers):

+------------------+       +----------------------------+       +------------------------+
|   YouTube URL/ID |  ==>  |  transcript_fetcher.py     |  ==>  | Output: JSON/TXT       |
|                  |       |  (youtube-transcript-api)  |       | Transcript Data        |
+------------------+       +----------------------------+       +------------------------+
         ||
         \/
+----------------------------+       +------------------------+
|    video_downloader.py     |  ==>  | Output: .mp4 / .mp3    |
|         (yt-dlp)           |       | Media Files            |
+----------------------------+       +------------------------+

🛠️ Step-by-Step Implementation

Step 1: Installing Dependencies

Create a requirements.txt file specifying the stable versions of the required libraries:

# requirements.txt
youtube-transcript-api>=0.6.2
yt-dlp>=2025.01.15

Install them using pip:

pip install -r requirements.txt

Step 2: Coding the Transcript Fetcher

Our fetcher needs to extract the unique video ID from standard watch URLs, short links, or share parameters, and query the transcripts API.

Create transcript_fetcher.py:

# transcript_fetcher.py
import sys
import re
from youtube_transcript_api import YouTubeTranscriptApi, TranscriptsDisabled, NoTranscriptFound

def extract_video_id(url_or_id: str) -> str:
    """
    Extracts the 11-character YouTube video ID from a URL or raw ID string.
    """
    pattern = r"(?:v=|\/shorts\/|\/embed\/|\/v\/|youtu\.be\/|\/watch\?v=)?([a-zA-Z0-9_-]{11})"
    match = re.search(pattern, url_or_id)
    if match:
        return match.group(1)
    raise ValueError(f"Could not extract YouTube video ID from input: {url_or_id}")

def fetch_transcript(video_id: str, languages=["en"]) -> list:
    """
    Retrieves the transcript list for a given YouTube video ID.
    Attempts manual transcripts first, then falls back to auto-generated.
    """
    try:
        transcript_list = YouTubeTranscriptApi.list_transcripts(video_id)
        
        # Try fetching manually created transcript
        try:
            return transcript_list.find_manually_created_transcript(languages).fetch()
        except NoTranscriptFound:
            # Fallback to auto-generated transcripts
            return transcript_list.find_generated_transcript(languages).fetch()
            
    except TranscriptsDisabled:
        print(f"[!] Error: Transcripts are disabled for video {video_id}")
        return []
    except Exception as e:
        print(f"[!] Unexpected error fetching transcript: {e}")
        return []

if __name__ == "__main__":
    if len(sys.argv) < 2:
        print("Usage: python transcript_fetcher.py <youtube_url_or_id>")
        sys.exit(1)
        
    input_str = sys.argv[1]
    try:
        vid_id = extract_video_id(input_str)
        print(f"[*] Fetching transcript for Video ID: {vid_id}")
        
        transcript_data = fetch_transcript(vid_id)
        if transcript_data:
            # Join text snippets
            full_text = " ".join([entry['text'] for entry in transcript_data])
            print("\n--- Transcript ---")
            print(full_text[:500] + "...\n[Truncated]")
        else:
            print("[!] No transcript found.")
    except Exception as e:
        print(f"[!] Execution failed: {e}")

Step 3: Coding the Media Downloader

Now, we build the media downloader utilizing yt-dlp. It has built-in support for format selectors, letting us download either full high-definition video or extract audio to MP3s.

Create video_downloader.py:

# video_downloader.py
import sys
import argparse
import yt_dlp

def download_video(url: str, audio_only: bool = False, output_dir: str = "downloads"):
    """
    Downloads a YouTube video using yt-dlp.
    Supports video download or audio-only extraction.
    """
    ydl_opts = {
        'outtmpl': f'{output_dir}/%(title)s.%(ext)s',
        'quiet': False,
        'no_warnings': True,
    }
    
    if audio_only:
        ydl_opts.update({
            'format': 'bestaudio/best',
            'postprocessors': [{
                'key': 'FFmpegExtractAudio',
                'preferredcodec': 'mp3',
                'preferredquality': '192',
            }],
        })
    else:
        # Downloads best video + best audio merged (requires FFmpeg)
        ydl_opts.update({
            'format': 'bestvideo+bestaudio/best',
        })
        
    with yt_dlp.YoutubeDL(ydl_opts) as ydl:
        try:
            print(f"[*] Starting download for: {url}")
            ydl.download([url])
            print("[*] Download completed successfully!")
        except Exception as e:
            print(f"[!] Download failed: {e}")

if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="Download YouTube videos or audio tracks.")
    parser.add_argument("url", help="YouTube video or shorts URL.")
    parser.add_argument("--audio-only", action="store_true", help="Extract audio to MP3.")
    args = parser.parse_args()
    
    download_video(args.url, audio_only=args.audio_only)

🧪 Verification & Testing

Let’s verify both scripts on a popular video (e.g., Rick Astley’s classic):

# 1. Test the Transcript Fetcher
python transcript_fetcher.py "https://www.youtube.com/watch?v=dQw4w9WgXcQ"

# 2. Test the Audio Downloader
python video_downloader.py "https://www.youtube.com/watch?v=dQw4w9WgXcQ" --audio-only

You should see:

The terminal printing the initial transcript sentences retrieved dynamically from YouTube’s server.
A newly created downloads/ directory containing a high-quality .mp3 audio conversion.

🚀 Conclusion

By designing our automation scripts with standard modules and robust error fallbacks, we’ve successfully bypassed the need for complex API integrations. These scripts can serve as the foundational backend for custom AI agents, automated clip generators, and content backup routines.

What automation features will you build next? Let me know in the comments below!