teppei-scraper/README.md

# Nihongo Con Teppei Downloader

A Python script to download audio files and subtitles from the Nihongo Con Teppei podcast.

## Features

- Download individual episodes or ranges of episodes
- Automatic file existence checking (skips existing files)
- Configurable delays between requests to avoid being banned
- Robust error handling with retry logic
- Progress tracking for batch downloads
- Clean, modular code structure

## Installation

1. Install the required dependencies:
   ```bash
   pip install -r requirements.txt
   ```

2. Make sure you have Chrome/Chromium installed (required for Selenium)

## Usage

### Download a single episode
```bash
python teppei.py 11 --download
```

### Download a range of episodes
```bash
python teppei.py --start 1 --end 20 --download
```

### Download to a specific directory
```bash
python teppei.py --start 11 --end 15 --download --output ./teppei_episodes
```

### Force re-download existing files
```bash
python teppei.py 11 --download --force
```

### Show URLs without downloading
```bash
python teppei.py 11
```

### Customize request delays and timeouts
```bash
python teppei.py --start 1 --end 10 --download --delay 3 --timeout 60
```

## Command Line Options

- `episode_num`: Episode number to download (for single episode mode)
- `--start`: Starting episode number for range download
- `--end`: Ending episode number for range download
- `--download, -d`: Download the files (if not specified, only show URLs)
- `--output, -o`: Output directory (default: current directory)
- `--force`: Force re-download even if files already exist
- `--delay`: Delay between requests in seconds (default: 2)
- `--timeout`: HTTP request timeout in seconds (default: 30)

## File Structure

The script downloads files with the following naming convention:
- Audio: `Nihongo-Con-Teppei-E{episode:02d}.mp3`
- Subtitles: `Nihongo-Con-Teppei-E{episode:02d}.vtt`

## Examples

```bash
# Download episodes 11-15 to a specific folder
python teppei.py --start 11 --end 15 --download --output ./japanese_lessons

# Download episode 20 with custom delay
python teppei.py 20 --download --delay 5

# Check what URLs would be downloaded without actually downloading
python teppei.py --start 1 --end 3
```

## Notes

- The script uses Selenium to scrape audio URLs from the website
- Subtitle URLs are constructed directly (no scraping needed)
- Built-in delays help prevent being rate-limited or banned
- Files are checked for existence before downloading to avoid duplicates
- Failed downloads are automatically retried up to 3 times