rsyncd.conf
Path: nginx/rsyncd.conf | Language: Config
Rsync daemon configuration providing read-only public access to large dataset mirrors hosted by gwern.net.
Overview
This configuration file defines four rsync modules that mirror large public datasets for research and preservation purposes. The rsync daemon provides an efficient, bandwidth-friendly way to distribute multi-gigabyte archives to researchers and mirrors worldwide.
All modules are configured as read-only with generous 6000-second (100-minute) timeouts to accommodate slow connections and large file transfers. The configuration supports large-scale data distribution including the Danbooru2021 anime image dataset (5.9TB), deprecated neural network resources, and the ThisWaifuDoesNotExist.net archive.
Two of the four modules (dnmarchives and biggan) are marked as deprecated with scheduled removal dates, reflecting evolving storage priorities and the migration of content to web-based access.
Key Directives/Settings
Global Configuration
log file = /var/log/rsyncd.log- Centralized logging for all rsync daemon activity
Modules
danbooru2021
Large-scale anime image dataset mirror
path = /home/danbooru/torrent/danbooru2021/comment = Danbooru2021 mirror (see <https://gwern.net/danbooru2021>)list = yes- Module appears in module listingsread only = yes- No uploads permittedtimeout = 6000- 100-minute timeout for large transfers
The Danbooru2021 dataset is a comprehensive archive of ~5.9 million tagged anime/manga images from the Danbooru imageboard, totaling approximately 5.9TB. It's used extensively in machine learning research, particularly for training generative models like GANs and diffusion models. This rsync mirror provides an alternative to BitTorrent for researchers who need reliable, resumable access.
dnmarchives
Darknet market research archive (DEPRECATED)
path = /home/danbooru/torrent/dnmarchives/comment = DNM Archives (2011–2015) mirror ... !WARNING!: This rsync mirror is deprecated; use <https://gwern.net/doc/darknet-market/dnm-archive/file/index> instead. Do not use it; mirror any files you need. Files will be removed on 2026-01-01.read only = yestimeout = 6000
Archive of darknet marketplace data (forum posts, vendor listings, etc.) from 2011-2015, used for research into cryptomarkets and online drug trafficking. The deprecation warning indicates migration to web-based access, with rsync scheduled for removal on January 1, 2026.
biggan
Neural network model checkpoints and datasets (DEPRECATED)
path = /home/danbooru/biggan/comment = Neural Net resources: model checkpoints & datasets (BigGAN, GPT-2, StyleGAN, etc). !WARNING!: this rsync mirror is deprecated! Do not use it; mirror any files you need. Files will start being removed on 2026-01-01.read only = yestimeout = 6000
Collection of pre-trained model weights and datasets for various neural networks (BigGAN, GPT-2, StyleGAN, etc.). Also marked deprecated with removal starting January 1, 2026. Likely being phased out due to:
- Storage costs
- Availability of models elsewhere (Hugging Face, official repositories)
- Shift toward newer model architectures
twdne
ThisWaifuDoesNotExist.net full site mirror
path = /home/gwern/thiswaifudoesnotexist.net/comment = ThisWaifuDoesNotExist.net full site mirror (all JPGs and text files). See <https://gwern.net/twdne> for background.read only = yestimeout = 6000
Complete archive of the ThisWaifuDoesNotExist.net generated anime portraits. Unlike the deprecated modules, this one has no removal warning, suggesting it's considered a permanent archive. Provides an alternative to HTTP access (see twdne.conf) for bulk downloading.
Special Features
Read-Only Public Access
All modules are configured as public, read-only mirrors with no authentication required. This supports the open science and digital preservation missions of gwern.net.
Extended Timeouts
The 6000-second (100-minute) timeout is significantly longer than rsync's default 60 seconds, accommodating:
- Large file transfers (some files are multi-GB)
- Slow or unstable international connections
- Batch downloads of entire datasets
- Mirror operations that sync thousands of files
Deprecation Notices in Comments
The configuration embeds user-facing warnings directly in the comment field, which appears when users list available modules. This is an elegant way to communicate deprecation without additional documentation, as the warnings appear inline during rsync operations:
$ rsync gwern.net::
dnmarchives DNM Archives (2011–2015) mirror ... !WARNING!: This rsync mirror is deprecated...
Cross-Service Integration
The modules complement HTTP access:
twdnemodule pairs withtwdne.confnginx configurationdanbooru2021supplements BitTorrent distribution- Rsync provides resumable transfers and efficient differential updates
Use Cases
Research Dataset Distribution
The primary use case is providing large machine learning datasets to researchers:
- Danbooru2021 for training generative anime models
- BigGAN/GPT-2/StyleGAN weights for fine-tuning or analysis
- DNM archives for sociological/criminological research
Mirroring and Preservation
Rsync's efficient delta-transfer algorithm makes it ideal for:
- Creating complete local mirrors of the datasets
- Periodically syncing updates without re-downloading unchanged files
- Distributed preservation (multiple institutions can easily maintain copies)
Alternative to Web Downloads
Rsync offers advantages over HTTP:
- Built-in resume capability for interrupted transfers
- Efficient recursive directory syncing
- Bandwidth throttling and scheduling support
- Better handling of thousands of small files
Example Usage
# List available modules
rsync gwern.net::
# Download Danbooru2021 dataset (5.9TB)
rsync -avz --progress gwern.net::danbooru2021/ /local/path/
# Sync only new files
rsync -avz --progress --update gwern.net::danbooru2021/ /local/path/
# Download TWDNE archive
rsync -avz gwern.net::twdne/ ./thiswaifudoesnotexist/
See Also
- gwern.net.conf - Main nginx configuration for gwern.net
- twdne.conf - HTTP access to TWDNE archive (complements rsync)
- memoriam.sh - Memorial header script (different server component)
- redirect-nginx - URL redirect rules
- sync.sh - Build/deploy script that may update rsync modules