Text.Regex
Path: build/Text/Regex.hs | Language: Haskell | Lines: ~203
Vendored regex compatibility module providing POSIX-style regular expressions via TDFA backend.
Overview
Text.Regex is a vendored copy of the regex-compat-tdfa package, included directly in the gwern.net codebase to avoid Cabal dependency versioning conflicts. It provides a simple, high-level interface for regular expression matching and substitution using the TDFA (Tagged Deterministic Finite Automaton) regex engine.
The module offers POSIX "extended" regular expression syntax (similar to egrep) with configurable options for case sensitivity and multiline matching. It wraps the lower-level Text.Regex.TDFA and Text.Regex.Base modules into a convenient API used throughout the build system.
This vendored approach ensures stable regex behavior regardless of external package updates or version conflicts in the Haskell ecosystem.
Public API
mkRegex :: String -> Regex
Creates a compiled regular expression with default options (multiline mode enabled, case-sensitive).
let pattern = mkRegex "^[A-Z][a-z]+ [0-9]{4}$"
-- Matches lines like "Smith 2020"
Default options:
newSyntax = True(extended regex syntax)multiline = True(^/$match line boundaries)- Case-sensitive matching
mkRegexWithOpts :: String -> Bool -> Bool -> Regex
Creates a compiled regex with explicit multiline and case-sensitivity options.
mkRegexWithOpts pattern singleLine caseSensitive
Parameters:
singleLine: WhenTrue,^and$match individual line boundaries;.does not match newlinescaseSensitive: WhenTrue, matching is case-sensitive
-- Case-insensitive, single-line mode
let pattern = mkRegexWithOpts "[a-z]+" True False
matchRegex :: Regex -> String -> Maybe [String]
Matches a regex against a string, returning captured subgroups.
matchRegex (mkRegex "([A-Z]+) ([0-9]+)") "ABC 123"
-- → Just ["ABC", "123"]
matchRegex (mkRegex "foo") "bar"
-- → Nothing
matchRegexAll :: Regex -> String -> Maybe (String, String, String, [String])
Full match information including context around the match.
matchRegexAll (mkRegex "[0-9]+") "foo 123 bar"
-- → Just ("foo ", "123", " bar", [])
-- ^before ^match ^after ^subgroups
Returns: Just (beforeMatch, matched, afterMatch, subgroupMatches) or Nothing
subRegex :: Regex -> String -> String -> String
Replaces all occurrences of a pattern with a replacement string.
subRegex (mkRegex "[0-9]+") "a1b2c3" "X"
-- → "aXbXcX"
Replacement syntax:
\0- entire match\1,\2, ... - captured subgroups\\- literal backslash
subRegex (mkRegex "([a-z]+)([0-9]+)") "foo123" "\\2-\\1"
-- → "123-foo"
Note: Does not advance on empty matches (matches original Text.Regex behavior).
splitRegex :: Regex -> String -> [String]
Splits a string on a delimiter pattern.
splitRegex (mkRegex ",\\s*") "a, b, c"
-- → ["a", "b", "c"]
splitRegex (mkRegex "::") "a::b::c"
-- → ["a", "b", "c"]
Warning: Produces infinite list if the regex matches empty strings.
Why Vendored
The comment at the top of the file explains:
GWERN.NET: VENDORED FROM regex-compat-tdfa-0.95.1.4 DUE TO CABAL DEPENDENCY VERSIONING PROBLEMS
Haskell's package ecosystem can have version conflicts when multiple packages require different versions of the same dependency. By vendoring this module, gwern.net:
- Avoids dependency resolution failures during builds
- Ensures consistent regex behavior across all build environments
- Eliminates external package update surprises
Usage in Codebase
This module is used throughout the Haskell build tools for:
- Typography.hs: Citation pattern matching (
Smith 2020,Smith & Jones 2020) - LinkMetadata.hs: URL and metadata parsing
- Annotation modules: Extracting structured data from scraped content
- Various utilities: String manipulation and validation
Configuration
Listed in gwernnet.cabal as an exposed library module:
exposed-modules:
...
Text.Regex
...
Depends on:
regex-base- Abstract regex interfaceregex-tdfa- TDFA implementation
See Also
- Typography.hs - Uses regex for citation formatting
- LinkMetadata.hs - URL pattern matching
- gwernnet.cabal - Build configuration listing this module
- Utils.hs - Other utility functions