Skip to main content

changeTag.hs

Path: build/changeTag.hs | Language: Haskell (runghc script) | Lines: ~148

CLI tool for batch add/remove operations on link tags across metadata databases


Overview

changeTag.hs is a command-line utility for modifying the tag associations of annotated links in gwern.net's metadata system. Tags are hierarchical categories (like economics/experience-curve or psychology/inner-monologue) that organize the site's extensive link annotations.

The tool operates on the GTX metadata files (me.gtx, full.gtx, half.gtx, auto.gtx) which store link annotations. It handles the complexity of determining which file contains a given link's annotation and writes changes to the correct location. A key behavior is "promotion": when tagging a link that only exists in auto.gtx (auto-generated annotations), the tool moves it to half.gtx to mark it as human-curated.

The tool supports flexible argument parsing where links (identified by / or http prefixes) and tags can be intermixed in any order. It also provides a shortcut for bidirectional tag relationships—specifying two tags with no links will cross-link them via their index pages.


Public API

This is a standalone CLI tool, not a library module.

Command Line Interface

# Add a tag to a link
changeTag.hs "https://example.com/article" "economics/experience-curve"

# Remove a tag (hyphen prefix)
changeTag.hs "/doc/paper.pdf" "-psychology/deprecated-tag"

# Multiple links × multiple tags (cartesian product)
changeTag.hs "https://link1.com" "https://link2.com" "tag1" "tag2"

# Bidirectional tag relationship (two tags, no links)
changeTag.hs "ai/machine-learning" "cs/algorithms"

# Pipe URLs via stdin
echo "https://example.com" | changeTag.hs "psychology/cognition"

Called by: Manual invocation, potentially build scripts Calls: GTX.readGTXFast, GTX.writeGTX, LinkMetadata.annotateLink, Tags.guessTagFromShort, Tags.listTagsAll


Internal Architecture

Data Flow

CLI args

Parse into (links, tags)

For each (link, tag) pair:

Load all 4 GTX files

Find which file contains the link

Modify tag list in that file (or promote from auto→half)

Write changed file(s)

Key Data Structures

MetadataList (from LinkMetadataTypes):

type MetadataList = [(String, MetadataItem)]
-- where MetadataItem is a 7-tuple:
-- (title, author, date, dateCreated, keyValues, tags, abstract)

Core Functions

FunctionPurpose
mainParse args, identify links vs tags, dispatch
changeOneTagProcess single link+tag pair
changeAndWriteTagsDetermine correct GTX file, write update
addNewLinkCreate annotation for unknown link, then retry
changeTag / addTag / removeTagPure list transformations
mvItemMove entry between metadata lists (auto→half promotion)

GTX File Priority

The tool checks files in order: me.gtxfull.gtxhalf.gtxauto.gtx

  • me.gtx: Personal/first-party annotations (highest priority)
  • full.gtx: Complete hand-written annotations
  • half.gtx: Partially curated annotations
  • auto.gtx: Auto-generated annotations (lowest priority, disposable)

When modifying, only one file is written (the one containing the link), except for auto→half promotion which writes both.


Key Patterns

Tag Negation

Prefixing a tag with - removes it instead of adding:

changeTag i ml tag = if head tag /= '-'
then addTag i ml tag
else removeTag i ml (tail tag)

Auto-to-Half Promotion

Links in auto.gtx are considered ephemeral. Adding a tag implies human curation, so the entry is moved to half.gtx:

if aP then
let (autoNew,halfNew) = mvItem a p i
in writeUpdatedGTX a "metadata/auto.gtx" autoNew >>
writeUpdatedGTX a "metadata/half.gtx" (changeTag i halfNew t)

When exactly two tags and zero links are provided, the tool creates mutual "see also" relationships:

if length tags == 2 && null links then
changeOneTag ("/doc/" ++ head tags ++ "/index") (tags !! 1) >>
changeOneTag ("/doc/" ++ (tags !! 1) ++ "/index") (head tags)

Tag Fuzzy Matching

Tags are resolved through guessTagFromShort, allowing abbreviated tag names:

guessTagFromShort [] allTags $ filter (/=',') t

If a link has no existing annotation, the tool triggers annotateLink to create one (e.g., fetching metadata from Arxiv), then retries:

addNewLink tag p = do
md <- readLinkMetadata
returnValue <- annotateLink md (Link nullAttr [] (T.pack p, T.pack ""))
case returnValue of
Right _ -> changeOneTag p tag -- retry with new annotation

Configuration

No external configuration. Paths are hardcoded:

PathPurpose
metadata/me.gtxPersonal annotations
metadata/full.gtxComplete annotations
metadata/half.gtxPartial annotations
metadata/auto.gtxAuto-generated annotations
doc/Tag directory structure

Uses Config.Misc.root for path normalization.


Integration Points

Input Sources

  • Command line arguments
  • stdin (piped URLs)

Dependencies

  • GTX module: Read/write metadata files
  • LinkMetadata module: Annotation creation pipeline
  • Tags module: Tag validation and fuzzy matching
  • Metadata.Format: URL canonicalization

Side Effects

  • Modifies GTX files in metadata/
  • May trigger network requests via annotateLink for new links

Error Handling

  • Validates tag existence (must have directory in doc/)
  • Rejects malformed arguments (catches common typos like bare gwt or t)
  • Distinguishes Temporary vs Permanent annotation failures

See Also

  • Tags.hs - Tag utilities and fuzzy matching via guessTagFromShort
  • Config.Tags - Tag configuration with aliases and display names
  • guessTag.hs - CLI tool for testing tag expansion
  • GTX.hs - GTX file format and I/O operations
  • LinkMetadata.hs - Annotation database manager
  • Annotation.hs - URL-to-scraper dispatcher for new annotations
  • Hakyll.hs - Site generator consuming tag data