link-titler.hs
Path: build/link-titler.hs | Language: Haskell | Lines: ~128
Automatically add tooltip titles to bare Markdown links using the metadata database
Overview
link-titler.hs is a utility script that enriches Markdown files by adding tooltip titles to links that lack them. When writing Markdown, authors often include bare URLs like [text](url) without specifying a title attribute. This script looks up each URL in the link metadata database and generates a standardized citation tooltip in the form "'Title', Author et al 2024".
The script serves two purposes: making Markdown source files more readable during authoring (since the live site would add these tooltips at compile time anyway, but you can't see them while writing), and improving full-text search by embedding author/title/year information directly in the source. It operates on both standalone Markdown files and the HTML abstracts stored in annotation metadata.
A key design decision is that the script edits raw text rather than round-tripping through Pandoc. This preserves original formatting, linebreaks, and minimizes VCS diff noise that would result from Pandoc's normalized output style.
Public API
main :: IO ()
Entry point. Reads the metadata database, processes all Markdown files passed as command-line arguments in parallel, then walks all HTML annotations to add titles there as well.
Called by: Command line, cron jobs
Calls: readLinkMetadataSlow, addTitlesToFile, walkAndUpdateLinkMetadata
addTitlesToFile :: Metadata -> String -> IO ()
Process a single Markdown file. Parses the file, extracts links, identifies those without titles, looks up metadata, generates citation strings, and writes the updated file.
Called by: main
Calls: parseMarkdownOrHTML, extractURLsAndAnchorTooltips, authorsToCite, writeUpdatedFile
addTitlesToHTML :: Metadata -> (String, MetadataItem) -> IO (String, MetadataItem)
Process an annotation's HTML abstract field. Similar logic to addTitlesToFile but operates on HTML content within annotation records and uses HTML attribute syntax (title="...") instead of Markdown syntax.
Called by: walkAndUpdateLinkMetadata (as callback)
Calls: parseMarkdownOrHTML, extractURLsAndAnchorTooltips, authorsToCite
textSimplifier :: T.Text -> T.Text
Normalize text for comparison by lowercasing, removing punctuation/whitespace, and stripping HTML tags. Used to detect when anchor text already matches the title (avoiding redundant tooltips).
Called by: addTitlesToFile, addTitlesToHTML
Calls: Pure function
Internal Architecture
Data Flow
Markdown File
↓
parseMarkdownOrHTML (via Query module)
↓
extractURLsAndAnchorTooltips → [(URL, [AnchorText])]
↓
Filter to links with exactly 1 anchor (no existing title)
↓
Look up each URL in Metadata database
↓
Generate citation: "'Title', AuthorsToCite Year"
↓
Text replacement: "url)" → "url \"title\")"
↓
Write updated file
Link Detection Logic
The script identifies "untitled" links by looking for URLs that appear with exactly one anchor text. In Pandoc's representation, a link with a title would have the title as additional text, so single-anchor links are bare. Links where the anchor text already matches the title (after simplification) are skipped to avoid redundancy.
Citation Format
Generated tooltips follow the pattern:
'Title Here', Author1 et al 2024
The title is single-quoted (with internal quotes escaped), followed by a comma and the author citation generated by authorsToCite.
Key Patterns
Raw Text Editing
Rather than parse → transform → serialize (which would reformat the entire file), the script does targeted string replacement:
T.replace (url `T.append` ")")
(url `T.append` " \"" `T.append` titleNew `T.append` "\")")
This finds url) and replaces with url "title"), preserving all other formatting.
Similarity Detection
The textSimplifier function aggressively normalizes text to catch near-matches:
- Lowercase everything
- Remove all punctuation and whitespace
- Strip HTML formatting tags (
<em>,<strong>, etc.)
This prevents adding tooltips when the anchor text is essentially the same as the title but with minor formatting differences.
Parallel Processing
Files are processed in parallel using Control.Monad.Parallel.mapM_, making it efficient for batch updates across many Markdown files.
Configuration
No explicit configuration. Behavior is controlled by:
- Metadata database: Links only get titles if they exist in the database with non-empty title, author, and date fields
- Command-line arguments: Which Markdown files to process
Integration Points
Imports From
| Module | Functions Used |
|---|---|
| LinkID | authorsToCite - generate "Author et al Year" strings |
| LinkMetadata | walkAndUpdateLinkMetadata, readLinkMetadataSlow |
| LinkMetadataTypes | Metadata, MetadataItem type definitions |
| Query | extractURLsAndAnchorTooltips, parseMarkdownOrHTML |
| Utils | printGreen, writeUpdatedFile, replace, deleteMany |
Shared State
- Metadata database: Read-only access to look up URL metadata
- File system: Reads and writes Markdown files, updates annotation database entries
Invocation
Typically run from cron or manually during authoring:
runghc build/link-titler.hs path/to/file.md [more files...]
Limitations
Documented warnings in the source:
- Over-insertion: Deciding title/anchor similarity is imprecise; minor differences can trigger redundant tooltips
- Markdown subset: Does not handle raw HTML
<a>tags, automatic links, reference links, or shortcut reference links - Table breakage: Can break space-separated "simple" tables (rare edge case)
See Also
- LinkMetadata.hs - The metadata database this script queries
- link-tooltip.hs - Related tool that extracts metadata from tooltips (reverse direction)
- LinkID.hs - Provides authorsToCite for citation formatting
- Annotation.hs - URL scraping that populates the metadata
- link-prioritize.hs - Identifies links needing titles
- Typography.hs - Other Pandoc AST transformations