Skip to main content

link-titler.hs

Path: build/link-titler.hs | Language: Haskell | Lines: ~128

Automatically add tooltip titles to bare Markdown links using the metadata database


Overview

link-titler.hs is a utility script that enriches Markdown files by adding tooltip titles to links that lack them. When writing Markdown, authors often include bare URLs like [text](url) without specifying a title attribute. This script looks up each URL in the link metadata database and generates a standardized citation tooltip in the form "'Title', Author et al 2024".

The script serves two purposes: making Markdown source files more readable during authoring (since the live site would add these tooltips at compile time anyway, but you can't see them while writing), and improving full-text search by embedding author/title/year information directly in the source. It operates on both standalone Markdown files and the HTML abstracts stored in annotation metadata.

A key design decision is that the script edits raw text rather than round-tripping through Pandoc. This preserves original formatting, linebreaks, and minimizes VCS diff noise that would result from Pandoc's normalized output style.


Public API

main :: IO ()

Entry point. Reads the metadata database, processes all Markdown files passed as command-line arguments in parallel, then walks all HTML annotations to add titles there as well.

Called by: Command line, cron jobs Calls: readLinkMetadataSlow, addTitlesToFile, walkAndUpdateLinkMetadata


addTitlesToFile :: Metadata -> String -> IO ()

Process a single Markdown file. Parses the file, extracts links, identifies those without titles, looks up metadata, generates citation strings, and writes the updated file.

Called by: main Calls: parseMarkdownOrHTML, extractURLsAndAnchorTooltips, authorsToCite, writeUpdatedFile


addTitlesToHTML :: Metadata -> (String, MetadataItem) -> IO (String, MetadataItem)

Process an annotation's HTML abstract field. Similar logic to addTitlesToFile but operates on HTML content within annotation records and uses HTML attribute syntax (title="...") instead of Markdown syntax.

Called by: walkAndUpdateLinkMetadata (as callback) Calls: parseMarkdownOrHTML, extractURLsAndAnchorTooltips, authorsToCite


textSimplifier :: T.Text -> T.Text

Normalize text for comparison by lowercasing, removing punctuation/whitespace, and stripping HTML tags. Used to detect when anchor text already matches the title (avoiding redundant tooltips).

Called by: addTitlesToFile, addTitlesToHTML Calls: Pure function


Internal Architecture

Data Flow

Markdown File

parseMarkdownOrHTML (via Query module)

extractURLsAndAnchorTooltips → [(URL, [AnchorText])]

Filter to links with exactly 1 anchor (no existing title)

Look up each URL in Metadata database

Generate citation: "'Title', AuthorsToCite Year"

Text replacement: "url)" → "url \"title\")"

Write updated file

The script identifies "untitled" links by looking for URLs that appear with exactly one anchor text. In Pandoc's representation, a link with a title would have the title as additional text, so single-anchor links are bare. Links where the anchor text already matches the title (after simplification) are skipped to avoid redundancy.

Citation Format

Generated tooltips follow the pattern:

'Title Here', Author1 et al 2024

The title is single-quoted (with internal quotes escaped), followed by a comma and the author citation generated by authorsToCite.


Key Patterns

Raw Text Editing

Rather than parse → transform → serialize (which would reformat the entire file), the script does targeted string replacement:

T.replace (url `T.append` ")")
(url `T.append` " \"" `T.append` titleNew `T.append` "\")")

This finds url) and replaces with url "title"), preserving all other formatting.

Similarity Detection

The textSimplifier function aggressively normalizes text to catch near-matches:

  • Lowercase everything
  • Remove all punctuation and whitespace
  • Strip HTML formatting tags (<em>, <strong>, etc.)

This prevents adding tooltips when the anchor text is essentially the same as the title but with minor formatting differences.

Parallel Processing

Files are processed in parallel using Control.Monad.Parallel.mapM_, making it efficient for batch updates across many Markdown files.


Configuration

No explicit configuration. Behavior is controlled by:

  • Metadata database: Links only get titles if they exist in the database with non-empty title, author, and date fields
  • Command-line arguments: Which Markdown files to process

Integration Points

Imports From

ModuleFunctions Used
LinkIDauthorsToCite - generate "Author et al Year" strings
LinkMetadatawalkAndUpdateLinkMetadata, readLinkMetadataSlow
LinkMetadataTypesMetadata, MetadataItem type definitions
QueryextractURLsAndAnchorTooltips, parseMarkdownOrHTML
UtilsprintGreen, writeUpdatedFile, replace, deleteMany

Shared State

  • Metadata database: Read-only access to look up URL metadata
  • File system: Reads and writes Markdown files, updates annotation database entries

Invocation

Typically run from cron or manually during authoring:

runghc build/link-titler.hs path/to/file.md [more files...]

Limitations

Documented warnings in the source:

  1. Over-insertion: Deciding title/anchor similarity is imprecise; minor differences can trigger redundant tooltips
  2. Markdown subset: Does not handle raw HTML <a> tags, automatic links, reference links, or shortcut reference links
  3. Table breakage: Can break space-separated "simple" tables (rare edge case)

See Also