Skip to main content

LinkBacklink.hs

Path: build/LinkBacklink.hs | Language: Haskell | Lines: ~130

Utility functions for reading and writing the backlinks database


Overview

LinkBacklink.hs is the inverse of Query.hs: while Query extracts forward links from documents (links pointing out), this module handles backlinks (links pointing in to a page). Backlinks enable reverse-citation navigation—showing all pages that reference the current page.

Because backlinks are a global property across all documents, they cannot be computed locally. The module provides IO functions for reading and writing metadata/backlinks.hs, a Haskell-readable database generated by generateBacklinks.hs. The database is also used as a convenient registry of all URLs in the site.

The module also provides path-generation utilities for the four types of "x-link" metadata snippets: annotations, backlinks, link-bibliographies, and similar-links. Each type lives in a subdirectory under metadata/annotation/ with URL-encoded filenames.


Public API

Reads and parses metadata/backlinks.hs into an in-memory map. Returns M.empty if the file doesn't exist or is empty.

Called by: generateBacklinks.hs:main', any module needing backlink data Calls: Config.Misc.cd, doesFileExist, TIO.readFile

Serializes the backlinks map and writes to metadata/backlinks.hs using ppShow for human-readable formatting.

Called by: generateBacklinks.hs:main' Calls: Utils.writeUpdatedFile

Converts a URL to its backlink snippet paths. Returns (rawPath, urlEncodedPath).

Example: /improvement("metadata/annotation/backlink/%2Fimprovement.html", "/metadata/annotation/backlink/%252Fimprovement.html")

Called by: Templates, hakyll.hs Calls: getXLink

Similar to getBackLink but for annotation, link-bibliography, and similar-link snippets respectively.

getBackLinkCheck, getAnnotationLinkCheck, getLinkBibLinkCheck, getSimilarLinkCheck :: FilePath -> IO (FilePath, FilePath)

IO versions that check for file existence on disk. Return ("", "") if the snippet doesn't exist.

getBackLinkCount :: FilePath -> IO Int

Counts backlinks by reading the HTML snippet and counting occurrences of "[backlink context]".

Called by: Template rendering for backlink counts

getSimilarLinkCount :: FilePath -> IO Int

Counts similar links by counting occurrences of class="link-annotated id-not backlink-not" in the similar-links snippet.

Derives forward links from the backlinks database by inverting the map. Returns all URLs that the given page links to.

suggestAnchorsToSplitOut :: IO [(Int, T.Text)]

Analyzes the backlinks database to find section anchors (e.g., /page#section) that are linked frequently enough to warrant extraction into standalone pages. Uses sectionizeMinN (default: 3) as the threshold.


Internal Architecture

type Backlinks = M.Map T.Text [(T.Text, [T.Text])]

A nested structure:

  • Outer key: Base URL (without anchor), e.g., "/improvement"
  • Inner pairs: (full URL with anchor, [list of callers])

This nesting preserves anchor-level granularity. A page like /improvement may have callers to the page as a whole and callers to specific anchors like #microsoft:

("/improvement",
[ ("/improvement#microsoft", ["/note/note", "/review/book"])
, ("/improvement", ["/index"])
])

URL Encoding Scheme

The getXLink function produces double-URL-encoded paths:

  1. On-disk path: Single-encoded, e.g., metadata/annotation/backlink/%2Fdocs%2Frl%2Findex.html
  2. HTML href: Double-encoded so browsers decode once to get the filename, e.g., /metadata/annotation/backlink/%252Fdocs%252Frl%252Findex.html

All snippet filenames end with .html to ensure correct MIME type handling.


Key Patterns

Anchor Truncation for Page-Level Grouping

For on-site pages (paths starting with /), backlinks strips anchors before lookup to consolidate all links to a page. External URLs retain their anchors:

-- Pages: strip anchor
"/gpt-3#prompt-programming""/gpt-3"

-- External: preserve anchor
"arxiv.org/123#google""arxiv.org/123#google"

Page Path Detection for Special Handling

The isPagePath check distinguishes top-level pages (which need anchor stripping and context links) from external URLs and annotation targets.


Configuration

ConfigLocationDefaultPurpose
sectionizeMinNConfig/Misc.hs3Minimum backlink count to suggest anchor extraction
sectionizeWhiteListConfig/Misc.hs["/danbooru2021#*"]Anchors exempt from extraction suggestions
backlinkBlackListConfig/Misc.hs(function)URLs to exclude from backlink tracking

Blacklist Rules

The backlinkBlackList function filters out:

  • Meta paths: /backlink/, /link-bibliography/, /similar/, /private
  • Special prefixes: $, #, !, mailto:, irc://, bitcoin symbol, /doc/www/
  • Example URLs: https://example.com

Integration Points

Database File

  • Path: metadata/backlinks.hs
  • Format: Haskell literal list, parseable via Read typeclass
  • Written by: generateBacklinks.hs during build
  • Read by: This module, any template needing backlink data

Snippet Files

Generated snippets live in metadata/annotation/{type}/:

  • metadata/annotation/ — annotations
  • metadata/annotation/backlink/ — backlink lists
  • metadata/annotation/link-bibliography/ — link bibliographies
  • metadata/annotation/similar/ — similar link lists
  • generateBacklinks.hs: Main generator that:
    1. Parses all .md and .html files for links
    2. Extracts links from annotation abstracts and author fields
    3. Updates metadata/backlinks.hs
    4. Writes HTML snippets for each target URL

See Also

  • generateBacklinks.hs - The generator executable that creates and populates the backlinks database
  • hakyll.hs - Build system that uses backlink paths for template rendering
  • LinkMetadata.hs - Annotation database (forward direction); complements backlinks
  • Annotation.hs - URL-to-scraper routing for annotation generation
  • GenerateSimilar.hs - Uses backlinks to enrich embedding context
  • Config.Misc - Contains backlinkBlackList and sectionize settings