LinkBacklink.hs
Path: build/LinkBacklink.hs | Language: Haskell | Lines: ~130
Utility functions for reading and writing the backlinks database
Overview
LinkBacklink.hs is the inverse of Query.hs: while Query extracts forward links from documents (links pointing out), this module handles backlinks (links pointing in to a page). Backlinks enable reverse-citation navigation—showing all pages that reference the current page.
Because backlinks are a global property across all documents, they cannot be computed locally. The module provides IO functions for reading and writing metadata/backlinks.hs, a Haskell-readable database generated by generateBacklinks.hs. The database is also used as a convenient registry of all URLs in the site.
The module also provides path-generation utilities for the four types of "x-link" metadata snippets: annotations, backlinks, link-bibliographies, and similar-links. Each type lives in a subdirectory under metadata/annotation/ with URL-encoded filenames.
Public API
readBacklinksDB :: IO Backlinks
Reads and parses metadata/backlinks.hs into an in-memory map. Returns M.empty if the file doesn't exist or is empty.
Called by: generateBacklinks.hs:main', any module needing backlink data
Calls: Config.Misc.cd, doesFileExist, TIO.readFile
writeBacklinksDB :: Backlinks -> IO ()
Serializes the backlinks map and writes to metadata/backlinks.hs using ppShow for human-readable formatting.
Called by: generateBacklinks.hs:main'
Calls: Utils.writeUpdatedFile
getBackLink :: FilePath -> (FilePath, FilePath)
Converts a URL to its backlink snippet paths. Returns (rawPath, urlEncodedPath).
Example: /improvement → ("metadata/annotation/backlink/%2Fimprovement.html", "/metadata/annotation/backlink/%252Fimprovement.html")
Called by: Templates, hakyll.hs
Calls: getXLink
getAnnotationLink, getLinkBibLink, getSimilarLink :: FilePath -> (FilePath, FilePath)
Similar to getBackLink but for annotation, link-bibliography, and similar-link snippets respectively.
getBackLinkCheck, getAnnotationLinkCheck, getLinkBibLinkCheck, getSimilarLinkCheck :: FilePath -> IO (FilePath, FilePath)
IO versions that check for file existence on disk. Return ("", "") if the snippet doesn't exist.
getBackLinkCount :: FilePath -> IO Int
Counts backlinks by reading the HTML snippet and counting occurrences of "[backlink context]".
Called by: Template rendering for backlink counts
getSimilarLinkCount :: FilePath -> IO Int
Counts similar links by counting occurrences of class="link-annotated id-not backlink-not" in the similar-links snippet.
getForwardLinks :: Backlinks -> T.Text -> [T.Text]
Derives forward links from the backlinks database by inverting the map. Returns all URLs that the given page links to.
suggestAnchorsToSplitOut :: IO [(Int, T.Text)]
Analyzes the backlinks database to find section anchors (e.g., /page#section) that are linked frequently enough to warrant extraction into standalone pages. Uses sectionizeMinN (default: 3) as the threshold.
Internal Architecture
Data Type: Backlinks
type Backlinks = M.Map T.Text [(T.Text, [T.Text])]
A nested structure:
- Outer key: Base URL (without anchor), e.g.,
"/improvement" - Inner pairs:
(full URL with anchor, [list of callers])
This nesting preserves anchor-level granularity. A page like /improvement may have callers to the page as a whole and callers to specific anchors like #microsoft:
("/improvement",
[ ("/improvement#microsoft", ["/note/note", "/review/book"])
, ("/improvement", ["/index"])
])
URL Encoding Scheme
The getXLink function produces double-URL-encoded paths:
- On-disk path: Single-encoded, e.g.,
metadata/annotation/backlink/%2Fdocs%2Frl%2Findex.html - HTML href: Double-encoded so browsers decode once to get the filename, e.g.,
/metadata/annotation/backlink/%252Fdocs%252Frl%252Findex.html
All snippet filenames end with .html to ensure correct MIME type handling.
Key Patterns
Anchor Truncation for Page-Level Grouping
For on-site pages (paths starting with /), backlinks strips anchors before lookup to consolidate all links to a page. External URLs retain their anchors:
-- Pages: strip anchor
"/gpt-3#prompt-programming" → "/gpt-3"
-- External: preserve anchor
"arxiv.org/123#google" → "arxiv.org/123#google"
Page Path Detection for Special Handling
The isPagePath check distinguishes top-level pages (which need anchor stripping and context links) from external URLs and annotation targets.
Configuration
| Config | Location | Default | Purpose |
|---|---|---|---|
sectionizeMinN | Config/Misc.hs | 3 | Minimum backlink count to suggest anchor extraction |
sectionizeWhiteList | Config/Misc.hs | ["/danbooru2021#*"] | Anchors exempt from extraction suggestions |
backlinkBlackList | Config/Misc.hs | (function) | URLs to exclude from backlink tracking |
Blacklist Rules
The backlinkBlackList function filters out:
- Meta paths:
/backlink/,/link-bibliography/,/similar/,/private - Special prefixes:
$,#,!,mailto:,irc://, bitcoin symbol,/doc/www/ - Example URLs:
https://example.com
Integration Points
Database File
- Path:
metadata/backlinks.hs - Format: Haskell literal list, parseable via
Readtypeclass - Written by:
generateBacklinks.hsduring build - Read by: This module, any template needing backlink data
Snippet Files
Generated snippets live in metadata/annotation/{type}/:
metadata/annotation/— annotationsmetadata/annotation/backlink/— backlink listsmetadata/annotation/link-bibliography/— link bibliographiesmetadata/annotation/similar/— similar link lists
Related Executables
- generateBacklinks.hs: Main generator that:
- Parses all
.mdand.htmlfiles for links - Extracts links from annotation abstracts and author fields
- Updates
metadata/backlinks.hs - Writes HTML snippets for each target URL
- Parses all
See Also
- generateBacklinks.hs - The generator executable that creates and populates the backlinks database
- hakyll.hs - Build system that uses backlink paths for template rendering
- LinkMetadata.hs - Annotation database (forward direction); complements backlinks
- Annotation.hs - URL-to-scraper routing for annotation generation
- GenerateSimilar.hs - Uses backlinks to enrich embedding context
- Config.Misc - Contains backlinkBlackList and sectionize settings