nesi · CallumWalley · Jun 3, 2026 · Jun 3, 2026 · Jun 3, 2026 · Jun 3, 2026
diff --git a/.github/workflows/README.md b/.github/workflows/README.md
@@ -11,20 +11,26 @@ Currently retrieves:
 - Software module list from [modules-list](https://github.com/nesi/modules-list).
 - Glossary, spellcheck dictionary and snippets from [nesi-wordlist](https://github.com/nesi/nesi-wordlist)
 
-It then runs [link_apps_pages.py](#link_apps_pagespy).
+It then runs [compile_tags.py](#compile_tagspy).
 
 All modified files are added to a new branch called `new-assets` and merged into main.
 
 In theory, all this could be done at deployment, but I wanted to make sure that changes to these remote files didn't break anything.
 
-## [link_apps_pages.py](link_apps_pages.py)
+## [compile_tags.py](compile_tags.py)
 
-A Python script used to add a link to the appropriate documentation to [modules-list.json](../../docs/assets/module-list.json).
+Replaces the old `link_apps_pages.py`.
 
-The script checks all titles of input files, and sets the `support` key to be equal to the pages url.
-It also adds whatever tags are on that page to the `domains` key.
+Validates page tags against the canonical vocabulary in [`docs/assets/tags.yml`](../../docs/assets/tags.yml), writes two compiled indexes, and links app pages to the module list:
 
-_One day I would like to simplify this whole thing._
+- **`docs/assets/tag-index.json`** — maps each canonical tag to the list of pages that carry it. Used by the `pages_with_tag()` macro at render time.
+- **`docs/assets/module-list.json`** — updated with support-page URLs and canonical domain tags for each application.
+
+Any tag not present in `tags.yml` (as a key or alias) produces a CI warning. Unknown tags are silently dropped from the index.
+
+### Tag vocabulary
+
+Tags are defined in [`docs/assets/tags.yml`](../../docs/assets/tags.yml). Each entry has a canonical key (snake\_case), a display label, and optional aliases. Pages should always use canonical keys; aliases are accepted for backwards compatibility but are normalised at compile time.
 
 ## [checks.yml](checks.yml)
 

diff --git a/.github/workflows/checks.yml b/.github/workflows/checks.yml
@@ -162,15 +162,24 @@ jobs:
         run: |
           shopt -s globstar extglob
           python3 checks/run_slurm_lint.py ${{needs.get.outputs.filelist}}
+  tagcheck:
+    name: Check tags
+    runs-on: ubuntu-24.04
+    steps:
+      - uses: actions/checkout@v6
+      - run: pip3 install pyyaml
+      - run: python3 .github/workflows/compile_tags.py
+
   testBuild:
     name: Test build
     if: ${{github.event_name != 'workflow_dispatch' || inputs.testBuild}}
     runs-on: ubuntu-24.04
-    needs: get
+    needs: [get, tagcheck]
     steps:
       - uses: actions/checkout@v6
         with:
           fetch-depth: 0
       - run: pip3 install -r requirements.txt
+      - run: python3 .github/workflows/compile_tags.py
       - run: ./checks/run_test_build.py
       - run: export NO_MKDOCS_2_WARNING="1"; python3 checks/run_aria_check.py
diff --git a/.github/workflows/compile_tags.py b/.github/workflows/compile_tags.py
@@ -0,0 +1,109 @@
+#!/usr/bin/env python3
+
+"""
+Validates page tags, links app pages to the module list, and writes compiled indexes.
+
+Run during CI lint checks to surface tag warnings.
+Run on deploy to ensure tag-index.json and module-list.json are up to date before building.
+
+Replaces: link_apps_pages.py
+"""
+
+import os
+import re
+import json
+import yaml
+import sys
+from pathlib import Path
+
+
+TAGS_VOCAB_PATH = os.getenv("TAGS_VOCAB_PATH", "docs/assets/tags.yml")
+TAG_INDEX_PATH = os.getenv("TAG_INDEX_PATH", "docs/assets/tag-index.json")
+MODULE_LIST_PATH = os.getenv("MODULE_LIST_PATH", "docs/assets/module-list.json")
+DOC_ROOT = os.getenv("DOC_ROOT", "docs")
+APPS_PAGES_PATH = os.getenv("APPS_PAGES_PATH", "Software/Available_Applications")
+BASE_URL = os.getenv("BASE_URL", "https://www.docs.nesi.org.nz")
+
+
+def load_vocabulary(path):
+    vocab = yaml.safe_load(open(path))
+    alias_map = {}
+    for canonical, entry in vocab.items():
+        alias_map[canonical.lower()] = canonical
+        for alias in (entry.get("aliases") or []):
+            alias_map[alias.lower()] = canonical
+    return vocab, alias_map
+
+
+def parse_frontmatter(path):
+    content = path.read_text()
+    match = re.match(r"---\n([\s\S]*?)---", content)
+    if not match:
+        return None
+    return yaml.safe_load(match.group(1)) or {}
+
+
+def title_from_path(md_file):
+    name = md_file.stem.replace("_", " ")
+    return name[0].upper() + name[1:]
+
+
+vocab, alias_map = load_vocabulary(TAGS_VOCAB_PATH)
+module_list = json.load(open(MODULE_LIST_PATH))
+
+tag_index = {canonical: [] for canonical in vocab}
+warnings = 0
+
+for md_file in sorted(Path(DOC_ROOT).rglob("*.md")):
+    rel = str(md_file.relative_to(DOC_ROOT))
+    meta = parse_frontmatter(md_file)
+
+    if meta is None:
+        print(f"::warning file={md_file},title=meta.parse::Meta block missing or malformed.")
+        warnings += 1
+        continue
+
+    raw_tags = meta.get("tags") or []
+    title = meta.get("title") or title_from_path(md_file)
+    canonical_tags = []
+
+    for tag in raw_tags:
+        canonical = alias_map.get(str(tag).lower())
+        if canonical is None:
+            print(f"::warning file={md_file},title=tag.unknown::Unknown tag '{tag}' on '{title}'. Add to {TAGS_VOCAB_PATH} or use an existing alias.")
+            warnings += 1
+        else:
+            entry = {"title": title, "path": rel}
+            if entry not in tag_index[canonical]:
+                tag_index[canonical].append(entry)
+            canonical_tags.append(canonical)
+
+    # For app pages: update support URL and merge canonical tags into module domains.
+    is_app_page = str(md_file.relative_to(DOC_ROOT)).startswith(APPS_PAGES_PATH)
+    if is_app_page and md_file.name != "index.md":
+        app = meta.get("title") or title_from_path(md_file)
+        if app in module_list:
+            page_link = f"{BASE_URL}/{APPS_PAGES_PATH}/{app}"
+            existing = module_list[app].get("support", "")
+            if existing and existing != page_link:
+                print(f"::warning file={md_file},title=docpath.change::Support URL for '{app}' changed from '{existing}' to '{page_link}'.")
+            module_list[app]["support"] = page_link
+            for canonical in canonical_tags:
+                if canonical not in module_list[app]["domains"]:
+                    module_list[app]["domains"].append(canonical)
+        else:
+            print(f"::warning file={md_file},title=missing.module::'{md_file.name}' has no corresponding module in {MODULE_LIST_PATH}.")
+            warnings += 1
+
+tag_index = {k: v for k, v in tag_index.items() if v}
+
+with open(TAG_INDEX_PATH, "w") as f:
+    f.write(json.dumps(tag_index, indent=4))
+
+with open(MODULE_LIST_PATH, "w") as f:
+    f.write(json.dumps(module_list, indent=4))
+
+print(f"tag-index.json: {len(tag_index)} tags, {sum(len(v) for v in tag_index.values())} entries.")
+print(f"module-list.json: updated support URLs and domains for app pages.")
+if warnings:
+    print(f"::warning::{warnings} warning(s) issued. Review and address before merging.")
diff --git a/.github/workflows/deploy.yml b/.github/workflows/deploy.yml
@@ -51,6 +51,8 @@ jobs:
         run: pip install -r requirements.txt
       - name: Fetch Remote Files
         run: bash .github/fetch_includes.sh
+      - name: Compile tag index and link app pages
+        run: python3 .github/workflows/compile_tags.py
       - name: Build documentation
         run: |
               mkdocs build --clean --quiet

diff --git a/docs/Announcements/Accessing_NeSI_Support_during_the_Easter_break.md b/docs/Announcements/Accessing_NeSI_Support_during_the_Easter_break.md
@@ -2,8 +2,7 @@
 description: A page sharing the details of reduced support hours over Easter and ANZAC break
 created_at: '2024-03-20T01:58:22Z'
 tags:
-    - easter
-    - holidays
+    - announcement
 title: Accessing REANNZ HPC Support during the Easter and ANZAC holidays
 search:
   boost: 0.1

diff --git a/docs/Announcements/Identity_Changes_for_Crown_Research_Institutes.md b/docs/Announcements/Identity_Changes_for_Crown_Research_Institutes.md
@@ -4,8 +4,8 @@ status: new
 search:
     boost: 10
 tags:
-    - identity
-    - email
+    - access
+    - slurm
 ---
 
 ## What is happening

diff --git a/docs/Announcements/Known_Issues_HPC3.md b/docs/Announcements/Known_Issues_HPC3.md
@@ -1,10 +1,9 @@
 ---
 created_at: 2025-04-28
 description: List of features currently missing from Mahuika (HPC3).
-tags: 
-    - hpc3
-    - refresh
-    - mahuika
+tags:
+    - release_notes
+    - announcement
 ---
 
 Below is a list issues that we're actively working on. We hope to have these resolved soon. This is intended to be a temporary page.

diff --git a/docs/Announcements/Release_Notes/index.md b/docs/Announcements/Release_Notes/index.md
@@ -1,6 +1,7 @@
 ---
 created_at: '2021-02-23T19:52:34Z'
-tags: []
+tags:
+- release_notes
 title: Release Notes
 ---
 

diff --git a/docs/Announcements/Slurm_Job_email.md b/docs/Announcements/Slurm_Job_email.md
@@ -1,10 +1,9 @@
 ---
 created_at: 2026-02-11
 description: Email from Slurm Jobs now available
-tags: 
-    - hpc3
-    - email
-    - mahuika
+tags:
+    - release_notes
+    - slurm
 ---
 
 Sending email from Slurm jobs is now available on Mahuika.  Here is an example of the Slurm parameters required to send email:

diff --git a/docs/Batch_Computing/Batch_Computing_Guide.md b/docs/Batch_Computing/Batch_Computing_Guide.md
@@ -3,7 +3,7 @@ created_at: 2025-12-19
 description: Guide to batch computing
 tags:
     - slurm
-    - ondemand
+    - interactive
 ---
 
 Batch jobs can be submitted via several methods. The most basic is a [simple Slurm job](#slurm-job-basics).

diff --git a/docs/Batch_Computing/Checking_resource_usage.md b/docs/Batch_Computing/Checking_resource_usage.md
@@ -2,7 +2,7 @@
 created_at: '2022-02-15T01:13:51Z'
 tags:
   - slurm
-  - accounting
+  - account
 status: deprecated
 ---
 

diff --git a/docs/Batch_Computing/Fair_Share.md b/docs/Batch_Computing/Fair_Share.md
@@ -1,14 +1,10 @@
 ---
 created_at: '2019-02-05T03:58:21Z'
 tags:
-  - accounting
-  - Slurm
-  - Fairshare
-  - Fair Share
-  - Job priority
-  - Long queue time
-  - Queing
-  - long wait time
+  - account
+  - slurm
+  - fairshare
+  - troubleshooting
 description: How balancing your workload lets you make the most of your allocation.
 ---
 

diff --git a/docs/Batch_Computing/Hardware.md b/docs/Batch_Computing/Hardware.md
@@ -3,7 +3,6 @@ created_at: '2022-06-13T04:54:38Z'
 description:  This page below outlines the available hardware.
 tags:
  - gpu
- - compute
 ---
 
 A list of the currently available hardware.

diff --git a/docs/Batch_Computing/Job_Arrays.md b/docs/Batch_Computing/Job_Arrays.md
@@ -1,10 +1,9 @@
 ---
 created_at: 2025-12-09
 description: How to utilise job arrays. 
-tags: 
+tags:
     - slurm
     - parallel
-    - array
 ---
 
 

diff --git a/docs/Batch_Computing/Job_Limits.md b/docs/Batch_Computing/Job_Limits.md
@@ -1,9 +1,9 @@
 ---
 created_at: 2025-07-17
 description: What limits are there on running jobs.
-tags: 
-    - Slurm
-    - accounting
+tags:
+    - slurm
+    - account
 ---
 
 These are open for review if you find any of them unreasonable or inefficient.

diff --git a/docs/Batch_Computing/Job_prioritisation.md b/docs/Batch_Computing/Job_prioritisation.md
@@ -1,9 +1,9 @@
 ---
 created_at: '2018-05-17T23:35:36Z'
 description: What factors are used to determine a jobs prioroty.
-tags: 
- - Slurm
- - accounting
+tags:
+ - slurm
+ - account
 ---
 
 Each queued job has a priority score. Jobs start when sufficient

diff --git a/docs/Batch_Computing/SLURM-Best_Practice.md b/docs/Batch_Computing/SLURM-Best_Practice.md
@@ -2,7 +2,6 @@
 created_at: '2019-01-18T01:56:15Z'
 tags: 
     - slurm
-    - tips
 title: 'SLURM: Best Practice'
 description: Some tips on how to get more out of the job sceduler.
 ---

diff --git a/docs/Batch_Computing/Temporary_directories.md b/docs/Batch_Computing/Temporary_directories.md
@@ -1,11 +1,7 @@
 ---
 created_at: '2023-07-21T04:10:04Z'
-tags: 
+tags:
   - storage
-  - tmpdir
-  - tmp
-  - temp
-  - localscratch
 description: How temporary files are utilised on the REANNZ cluster.
 ---
 

diff --git a/docs/Batch_Computing/Using_GPUs.md b/docs/Batch_Computing/Using_GPUs.md
@@ -2,7 +2,7 @@
 created_at: '2020-04-19T22:59:58Z'
 tags:
 - gpu
-- Slurm
+- slurm
 ---
 
 This page provides generic information about how to access GPUs through the Slurm scheduler.
@@ -307,6 +307,6 @@ To record the GPU utilisation and GPU memory, see [Measuring GPU efficiency afte
 
 ## Application and toolbox specific support pages
 
-See the [Supported Applications](../Software/Available_Applications/index.md) for more information on what softwares have GPU support, as well as programming toolkits:
-
-- [NVIDIA GPU Containers](../Software/Containers/NVIDIA_GPU_Containers.md)
+{% for p in pages_with_tag("gpu") %}
+- [{{ p.title }}]({{ p.path }})
+{% endfor %}
diff --git a/docs/Data_Transfer/Checksums.md b/docs/Data_Transfer/Checksums.md
@@ -1,11 +1,8 @@
 ---
 created_at: '2020-01-14T22:10:50Z'
-tags: 
+tags:
     - checksum
-    - md5
-    - sha
-    - hash
-    - digest
+    - announcement
 ---
 
 Applying a *checksum function* to a file will return its *message digest* (also simply referred to as a _checksum_), which is akin to a digital fingerprint.

diff --git a/docs/Data_Transfer/Data_Transfer_Overview.md b/docs/Data_Transfer/Data_Transfer_Overview.md
@@ -1,5 +1,7 @@
 ---
 created_at: '2018-11-20T22:41:32Z'
+tags:
+- file_transfer
 ---
 
 !!! prerequisite

diff --git a/docs/Data_Transfer/Data_Transfer_Using_MobaXterm.md b/docs/Data_Transfer/Data_Transfer_Using_MobaXterm.md
@@ -1,8 +1,8 @@
 ---
 created_at: 2026-02-04
 description: How to copy files to the REANNZ HPC using MobaXterm.
-tags: 
-    - data transfer
+tags:
+    - file_transfer
 title:  MobaXterm (Windows)
 ---
-Original file line number
+Diff line change
@@ Expand Up / @@ -4,8 +4,8 @@ status: new @@
     search:
         boost: 10
     tags:
-        - identity
-        - email
+        - access
+        - slurm
     ---
     ## What is happening
@@ Expand Down @@