Skip to content

Use github token#42

Open
spoorcc wants to merge 1 commit intomainfrom
file-exploration
Open

Use github token#42
spoorcc wants to merge 1 commit intomainfrom
file-exploration

Conversation

@spoorcc
Copy link
Contributor

@spoorcc spoorcc commented Mar 11, 2026

Summary by CodeRabbit

  • New Features

    • Added a file browser for inspecting package files directly from GitHub.
    • Introduced a Files tab in the package detail view.
    • Added GitHub token configuration in manifest settings for API access.
    • Support for per-path ignore rules in generated manifests.
  • Improvements

    • Enhanced version sorting to prioritize recent dates.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 11, 2026

📝 Walkthrough

Walkthrough

A single HTML file was updated to add a client-side file browser with GitHub API integration, allowing per-package file inspection. Additionally, manifest generation now supports per-path ignore rules, version sorting logic was improved to prefer newer dates, and new UI elements were introduced for GitHub token configuration and file browsing.

Changes

Cohort / File(s) Summary
GitHub Integration & File Browser
dfetch_hub/site/index.html
Added state variables for tracking file trees, ignored paths, and GitHub tokens. Implemented fetchGhData() to augment package data with GitHub-derived content, loadFilesIfNeeded() and related helpers to fetch and render file listings from GitHub API, and navigation functions for directory traversal.
Ignore Rules & Manifest Generation
dfetch_hub/site/index.html
Enhanced buildEntry() to append ignore rules via getIgnoredYaml() when present. Updated buildSnipHtml() and buildFullMfHtml() to optionally embed ignore rules in manifest representations.
Version Sorting & UI Enhancements
dfetch_hub/site/index.html
Improved _semverCmp() to prioritize newer dates over semantic versioning. Added Files tab and GitHub token configuration UI section in manifest settings. Extended tab-switching logic to support file browser loading and added folder/file/parent icons for file listings.

Sequence Diagram

sequenceDiagram
    participant User
    participant Client as Client UI
    participant State as Local State
    participant GH as GitHub API

    User->>Client: Select package
    Client->>State: Initialize files state (CUR_FILES, caches)
    Client->>GH: Call fetchGhData() with package info
    GH-->>Client: Return readme, tags with dates, contents
    Client->>State: Store GitHub data (GH_DATA)
    User->>Client: Click Files tab
    Client->>Client: loadFilesIfNeeded()
    Client->>GH: buildGhApiUrl() → fetch repository contents
    GH-->>Client: Return file tree with metadata
    Client->>State: Update CUR_FILES, CUR_FILES_CACHED_PKG
    Client->>Client: renderFilesList() + renderFileRow()
    Client-->>User: Display file browser with navigation
    User->>Client: Click file/folder or navigate path
    Client->>Client: navFilesPath(subPath)
    Client->>GH: Fetch contents for new path
    GH-->>Client: Return updated file tree
    Client->>Client: renderFilesList() again
    Client-->>User: Update display with new directory contents
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 Behold, a file browser blooms with GitHub's grace,
Where packages dance through folder space,
Ignores toggle, tokens sparkle bright,
Versions sort by date's own light,
A manifest garden, flourishing delight! 🌿✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'Use github token' is vague and does not clearly summarize the main changes. The PR introduces a comprehensive file browser with GitHub integration, but the title only references token usage, which is a secondary feature. Consider a more descriptive title like 'Add GitHub-powered file browser with token authentication' to better capture the main functionality being introduced.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch file-exploration

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@dfetch_hub/site/index.html`:
- Around line 532-565: The code persists the GitHub personal access token to
localStorage via the setGhToken flow (input id="ghToken",
oninput="setGhToken(this.value)"), which is insecure; change setGhToken and any
storage helpers so the token is kept only in memory (a JS variable/module-scoped
cache) or, if you require reload persistence, in sessionStorage instead of
localStorage; remove any auto-save logic that writes to localStorage and update
the user-facing copy in the element with class "gh-token-note" (and the
hint/status elements "ghTokenHint"/"ghTokenStatus") to reflect that the token is
stored only locally in the browser session (not long-lived localStorage) and is
sent only to GitHub for authenticated requests.
- Around line 976-983: The bug is that RAW[id]._detailLoaded is set before
GitHub enrichment, so fetchGhData(id) won't be retried after a later token is
provided; change the logic in the block that reads ../data/${id}.json and deals
with RAW[id] so that either (a) RAW[id]._detailLoaded is only set after
fetchGhData(c) completes successfully, or (b) introduce and check a separate
flag (e.g., RAW[id]._ghEnriched) and ensure fetchGhData(c) is invoked whenever
_ghEnriched is false even if _detailLoaded is true; also make sure to
clear/unset the relevant flag on rate-limit/error and surface errors from
fetchGhData so retries are possible.
- Around line 867-870: The path model flips because CUR_FILES_PATH is sometimes
set directly to path.split('/') while readers expect index 0 to be a root
sentinel; ensure CUR_FILES_PATH always has the sentinel at index 0 (use ['']
followed by the path segments) and never an empty array—when clearing set
CUR_FILES_PATH = [''] and whenever assigning from a string use a prepend of the
root sentinel (e.g., [''].concat(path === '' ? [] : path.split('/'))); update
every site location that assigns or compares CUR_FILES_PATH (the places that
also touch CUR_FILES, CUR_FILES_CACHED_PKG and CUR_FILES_CACHED_VERSION) so
readers no longer drop the first segment, re-fetch incorrectly, or carry stale
subpaths.
- Around line 357-375: Several CSS rule lines (e.g., selectors like
.mf-settings-drop input, .gh-token-wrap input, .gh-token-steps and others)
exceed the repo's 120-character line length; run the repository formatter or
manually wrap long one-liner rules into multiple lines (one property per line
and break long selector lists) to respect the 120-char limit for HTML/CSS/JS in
{**/*.html,**/*.md,**/*.rst}; reformat the specified hunks (including the other
flagged ranges 986-1004, 1301-1431, 1556) and re-run the test/workflow so the
file passes the line-length/doc8 checks.
- Around line 993-1000: The README base64 is being decoded with atob(d.content)
which produces a Latin1 string and will corrupt non-ASCII; inside the
if(readmeResp.ok) block replace the raw atob assignment to c.readme with a
proper base64→UTF‑8 decode: obtain the binary via atob(d.content), convert each
charCode to a Uint8Array, then use new TextDecoder('utf-8').decode(...) to
produce the correct UTF‑8 text and assign that to c.readme (keep the existing
try/catch and null checks around d.content).
- Around line 1343-1356: The file browser is vulnerable because
renderFileRow/building HTML uses esc() (HTML-escaping) inside inline event
handler attributes and raw innerHTML (fb.innerHTML) so crafted filenames can
break out and execute JS; fix by stopping use of inline JS handlers and
string-concatenated innerHTML: in renderFileRow and where
parentPath/item.name/fullPath are used, construct DOM nodes via
document.createElement, set element.textContent for file names, use
element.addEventListener to call navFilesPath or toggleFileIgnore (referencing
renderFileRow, navFilesPath, toggleFileIgnore, CUR_FILES_PATH, MFS_IGNORES), and
ensure any remaining template insertion (e.g., yamlQ() output) is HTML-escaped
before setting innerHTML or, better, inserted as textContent; do not rely on
esc() for JS contexts—use proper JS string handling only when building JS code
(but prefer event listeners instead).

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 40f7e685-90cc-498a-b8d2-a1cce2e6fe0d

📥 Commits

Reviewing files that changed from the base of the PR and between 4d772ab and e802e26.

📒 Files selected for processing (1)
  • dfetch_hub/site/index.html

Comment on lines +357 to +375
.mf-settings-drop input{font-family:var(--mono);font-size:.74rem;background:rgba(255,255,255,.1);border:1px solid rgba(255,255,255,.2);border-radius:var(--rm);color:#fff;padding:.18rem .38rem;width:80px;outline:none;flex:1}
.mf-settings-drop input:focus{border-color:var(--accent)}
.gh-token-row{padding-top:.35rem;border-top:1px solid rgba(255,255,255,.1);margin-top:.35rem}
.gh-token-wrap{display:flex;gap:.25rem;flex:1}
.gh-token-wrap input{width:100%!important}
.gh-token-toggle{background:rgba(255,255,255,.1);border:1px solid rgba(255,255,255,.2);border-radius:var(--rm);color:rgba(255,255,255,.6);cursor:pointer;padding:.12rem .35rem;font-size:.75rem;flex-shrink:0}
.gh-token-toggle:hover{background:rgba(255,255,255,.18);color:#fff}
.gh-token-status{font-size:.68rem;color:rgba(255,255,255,.45);padding-top:.25rem;display:flex;align-items:center;gap:.25rem}
.gh-token-status .gh-token-valid{color:#27ae60}
.gh-token-status .gh-token-invalid{color:#e74c3c}
.gh-token-status .gh-token-clear{color:rgba(255,255,255,.5);cursor:pointer;margin-left:auto;text-decoration:underline}
.gh-token-status .gh-token-clear:hover{color:#fff}
.gh-token-help{font-size:.68rem;color:rgba(255,255,255,.45);padding:.2rem 0 .1rem}
.gh-token-help summary{color:var(--accent);cursor:pointer;font-weight:600;user-select:none;outline:none}
.gh-token-help summary:hover{color:var(--accent2)}
.gh-token-hint{font-size:.65rem;color:rgba(255,255,255,.4);padding:.15rem 0}
.gh-token-steps{margin-top:.4rem;padding:.45rem .5rem;background:rgba(255,255,255,.05);border-radius:var(--rm);font-size:.66rem;line-height:1.7;color:rgba(255,255,255,.7)}
.gh-token-steps strong{color:#fff}
.gh-token-note{color:rgba(255,255,255,.4);font-size:.62rem;margin-top:.25rem;font-style:italic}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Run the formatter on these new hunks.

The Test workflow is already failing on this file, and these added CSS/JS one-liners exceed the repo's 120-character limit.

As per coding guidelines, {**/*.html,**/*.md,**/*.rst}: Enforce maximum line length of 120 characters in HTML/CSS/JS and RST/Markdown files with doc8.

Also applies to: 986-1004, 1301-1431, 1556-1556

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dfetch_hub/site/index.html` around lines 357 - 375, Several CSS rule lines
(e.g., selectors like .mf-settings-drop input, .gh-token-wrap input,
.gh-token-steps and others) exceed the repo's 120-character line length; run the
repository formatter or manually wrap long one-liner rules into multiple lines
(one property per line and break long selector lists) to respect the 120-char
limit for HTML/CSS/JS in {**/*.html,**/*.md,**/*.rst}; reformat the specified
hunks (including the other flagged ranges 986-1004, 1301-1431, 1556) and re-run
the test/workflow so the file passes the line-length/doc8 checks.

Comment on lines +532 to +565
<div class="mf-settings-row gh-token-row">
<label for="ghToken"
title="Personal access token for higher GitHub API rate limits.">
GitHub token:
</label>
<div class="gh-token-wrap">
<input id="ghToken"
type="password"
value=""
placeholder="ghp_xxxxxxxxxxxx"
oninput="setGhToken(this.value)"
onkeydown="if(event.key==='Enter'){event.preventDefault();toggleMfSettings()}"
aria-label="GitHub personal access token" />
<button class="gh-token-toggle"
type="button"
onclick="toggleGhTokenVisibility()"
aria-label="Toggle token visibility">
👁
</button>
</div>
</div>
<div class="gh-token-hint" id="ghTokenHint">
<span>Rate: 60/hr → 5000/hr with token</span>
</div>
<details class="gh-token-help">
<summary>How to get a token (click)</summary>
<div class="gh-token-steps">
<div>1. <button type="button" class="gh-token-get" onclick="openGhTokenPage()">Go to GitHub ↗</button></div>
<div>2. Scroll down, click <strong>Generate token</strong></div>
<div>3. Copy the token and paste above (auto-saves)</div>
<div class="gh-token-note">Stored locally in your browser, never sent to any server</div>
</div>
</details>
<div class="gh-token-status" id="ghTokenStatus"></div>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid persisting a personal access token in localStorage.

This leaves a long-lived credential readable by any script on this origin and on shared machines after the tab closes. Prefer in-memory storage, or at most sessionStorage if you need reload persistence. The helper copy at Line 562 should also be updated, because authenticated requests here do send the token to GitHub.

Also applies to: 872-872, 1400-1427

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dfetch_hub/site/index.html` around lines 532 - 565, The code persists the
GitHub personal access token to localStorage via the setGhToken flow (input
id="ghToken", oninput="setGhToken(this.value)"), which is insecure; change
setGhToken and any storage helpers so the token is kept only in memory (a JS
variable/module-scoped cache) or, if you require reload persistence, in
sessionStorage instead of localStorage; remove any auto-save logic that writes
to localStorage and update the user-facing copy in the element with class
"gh-token-note" (and the hint/status elements "ghTokenHint"/"ghTokenStatus") to
reflect that the token is stored only locally in the browser session (not
long-lived localStorage) and is sent only to GitHub for authenticated requests.

Comment on lines +867 to +870
let CUR_FILES = null; // {path:[{name,type,sha,ignored?}]}
let CUR_FILES_PATH = []; // current breadcrumb path stack
let CUR_FILES_CACHED_PKG = null; // package ID for cached files
let CUR_FILES_CACHED_VERSION = null; // version for cached files
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

The file browser's path model is broken right now.

CUR_FILES_PATH flips between [''] and path.split('/'), but the readers always treat index 0 as a root sentinel. That makes the first directory open refetch the current directory, drops the first segment of nested paths, and can carry a stale subpath into the next package/version load.

Possible fix
-let CUR_FILES_PATH = []; // current breadcrumb path stack
+let CUR_FILES_PATH = ''; // current path relative to repo root
...
-  _pushURL({pkg:id});curPkg=id;CUR_FILES=null;CUR_FILES_CACHED_PKG=null;CUR_FILES_CACHED_VERSION=null;showPage('page-detail');window.scrollTo(0,0);
+  _pushURL({pkg:id});curPkg=id;CUR_FILES=null;CUR_FILES_PATH='';CUR_FILES_CACHED_PKG=null;CUR_FILES_CACHED_VERSION=null;showPage('page-detail');window.scrollTo(0,0);
...
-  CUR_FILES=null;
+  CUR_FILES=null;
+  CUR_FILES_PATH='';
   CUR_FILES_CACHED_PKG=null;
   CUR_FILES_CACHED_VERSION=null;
...
-function buildGhApiUrl(c,version){
+function buildGhApiUrl(c,version,path=''){
   let org=c.org||'';let repo=c.repo||c.name||'';
   const m=(c.url||'').match(/github\.com[\/:]([^\/]+)\/([^\/\s]+)/);
   if(m){org=m[1].toLowerCase();repo=m[2].replace(/\.git$/,'').toLowerCase();}
   const ref=version?(version.kind==='branch'?version.name:version.name):c.default_branch||'main';
-  const pth=CUR_FILES_PATH.slice(1).join('/');
-  const pathParam=pth?'/'+pth:'';
-  return `https://api.github.com/repos/${org}/${repo}/contents${pathParam}?ref=${ref}`;
+  const pathParam=path ? '/' + path.split('/').map(encodeURIComponent).join('/') : '';
+  return `https://api.github.com/repos/${org}/${repo}/contents${pathParam}?ref=${encodeURIComponent(ref)}`;
 }
...
-  const curPath=CUR_FILES_PATH.slice(1).join('/');
+  const curPath=CUR_FILES_PATH;
...
-  if(CUR_FILES[targetPath]){CUR_FILES_PATH=subPath?subPath.split('/'):[''];renderFilesList();return;}
+  if(CUR_FILES[targetPath]){CUR_FILES_PATH=targetPath;renderFilesList();return;}
...
-    const apiUrl=buildGhApiUrl(c,curVersion);
+    const apiUrl=buildGhApiUrl(c,curVersion,targetPath);
...
-    CUR_FILES_PATH=targetPath?targetPath.split('/'):[''];
+    CUR_FILES_PATH=targetPath;

Also applies to: 975-975, 1231-1233, 1325-1333, 1335-1343, 1379-1392

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dfetch_hub/site/index.html` around lines 867 - 870, The path model flips
because CUR_FILES_PATH is sometimes set directly to path.split('/') while
readers expect index 0 to be a root sentinel; ensure CUR_FILES_PATH always has
the sentinel at index 0 (use [''] followed by the path segments) and never an
empty array—when clearing set CUR_FILES_PATH = [''] and whenever assigning from
a string use a prepend of the root sentinel (e.g., [''].concat(path === '' ? []
: path.split('/'))); update every site location that assigns or compares
CUR_FILES_PATH (the places that also touch CUR_FILES, CUR_FILES_CACHED_PKG and
CUR_FILES_CACHED_VERSION) so readers no longer drop the first segment, re-fetch
incorrectly, or carry stale subpaths.

Comment on lines 976 to 983
if(!c._detailLoaded){
document.getElementById('readmeBody').innerHTML='<div class="readme-empty">Loading…</div>';
document.getElementById('vtbody').innerHTML='<tr><td colspan="4" style="text-align:center;color:var(--subtle)">Loading…</td></tr>';
try{const resp=await fetch(`../data/${id}.json`);if(resp.ok){const d=await resp.json();Object.assign(RAW[id],d);}}catch(e){}
RAW[id]._detailLoaded=true;
const isGh=c.source_type==='github'||(c.source_labels||[]).includes('github');
if(isGh){await fetchGhData(c);}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Make GitHub enrichment retryable.

Once _detailLoaded is set, fetchGhData() never runs again for that package. If the first unauthenticated call is rate-limited and the user adds a token afterward, README/tags/changelog stay stale until a hard refresh.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dfetch_hub/site/index.html` around lines 976 - 983, The bug is that
RAW[id]._detailLoaded is set before GitHub enrichment, so fetchGhData(id) won't
be retried after a later token is provided; change the logic in the block that
reads ../data/${id}.json and deals with RAW[id] so that either (a)
RAW[id]._detailLoaded is only set after fetchGhData(c) completes successfully,
or (b) introduce and check a separate flag (e.g., RAW[id]._ghEnriched) and
ensure fetchGhData(c) is invoked whenever _ghEnriched is false even if
_detailLoaded is true; also make sure to clear/unset the relevant flag on
rate-limit/error and surface errors from fetchGhData so retries are possible.

Comment on lines +993 to +1000
const [readmeResp,tagsResp,contentsResp,repoResp]=await Promise.all([
fetch(`https://api.github.com/repos/${org}/${repo}/readme`,opts),
fetch(`https://api.github.com/repos/${org}/${repo}/tags?per_page=100`,opts),
fetch(`https://api.github.com/repos/${org}/${repo}/contents/`,opts),
fetch(`https://api.github.com/repos/${org}/${repo}`,opts)
]);
if(readmeResp.ok){try{const d=await readmeResp.json();if(d.content){const txt=atob(d.content);c.readme=txt;}}catch(e){}}
if(tagsResp.ok){try{const tags=await tagsResp.json();if(Array.isArray(tags)){const tagList=tags.slice(0,50);const commitResps=await Promise.allSettled(tagList.map(t=>fetch(`https://api.github.com/repos/${org}/${repo}/commits/${t.commit.sha}`,opts).then(r=>r.ok?r.json():null)));tagList.forEach((t,i)=>{if(commitResps[i].status==='fulfilled'&&commitResps[i].value?.commit?.committer?.date)t.date=commitResps[i].value.commit.committer.date;});c.tags=[...new Map(tagList.map(t=>({name:t.name,commit_sha:t.commit?.sha||'',date:t.date||'',...(t.commit?.url?{commit_url:t.commit.url}:{})})).map(t=>[t.name,t])).values()];}}catch(e){}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

GitHub API README endpoint content encoding base64

💡 Result:

GitHub’s REST README endpoint (GET /repos/{owner}/{repo}/readme) returns a contents JSON object where:

  • encoding is "base64"
  • content is the README bytes base64-encoded (often with \n line breaks inserted) [1]

So to get the README text, remove line breaks from content and base64-decode it.

If you want the README as raw bytes/text instead of base64 JSON, request the raw media type via Accept (GitHub documents application/vnd.github.raw+json) [1]. Older examples commonly use application/vnd.github.v3.raw and work in many clients [2].

Example (curl, raw):

curl -L \
  -H "Accept: application/vnd.github.raw+json" \
  https://api.github.com/repos/OWNER/REPO/readme

Example (decode JSON content in JS):

const decoded = Buffer.from(content.replace(/\n/g, ""), "base64").toString("utf8");

Sources: GitHub REST “Repository contents / Get a repository README” docs [1], Stack Overflow example using the raw Accept header [2].


🏁 Script executed:

# Check the current line lengths in the suggested fix to see if they comply with 120 char limit
python3 << 'EOF'
# Suggested fix lines
lines = [
    "    if(readmeResp.ok){",
    "      try{",
    "        const d=await readmeResp.json();",
    "        if(d.content){",
    "          const bin=atob(d.content.replace(/\\n/g,''));",
    "          const bytes=Uint8Array.from(bin,ch=>ch.charCodeAt(0));",
    "          c.readme=new TextDecoder('utf-8').decode(bytes);",
    "        }",
    "      }catch(e){}",
    "    }"
]

max_len = 120
violations = []
for i, line in enumerate(lines, 1):
    if len(line) > max_len:
        violations.append(f"Line {i}: {len(line)} chars - {line}")

if violations:
    print("Line length violations (>120 chars):")
    for v in violations:
        print(v)
else:
    print("All lines comply with 120 char limit")
    print(f"Longest line: {max(len(line) for line in lines)} chars")
EOF

Repository: dfetch-org/dfetch-hub

Length of output: 124


Decode README content as UTF-8 bytes, not a raw atob() string.

atob() decodes base64 to a Latin1 binary string. Assigning that directly to c.readme will garble any non-ASCII characters in the README (emoji, accented characters, etc.).

Suggested fix
-    if(readmeResp.ok){try{const d=await readmeResp.json();if(d.content){const txt=atob(d.content);c.readme=txt;}}catch(e){}}
+    if(readmeResp.ok){
+      try{
+        const d=await readmeResp.json();
+        if(d.content){
+          const bin=atob(d.content.replace(/\n/g,''));
+          const bytes=Uint8Array.from(bin,ch=>ch.charCodeAt(0));
+          c.readme=new TextDecoder('utf-8').decode(bytes);
+        }
+      }catch(e){}
+    }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dfetch_hub/site/index.html` around lines 993 - 1000, The README base64 is
being decoded with atob(d.content) which produces a Latin1 string and will
corrupt non-ASCII; inside the if(readmeResp.ok) block replace the raw atob
assignment to c.readme with a proper base64→UTF‑8 decode: obtain the binary via
atob(d.content), convert each charCode to a Uint8Array, then use new
TextDecoder('utf-8').decode(...) to produce the correct UTF‑8 text and assign
that to c.readme (keep the existing try/catch and null checks around d.content).

Comment on lines +1343 to +1356
html+=`<div class="files-row files-parent-row" onclick="navFilesPath('${esc(parentPath)}')" style="border-bottom:1px solid var(--border)"><span class="files-ico">${iconParent()}</span><span class="files-name">.. (parent directory)</span></div>`;
}
sorted.forEach(item=>{html+=renderFileRow(item);});
html+='</div></div>';
fb.innerHTML=html;
updFilesIgnCnt();
}
function renderFileRow(item){
const ico=item.type==='dir'?iconFolder():iconFile();
const curDir=CUR_FILES_PATH.slice(1).join('/');
const fullPath=curDir?curDir+'/'+item.name:item.name;
const ign=MFS_IGNORES[curPkg]?.has(fullPath)||item.ignored;
const folderNav=item.type==='dir'?`ondblclick="event.stopPropagation();navFilesPath('${esc(item.name)}')" title="Double-click to open"`:' title="Check to ignore"';
return `<div class="files-row"><input type="checkbox" ${ign?'checked':''} onchange="toggleFileIgnore('${esc(fullPath)}',this.checked)"><span class="files-ico">${ico}</span><span class="files-name${ign?' ignored':''}" ${folderNav}>${esc(item.name)}</span></div>`;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, check the file exists and get line count
wc -l dfetch_hub/site/index.html

Repository: dfetch-org/dfetch-hub

Length of output: 96


🏁 Script executed:

# Read the specific lines mentioned in the review
sed -n '1343,1356p' dfetch_hub/site/index.html

Repository: dfetch-org/dfetch-hub

Length of output: 1123


🏁 Script executed:

# Also check lines 1510-1528
sed -n '1510,1528p' dfetch_hub/site/index.html

Repository: dfetch-org/dfetch-hub

Length of output: 1351


🏁 Script executed:

# Find the esc() function implementation
rg -n "function esc" dfetch_hub/site/index.html | head -20

Repository: dfetch-org/dfetch-hub

Length of output: 141


🏁 Script executed:

# Find the yamlQ() function implementation
rg -n "function yamlQ|const yamlQ|let yamlQ" dfetch_hub/site/index.html | head -20

Repository: dfetch-org/dfetch-hub

Length of output: 186


🏁 Script executed:

# Find the _ESC definition
rg -n "_ESC" dfetch_hub/site/index.html | head -10

Repository: dfetch-org/dfetch-hub

Length of output: 219


🏁 Script executed:

# Get more context around the _ESC definition
rg -B5 -A5 "_ESC\s*=" dfetch_hub/site/index.html

Repository: dfetch-org/dfetch-hub

Length of output: 1249


🏁 Script executed:

# Let's also check how innerHTML is being used with the html variable
rg -n "innerHTML" dfetch_hub/site/index.html | head -20

Repository: dfetch-org/dfetch-hub

Length of output: 2038


🏁 Script executed:

# Search for where yamlQ results are inserted into the DOM
rg -B3 -A3 "yamlQ" dfetch_hub/site/index.html

Repository: dfetch-org/dfetch-hub

Length of output: 2602


🏁 Script executed:

# Check for CSP or other security headers
rg -i "content-security-policy|csp|script.*src" dfetch_hub/site/index.html | head -20

Repository: dfetch-org/dfetch-hub

Length of output: 233


🏁 Script executed:

# Check if DOMPurify is used and how
rg -n "DOMPurify" dfetch_hub/site/index.html

Repository: dfetch-org/dfetch-hub

Length of output: 366


🏁 Script executed:

# Verify the exact context where yamlQ is used in HTML generation (around 1510-1528)
sed -n '1500,1530p' dfetch_hub/site/index.html

Repository: dfetch-org/dfetch-hub

Length of output: 2104


🏁 Script executed:

# Check for any sanitization or token storage details
rg -n "token|localStorage|sessionStorage" dfetch_hub/site/index.html | head -20

Repository: dfetch-org/dfetch-hub

Length of output: 1793


🏁 Script executed:

# Let me check what DOMPurify is used for and if it's applied to the file list HTML
sed -n '1340,1360p' dfetch_hub/site/index.html

Repository: dfetch-org/dfetch-hub

Length of output: 1585


🏁 Script executed:

# Check the function that uses fb.innerHTML to see if any sanitization happens
rg -B10 "fb.innerHTML" dfetch_hub/site/index.html

Repository: dfetch-org/dfetch-hub

Length of output: 5153


🏁 Script executed:

# Verify the exact vulnerability: test if esc() in onclick attributes provides JS-context escaping
python3 << 'EOF'
# Test case: a filename with a quote and JavaScript
filename = "');alert('xss');//"

# Simulate esc() function
def esc(s):
    s = str(s or '')
    escapes = {'&': '&amp;', '<': '&lt;', '>': '&gt;', '"': '&quot;', "'": '&#39;'}
    return ''.join(escapes.get(c, c) for c in s)

escaped = esc(filename)
print(f"Original filename: {filename}")
print(f"After esc(): {escaped}")

# Now simulate HTML parsing - browsers decode HTML entities in attributes
html_string = f"<div onclick=\"navFilesPath('{escaped}')\"></div>"
print(f"HTML string: {html_string}")

# When parsed, entities are decoded
decoded_onclick = html_string.replace('&#39;', "'")
print(f"After browser parsing entities: {decoded_onclick}")

# Extract what JS sees
import re
match = re.search(r"onclick=\"([^\"]+)\"", decoded_onclick)
if match:
    js_code = match.group(1)
    print(f"JavaScript sees: {js_code}")
EOF

Repository: dfetch-org/dfetch-hub

Length of output: 369


Critical XSS vulnerability in file browser: crafted filenames can execute JavaScript and steal tokens.

Repository paths (parentPath, item.name, fullPath) are interpolated into inline event handlers using esc(), which performs HTML entity escaping (&#39; for quotes) rather than JavaScript string escaping. When innerHTML renders this HTML, the browser decodes entities back to literal characters, allowing attackers to break out of the string context and inject arbitrary code into onclick/ondblclick handlers.

Additionally, at lines 1510–1528, yamlQ() results are wrapped in HTML without escaping, permitting direct HTML injection through filenames.

A crafted filename like ');alert(1);// or <img src=x onerror=alert(1)> can execute JavaScript with access to stored GitHub tokens in localStorage.

Use textContent instead of innerHTML where possible, or apply proper JavaScript string escaping for event attributes (not HTML entity escaping), and HTML escape all content inserted via innerHTML.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dfetch_hub/site/index.html` around lines 1343 - 1356, The file browser is
vulnerable because renderFileRow/building HTML uses esc() (HTML-escaping) inside
inline event handler attributes and raw innerHTML (fb.innerHTML) so crafted
filenames can break out and execute JS; fix by stopping use of inline JS
handlers and string-concatenated innerHTML: in renderFileRow and where
parentPath/item.name/fullPath are used, construct DOM nodes via
document.createElement, set element.textContent for file names, use
element.addEventListener to call navFilesPath or toggleFileIgnore (referencing
renderFileRow, navFilesPath, toggleFileIgnore, CUR_FILES_PATH, MFS_IGNORES), and
ensure any remaining template insertion (e.g., yamlQ() output) is HTML-escaped
before setting innerHTML or, better, inserted as textContent; do not rely on
esc() for JS contexts—use proper JS string handling only when building JS code
(but prefer event listeners instead).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant