Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
113 changes: 113 additions & 0 deletions CHROME_WEB_STORE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
# Chrome Web Store Listing

## Extension Name

Scrape GitHub by Olostep

## Short Description

Extract GitHub data as JSON or CSV, then discover Olostep's Web Data API for scalable scraping and AI workflows.

## Full Description

Scrape GitHub by Olostep is a lightweight Chrome extension that turns GitHub repository pages and personal user profiles into structured JSON or CSV directly from your browser.

Use it to quickly extract GitHub data such as repository names, descriptions, stars, forks, watchers, topics, programming languages, licenses, README summaries, profile bios, follower counts, social links, and pinned repositories.

This extension is designed as a simple example of structured web data extraction. For developers and teams that need to scrape websites at scale, crawl pages, extract clean Markdown, HTML, PDFs, or structured JSON, and power AI agents with reliable web data, Olostep provides a full Web Data API.

Olostep helps you search, extract, and structure web data from any website through developer-friendly APIs for scrapes, crawls, answers, parsers, batches, and browser automations.

Key features:

- Scrape GitHub repository pages into structured JSON
- Export GitHub data as CSV
- Parse GitHub user profiles and pinned repositories
- Copy or download extracted data from the browser
- Run local parsing from the active GitHub tab
- Discover Olostep for scalable web scraping, crawling, and AI data workflows

Scrape GitHub is useful for open-source research, developer sourcing, repository analysis, technical research, competitive intelligence, and structured GitHub data collection.

Supported pages currently include GitHub repository root pages and personal user profile root pages.

When you are ready to move beyond a browser extension, visit https://olostep.com to build scalable web data pipelines with Olostep's Web Data API for AI.

## Single Purpose Statement

Scrape GitHub by Olostep extracts structured JSON or CSV from the active supported GitHub repository or personal profile page when the user clicks the extension popup.

## Category

Developer Tools

## Language

English

## Website URL

https://olostep.com

## Support URL

https://olostep.com

## Privacy Policy URL

https://www.olostep.com/privacy-policy

## Permission Justifications

### activeTab

Required so the extension can access the currently active tab only after the user opens the extension and requests parsing. The extension uses this access to read the supported GitHub page the user wants to parse.

### scripting

Required to run a small script in the active tab that reads the page URL and HTML. This lets the extension parse the current GitHub repository or profile page locally in the popup.

### Host permission: https://github.com/*

Required to limit extension parsing to GitHub pages. The extension supports GitHub repository root pages and personal user profile root pages.

## Privacy Practices Answers

### Does the extension collect user data?

No. The extension reads the active GitHub page HTML and URL locally only when the user clicks "Parse current page". It does not store, transmit, sell, or share parsed data.

### Does the extension use remote code?

No. All extension code is included in the extension package.

### Does the extension use analytics or tracking?

No. The extension does not include analytics, advertising SDKs, telemetry, tracking pixels, or behavioral tracking.

### Does the extension transfer data to Olostep or third parties?

No. Parsed GitHub page content, JSON, CSV, and URLs are not sent to Olostep or third parties. The popup includes a link to Olostep's website, and clicking it navigates the browser to https://olostep.com.

## Store Assets

Prepared assets:

- `store-assets/screenshot-1280x800.png`
- `store-assets/small-promo-440x280.png`
- `extension/icons/icon-128.png`

## Manual QA Checklist

- Load the `extension/` folder as an unpacked extension in Chrome.
- Open a GitHub repository root page and confirm parsing succeeds.
- Open a GitHub personal profile root page and confirm parsing succeeds.
- Open an unsupported GitHub subpage and confirm parsing is disabled.
- Open a non-GitHub page and confirm parsing is disabled.
- Copy JSON and CSV results.
- Download JSON and CSV results.
- Click the Olostep links and confirm they open https://olostep.com.

## Upload Notes

Upload `dist/scrape-github-by-olostep.zip` to the Chrome Web Store dashboard.
Binary file modified docs/assets/chrome-extension-parse.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
8 changes: 5 additions & 3 deletions extension/github-parsers.js
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,9 @@ function getMeta(doc, selector) {
function parseCount(value) {
if (!value) return null;
const normalized = value.replace(/,/g, "").trim().toLowerCase();
const match = normalized.match(/^([\d.]+)([km])?$/);
const match =
normalized.match(/^([\d.]+)\s*([km])?$/) ||
normalized.match(/([\d.]+)\s*([km])?(?=\s*(stars?|forks?|watch(?:ing|ers)?|followers?|following|$))/);
if (!match) return null;
const number = Number(match[1]);
if (Number.isNaN(number)) return null;
Expand Down Expand Up @@ -83,8 +85,8 @@ function parseRepository(htmlString, pageUrl) {
return href.includes("/LICENSE") || /license/i.test(getText(node) || "");
});
const readmeSummary = getText(
doc.querySelector("article.markdown-body p, article.markdown-body h1, article.markdown-body h2")
);
doc.querySelector("#readme article.markdown-body p, #readme article.markdown-body h1, #readme article.markdown-body h2, article.markdown-body p, article.markdown-body h1, article.markdown-body h2, .js-snippet-clipboard-copy-unpositioned pre")
) || getText(doc.querySelector("#readme .markdown-body, #readme .Box-body, #readme, article.markdown-body, .markdown-body, .js-snippet-clipboard-copy-unpositioned"));

return {
success: true,
Expand Down
Binary file added extension/icons/icon-128.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added extension/icons/icon-16.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added extension/icons/icon-32.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added extension/icons/icon-48.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
19 changes: 15 additions & 4 deletions extension/manifest.json
Original file line number Diff line number Diff line change
@@ -1,8 +1,14 @@
{
"manifest_version": 3,
"name": "Scrape GitHub",
"name": "Scrape GitHub by Olostep",
"version": "0.1.0",
"description": "Parse GitHub repository and user profile pages into structured JSON.",
"description": "Extract GitHub repo and profile data as JSON/CSV, then discover Olostep's Web Data API for AI.",
"icons": {
"16": "icons/icon-16.png",
"32": "icons/icon-32.png",
"48": "icons/icon-48.png",
"128": "icons/icon-128.png"
},
"permissions": [
"activeTab",
"scripting"
Expand All @@ -11,8 +17,13 @@
"https://github.com/*"
],
"action": {
"default_title": "Scrape GitHub",
"default_title": "Scrape GitHub by Olostep",
"default_popup": "popup.html",
"default_icon": "olostep-icon.png"
"default_icon": {
"16": "icons/icon-16.png",
"32": "icons/icon-32.png",
"48": "icons/icon-48.png",
"128": "icons/icon-128.png"
}
}
}
70 changes: 69 additions & 1 deletion extension/popup.css
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,7 @@ body {
min-height: 36px;
padding: 0 12px;
text-decoration: none;
white-space: nowrap;
}

.panel {
Expand Down Expand Up @@ -121,8 +122,63 @@ body {
color: var(--muted);
}

.actions {
.toolbar {
display: flex;
justify-content: flex-end;
margin-bottom: 12px;
}

.segmented {
display: inline-flex;
align-items: center;
gap: 6px;
background: var(--panel);
border: 1px solid var(--line);
border-radius: 999px;
padding: 6px;
}

.segmented-label {
color: var(--muted);
font-size: 12px;
font-weight: 600;
padding: 0 8px 0 10px;
}

.segmented-option {
border: 1px solid transparent;
background: transparent;
color: var(--ink);
border-radius: 999px;
cursor: pointer;
font: inherit;
font-size: 12px;
letter-spacing: 0.01em;
min-height: 28px;
padding: 0 10px;
transition: background-color 140ms ease, opacity 140ms ease;
}

.segmented-option:hover:enabled {
background: rgba(108, 99, 255, 0.08);
transform: none;
}

.segmented-option:disabled {
cursor: not-allowed;
opacity: 0.55;
}

.segmented-option.is-active {
background: var(--accent-soft);
color: var(--accent-strong);
border-color: rgba(108, 99, 255, 0.18);
font-weight: 700;
}

.actions {
display: grid;
grid-template-columns: 1fr 1fr;
gap: 10px;
}

Expand Down Expand Up @@ -170,3 +226,15 @@ button:disabled {
white-space: pre-wrap;
word-break: break-word;
}

.footer-link {
margin: 12px 0 0;
text-align: center;
}

.footer-link a {
color: var(--accent-strong);
font-size: 12px;
font-weight: 600;
text-decoration: none;
}
21 changes: 17 additions & 4 deletions extension/popup.html
Original file line number Diff line number Diff line change
Expand Up @@ -3,22 +3,22 @@
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Scrape GitHub</title>
<title>Scrape GitHub by Olostep</title>
<link rel="stylesheet" href="./popup.css">
</head>
<body>
<main class="popup">
<header class="topbar">
<div>
<p class="eyebrow">olostep</p>
<p class="eyebrow">by Olostep</p>
<h1>Scrape GitHub</h1>
</div>
<a class="docs-link" href="https://olostep.com" target="_blank" rel="noreferrer">API</a>
<a class="docs-link" href="https://olostep.com" target="_blank" rel="noreferrer">Web Data API</a>
</header>

<section class="panel intro-panel">
<p class="intro-title">GitHub parser</p>
<p class="intro-copy">Parse repository roots and personal profile roots into structured JSON with a local extension UI that mirrors the Olostep workflow.</p>
<p class="intro-copy">Extract GitHub repository and profile data locally. Need reliable web data at scale? Use Olostep's Web Data API for scrapes, crawls, parsers, and AI workflows.</p>
</section>

<section class="panel">
Expand All @@ -29,13 +29,26 @@ <h1>Scrape GitHub</h1>

<p id="status-text" class="status-text">Open a GitHub repository root or user profile root.</p>

<div class="toolbar">
<div class="segmented" role="group" aria-label="Output format">
<button id="format-json" class="segmented-option is-active" type="button" disabled>JSON</button>
<button id="format-csv" class="segmented-option" type="button" disabled>CSV</button>
</div>
</div>

<div class="actions">
<button id="parse-button" class="primary" type="button" disabled>Parse current page</button>
<button id="copy-button" class="secondary" type="button" disabled>Copy JSON</button>
<button id="download-json-button" class="secondary" type="button" disabled>Download JSON</button>
<button id="download-csv-button" class="secondary" type="button" disabled>Download CSV</button>
</div>

<pre id="output" class="output">No parsed result yet.</pre>
</section>

<footer class="footer-link">
<a href="https://olostep.com" target="_blank" rel="noreferrer">Build scalable extraction workflows with Olostep</a>
</footer>
</main>

<script type="module" src="./popup.js"></script>
Expand Down
Loading