Skip to content

Latest commit

 

History

History
439 lines (285 loc) · 34.3 KB

File metadata and controls

439 lines (285 loc) · 34.3 KB

I went looking for AI-built code on GitHub. I found a bot-star service for Chinese AI startups, a 3,112-repo brand-squat of an Indian physical-AI company, two republished crypto wallets, and an Allen AI paper laundered four days after release.

Working title. Draft 2.


Cold open

I was building a small corpus for an even smaller experiment: could three forensic tools (entropyx, kraken, vajra) tell AI-authored code apart from human-authored code on real GitHub repositories? The plan was to grab the most-prolific Co-Authored-By: Claude repos by commit count and treat them as a positive class.

The two top repos by that ranking are kyasbalme/Scrapbox and luliguyu/cmbd-book. They have 101 Claude-trailer commits each. They surface at the very top of gh search commits "Co-Authored-By: Claude" sort:author-date-desc because their committer dates extend to June 2037.

Those two repos turn out to be byte-identical except for one byte in README.md. Both are the same Japanese Shogi-AI project pushed under different sock-puppet identities by the same operator. The "Claude" trailer is inherited from upstream commits the operator cloned — they didn't add it, they preserved it as free SEO. The fabricated 2030s timestamps are how they sit at the top of the sort.

Pulling that thread gave me, in roughly one afternoon:

  • Three distinct operator clusters running republishing farms on GitHub.
  • At least 16 farm-controlled GitHub accounts (4 in Cluster A, 8 in Cluster B, ≥1 org + suspected affiliates in Cluster C).
  • Cluster C alone publishes 3,112 fake AI-product landing pages — currently growing at ~140 repos per day — under the name DelbyIntelligence, which appears to brand-squat the real Indian physical-AI startup delby.ai.
  • A paid GitHub-star-inflation service stitched across the whole thing, also serving at least four real Chinese AI startups with combined ~19,000 stars between them.
  • Two real cryptocurrency wallets (Quantum Resistant Ledger, NEAR Protocol) republished as "personal forks" under stolen contributor handles.
  • An Allen Institute for AI paper repository (allenai/WildDet3D) cloned and republished four days after the official release.
  • A separate operator (Cluster B) forging Claude Code <claude-code@anthropic.local> as the author identity on commits in mostly-compromised dormant accounts, including one that impersonates the European Synchrotron Research Facility.
  • 43% of the top-1000 results for Co-Authored-By: Claude` on GitHub commit search trace to just two repos from one operator. Anyone scraping that for research, training data, or market intelligence is being fed adversarial output as the dominant signal.

Below is the forensic case for each layer, the tooling I used, the things I'm still chasing, and the disclosures I plan before publication.


Part 1 — How the GitHub commit-search trick works

GitHub commit search supports sort:author-date-desc. Author date is whatever sits in the commit metadata, and git happily accepts arbitrary Unix timestamps. The farms rewrite commit dates to 2030, 2034, 2036, 2037. Their commits sit at the top of every Co-Authored-By: query forever, or at least until 2037 actually arrives.

The Co-Authored-By: Claude text is not generated by the farm. It's inherited from the upstream commits they cloned. The real upstream authors used Claude. The farm preserved their commit messages while rewriting the author identity to a sock-puppet. Anthropic's trailer becomes free SEO for content the actual humans wrote.

That gets you to the top of the search. The rest is content laundering, in two distinct modes.

Mode 1 — Identity-rewrite

Clone the target repo. Replace single-author identity with a sock-puppet email (e.g. BensonJennifer6145@outlook.com). Scramble all author/committer Unix timestamps uniformly across 2024-2037. Push under a fresh GitHub owner account. Add a junk profile bio.

Forensic signature: entropyx scan returns author_entropy: 0.0 across every file (single sole author), uniform temporal_volatility (no work-day clustering), and produces a metric vector that is bit-identical between two such repos from the same operator.

Mode 2 — Straight import

Clone the target. Push under a fresh GitHub owner account. Preserve original author identities, original timestamps. The owner account is fake; the commit history isn't.

Forensic signature: harder — looks legitimate at the commit level. The tells are in the owner account profile (created days before the repo, no followers, repo name reuses upstream contributor handle, etc.).

Operator A uses both modes. Mode 1 for repos with one upstream author; Mode 2 for repos with many.


Part 2 — Operator A: Sino-themed republishing farm

The operator network

Four GitHub accounts confirmed via shared sock-puppet author emails:

Account Created Repos Notes
kyasbalme 2025-10-25 6 Bio "CS & Philosophy @ Columbia"; profile website http://blu3mo.com/ (a real Japanese Scrapbox researcher's page)
luliguyu 2025-10-21 10 company field contains junk Japanese text "のサポートペ" — automation bleed
tusmart-grouptt ("Eric Pi") 2022-03-18 14 Older account showing recent farm activity. Working hypothesis: compromised dormant account. If you are the real "Eric Pi" who registered this account in 2022, please contact us — we will issue a correction and remove identifying details. Hosts a 403-star repo.
countneurooman ("countneuroman") 2022-04-18 10 Same pattern — 4-year-old account, recent farm activity. Working hypothesis: compromised dormant account. If you are the real account holder, please contact us. Hosts a 319-star repo.

The smoking-gun cross-account attribution is the email bmqx9295@163.com, which appears as sole author in both luliguyu/crewrktabletsn and tusmart-grouptt/crewrktabletsn — same email, same repo name, two different "owner" accounts. Same operator. Identical evidence for naobingdz407945@163.com linking luliguyu and countneurooman.

kraken independently corroborated this: the behavioural fingerprint vectors of kyasbalme and luliguyu match to the fourth decimal on rhythm_period: 13.0, burst_rate: 0.0, star_concentration: 0.0, and career_hops: 0 — values that reflect automation, not the underlying repo content.

Sock-puppet pool

Nine distinct author emails confirmed across the 16 farm repos:

  • Anglo-name patterns on outlook.com: BensonJennifer6145@outlook.com, DelacruzDawn1338@outlook.com
  • Random alphanumeric on Chinese free providers: bmqx9295@163.com, naobingdz407945@163.com, kuqkz736@yeah.net, sbolr9514@yeah.net, io64083@yeah.net, czmahaixuan@126.com
  • Plus weikaih@allenai.org and 767653317@qq.compreserved upstream identities of real authors, not sock-puppets

The Chinese-provider concentration (yeah.net, 163.com, 126.com, qq.com) localises operator infrastructure to China and adjacent markets.

Confirmed real-OSS upstream victims

Farm repo Real upstream Domain
luliguyu/WildDet3D allenai/WildDet3D (Allen Institute for AI, 505 stars) Real-time academic IP theft — 4-day lag from official release
kyasbalme/AiToEarn yikart/AiToEarn (9,765 stars) Chinese AI content-monetization platform — auto-publishes to Douyin, Xiaohongshu, WeChat, TikTok, YouTube, etc.
luliguyu/dimatura theQRL/zond-web3-wallet Crypto wallet — Quantum Resistant Ledger
luliguyu/ssaavedrad (also countneurooman/ssaavedrad) Narwallets/narwallets-extension Crypto wallet — NEAR Protocol
luliguyu/sachinDevloop matteonerini/5g-network-slicing-for-wifi-networks Academic 5G research
luliguyu/academ bakercp/ofxEdsdk openFrameworks Canon SDK
luliguyu/statbox2 tetsu-osaka-physics/statbox2 Japanese academic LaTeX
kyasbalme/robokssay khsk/AviUtl-LocalFontPlugin + sigma-axis/MediaPipe Japanese video editor
kyasbalme/Scrapbox + luliguyu/cmbd-book "maou" Shogi-AI (src/maou/... paths) Japanese AI; description lifted from nota/Scrapbox (Nota Inc. wiki product); profile claims blu3mo.com

The naming trick

Several farm repo names are the GitHub usernames of real upstream contributorsdimatura, ssaavedrad, sachinDevloop. The intent is personal-fork camouflage: luliguyu/ssaavedrad looks like the real ssaavedrad developer's fork until you check the owner account. This adds an identity-impersonation layer on top of the IP laundering.

Wallet-integrity check

Both laundered crypto wallets were diffed against their real upstreams:

  • luliguyu/ssaavedrad is manifest version 4.0.3 of Narwallets; the upstream is at 4.0.7 — four minor versions behind. The upstream has additional WALLET_SELECTOR_CODES constants and an injected-script.ts that the laundered version lacks.
  • luliguyu/dimatura is from before theQRL's Qrl* → Zond* rename completed, so it's a partial-rename snapshot.

No malicious modifications detected — the laundered wallets aren't actively stealing crypto. But they are stale: anyone installing them instead of the real ones runs out-of-date wallet code missing security fixes. Stale code pretending to be current is itself a security failure mode.

Star inflation

tusmart-grouptt/crewrktabletsn reports 403 stars, 302 forks. countneurooman/ssaavedrad reports 319 stars, 254 forks. Both stargazer lists are wall-to-wall obvious bot handles — tASDFG12345m, 7228735902, liiiiiii1i1i1, 8888x82, superdaysk3wom. The handle RPaez09l appears in both lists, starring multiple farm repos in sequence. Star-buying is part of the kit — see Part 5 for the bot network analysis.


Part 3 — Operator B: Anthropic-impersonation across compromised accounts

The signature

Commits forge the author identity Claude Code <claude-code@anthropic.local>. anthropic.local is not a real domain. Anthropic's actual Claude Code does not author commits this way. Someone is forging the identity as a stamp of "AI did this".

23 commits across 8 confirmed repos in our sample.

The accounts

Account Created Public repos Likely status
CaMaGuee 2021-08-23 11 Compromised dormant
esrfdev 2026-04-12 1 Fresh attacker-created (same day as repo)
gdhughey 2025-11-28 5 Fresh attacker-created
jun564 2021-08-08 2 Compromised dormant
mctils12-arch 2026-04-03 1 Fresh attacker-created
mearley24 2023-02-07 11 Possibly compromised
rvadapally 2021-10-25 7 Compromised dormant
vpneoterra 2024-12-21 15 Coordinated fake-fusion-startup farm (see below)

Mix of fresh-attacker accounts and dormant 2021-2023 accounts now suddenly active. The dormant ones look like account takeover. Two distinct sub-techniques in one cluster — MITRE ATT&CK T1585 (Establish Accounts) plus T1586 (Compromise Accounts).

Notable impersonations

  • esrfdev/ESRF-cleanimpersonates the European Synchrotron Research Facility's esrfdev developer account name.
  • vpneoterra/forge-ecs-platform — one of 15 coordinated forge-* repos under this account: forge-stellarator-coils, forge-stellarator-cad, forge-cluster-a-geometry, PicoGK (a real CAD library, almost certainly cloned), ComfyVoxelizer. Fake fusion-energy AI startup posturing. "Stellarator" is a nuclear fusion device. There is no public company called Neoterra in the fusion space; this looks like a fictional fundraising-bait front.

Part 4 — Operator C: DelbyIntelligence — vaporware AI lab at industrial scale, brand-squatting a real Indian startup

The org

DelbyIntelligence, GitHub organization, created 2026-04-03. By 2026-04-25 it has:

  • 3,112 public repos
  • 0 followers
  • No description, no email, no website set
  • ~140 repos created per day, evenly distributed across all 24 hours
  • 87% of repos are HTML, 87% have GitHub Pages enabled
  • 100% of repos have zero stars — even the bot stargazers don't bother

The naming generator

All repo names follow demo-{product}-{descriptors} or product-{name}-{descriptors}, all written as HTML landing pages, all hitting GitHub's 100-character repo-name limit and getting truncated mid-word. Sample, all from the same hour:

demo-delby-sentinel-real-time-sensor-fusion-h
demo-real-vs-synthetic-sensor-data-validator-
demo-delby-cortex-safety-layer-geometric-hall
demo-delby-glove-grasp-predictor-real-time-de
demo-delby-marl-sentinel-real-time-interactiv
demo-sensor-fusion-puzzle-hunt-viral-recruitm   ← ⚠
demo-sensor-fusion-ctf-interactive-web-demo-p
demo-ai-lab-seed-agents-challenge-entry-long-  ← ⚠
product-choreograph-forge-multi-robot-coordinated-reaching
product-multi-vendor-fleet-orchestration-demo-for-cardinal  ← Cardinal Health
product-federated-physical-ai-training-sandbox-live-demo-f  ← IIT Bombay

Two repos quietly tell you the motive: puzzle-hunt-viral-recruitm[ent] and ai-lab-seed-agents-challenge-entry. This is an automated business-development apparatus dressed up as a polymath AI lab — fake demos targeted at specific named potential customers (Cardinal Health, IIT Bombay), fake CTFs and puzzle hunts as recruitment lures, fake "AI lab seed agent challenge" entries as bait for AI-accelerator funding programs.

The Potemkin renderings

product-choreograph-forge-multi-robot-coordinated-reaching is six files: assets/, favicon.svg, index.html, manifest.json, og-image.svg, robots.txt. No code, no README, no tests. Default branch: gh-pages. Description: "Delby AI Product: CHOREOGRAPH-FORGE: Multi-Robot Coordinated Reaching Planner with Graph Neural Network Collision Avoidance."

3,112 of these. All live at https://delbyintelligence.github.io/{repo-name}/.

The brand-squat

delby.ai is a real Indian physical-AI startup based in India, building "data infrastructure to train the next generation of Physical AI and Vision-Language-Action models." They operate a 500+ vehicle real-world data collection network. Active commercial trials with logistics partners. Contact: intelligence@delby.ai.

The real Delby's website does not link to a DelbyIntelligence GitHub org anywhere visible. The DelbyIntelligence org has no description and no website. Nothing on the real delby.ai mentions GitHub at all.

The clincher: the HTML inside delbyintelligence.github.io/{repo}/ lists <link rel="canonical" href="https://delbyai.github.io/delby-agents/" /> — an entirely different account name. delbyai does not exist on GitHub (HTTP 404, both as user and as org). Either the real Delby team had a github.io presence at delbyai.github.io/delby-agents/ and migrated away (deleting the account), at which point someone scooped the templates and republished them under DelbyIntelligence; or the canonical URL is intentional misdirection.

Either way: the GitHub org DelbyIntelligence is almost certainly an impersonation of the real Indian physical-AI company Delby Intelligence. The real company very likely does not know.

Production curve

Phase Days Repos/day
Warm-up Apr 3-6 ~25
Ramp Apr 7-11 130-200
Pause Apr 12-14 51 → 0 → 0 (automation tuning?)
Steady state Apr 15-25 200-253

Hourly creation distribution is flat (peak hour = 21 repos, others 17-19). This is automation, not human content.


Part 5 — The cross-cutting paid bot-star service

The Operator A farm repos and Operator C's outputs alike attract stars from the same pool of accounts. We mapped six:

Account Created Public repos Profile dressing Tier
RPaez09l 2021-11-12 28 "Ianko Leite" Aged organic-looking
tASDFG12345m 2022-05-11 24 none Aged but bot-handle
7228735902 2022-06-20 18 "Nadjmou BOINA — Ethical Hacker, B.Tech Student" Aged organic-looking
superdaysk3wom 2021-12-29 17 "Ethan H — I make stuff." Aged organic-looking
liiiiiii1i1i1 2025-04-13 5 none Fresh bulk-stars
8888x82 2025-04-16 12 none Fresh bulk-stars

The two fresh accounts (liiiiiii1i1i1, 8888x82) were created three days apart in April 2025 — exactly 12 months before the current farm wave they are starring. The bot factory was provisioned in advance.

Each bot's starred-repos list mixes obvious farm repos with high-quality real OSS (httpie/cli, v2ray/v2ray-core, lodash/lodash, google/grumpy) for camouflage. They do not look like crude star bots; they look like aging hobbyist accounts that sometimes star Chinese AI products.

Cross-bot most-starred targets — the customer list

Sampling 30 starred repos per bot and counting overlap reveals the paying customers of the service:

Customer repo Stars (real) Bots (of 6) What it claims to be
Customer A (name redacted, pending vendor disclosure) 8,780 4/6 major Chinese AI co — agentic workflow platform
Customer B (name redacted, pending vendor disclosure) 5,681 5/6 Chinese open-source AI portal
Customer C (name redacted, pending vendor disclosure) 3,534 5/6 "Universal memory layer for AI Agents"
Galaxy-Dawn/claude-scholar 3,446 bots ≥3 "Semi-automated research assistant" — name evokes Claude
EvoScientist/EvoScientist 2,590 bots ≥3 "Harness Vibe Research with Self-evolving AI Scientists"
Soul-AILab/SoulX-LiveAct 1,209 1+ Real research code (EACL'26 paper)
Customer D (name redacted, pending vendor disclosure) 1,049 4/6 Java AI app dev platform
DaRL-GenAI/instructional_agents 725 2+ EACL'26 paper, but is a fork of Hyan-Yao/instructional_agents
MarilynClarke/Hyperliquid-Copy-Trading-Bot 415 2+ Crypto Hyperliquid copy-trading bot
tusmart-grouptt/crewrktabletsn (known farm) 403 4/6 Operator A
countneurooman/ssaavedrad (known farm) 319 3/6 Operator A
PhyAgentOS/PhyAgentOS 211 1+ Vibe-coded "embodied AI operating system"
xup6jammy/AI-INVOICE-OCR-ENGINE 163 3/6 Chinese AI invoice OCR
Karmacoke/chargen 151 2+ "AI-powered character generator for TRPG"

Note on customer names: Four real Chinese AI startups appear in the bot-network customer base above. We have evidence the same bot pool stars both their repos and known republishing-farm repos — but we cannot prove from the outside whether each company purchased the stars or whether someone else inflated them at the company. Names are withheld until each affected vendor has been directly notified and given an opportunity to respond. Updated catalog will be issued post-disclosure.

The bot service's customer base is mixed:

  • Real Chinese AI startups (iflytek, 53AI, MemMachine, aiflowy) — combined ~19,000 inflated stars across just four products.
  • Republishing farms (Operator A's tusmart-grouptt, countneurooman).
  • Crypto/Web3 grift products (MarilynClarke/Hyperliquid-Copy-Trading-Bot).
  • Vibe-coded AI vapor projects (Galaxy-Dawn/claude-scholar, PhyAgentOS, EvoScientist).

Direct bot-to-farm connection

Bot account 8888x82 forks both tusmart-grouptt/crewrktabletsn (Operator A farm repo) and 53AI/53AIHub (the 5,681-star real Chinese AI portal). One bot account, both customer types — direct linkage proof.

Bot RPaez09l forked mzbankl/ghaerrb — and mzbankl/ghaerrb was on RPaez09l's starred list too, suggesting a cycle of fork+star pumping that further inflates engagement metrics.


Part 6 — Pollution rate of the Claude trailer corpus

In a 232-commit sample taken across the top 10 pages of gh search commits "Co-Authored-By: Claude" sort:author-date-desc:

  • 43% (100 / 232) of the commits trace to just 2 farm repos (luliguyu/cmbd-book + kyasbalme/Scrapbox).
  • All 30 sampled commits with author-date > 2027 came from those two repos.
  • 1 explicit claude-code@anthropic.local impersonation from Cluster B.

Operator A alone pollutes a near-majority of the most prominent GitHub query for AI-authored code. Anyone scraping that for research, training data, AI-tool reputation analysis, or market-sizing reports is being fed adversarial output as the dominant signal.

This includes:

  • Academic AI/human classification research using GitHub trailers as labels.
  • Training-data curation for code models filtering on "AI-authored" signal.
  • Reputation systems for AI-coding tools ("Claude appears in N repos this month").
  • Investment / market research counting "AI-built repos" as an adoption proxy.
  • Recruitment pipelines screening candidates by GitHub activity (Cluster B's compromised dormant accounts now show false activity that points back at the original owner).

Part 7 — The tools, briefly

entropyx scan is a deterministic forensic engine for git history. Its author_entropy = 0.0 across every file in kyasbalme/Scrapbox was the quiet "this can't be a real collaborative project" alarm. Its metric vectors for two supposedly-different repos came back bit-identical — same coupling_stress, same semantic_drift, same composite — because the underlying repos are the same. entropyx's --github enrichment also reported zero pull-request activity across hundreds of commits, another red flag.

kraken reads GitHub identity-graph state via GraphQL. It produced behavioral fingerprint vectors that match across kyasbalme and luliguyu to four decimals on rhythm_period, burst_rate, star_concentration, and career_hops. It also surfaced the blu3mo.com profile-website impersonation that I'd missed by eye.

vajra is queued up to compare structural fingerprints across DelbyIntelligence's 3,112 landing pages and across the bot-account starred-repo histograms — work in flight.

The deeper point: most of the load-bearing work was gh api, git log, and diff -rq. When you start looking, the shape of these operations comes apart in your hands inside a few hours. The scary part is that nobody seems to be looking, because the visible front (the Claude trailer, the polished landing pages, the four-digit star counts on real-looking AI-startup repos) discourages it.


Part 8 — What this contaminates

A short list of systems that ingest GitHub at scale and have to assume the data is at least roughly honest:

  • GitHub's own search rankingsort:author-date-desc has been weaponised; commits from 2037 win every time.
  • Star-based discovery and "trending" pages — at least 19,000 stars across four AI-startup products are bot-inflated; multiplied across the customer base, this is a large fraction of the visible Chinese AI ecosystem on GitHub.
  • AI training data scraped from GitHub — code-generation models that downweight low-quality content via "AI-coauthor" or "high-star" signals are now boosting laundered corpora.
  • Code-model evaluation benchmarks that assume repo authorship is what it claims — Operator A's identity-rewrite is invisible to surface metrics.
  • Investor and market-research dashboards that count "AI startups", "AI-built repos", or "trending AI projects" — at least four real companies and ≥3,112 fake products are riding inflated signal.
  • Recruiting pipelines screening on GitHub activity — Operator B's compromised dormant accounts now show false high-recency activity.

Part 9 — What's still open

  • Full bot-network enumeration. Six bots is what I have; the real customer list is likely dozens of repos and the bot pool is probably hundreds.
  • The delbyai-account question — was the real Delby Intelligence ever at delbyai.github.io, did they delete the account, did the laundering operation scoop their abandoned templates? Needs an answer from Delby themselves.
  • vpneoterra's 15 forge-stellarator-* repos — what's the source codebase being cloned for the fake fusion-startup posture? Is "Neoterra" impersonating any real entity?
  • Crypto wallet code: confirmed no payout-address modification today, but the laundered repos are snapshots; the operator could push a malicious update at any time. Worth setting a watcher.
  • Galaxy-Dawn/claude-scholar — 3,446 stars, name leans on Claude. Is it a real product, or a Claude-grift product, or a farm dressed up as a product?
  • The exact relationship between Cluster A and Cluster C (any overlap in IPs / stargazers / push-times)?
  • Did any of the bot accounts ever push commits into farm repos (cross-staffing)?

Part 10 — Geo-attribution

Operator A is definitively China-based

Four lines of evidence converge:

  1. Author-date timezone is +0800 across all 16 farm repos. When the farm fabricates dates, the timestamp generator runs on the operator's local clock; the +0800 offset on every fake commit is the operator's own timezone leaking through. The only exception is luliguyu/WildDet3D, which preserves the upstream Allen AI author's -0700.
  2. Email-provider distribution is exclusively Chinese free providers: yeah.net, 163.com, 126.com, qq.com (NetEase + Tencent). No gmail. No protonmail. No icloud.
  3. GitHub account creation hours align with China daylight/evening. All four Operator A accounts created at UTC times that map to noon-late-evening China Standard Time.
  4. Scouting behaviour reveals Chinese-AI-ecosystem awareness: luliguyu watches tuya/tuya-openclaw-skills (real Tuya — major Chinese IoT/AI co.), MemMachine/MemMachine, and zhoushisheng001b/Aziiizx — all hits in the Chinese AI corner of GitHub.

Operator C: a night-owl operator in China (or possibly India)

The actual human-driven account is delby-ai — created 2026-04-03 at 15:18:59 UTC, exactly 92 seconds before the DelbyIntelligence org. 0 followers, 0 following, 0 listed repos, no profile fields. Pure operator handle. It is the actor on every CreateEvent and PushEvent across the org's 3,112 repos.

Two activity signatures expose the human behind the automation:

  • Hour-of-day (delby-ai's last 100 events): heavily concentrated UTC 17-23 and UTC 00-02, light during UTC 03-15. That maps to 01:00-07:00 China time peak (deep night) plus 08:00-10:00 China morning. Night-owl Asian operator.
  • Day-of-week across the org's 3,112 repo creations: Mon 292 → Tue 382 → Wed 475 → Thu 681 → Fri 647 → Sat 428 → Sun 207. A 3.3× peak/trough ratio, classic human work-week shape with Chinese-style Sat half-day and Sun off.

The brand-squat target — delby.ai — is an Indian physical-AI company. Cross-border targeting is plausible, but the operator's own schedule fingerprint reads China.

Operator B: mixed origin, attribution requires push-IP forensics not in public API

Eight accounts mix fresh-attacker creations and 2021-2023 dormant takeovers. Profile location fields almost all empty. Without push-IP data we cannot attribute the operator(s).

Domain WHOIS

Domain Registered Registrar Notable
delby.ai (real Indian co.) 2025-09-08 GoDaddy + Domains By Proxy (Tempe, AZ) AWS US-East-1 hosting; real company, real infra
memmachine.ai (bot-net customer) 2025-08-15 GoDaddy + Domains By Proxy Same day as the GitHub repo creation; entire site hosted on GitHub Pages
evoscientist.ai (bot-net customer) 2026-02-15 Cloudflare "DATA REDACTED, Country: GB"; hosted on GitHub Pages
aiflowy.tech, astron.ai, 53ai.com various Hichina + others Alibaba Cloud Beijing/Singapore; real Chinese cloud infrastructure

The "real Chinese AI startup" customer base of the bot-star service splits 3-3 between Alibaba Cloud (real infrastructure) and GitHub Pages (entire web presence is github.io). The latter group are shoestring operations whose business model depends on GitHub-visibility metrics — exactly the customers most exposed to a star-buying service.

Wayback Machine confirms the phantom

http://archive.org/wayback/available?url=delbyai.github.io returns archived_snapshots: {} for every variant. The Internet Archive never indexed delbyai.github.io — meaning either the URL never existed, or it existed only briefly with a noindex directive. Either way: the canonical URL embedded in DelbyIntelligence's HTML pointing at delbyai.github.io/delby-agents/ is misdirection. The real Indian Delby Intelligence almost certainly never had this GitHub Pages presence.


Part 11 — Would GitHub care?

Short answer: yes, and they have a published policy framework that names every behaviour we documented. Realistic answer: enforcement is reactive and uneven. Tactical answer: bulk-flagged campaigns get cleaned, but the bot infrastructure rotates faster than account suspensions catch up.

Policy violations, line by line

GitHub's Acceptable Use Policies (current 2026 text) prohibit:

"automated excessive bulk activity and coordinated inauthentic activity, such as creation of or participation in secondary markets for the purpose of the proliferation of inauthentic activity"

— covers the bot-star service (Cluster D), DelbyIntelligence's 3,112-repo automation (Cluster C), Operator A's 16-repo coordinated farm.

"content or activity that impersonates any person or entity, including any of GitHub's employees or representatives"

— covers claude-code@anthropic.local forgery (B), esrfdev impersonation (B), kyasbalme claiming blu3mo.com (A), Scrapbox description lifted from Nota Inc. (A), repo names matching real upstream contributor handles (A), DelbyIntelligence brand-squat (C).

GitHub retains "full discretion" to suspend accounts, terminate accounts, or remove content for any of these.

Reporting paths

  • General abuse / impersonation: github.com/contact/report-abuse → choose "Impersonation" or "Malware or phishing"
  • Trademark / brand impersonation (Delby, Anthropic, ESRF): github.com/contact/dmca → trademark complaint
  • Malicious repository at scale: same abuse form with bulk evidence pack

What we can predict — the StarScout precedent

The closest published benchmark is the ICSE 2026 paper "Six Million (Suspected) Fake Stars on GitHub" by Hao He, Haoqin Yang, Philipp Burckhardt, Alexandros Kapravelos, Bogdan Vasilescu, and Christian Kästner (CMU + NCSU + Socket). Their tool StarScout flagged 18,617 repos and ~301,000 accounts as fake-star participants over 20 TB of GitHub metadata.

After they reported, GitHub removed:

  • 90.42% of flagged repositories
  • 57.07% of flagged accounts

Per StarScout's authors, GitHub does not publish transparency reports on star manipulation and has never published an engineering blog post on its detection methods. Enforcement is reactive — driven by external reports — and the bot-account infrastructure largely persists for future campaigns because account-removal lags repo-removal.

The StarScout authors also found that the majority of repos with fake-star campaigns distribute malware — typically disguised as piracy tools, game cheats, or cryptocurrency bots. Our finding of MarilynClarke/Hyperliquid-Copy-Trading-Bot (a crypto copy-trading bot heavily promoted by the same bot pool that promotes our farms) is a textbook fit for the StarScout-identified profile.

Realistic outcome if our enumeration is reported in bulk

Cluster / target Predicted action Confidence
Operator A's 16 farm repos Removal High
kyasbalme, luliguyu (purpose-built fresh accounts) Suspension High
tusmart-grouptt, countneurooman (dormant accounts that became active) Investigation, possible original-owner notification Medium
DelbyIntelligence org (3,112 repos) Bulk impersonation removal High (clear brand-squat)
delby-ai operator account Suspension High
6 mapped bot accounts (D) ~50% removal per StarScout precedent Medium
Operator B's 8 compromised accounts Action on impersonation but tied to Anthropic / ESRF complaints Medium-high
Customer companies (iflytek, 53AI, MemMachine, aiflowy) Unlikely action — GitHub historically reluctant to suspend legitimate-looking companies that purchase fake-star services rather than provide them. Low

Regulatory escalation

The FTC's 2024 final rule on fake reviews and fake social-influence indicators (16 CFR Part 465) prohibits buying or selling fake indicators of social media influence for commercial purposes, with penalties exceeding $50,000 per violation. The rule does not name "GitHub stars" specifically, but the principle squarely applies: paid-star inflation is fake-social-influence purchasing.

Customer companies reachable in US jurisdiction are theoretically exposed. Customer companies in mainland China are not, in practice — the regulatory pathway runs through US-touchpoint enforcement.

The article should explicitly invite the FTC and GitHub Trust & Safety as audiences alongside the named upstream victims.


Disclosures planned before publication

  • Allen Institute for AIallenai/WildDet3D cloned as luliguyu/WildDet3D four days after release.
  • The Quantum Resistant LedgertheQRL/zond-web3-wallet republished as luliguyu/dimatura.
  • Narwalletsnarwallets-extension republished twice.
  • yikartAiToEarn republished by kyasbalme and tusmart-grouptt.
  • Anthropicclaude-code@anthropic.local impersonation across at least 8 repos.
  • European Synchrotron Research Facilityesrfdev account-name impersonation.
  • The real blu3mo — profile-website impersonation by kyasbalme.
  • The real Delby Intelligence (intelligence@delby.ai) — 3,112-repo brand-squat under DelbyIntelligence org.
  • The real dimatura, ssaavedrad, sachinDevloop — usernames being used as personal-fork camouflage.
  • The four real Chinese AI startups currently anonymized as Customers A/B/C/D — your stars are inflated by the same bot service that promotes republishing farms; you may want to know. Direct disclosure being attempted before names are released.
  • GitHub Trust & Safety — entire enumeration.

I'd rather get a "yes, that's actually our account" or a "no, please don't name them" before any of these names go to print.


Codename for the adversary cluster: Long Shadow, after the operators' signature trick of casting forward-dated commits as far ahead as June 2037 to bubble laundered repos to the top of every search. Forensic evidence and reproduction commands live in this repository.