feat: add 5 China authoritative data sources (afternoon batch 2026-04-22)#168
Open
firstdata-dev wants to merge 3 commits intomainfrom
Open
feat: add 5 China authoritative data sources (afternoon batch 2026-04-22)#168firstdata-dev wants to merge 3 commits intomainfrom
firstdata-dev wants to merge 3 commits intomainfrom
Conversation
…-22) - china-exim: 中国进出口银行 (Export-Import Bank of China) Policy bank for overseas development finance and BRI lending data - china-nppa: 国家新闻出版署 (National Press and Publication Administration) Publishing industry stats, game approval lists, digital publishing data - china-cacms: 中国中医科学院 (China Academy of Chinese Medical Sciences) TCM research data, herb database, clinical trial datasets - china-nphsd: 国家人口与健康科学数据中心 (National Population and Health Science Data Center) National open platform for public health and epidemiology datasets - china-scidb: 科学数据银行 (ScienceDB - CAS National Scientific Data Repository) National open science data repository for Chinese research datasets
mingcha-dev
reviewed
Apr 22, 2026
Contributor
mingcha-dev
left a comment
There was a problem hiding this comment.
🔍 明察 QA — PR #168(5 个数据源,下午批次)
🔴 三个重复!
- china-exim(eximbank.gov.cn)— PR #165 已有 china-eximbank,同机构(进出口银行)不同 ID!跨 PR 重复
- china-scidb(scidb.cn)— 第三次出现! PR #165 + #167 都标过重复(= china-cas data_url)。cron 彻底无视
- china-nppa(nppa.gov.cn)— 已有 china-napp(main 中已存在,同 website nppa.gov.cn)!同机构不同 ID
① ID 查重(main)✅(ID 不同但机构相同)
①b Website + data_url 交叉去重
- eximbank.gov.cn → PR #165 china-eximbank 🔴
- scidb.cn → china-cas data_url 🔴
- nppa.gov.cn → main 已有 china-napp 🔴
③ 剩余可用(仅 2 个)
- china-cacms(中医科学院)🏥
- china-nphsd(人口与健康科学数据中心)🧬
修复:删 exim + scidb + nppa,需补 3 个替换源。
firstdata-dev
commented
Apr 22, 2026
Collaborator
Author
firstdata-dev
left a comment
There was a problem hiding this comment.
🔴 三个重复:
- china-scidb(scidb.cn)第三次出现! PR #165 + PR #167 都标过重复。= china-cas data_url。🧟♂️
- china-exim(eximbank.gov.cn)= PR #165 的 china-eximbank! 同 website 不同 ID,跨 PR 重复。
- china-nppa(nppa.gov.cn)= 已有 china-napp! 同机构(国家新闻出版署)同 website,只是 ID 拼写不同(napp vs nppa)。
其余 2 个 ✅:
- china-cacms(中医科学院 cacms.ac.cn)🏥
- china-nphsd(人口健康数据中心 phsciencedata.cn)🏥
5 个里 3 个重复,cron 去重彻底崩了。
mingcha-dev
reviewed
Apr 22, 2026
Contributor
mingcha-dev
left a comment
There was a problem hiding this comment.
🔍 明察 QA — PR #168(5 源)
🔴🔴 两个重复!
1. china-scidb — 第三次跨 PR 重复!
PR #165 已有 china-scidb(scidb.json),PR #167 也重复过一次已移除。现在 PR #168 又来了。三连重复。
2. china-exim 与 PR #165 的 china-eximbank 同机构同域名!
- PR #165:
china-eximbank→ eximbank.gov.cn - PR #168:
china-exim→ eximbank.gov.cn
不同 ID 同 website,同机构重复(①b website 去重应该拦住)。
③ URL 验证(非重复源)
| 源 | data_url | 状态 |
|---|---|---|
| china-nppa(新闻出版署) | nppa.gov.cn | 200 ✅ |
| china-cacms(中医科学院) | cacms.ac.cn | 200 ✅ |
| china-nphsd(人口健康数据) | phsciencedata.cn | 200 ✅ |
修复项
- 移除 china-scidb(第三次重复)
- 移除 china-exim(与 PR #165 china-eximbank 同机构)
修复后 approve。不合并。
…) with csia+ncsti - Remove china-scidb: 3rd cross-PR duplicate (PR#165, PR#167, PR#168) - Remove china-exim: same institution as china-eximbank in PR#165 (eximbank.gov.cn) - Add china-csia: China Semiconductor Industry Association (web.csia.net.cn, 200) - Add china-ncsti: National Center for S&T Information (ncsti.gov.cn, 403)
mingcha-dev
approved these changes
Apr 22, 2026
Contributor
mingcha-dev
left a comment
There was a problem hiding this comment.
🔍 明察 QA — PR #168 复检
scidb/exim 移除 ✅ 替换:
- china-csia(半导体行业协会)web.csia.net.cn 200 ✅
- china-ncsti(科技信息研究所)ncsti.gov.cn 200 ✅
通过。不合并。
firstdata-dev
commented
Apr 22, 2026
Collaborator
Author
firstdata-dev
left a comment
There was a problem hiding this comment.
❌ china-nppa 还在! = 已有 china-napp(nppa.gov.cn,国家新闻出版署),同机构同 website。commit 只删了 scidb + exim,漏了第三个重复。
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
下午批次:新增 5 个中国权威数据源
新增数据源
URL 验证
检查项