Skip to content

使用排名第一的 LinkedIn 爬虫工具 API 提取 LinkedIn 数据,包括个人资料、职位发布、公司详情、人脉信息和帖子。立即开始您的免费试用!

Notifications You must be signed in to change notification settings

bright-cn/LinkedIn-Scraper

Repository files navigation

LinkedIn 爬虫工具

推广图

本仓库提供两种在 LinkedIn 上收集数据的方式:

  1. 免费方式:适合小规模项目、实验和学习目的。
  2. LinkedIn 爬虫工具 API:专为大规模、高可靠性、实时的数据采集设计。

目录

第一种方法:免费 LinkedIn 爬虫工具

此免费工具提供两个主要功能:

  1. LinkedIn 职位爬虫工具:可采集包含丰富元数据的职位列表
  2. LinkedIn 资料校验器:用于验证 LinkedIn 个人和公司链接是否有效

linkedin-scraper-bright-data-screenshot-linkedin-jobs

1. 职位爬虫工具

从 LinkedIn 的职位搜索中采集职位信息。

主要特性

  • 采集职位详情(标题、公司、地点、链接、发布日期等)
  • 内置请求频率限制与错误处理
  • 输出整洁的 JSON 数据

2. 资料校验器

验证 LinkedIn 个人或公司主页链接是否有效。

主要特性

  • 检查个人/公司链接
  • 自动重试失败的请求
  • 显示每条 URL 的详细状态
  • 可一次性批量检查多个链接

快速开始

以下步骤将帮助你在几分钟内启动并运行:

前置要求

安装步骤

只需三步即可开始:

git clone https://github.com/luminati-io/LinkedIn-Scraper.git
cd LinkedIn-Scraper
pip install -r requirements.txt

使用示例

以下示例展示如何使用这些爬虫工具:

1. 职位爬虫工具

配置搜索参数:

# 在 jobs_scraper.py 中
params = {
    "keywords": "AI/ML Engineer",  # 要搜索的职位/关键字
    "location": "London",          # 要搜索的地点
    "max_jobs": 100                # 采集的最大职位数量
}

# 运行:python jobs_scraper.py

爬虫工具会生成一个包含职位详情的 JSON 文件,例如:

{
  "title": "Research Engineer, AI/Machine Learning",
  "company": "Google",
  "location": "London, England, United Kingdom",
  "job_link": "https://uk.linkedin.com/jobs/view/research-engineer-ai-machine-learning-at-google-4086259724",
  "posted_date": "3 weeks ago"
}

2. 资料校验器

配置需要验证的 URL:

# 在 profile_checker.py 中
test_urls = [
    "https://www.linkedin.com/company/bright-data/",
    "https://www.linkedin.com/company/aabbccdd/"
]

# 运行:python profile_checker.py

你会获得每个链接的校验结果:

✓ linkedin.com/company/bright-data - Status: 200
✗ linkedin.com/company/aabbccdd - Status: 400

免费方式常见的采集挑战

当你在 LinkedIn 上采集数据时,可能会遇到各种反爬机制,以下列出主要挑战:

  1. 请求频率限制:LinkedIn 会严格监控每个 IP 地址的请求频率,一旦超过限制,可能导致 IP 被临时或永久封禁。
  2. 验证码:若检测到异常访问行为,LinkedIn 会要求输入验证码,以阻断自动化爬虫。
  3. 身份验证壁垒:大部分有价值的 LinkedIn 数据都需要登录才能访问,而平台在检测到自动化登录尝试时会予以封锁。
  4. 技术难点:如处理多页翻页、动态内容加载、不完整数据点及页面广告等。

手动方式适用于小规模采集项目,但若需求量大,就会变得困难且消耗资源。想要高效、可扩展和更可靠的数据采集方案,Bright Data 的解决方案可节省大量时间和资源,并确保更高质量的数据结果。

推广图

第二种方法:Bright Data LinkedIn 爬虫工具 API

如果你需要更健壮且可扩展的 LinkedIn 采集方案,可以考虑 Bright Data LinkedIn 爬虫工具 API。优势如下:

主要优势

  • 无需自行搭建基础设施:自动处理代理、验证码及请求频率限制。
  • 可扩展且高可靠:针对大规模、实时数据采集进行了优化。
  • 覆盖范围广:可从个人资料、职位、公司和帖子中采集数据。
  • 全球可用:支持所有地区和语言。
  • 隐私合规:严格遵守 GDPR 和 CCPA。
  • 按量付费:仅为成功的响应付费。
  • 免费试用:可获得 20 次免费 API 调用立即开始体验。

开始使用 LinkedIn 爬虫工具 API

Bright Data 的 LinkedIn 爬虫工具 API 可让开发者以编程方式从 LinkedIn 的个人资料、公司主页、职位列表及帖子中采集公开数据。这套企业级解决方案可自动处理包括代理管理、请求限速及数据解析等在内的复杂基础设施需求。

开始前,你需要:

  • Bright Data 帐号
    • 前往 Bright Data 官网注册免费试用并登录。
    • 在“Billing”页面添加支付方式以激活账户。
  • API Token

1. 公司信息采集

通过公司在 LinkedIn 上的链接,采集企业的详细信息。

linkedin-scraper-bright-data-screenshot-linkedin-company-information-by-url

输入参数

字段 类型 必填 说明
url string 要采集信息的公司 LinkedIn URL

样例响应

{
    "name": "Kraft Heinz",
    "about": "The Kraft Heinz Company 是世界上最大的食品和饮料公司之一...",
    "key_info": {
        "headquarters": "Chicago, IL",
        "founded": 2015,
        "company_size": "10,001+ employees",
        "organization_type": "Public Company",
        "industries": "Food and Beverage Services",
        "website": "https://www.careers.kraftheinz.com/"
    },
    "metrics": {"linkedin_followers": 1557451, "linkedin_employees": 25254},
    "stock_info": {
        "ticker": "KHC",
        "exchange": "NASDAQ",
        "price": "$30.52",
        "last_updated": "December 21, 2024"
    },
    "specialties": "Food, Fast Moving Consumer Packaged Goods, CPG, and Consumer Packaged Goods",
    "locations": ["200 E. Randolph St. Suite 7600 Chicago, IL 60601, US"],
    "slogan": "Let's make life delicious!"
}

👉 此处仅展示关键字段,完整数据可参考 JSON 样例

代码示例

在列表中修改公司链接,采集目标公司的数据:

companies = [
    {"url": "https://il.linkedin.com/company/ibm"},
    {"url": "https://www.linkedin.com/company/stalkit"},
    {
        "url": "https://www.linkedin.com/organization-guest/company/the-kraft-heinz-company"
    },
    {"url": "https://il.linkedin.com/company/bright-data"},
]

👉 查看 完整 Python 示例

2. 个人资料采集(通过URL)

获取 LinkedIn 个人资料的详细信息。

linkedin-scraper-bright-data-screenshot-linkedin-people-profiles-by-url

输入参数

参数 类型 必填 说明
url string 要采集的个人资料链接

样例响应

{
    "name": "Richard Branson",
    "profile_info": {
        "position": "Founder at Virgin Group",
        "followers": 18730516,
        "connections": 2,
        "avatar": "https://media.licdn.com/dms/image/v2/C4D03AQHh6_Wth5f3rQ/..."
    },
    "experience": [
        {
            "title": "Founder",
            "company": "Virgin Group",
            "duration": "Jan 1968 - Present (57 years)",
            "description": "Tie-loathing adventurer..."
        }
    ],
    "current_company": {"name": "Virgin Group", "title": "Founder at Virgin Group"},
    "url": "https://www.linkedin.com/in/rbranson/"
}

👉 仅展示关键字段,完整数据可参考 JSON 样例

代码示例

替换下方的链接为你感兴趣的个人资料:

profiles = [
    {"url": "https://www.linkedin.com/in/williamhgates"},
    {"url": "https://www.linkedin.com/in/rbranson/"},
    {"url": "https://www.linkedin.com/in/justinwelsh/"},
    {"url": "https://www.linkedin.com/in/simonsinek/"},
]

👉 查看 完整 Python 示例

3. 个人资料搜索

通过姓名搜索 LinkedIn 个人资料。

linkedin-scraper-bright-data-screenshot-linkedin-people-profiles-by-name

输入参数

参数 类型 必填 说明
first_name string 要搜索的名字
last_name string 要搜索的姓氏

样例响应

{
    "profile_info": {
        "id": "richard-branson-8a38866",
        "name": "Richard Branson",
        "location": {"city": "Cincinnati", "state": "Ohio", "country": "US"},
        "about": "Respiratory therapist with 40 years of experience...",
        "metrics": {"followers": 868, "connections": 500, "recommendations": 1}
    },
    "professional": {
        "current_position": {
            "company": "University of Cincinnati",
            "company_link": "https://www.linkedin.com/school/university-of-cincinnati"
        },
        "education": {
            "school": "The George Washington University School of Medicine and Health Sciences",
            "years": "2001-2003"
        }
    },
    "recommendations": [
        "Tracy OConnell Well known pro active valuable assett..."
    ],
    "similar_professionals": [
        {
            "name": "Walter J. Jones, PhD, MHSA",
            "title": "Professor at Medical University of South Carolina",
            "location": "Mount Pleasant, SC"
        }
    ],
    "url": "https://www.linkedin.com/in/richard-branson-8a38866"
}

👉 查看 完整 JSON 样例

代码示例

修改下列的 first_name 和 last_name 以查找指定人员:

people = [
    {"first_name": "Richard", "last_name": "Branson"},
    {"first_name": "Bill", "last_name": "Gates"},
]

👉 查看 完整 Python 示例

4. 帖子采集(通过URL)

获取指定 LinkedIn 帖子的详细信息。

linkedin-scraper-bright-data-screenshot-linkedin-posts-by-url

输入参数

参数 类型 必填 说明
url string LinkedIn 帖子链接

样例响应

{
    "post_info": {
        "id": "7176601589682434049",
        "url": "https://www.linkedin.com/posts/karin-dodis_web-data-collection-for-businesses-bright-activity-7176601589682434049-Aakz",
        "date_posted": "2024-03-21T15:32:33.770Z",
        "post_type": "post",
        "engagement": {"num_likes": 12, "num_comments": 4}
    },
    "content": {
        "title": "Karin Dodis on LinkedIn: Web data collection for Businesses. Bright Data",
        "text": "Hey data enthusiasts, Bright Data has an awesome collection of free datasets..."
    },
    "author": {
        "user_id": "karin-dodis",
        "profile_url": "https://il.linkedin.com/in/karin-dodis",
        "followers": 4131,
        "total_posts": 28
    },
    "repost_info": {
        "original_author": "Or Lenchner",
        "original_author_id": "orlenchner",
        "original_text": "Free Datasets! Not just samples, but complete datasets with millions of records...",
        "original_date": "2024-03-27T15:39:54.497Z",
        "original_post_id": "7176470998987214848"
    }
}

👉 查看 完整 JSON 样例

代码示例

将下列帖子链接替换为你想要采集的链接:

posts = [
    {
        "url": "https://www.linkedin.com/pulse/ab-test-optimisation-earlier-decisions-new-readout-de-b%C3%A9naz%C3%A9"
    },
    {
        "url": "https://www.linkedin.com/posts/orlenchner_scrapecon-activity-7180537307521769472-oSYN"
    },
    {
        "url": "https://www.linkedin.com/posts/karin-dodis_web-data-collection-for-businesses-bright-activity-7176601589682434049-Aakz"
    },
    {
        "url": "https://www.linkedin.com/pulse/getting-value-out-sunburst-guillaume-de-b%C3%A9naz%C3%A9"
    },
]

👉 查看 完整 Python 示例

5. 帖子搜索(通过作者/文章URL)

收集特定作者或文章下的详细帖子信息。

linkedin-scraper-bright-data-screenshot-linkedin-posts-discover-by-url

输入参数

参数 类型 必填 说明
url string 作者或文章在 LinkedIn 上的链接
limit number 要获取的文章数量上限

样例响应

{
    "article_info": {
        "id": "fare-business-con-la-propria-identità-cristian-brunori",
        "url": "https://it.linkedin.com/pulse/fare-business-con-la-propria-identità-cristian-brunori",
        "title": "Fare Business con la propria Identità",
        "date_posted": "2017-03-01T17:27:26.000Z",
        "post_type": "article",
        "engagement": {"num_likes": 18, "num_comments": 0}
    },
    "author": {
        "user_id": "cristianbrunori",
        "profile_url": "https://it.linkedin.com/in/cristianbrunori",
        "followers": 5205
    },
    "content": {
        "headline": "Quali sono i fattori che permettono ad un prodotto...",
        "text": "Quali sono i fattori che permettono ad un prodotto..."
    },
    "related_articles": [
        {
            "headline": "La differenza tra Marketing e Branding",
            "date_posted": "2017-06-29T00:00:00.000Z"
        }
    ]
}

👉 查看 完整 JSON 样例

代码示例

更新下方的 urllimit 获取指定作者在 LinkedIn 上发表的文章:

authors = [
    {
        "url": "https://www.linkedin.com/today/author/cristianbrunori?trk=public_post_follow-articles",
        "limit": 50
    },
    {
        "url": "https://www.linkedin.com/today/author/stevenouri?trk=public_post_follow-articles"
    },
]

👉 查看 完整 Python 示例

6. 根据个人资料采集帖子

发现某个 LinkedIn 用户发布或互动过的所有帖子。

linkedin-scraper-bright-data-screenshot-linkedin_posts_by_profile_url

输入参数

参数 类型 必填 说明
url string LinkedIn 个人资料链接
start_date date 帖子筛选起始日期(ISO 8601 格式)
end_date date 帖子筛选结束日期(ISO 8601 格式)

样例响应

{
    "article_info": {
        "id": "fare-business-con-la-propria-identità-cristian-brunori",
        "url": "https://it.linkedin.com/pulse/fare-business-con-la-propria-identità-cristian-brunori",
        "title": "Fare Business con la propria Identità",
        "date_posted": "2017-03-01T17:27:26.000Z",
        "post_type": "article",
        "engagement": {"num_likes": 18, "num_comments": 0}
    },
    "author": {
        "user_id": "cristianbrunori",
        "profile_url": "https://it.linkedin.com/in/cristianbrunori",
        "followers": 5205
    },
    "content": {
        "headline": "Quali sono i fattori che permettono ad un prodotto...",
        "text": "Quali sono i fattori che permettono ad un prodotto..."
    },
    "related_articles": [
        {
            "headline": "La differenza tra Marketing e Branding",
            "date_posted": "2017-06-29T00:00:00.000Z"
        }
    ]
}

👉 查看 完整 JSON 样例

代码示例

根据需要修改个人资料链接及日期范围:

profiles = [
    {
        "url": "https://www.linkedin.com/in/luca-rossi-0aa497bb",
        "start_date": "2024-10-01T00:00:00.000Z",
        "end_date": "2024-10-09T00:00:00.000Z"
    },
    {
        "url": "https://www.linkedin.com/in/srijith-gomattam-401059214",
        "start_date": "2024-09-01T00:00:00.000Z",
        "end_date": "2024-10-01T00:00:00.000Z"
    },
    {
        "url": "https://www.linkedin.com/in/anna-clarke-0a342513",
        "start_date": "2024-10-01T00:00:00.000Z"
    },
]

👉 查看 完整 Python 示例

7. 根据公司主页采集帖子

采集指定公司页面上的帖子及动态。

linkedin-scraper-bright-data-screenshot-linkedin_posts_by_company_url

输入参数

参数 类型 必填 说明
url string 公司 LinkedIn 主页链接
start_date date 帖子筛选起始日期(ISO 8601 格式)
end_date date 帖子筛选结束日期(ISO 8601 格式)

样例响应

{
    "post_info": {
        "id": "7254476883906482179",
        "url": "https://it.linkedin.com/posts/lanieri_lanieri-torna-in-lussemburgo...",
        "date_posted": "2024-10-22T13:01:10.754Z",
        "post_type": "post"
    },
    "content": {
        "title": "Lanieri on LinkedIn: Lanieri torna in Lussemburgo...",
        "text": "Lanieri torna in Lussemburgo. Siamo lieti di annunciare..."
    },
    "engagement": {"likes": 12, "comments": 0},
    "company_info": {
        "name": "Lanieri",
        "followers": 5768,
        "account_type": "Organization",
        "profile_url": "https://it.linkedin.com/company/lanieri"
    }
}

👉 查看 完整 JSON 样例

代码示例

自定义公司链接及日期范围,以采集特定公司的帖子:

companies = [
    {"url": "https://www.linkedin.com/company/green-philly"},
    {"url": "https://www.linkedin.com/company/lanieri"},
    {"url": "https://www.linkedin.com/company/effortel"},
]

👉 查看 完整 Python 示例

8. 根据URL采集职位列表

通过职位链接获取相应职位的完整信息。

linkedin-scraper-bright-data-screenshot-linkedin_jobs_by_url

输入参数

参数 类型 必填 说明
url string LinkedIn 职位链接

样例响应

{
    "job_info": {
        "id": "4073552631",
        "title": "Data Platform Engineer",
        "location": "Tel Aviv-Yafo, Tel Aviv District, Israel",
        "posted_date": "2024-11-22T09:41:10.107Z",
        "posted_time": "1 month ago",
        "employment_type": "Full-time",
        "function": "Engineering and Information Technology",
        "seniority_level": "Not Applicable",
        "industries": "Computer and Network Security",
        "applicants": 85,
        "apply_link": "https://www.linkedin.com/jobs/view/externalApply/4073552631?url=https%3A%2F%2Fcycode%2Ecom..."
    },
    "company": {
        "name": "Cycode | Complete ASPM",
        "id": "40789623",
        "logo": "https://media.licdn.com/dms/image/v2/D4D0BAQFsSsfzqEVWtw/company-logo_100_100/...",
        "url": "https://www.linkedin.com/company/cycode"
    },
    "description": {
        "summary": "This is a unique opportunity to join an exciting early-stage startup...",
        "requirements": [
            "Bachelor's degree in a relevant field...",
            "Proven experience in building, deploying...",
            "Proficiency in data analysis tools such as SQL, Python, Pandas..."
        ],
        "advantages": ["MongoDB", "AWS Cloud", "CICD, Docker Kubernetes"]
    }
}

👉 查看 完整 JSON 样例

代码示例

替换下列职位链接,采集指定职位的信息:

job_searches = [
    {"url": "https://www.linkedin.com/jobs/view/4073552631"},
    {"url": "https://www.linkedin.com/jobs/view/4073729630"},
]

👉 查看 完整 Python 示例

9. 通过关键词搜索职位列表

通过高级搜索和筛选条件采集相关职位信息。

linkedin-scraper-bright-data-screenshot-linkedin_jobs_by_keyword

输入参数

参数 类型 必填 说明
location string 搜索的职位所在城市或地区
keyword string 搜索关键字或职位名称(如 "Product Manager"),可用引号精确匹配
country string 国家代码(如 US、FR 等)
time_range string 职位发布时间范围(如 过去24小时、过去一周等)
job_type string 职位类型(如 全职、兼职、合同工等)
experience_level string 经验要求(如 入门、中级、高级等)
remote string 是否为远程职位
company string 搜索特定公司的职位
selective_search boolean 若设为 true,则过滤掉不含关键词的职位标题

样例响应

{
    "job_info": {
        "id": "4096670538",
        "title": "Remote Part-Time Focus Group Participants...",
        "posted_date": "2024-12-15T09:16:55.932Z",
        "posted_time": "1 week ago",
        "location": {"city": "Bronx", "state": "NY", "country": "US"},
        "type": {
            "employment": "Part-time",
            "level": "Entry level",
            "function": "Other",
            "industry": "Market Research",
            "remote": true
        },
        "applicants": 25,
        "apply_link": "https://www.linkedin.com/jobs/view/externalApply/4096670538?url=https%3A%2F%2Fwww%2Ecollegerecruiter%2Ecom..."
    },
    "company": {
        "name": "Apex Focus Group",
        "id": "89885194",
        "logo": "https://media.licdn.com/dms/image/v2/C560BAQHmbh3iXrrrEA/company-logo_100_100/...",
        "url": "https://www.linkedin.com/company/apex-focus-group"
    },
    "compensation": {
        "per_session": "$75-$150 (1 hour)",
        "multi_session": "$300-$750",
        "frequency": "weekly"
    },
    "requirements": {
        "technical": [
            "Smartphone with working camera...",
            "High speed internet connection"
        ],
        "responsibilities": [
            "Show up 10 mins before discussion...",
            "Complete written and oral instructions..."
        ]
    },
    "search_parameters": {
        "keyword": "data analyst",
        "location": "New York",
        "job_type": "Part-time",
        "experience": "Entry level",
        "remote": "Remote",
        "country": "US"
    }
}

👉 查看 完整 JSON 样例

代码示例

可自定义搜索参数,以不同的地点和要求搜索职位:

search_criteria = [
    {
        "location": "New York",
        "keyword": "data analyst",
        "country": "US",
        "time_range": "Any time",
        "job_type": "Part-time",
        "experience_level": "Entry level",
        "remote": "Remote",
        "company": ""
    },
    {
        "location": "paris",
        "keyword": "product manager",
        "country": "FR",
        "time_range": "Past month",
        "job_type": "Full-time",
        "experience_level": "Internship",
        "remote": "On-site",
        "company": ""
    },
    {
        "location": "New York",
        "keyword": "\"python developer\"",
        "country": "",
        "time_range": "",
        "job_type": "",
        "experience_level": "",
        "remote": "",
        "company": ""
    },
]

👉 查看 完整 Python 示例

10. 通过搜索URL获取职位列表

直接使用 LinkedIn 搜索链接来提取职位列表。

linkedin-scraper-bright-data-screenshot-linkedin_jobs_by_search_url

输入参数

参数 类型 必填 说明
url string LinkedIn 的搜索链接(例如公司搜索或关键词搜索)
selective_search boolean 若设为 true,则过滤掉不含关键词的职位标题

注意:若需实现时间范围过滤,可在搜索链接中通过 &f_TPR 参数设置所需的秒数(小时数×3600)。

  • f_TPR=r3600 表示过去 1 小时
  • f_TPR=r86400 表示过去 24 小时
  • f_TPR=r604800 表示过去一周

样例响应

{
    "job_info": {
        "id": "4107998267",
        "title": "Software Engineer, Professional Services",
        "location": "Tel Aviv District, Israel",
        "posted": {"date": "2024-12-22T08:39:21.666Z", "time_ago": "1 hour ago"},
        "type": {
            "employment": "Full-time",
            "level": "Entry level",
            "function": "Information Technology",
            "industry": "Software Development"
        },
        "applicants": 25,
        "apply_link": "https://www.linkedin.com/jobs/view/externalApply/4107998267?url=https%3A%2F%2Fwww%2Efireblocks%2Ecom%2Fcareers..."
    },
    "company": {
        "name": "Fireblocks",
        "id": "14824547",
        "logo": "https://media.licdn.com/dms/image/v2/C4D0BAQEyT6gpuwTpPg/company-logo_100_100/...",
        "url": "https://www.linkedin.com/company/fireblocks"
    },
    "requirements": {
        "core": [
            "2+ years of software development experience",
            "Proficiency in JavaScript, TypeScript, and Python",
            "Strong understanding of frontend and backend technologies"
        ],
        "nice_to_have": [
            "Experience with Fireblocks or similar crypto platforms",
            "Knowledge of cloud platforms..."
        ]
    },
    "responsibilities": [
        "Collaborate with clients on technical requirements...",
        "Build custom tools and integrations..."
    ]
}

👉 查看 完整 JSON 样例

代码示例

改变下面的搜索链接,以采集特定公司或搜索结果中的职位列表:

search_urls = [
    {
        "url": "https://www.linkedin.com/jobs/search?keywords=Software&location=Tel%20Aviv-Yafo&geoId=101570771&trk=public_jobs_jobs-search-bar_search-submit&position=1&pageNum=0&f_TPR=r3600"
    },
    {"url": "https://www.linkedin.com/jobs/semrush-jobs?f_C=2821922"},
    {"url": "https://www.linkedin.com/jobs/reddit-inc.-jobs-worldwide?f_C=150573"}
]

👉 查看 完整 Python 示例

数据采集方式

可使用以下参数来细化采集结果:

参数 类型 说明 示例
limit integer 每个输入的最大返回结果数 limit=10
include_errors boolean 是否返回报错信息以便排查异常 include_errors=true
notify url 爬虫完成后,向此地址发送回调通知 notify=https://notify-me.com/
format enum 输出格式(如 JSON,NDJSON,JSONL,CSV) format=json

💡 提示:你还可以选择将数据交付到外部存储,或以Webhook形式传递数据。


需要更多详情?请查看官方 API 文档

About

使用排名第一的 LinkedIn 爬虫工具 API 提取 LinkedIn 数据,包括个人资料、职位发布、公司详情、人脉信息和帖子。立即开始您的免费试用!

Resources

Stars

Watchers

Forks

Languages