本仓库提供两种在 LinkedIn 上收集数据的方式:
- 免费方式:适合小规模项目、实验和学习目的。
- LinkedIn 爬虫工具 API:专为大规模、高可靠性、实时的数据采集设计。
- 第一种方法:免费 LinkedIn 爬虫工具
- 免费方式常见的采集挑战
- 第二种方法:Bright Data LinkedIn 爬虫工具 API
- 开始使用 LinkedIn 爬虫工具 API
- 数据采集方式
此免费工具提供两个主要功能:
- LinkedIn 职位爬虫工具:可采集包含丰富元数据的职位列表
- LinkedIn 资料校验器:用于验证 LinkedIn 个人和公司链接是否有效
从 LinkedIn 的职位搜索中采集职位信息。
主要特性:
- 采集职位详情(标题、公司、地点、链接、发布日期等)
- 内置请求频率限制与错误处理
- 输出整洁的 JSON 数据
验证 LinkedIn 个人或公司主页链接是否有效。
主要特性:
- 检查个人/公司链接
- 自动重试失败的请求
- 显示每条 URL 的详细状态
- 可一次性批量检查多个链接
以下步骤将帮助你在几分钟内启动并运行:
- Python 3.9 或更高版本
- 在 requirements.txt 中列出的依赖库
只需三步即可开始:
git clone https://github.com/luminati-io/LinkedIn-Scraper.git
cd LinkedIn-Scraper
pip install -r requirements.txt以下示例展示如何使用这些爬虫工具:
配置搜索参数:
# 在 jobs_scraper.py 中
params = {
"keywords": "AI/ML Engineer", # 要搜索的职位/关键字
"location": "London", # 要搜索的地点
"max_jobs": 100 # 采集的最大职位数量
}
# 运行:python jobs_scraper.py爬虫工具会生成一个包含职位详情的 JSON 文件,例如:
{
"title": "Research Engineer, AI/Machine Learning",
"company": "Google",
"location": "London, England, United Kingdom",
"job_link": "https://uk.linkedin.com/jobs/view/research-engineer-ai-machine-learning-at-google-4086259724",
"posted_date": "3 weeks ago"
}配置需要验证的 URL:
# 在 profile_checker.py 中
test_urls = [
"https://www.linkedin.com/company/bright-data/",
"https://www.linkedin.com/company/aabbccdd/"
]
# 运行:python profile_checker.py你会获得每个链接的校验结果:
✓ linkedin.com/company/bright-data - Status: 200
✗ linkedin.com/company/aabbccdd - Status: 400当你在 LinkedIn 上采集数据时,可能会遇到各种反爬机制,以下列出主要挑战:
- 请求频率限制:LinkedIn 会严格监控每个 IP 地址的请求频率,一旦超过限制,可能导致 IP 被临时或永久封禁。
- 验证码:若检测到异常访问行为,LinkedIn 会要求输入验证码,以阻断自动化爬虫。
- 身份验证壁垒:大部分有价值的 LinkedIn 数据都需要登录才能访问,而平台在检测到自动化登录尝试时会予以封锁。
- 技术难点:如处理多页翻页、动态内容加载、不完整数据点及页面广告等。
手动方式适用于小规模采集项目,但若需求量大,就会变得困难且消耗资源。想要高效、可扩展和更可靠的数据采集方案,Bright Data 的解决方案可节省大量时间和资源,并确保更高质量的数据结果。
如果你需要更健壮且可扩展的 LinkedIn 采集方案,可以考虑 Bright Data LinkedIn 爬虫工具 API。优势如下:
- 无需自行搭建基础设施:自动处理代理、验证码及请求频率限制。
- 可扩展且高可靠:针对大规模、实时数据采集进行了优化。
- 覆盖范围广:可从个人资料、职位、公司和帖子中采集数据。
- 全球可用:支持所有地区和语言。
- 隐私合规:严格遵守 GDPR 和 CCPA。
- 按量付费:仅为成功的响应付费。
- 免费试用:可获得 20 次免费 API 调用立即开始体验。
Bright Data 的 LinkedIn 爬虫工具 API 可让开发者以编程方式从 LinkedIn 的个人资料、公司主页、职位列表及帖子中采集公开数据。这套企业级解决方案可自动处理包括代理管理、请求限速及数据解析等在内的复杂基础设施需求。
开始前,你需要:
- Bright Data 帐号
- 前往 Bright Data 官网注册免费试用并登录。
- 在“Billing”页面添加支付方式以激活账户。
- API Token
- 根据 此指南 获取你的 API Token。
通过公司在 LinkedIn 上的链接,采集企业的详细信息。
| 字段 | 类型 | 必填 | 说明 |
|---|---|---|---|
url |
string | 是 | 要采集信息的公司 LinkedIn URL |
{
"name": "Kraft Heinz",
"about": "The Kraft Heinz Company 是世界上最大的食品和饮料公司之一...",
"key_info": {
"headquarters": "Chicago, IL",
"founded": 2015,
"company_size": "10,001+ employees",
"organization_type": "Public Company",
"industries": "Food and Beverage Services",
"website": "https://www.careers.kraftheinz.com/"
},
"metrics": {"linkedin_followers": 1557451, "linkedin_employees": 25254},
"stock_info": {
"ticker": "KHC",
"exchange": "NASDAQ",
"price": "$30.52",
"last_updated": "December 21, 2024"
},
"specialties": "Food, Fast Moving Consumer Packaged Goods, CPG, and Consumer Packaged Goods",
"locations": ["200 E. Randolph St. Suite 7600 Chicago, IL 60601, US"],
"slogan": "Let's make life delicious!"
}👉 此处仅展示关键字段,完整数据可参考 JSON 样例。
在列表中修改公司链接,采集目标公司的数据:
companies = [
{"url": "https://il.linkedin.com/company/ibm"},
{"url": "https://www.linkedin.com/company/stalkit"},
{
"url": "https://www.linkedin.com/organization-guest/company/the-kraft-heinz-company"
},
{"url": "https://il.linkedin.com/company/bright-data"},
]👉 查看 完整 Python 示例
获取 LinkedIn 个人资料的详细信息。
| 参数 | 类型 | 必填 | 说明 |
|---|---|---|---|
url |
string | 是 | 要采集的个人资料链接 |
{
"name": "Richard Branson",
"profile_info": {
"position": "Founder at Virgin Group",
"followers": 18730516,
"connections": 2,
"avatar": "https://media.licdn.com/dms/image/v2/C4D03AQHh6_Wth5f3rQ/..."
},
"experience": [
{
"title": "Founder",
"company": "Virgin Group",
"duration": "Jan 1968 - Present (57 years)",
"description": "Tie-loathing adventurer..."
}
],
"current_company": {"name": "Virgin Group", "title": "Founder at Virgin Group"},
"url": "https://www.linkedin.com/in/rbranson/"
}👉 仅展示关键字段,完整数据可参考 JSON 样例。
替换下方的链接为你感兴趣的个人资料:
profiles = [
{"url": "https://www.linkedin.com/in/williamhgates"},
{"url": "https://www.linkedin.com/in/rbranson/"},
{"url": "https://www.linkedin.com/in/justinwelsh/"},
{"url": "https://www.linkedin.com/in/simonsinek/"},
]👉 查看 完整 Python 示例
通过姓名搜索 LinkedIn 个人资料。
| 参数 | 类型 | 必填 | 说明 |
|---|---|---|---|
first_name |
string | 是 | 要搜索的名字 |
last_name |
string | 是 | 要搜索的姓氏 |
{
"profile_info": {
"id": "richard-branson-8a38866",
"name": "Richard Branson",
"location": {"city": "Cincinnati", "state": "Ohio", "country": "US"},
"about": "Respiratory therapist with 40 years of experience...",
"metrics": {"followers": 868, "connections": 500, "recommendations": 1}
},
"professional": {
"current_position": {
"company": "University of Cincinnati",
"company_link": "https://www.linkedin.com/school/university-of-cincinnati"
},
"education": {
"school": "The George Washington University School of Medicine and Health Sciences",
"years": "2001-2003"
}
},
"recommendations": [
"Tracy OConnell Well known pro active valuable assett..."
],
"similar_professionals": [
{
"name": "Walter J. Jones, PhD, MHSA",
"title": "Professor at Medical University of South Carolina",
"location": "Mount Pleasant, SC"
}
],
"url": "https://www.linkedin.com/in/richard-branson-8a38866"
}👉 查看 完整 JSON 样例
修改下列的 first_name 和 last_name 以查找指定人员:
people = [
{"first_name": "Richard", "last_name": "Branson"},
{"first_name": "Bill", "last_name": "Gates"},
]👉 查看 完整 Python 示例
获取指定 LinkedIn 帖子的详细信息。
| 参数 | 类型 | 必填 | 说明 |
|---|---|---|---|
url |
string | 是 | LinkedIn 帖子链接 |
{
"post_info": {
"id": "7176601589682434049",
"url": "https://www.linkedin.com/posts/karin-dodis_web-data-collection-for-businesses-bright-activity-7176601589682434049-Aakz",
"date_posted": "2024-03-21T15:32:33.770Z",
"post_type": "post",
"engagement": {"num_likes": 12, "num_comments": 4}
},
"content": {
"title": "Karin Dodis on LinkedIn: Web data collection for Businesses. Bright Data",
"text": "Hey data enthusiasts, Bright Data has an awesome collection of free datasets..."
},
"author": {
"user_id": "karin-dodis",
"profile_url": "https://il.linkedin.com/in/karin-dodis",
"followers": 4131,
"total_posts": 28
},
"repost_info": {
"original_author": "Or Lenchner",
"original_author_id": "orlenchner",
"original_text": "Free Datasets! Not just samples, but complete datasets with millions of records...",
"original_date": "2024-03-27T15:39:54.497Z",
"original_post_id": "7176470998987214848"
}
}👉 查看 完整 JSON 样例
将下列帖子链接替换为你想要采集的链接:
posts = [
{
"url": "https://www.linkedin.com/pulse/ab-test-optimisation-earlier-decisions-new-readout-de-b%C3%A9naz%C3%A9"
},
{
"url": "https://www.linkedin.com/posts/orlenchner_scrapecon-activity-7180537307521769472-oSYN"
},
{
"url": "https://www.linkedin.com/posts/karin-dodis_web-data-collection-for-businesses-bright-activity-7176601589682434049-Aakz"
},
{
"url": "https://www.linkedin.com/pulse/getting-value-out-sunburst-guillaume-de-b%C3%A9naz%C3%A9"
},
]👉 查看 完整 Python 示例
收集特定作者或文章下的详细帖子信息。
| 参数 | 类型 | 必填 | 说明 |
|---|---|---|---|
url |
string | 是 | 作者或文章在 LinkedIn 上的链接 |
limit |
number | 否 | 要获取的文章数量上限 |
{
"article_info": {
"id": "fare-business-con-la-propria-identità-cristian-brunori",
"url": "https://it.linkedin.com/pulse/fare-business-con-la-propria-identità-cristian-brunori",
"title": "Fare Business con la propria Identità",
"date_posted": "2017-03-01T17:27:26.000Z",
"post_type": "article",
"engagement": {"num_likes": 18, "num_comments": 0}
},
"author": {
"user_id": "cristianbrunori",
"profile_url": "https://it.linkedin.com/in/cristianbrunori",
"followers": 5205
},
"content": {
"headline": "Quali sono i fattori che permettono ad un prodotto...",
"text": "Quali sono i fattori che permettono ad un prodotto..."
},
"related_articles": [
{
"headline": "La differenza tra Marketing e Branding",
"date_posted": "2017-06-29T00:00:00.000Z"
}
]
}👉 查看 完整 JSON 样例
更新下方的 url 和 limit 获取指定作者在 LinkedIn 上发表的文章:
authors = [
{
"url": "https://www.linkedin.com/today/author/cristianbrunori?trk=public_post_follow-articles",
"limit": 50
},
{
"url": "https://www.linkedin.com/today/author/stevenouri?trk=public_post_follow-articles"
},
]👉 查看 完整 Python 示例
发现某个 LinkedIn 用户发布或互动过的所有帖子。
| 参数 | 类型 | 必填 | 说明 |
|---|---|---|---|
url |
string | 是 | LinkedIn 个人资料链接 |
start_date |
date | 否 | 帖子筛选起始日期(ISO 8601 格式) |
end_date |
date | 否 | 帖子筛选结束日期(ISO 8601 格式) |
{
"article_info": {
"id": "fare-business-con-la-propria-identità-cristian-brunori",
"url": "https://it.linkedin.com/pulse/fare-business-con-la-propria-identità-cristian-brunori",
"title": "Fare Business con la propria Identità",
"date_posted": "2017-03-01T17:27:26.000Z",
"post_type": "article",
"engagement": {"num_likes": 18, "num_comments": 0}
},
"author": {
"user_id": "cristianbrunori",
"profile_url": "https://it.linkedin.com/in/cristianbrunori",
"followers": 5205
},
"content": {
"headline": "Quali sono i fattori che permettono ad un prodotto...",
"text": "Quali sono i fattori che permettono ad un prodotto..."
},
"related_articles": [
{
"headline": "La differenza tra Marketing e Branding",
"date_posted": "2017-06-29T00:00:00.000Z"
}
]
}👉 查看 完整 JSON 样例
根据需要修改个人资料链接及日期范围:
profiles = [
{
"url": "https://www.linkedin.com/in/luca-rossi-0aa497bb",
"start_date": "2024-10-01T00:00:00.000Z",
"end_date": "2024-10-09T00:00:00.000Z"
},
{
"url": "https://www.linkedin.com/in/srijith-gomattam-401059214",
"start_date": "2024-09-01T00:00:00.000Z",
"end_date": "2024-10-01T00:00:00.000Z"
},
{
"url": "https://www.linkedin.com/in/anna-clarke-0a342513",
"start_date": "2024-10-01T00:00:00.000Z"
},
]👉 查看 完整 Python 示例
采集指定公司页面上的帖子及动态。
| 参数 | 类型 | 必填 | 说明 |
|---|---|---|---|
url |
string | 是 | 公司 LinkedIn 主页链接 |
start_date |
date | 否 | 帖子筛选起始日期(ISO 8601 格式) |
end_date |
date | 否 | 帖子筛选结束日期(ISO 8601 格式) |
{
"post_info": {
"id": "7254476883906482179",
"url": "https://it.linkedin.com/posts/lanieri_lanieri-torna-in-lussemburgo...",
"date_posted": "2024-10-22T13:01:10.754Z",
"post_type": "post"
},
"content": {
"title": "Lanieri on LinkedIn: Lanieri torna in Lussemburgo...",
"text": "Lanieri torna in Lussemburgo. Siamo lieti di annunciare..."
},
"engagement": {"likes": 12, "comments": 0},
"company_info": {
"name": "Lanieri",
"followers": 5768,
"account_type": "Organization",
"profile_url": "https://it.linkedin.com/company/lanieri"
}
}👉 查看 完整 JSON 样例
自定义公司链接及日期范围,以采集特定公司的帖子:
companies = [
{"url": "https://www.linkedin.com/company/green-philly"},
{"url": "https://www.linkedin.com/company/lanieri"},
{"url": "https://www.linkedin.com/company/effortel"},
]👉 查看 完整 Python 示例
通过职位链接获取相应职位的完整信息。
| 参数 | 类型 | 必填 | 说明 |
|---|---|---|---|
url |
string | 是 | LinkedIn 职位链接 |
{
"job_info": {
"id": "4073552631",
"title": "Data Platform Engineer",
"location": "Tel Aviv-Yafo, Tel Aviv District, Israel",
"posted_date": "2024-11-22T09:41:10.107Z",
"posted_time": "1 month ago",
"employment_type": "Full-time",
"function": "Engineering and Information Technology",
"seniority_level": "Not Applicable",
"industries": "Computer and Network Security",
"applicants": 85,
"apply_link": "https://www.linkedin.com/jobs/view/externalApply/4073552631?url=https%3A%2F%2Fcycode%2Ecom..."
},
"company": {
"name": "Cycode | Complete ASPM",
"id": "40789623",
"logo": "https://media.licdn.com/dms/image/v2/D4D0BAQFsSsfzqEVWtw/company-logo_100_100/...",
"url": "https://www.linkedin.com/company/cycode"
},
"description": {
"summary": "This is a unique opportunity to join an exciting early-stage startup...",
"requirements": [
"Bachelor's degree in a relevant field...",
"Proven experience in building, deploying...",
"Proficiency in data analysis tools such as SQL, Python, Pandas..."
],
"advantages": ["MongoDB", "AWS Cloud", "CICD, Docker Kubernetes"]
}
}👉 查看 完整 JSON 样例
替换下列职位链接,采集指定职位的信息:
job_searches = [
{"url": "https://www.linkedin.com/jobs/view/4073552631"},
{"url": "https://www.linkedin.com/jobs/view/4073729630"},
]👉 查看 完整 Python 示例
通过高级搜索和筛选条件采集相关职位信息。
| 参数 | 类型 | 必填 | 说明 |
|---|---|---|---|
location |
string | 是 | 搜索的职位所在城市或地区 |
keyword |
string | 否 | 搜索关键字或职位名称(如 "Product Manager"),可用引号精确匹配 |
country |
string | 否 | 国家代码(如 US、FR 等) |
time_range |
string | 否 | 职位发布时间范围(如 过去24小时、过去一周等) |
job_type |
string | 否 | 职位类型(如 全职、兼职、合同工等) |
experience_level |
string | 否 | 经验要求(如 入门、中级、高级等) |
remote |
string | 否 | 是否为远程职位 |
company |
string | 否 | 搜索特定公司的职位 |
selective_search |
boolean | 否 | 若设为 true,则过滤掉不含关键词的职位标题 |
{
"job_info": {
"id": "4096670538",
"title": "Remote Part-Time Focus Group Participants...",
"posted_date": "2024-12-15T09:16:55.932Z",
"posted_time": "1 week ago",
"location": {"city": "Bronx", "state": "NY", "country": "US"},
"type": {
"employment": "Part-time",
"level": "Entry level",
"function": "Other",
"industry": "Market Research",
"remote": true
},
"applicants": 25,
"apply_link": "https://www.linkedin.com/jobs/view/externalApply/4096670538?url=https%3A%2F%2Fwww%2Ecollegerecruiter%2Ecom..."
},
"company": {
"name": "Apex Focus Group",
"id": "89885194",
"logo": "https://media.licdn.com/dms/image/v2/C560BAQHmbh3iXrrrEA/company-logo_100_100/...",
"url": "https://www.linkedin.com/company/apex-focus-group"
},
"compensation": {
"per_session": "$75-$150 (1 hour)",
"multi_session": "$300-$750",
"frequency": "weekly"
},
"requirements": {
"technical": [
"Smartphone with working camera...",
"High speed internet connection"
],
"responsibilities": [
"Show up 10 mins before discussion...",
"Complete written and oral instructions..."
]
},
"search_parameters": {
"keyword": "data analyst",
"location": "New York",
"job_type": "Part-time",
"experience": "Entry level",
"remote": "Remote",
"country": "US"
}
}👉 查看 完整 JSON 样例
可自定义搜索参数,以不同的地点和要求搜索职位:
search_criteria = [
{
"location": "New York",
"keyword": "data analyst",
"country": "US",
"time_range": "Any time",
"job_type": "Part-time",
"experience_level": "Entry level",
"remote": "Remote",
"company": ""
},
{
"location": "paris",
"keyword": "product manager",
"country": "FR",
"time_range": "Past month",
"job_type": "Full-time",
"experience_level": "Internship",
"remote": "On-site",
"company": ""
},
{
"location": "New York",
"keyword": "\"python developer\"",
"country": "",
"time_range": "",
"job_type": "",
"experience_level": "",
"remote": "",
"company": ""
},
]👉 查看 完整 Python 示例
直接使用 LinkedIn 搜索链接来提取职位列表。
| 参数 | 类型 | 必填 | 说明 |
|---|---|---|---|
url |
string | 是 | LinkedIn 的搜索链接(例如公司搜索或关键词搜索) |
selective_search |
boolean | 否 | 若设为 true,则过滤掉不含关键词的职位标题 |
注意:若需实现时间范围过滤,可在搜索链接中通过
&f_TPR参数设置所需的秒数(小时数×3600)。
f_TPR=r3600表示过去 1 小时f_TPR=r86400表示过去 24 小时f_TPR=r604800表示过去一周
{
"job_info": {
"id": "4107998267",
"title": "Software Engineer, Professional Services",
"location": "Tel Aviv District, Israel",
"posted": {"date": "2024-12-22T08:39:21.666Z", "time_ago": "1 hour ago"},
"type": {
"employment": "Full-time",
"level": "Entry level",
"function": "Information Technology",
"industry": "Software Development"
},
"applicants": 25,
"apply_link": "https://www.linkedin.com/jobs/view/externalApply/4107998267?url=https%3A%2F%2Fwww%2Efireblocks%2Ecom%2Fcareers..."
},
"company": {
"name": "Fireblocks",
"id": "14824547",
"logo": "https://media.licdn.com/dms/image/v2/C4D0BAQEyT6gpuwTpPg/company-logo_100_100/...",
"url": "https://www.linkedin.com/company/fireblocks"
},
"requirements": {
"core": [
"2+ years of software development experience",
"Proficiency in JavaScript, TypeScript, and Python",
"Strong understanding of frontend and backend technologies"
],
"nice_to_have": [
"Experience with Fireblocks or similar crypto platforms",
"Knowledge of cloud platforms..."
]
},
"responsibilities": [
"Collaborate with clients on technical requirements...",
"Build custom tools and integrations..."
]
}👉 查看 完整 JSON 样例
改变下面的搜索链接,以采集特定公司或搜索结果中的职位列表:
search_urls = [
{
"url": "https://www.linkedin.com/jobs/search?keywords=Software&location=Tel%20Aviv-Yafo&geoId=101570771&trk=public_jobs_jobs-search-bar_search-submit&position=1&pageNum=0&f_TPR=r3600"
},
{"url": "https://www.linkedin.com/jobs/semrush-jobs?f_C=2821922"},
{"url": "https://www.linkedin.com/jobs/reddit-inc.-jobs-worldwide?f_C=150573"}
]👉 查看 完整 Python 示例
可使用以下参数来细化采集结果:
| 参数 | 类型 | 说明 | 示例 |
|---|---|---|---|
limit |
integer |
每个输入的最大返回结果数 | limit=10 |
include_errors |
boolean |
是否返回报错信息以便排查异常 | include_errors=true |
notify |
url |
爬虫完成后,向此地址发送回调通知 | notify=https://notify-me.com/ |
format |
enum |
输出格式(如 JSON,NDJSON,JSONL,CSV) | format=json |
💡 提示:你还可以选择将数据交付到外部存储,或以Webhook形式传递数据。
需要更多详情?请查看官方 API 文档。











