GitHub - KW0SS/data_collection_A

데이터 파이프라인 실행 순서

f_fetch_induty_codes.py → data/etc/delisted_induty_codes.csv 생성
f_make_delisted_input.py → data/input/{섹터}_companies_final.csv 생성

입력 파일 (Git 미포함, from KIND)

data/etc/상장폐지현황.xlsx
data/etc/상장법인목록.xlsx

스크립트 설명

`f_make_delisted_input.py`

상장폐지 기업 중 재무위기로 폐지된 기업만 추려서 GICS 섹터 필터링 후, 정상 기업 목록과 합쳐 최종 학습용 데이터셋을 만드는 스크립트

내부적으로 아래 순서로 동작합니다:

단계	내용
1. 업종코드 로드	`delisted_induty_codes.csv`에서 상폐 기업별 induty_code 가져오기
2. SPAC 제거	스펙·기업인수목적 등 이름 기반 필터링
3. 폐지사유 필터링	합병·자진상폐 등 제외 → 부도·감사의견거절·자본잠식 등 재무위기만 유지
4. GICS 매핑	induty_code 앞자리로 담당 섹터 분류, 해당 없으면 제거
5. label 부여	상폐 기업 `label=1` / 정상 기업 `label=0`
6. 병합 및 중복 제거	정상 기업(`{섹터}_companies.csv`)과 합치기, 중복 종목코드는 정상 기업 우선
7. 저장	`{섹터}_companies_final.csv` 출력

섹터별 커스터마이징 가이드

f_make_delisted_input.py는 A섹터 기준으로 작성되어 있습니다. 담당 섹터에 맞게 아래 두 부분만 수정하세요.

수정 1: GICS 매핑 테이블 (`INDUTY_GICS_3`, `INDUTY_GICS_2`)

induty_code 앞자리 → 담당 섹터 GICS명으로 매핑하는 부분입니다.

# 현재 A섹터 예시
INDUTY_GICS_2 = {
    "58": "Information Technology",
    "62": "Information Technology",
    "59": "Communication Services",
    "14": "Consumer Discretionary",
    ...
}

담당 섹터의 induty_code 분포는 아래 코드로 확인할 수 있습니다:

import pandas as pd
df = pd.read_csv("data/etc/delisted_induty_codes.csv", dtype={"induty_code": str})
print(df["induty_code"].str[:2].value_counts().sort_index())

수정 2: 저장 파일명

# 현재 (A섹터)
OUTPUT_FILE = Path("data/input/A_companies_final.csv")
NORMAL_FILE = Path("data/input/A_companies.csv")

# 담당 섹터에 맞게 변경 (예: B섹터)
OUTPUT_FILE = Path("data/input/B_companies_final.csv")
NORMAL_FILE = Path("data/input/B_companies.csv")

참고: induty_code 앞자리 주요 매핑표

앞자리	업종	GICS 섹터 참고
20~21	식품 제조	Consumer Staples
24~25	화학	Materials
26	전자·반도체	IT / Industrials
27~28	의료기기·의약품	Health Care
29~30	기계·자동차	Industrials
35	전기·가스	Utilities
41~42	건설	Real Estate
46~47	도소매	Consumer Discretionary
58~63	IT·소프트웨어	Information Technology
64~66	금융	Financials
68	부동산	Real Estate

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
data		data
README.md		README.md
f_fetch_induty_codes.py		f_fetch_induty_codes.py
f_make_delisted_input.py		f_make_delisted_input.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

데이터 파이프라인 실행 순서

입력 파일 (Git 미포함, from KIND)

스크립트 설명

`f_make_delisted_input.py`

섹터별 커스터마이징 가이드

수정 1: GICS 매핑 테이블 (`INDUTY_GICS_3`, `INDUTY_GICS_2`)

수정 2: 저장 파일명

참고: induty_code 앞자리 주요 매핑표

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

데이터 파이프라인 실행 순서

입력 파일 (Git 미포함, from KIND)

스크립트 설명

f_make_delisted_input.py

섹터별 커스터마이징 가이드

수정 1: GICS 매핑 테이블 (INDUTY_GICS_3, INDUTY_GICS_2)

수정 2: 저장 파일명

참고: induty_code 앞자리 주요 매핑표

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`f_make_delisted_input.py`

수정 1: GICS 매핑 테이블 (`INDUTY_GICS_3`, `INDUTY_GICS_2`)

Packages