The dataset used in this predictive model is the Kaggle Breast Cancer Wisconsin Diagnostic (WBCD) dataset, which may exhibit several forms of bias that can impact the fairness and accuracy of predictions:
- Lack of demographic representation: The dataset may underrepresent specific groups (age, ethnicity, socioeconomic status), leading to unfair outcomes for those populations (Wilkerson et al., 2024).
- Selection bias: The dataset likely includes only patients with access to healthcare, excluding those from underserved communities.
- Data quality issues: Missing, inconsistent, or erroneous data can reduce the model's performance and exacerbate unfairness (Nugroho, 2023).
To mitigate such biases, IBM AI Fairness 360 provides several fairness-enhancing algorithms:
- Fairness-aware model selection: Selects models optimized for fairness (Varshney, 2018).
- Model parameter adjustment: Modifies learning parameters to reduce biases.
- Bias-reducing data preprocessing: Normalizes, scales, and balances the dataset.
To build fairer models:
- 📊 Data collection: Use diverse sources to ensure broad demographic coverage.
- 🧹 Preprocessing: Normalize data, handle missing values, and remove outliers.
- ⚖️ Fair model selection: Choose algorithms that prioritize fairness.
- 🔧 Adjust models: Tune parameters to align with fairness objectives.
By integrating fairness tools and improving dataset diversity, we can create AI systems that make accurate and equitable predictions for all users.
- Nugroho, H. (2023). A Review: Data Quality Problem in Predictive Analytics. IJAIT, 7(2), 79. https://doi.org/10.25124/ijait.v7i02.5980
- Wilkerson, A. D., Gentle, C. K., Ortega, C., & Al-Hilli, Z. (2024). Disparities in Breast Cancer Care. Healthcare, 12(4), 462. https://doi.org/10.3390/healthcare12040462
- Varshney, K. R. (2018). IBM AI Fairness 360: A Comprehensive Toolkit. IBM Research.