A dataset of 2K Amazon product reviews from 2019 to August 2023 (time of publication) were analyzed for AI Content, and its relationship with other features.
Approximately 26K product review records was retrieved from Amazon. After cleaning and removal of sensitive and identifying customer information, 2K records were processed with Originality.AI's AI Detection API. The state-of-the-art model analyzes text data for the likelihood of AI Content and returns a probability score of 0-100%
helpfulScorewas derived by standardizinghelpfulCounton the geometric mean oftotalCategoryRatingsandtotalCategoryReviews.ratingSeveritywas derived by binarizingratingScoreinto Extreme and Moderate values.
All functions are from the scipy.stats package library.
- Analysis of aiContent against categorical features was performed with
f_onewayone-way ANOVA testing. For the case of comparing categorical data against numerical data, it tests against the Null hypothesis that the two features are not correlated. - Analysis of aiContent against numerical features was performed with
pearsonr,kendalltauandspearmanr. They test for the null Hypothesis that the features are not correlated. - Analysis of the distribution of the sample data, to confirm that it's representative of the raw data was performed with
ks_2samp,chisquare, andentropy. The first two tests for the null hypothesis that 2 independent samples are drawn from the same continuous distribution. The last is a test of the difference in the distributions.
Trend Analysis was done by converting aiContent into a binary feature and aggregating its annual averages.
- Roughly 400% increase in
aiContenttrend from 2022 to 2023. Chat GPT launched on November 30, 2022. - Correlation between
aiContentandratingSeverity,ratingScoreandisVerified. - Negative correlation between
helpfulScoreandaiContent.
Studies are still ongoing. More data will be required for deeper analysis and findings of statistical confidence. The results so far are interesting and it will be worthwhile to look into the data from an e-commerce website with less rigorous review monitoring.




