Airflow DAG that performs sentiment analysis on API-fetched news articles. The DAG is designed to be deployed on GCP Composer and run at a daily interval. During the DAG Run the processes below are serially executed:
- A table is idempotently created on Google BigQuery. Each row represents a news article,
with the table's columns being
title,creator,description,country,category,description_sentiment,topicandretrievaldate. - News articles are fetched using the
newsdata.ioAPI client. The topic and volume of the articles fetched are configurable via the YAML file. - The articles' descriptions are processed with standard NLP methods and their sentiment is calculated
with the
VaderNLTKsub-module. Sentiment scores range from 0 (negative sentiment) to 1 (positive sentiment). - The results of step 3 are stored in the BigQuery table created in step 1.