A machine learning project that automatically categorizes news articles using natural language processing.
Takes news articles and figures out what category they belong to. Uses neural networks and word embeddings to understand the text and make predictions.
- Text cleaning - Removes unwanted characters and formats the text
- NLP processing - Uses Stanford CoreNLP to break down sentences and find root words
- Word embeddings - Converts words to numbers using GloVe vectors
- Neural network - Trains a model to classify articles into different categories
- Java - Main programming language
- Stanford CoreNLP - For text processing and lemmatization
- DeepLearning4J - Neural network framework
- GloVe - Pre-trained word vectors
- Handles multiple news categories
- Custom vector operations for semantic analysis
- Optimized training with Adam optimizer
- Text preprocessing pipeline with stop-word removal
Built as part of university coursework to learn about machine learning and NLP techniques.