A high-performance FastAPI microservice that extracts content from web articles using newspaper3k, providing structured data through a RESTful API.
- Article Extraction: Extracts comprehensive article data including:
- Title and main content
- Authors and publication date
- Images and videos
- Meta information (keywords, description, language)
- Additional metadata
- Clean Architecture: Modular design with clear separation of concerns
- FastAPI Framework: High performance, automatic OpenAPI documentation
- Error Handling: Robust error handling for various failure scenarios
- Type Safety: Full type hints and Pydantic models for request/response validation
- Python 3.8 or higher
- pip package manager
- Clone the repository:
git clone https://github.com/yourusername/content-scraper-api.git
cd content-scraper-api- Install dependencies:
pip install -r requirements.txtpython main.pyOr using uvicorn directly:
uvicorn main:app --reloadThe API will be available at http://localhost:8000
Fetches and parses an article from the provided URL.
Request:
{
"url": "https://example.com/news/article"
}Response:
{
"url": "https://example.com/news/article",
"title": "Example Article Title",
"content": "Article content text...",
"top_image": "https://example.com/images/top.jpg",
"authors": ["Author Name"],
"images": [
"https://example.com/images/1.jpg",
"https://example.com/images/2.jpg"
],
"movies": ["https://example.com/videos/1.mp4"]
}FastAPI automatically generates interactive API documentation:
- Swagger UI: http://localhost:8000/docs
- ReDoc: http://localhost:8000/redoc
The application follows a clean architecture approach with the following components:
- API Layer: Handles HTTP requests and responses
- Service Layer: Contains the business logic for article extraction
- Core: Configuration and shared utilities
The API handles various error scenarios:
- Invalid URLs
- Unreachable sites
- Parsing failures
- Server errors
The modular architecture makes it easy to extend the API:
- Add new endpoints in
api/routes.py - Add new services in the
servicespackage - Modify the data models in
api/models.py
MIT