Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
__pycache__/
.vscode/
final_task/rss_reader/parser.log
final_task/rss_reader/client/parser.log
final_task/rss_reader/client/__pycache__/
final_task/env/
5 changes: 1 addition & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,4 @@ Congrats! You have successfully forked our repository.
2. Pull request name *MUST* be in format: `YourFirstName_YourLastName_EmailYouUsedWhileRegisteringOnThisCourse`
3. Pull request which have any other name format, or invalid e-mail *will be ignored completely until you fix it*. So make sure you specified correct e-mail.
4. In pull request description specify your current iteration. You also can add there any other info you want us to know before we start code review.
5. *Pull request must NOT contain any .pyc files, any virtual environment files/folders, any IDE technical files*.



5. *Pull request must NOT contain any .pyc files, any virtual environment files/folders, any IDE technical files*.
100 changes: 97 additions & 3 deletions final_task/README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,97 @@
# Your readme here
Some text.
Checkout how to write this file using *markdown*.
# RSS-READER

## Command-line utility which receives RSS URL and prints results in human-readable format.

### **Example:**
python rss_reader.py https://news.yahoo.com/rss - -limit 1

### **Output**:

**Feed: Yahoo News - Latest News & Headlines**

**Title**: Families come from across U.S. to grieve relatives slain in Mexico

**Date**: Thu, 07 Nov 2019 01:06:45 -0500

**Link**: https://news.yahoo.com/under-armed-escort-mourner-convoys-060645935.html

**Description**: An American man whose grandchildren were slain in a massacre in Mexico demanded justice on Thursday for other victims of the country's drug war, as relatives gathered from
across the United States for a funeral guarded by heavily armed military. Kenneth Miller lost his daughter-in-law and four grandchildren, all dual citizens, in an ambush on Monday in th
e northern border state of Sonora that killed three mothers and six children. The attack on members of breakaway Mormon communities who settled in Mexico decades ago prompted U.S. Pres
ident Donald Trump to urge Mexico and the United States to "wage war" together on drug cartels.

**Links**:

```
[1]: https://news.yahoo.com/under-armed-escort-mourner-conv... (link)
[2]: http://l.yimg.com/uu/api/res/1.2/rRx_J3xHKYzIQ4EsiCPRT...
```

```
positional arguments:
source RSS URL

optional arguments:
-h, --help show this help message and exit
--version Print version info
--json Print result as JSON in stdout
--verbose Outputs verbose status messages
--limit LIMIT Limit news topics if this parameter provided
```

## Installation

The recommended way to install rss-reader is with pip:


```
pip install rssreaderih
```
Comment thread
HenadziStantchik marked this conversation as resolved.

or from source distribution:

```
python setup.py install
```

## Data caching

I wrote a program that creates a database for convenient storage of news. The **postgresql** database is perfectly suited for this. Pictures are also stored in the database in binary format.

## Converting

To convert data to html format, I used the **dominate** library.

Example:
```
html_document = dominate.document(title="HTML document")

with html_document:
with div():
h2("Title: " + news_title)
p("Link: " + news_link)
p("Description: " + news_description)
```

To convert data to pdf format from html document, I used the **xhtml2pdf** library.

Example:
```
from xhtml2pdf import pisa

pdf_file = pisa.CreatePDF(sourceHtmlFile)
```

## Deploying

The application has a **dockerfile** for creating an application image. And **docker-compose.yml** file for linking application and database images.

To deploy application use this command:
```
docker-compose up
```

If you made changes to the application then use command:
```
docker-compose up --build
```
3 changes: 3 additions & 0 deletions final_task/rss_reader/.dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
__pycache__/
.vscode/
client/__pycache__/
1 change: 1 addition & 0 deletions final_task/rss_reader/.env.dev
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
FLASK_ENV=development
File renamed without changes.
50 changes: 50 additions & 0 deletions final_task/rss_reader/app.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
from flask import Flask, request, Response, send_from_directory
import os
import sys

from . import collect_news, version, get_cache, logg
Comment thread
HenadziStantchik marked this conversation as resolved.


app = Flask(__name__)


@app.route('/print/', methods=['GET', 'POST'])
def getNews():
req = request.get_json()
news = collect_news.collectNews(req['limit'], req['tojson'], req['tohtml'], req['topdf'], req['color'], req['url'])
return sendResponse(req, news)


@app.route('/getcache/')
def getCacheNews():
req = request.get_json()
if(req['tohtml'] or req['topdf']):
news = get_cache.createHtmlFromDB(req['limit'], req['tohtml'], req['topdf'], req['date'])
else:
news = get_cache.collectNewsFromDB(req['limit'], req['tojson'], req['color'], req['date'])
return sendResponse(req, news)


def sendResponse(req, news):
if(req['topdf']):
try:
return send_from_directory(req['topdf'], filename=news, as_attachment=True)
except FileNotFoundError:
abort(404)
else:
return Response(news)


@app.route('/version/', methods=['GET', 'POST'])
def getVersion():
req = request.get_json()
return version.VERSION


@app.route('/verbose/', methods=['GET', 'POST'])
def setLogging():
logg.makeVerbose()


if __name__ == '__main__':
app.run()
Empty file.
19 changes: 19 additions & 0 deletions final_task/rss_reader/client/arg_parser.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
import argparse


def createArgparser(vers):
'''Add argument commands'''
arguments = argparse.ArgumentParser(description='Pure Python command-line RSS reader')

arguments.add_argument('source', type=str, nargs='?', help='RSS URL')
arguments.add_argument('--version', action='version', version=f'{vers}',
help='Print version info')
arguments.add_argument('--json', action='store_true', help='Print result as JSON in stdout')
arguments.add_argument('--verbose', action='store_true', help='Outputs verbose')
arguments.add_argument('--limit', action='store', type=int, help='Limit news topics')
arguments.add_argument('--date', action='store', help='Print news from the specified day')
arguments.add_argument('--tohtml', action='store', help='Convert news in html format')
arguments.add_argument('--topdf', action='store', help='Convert news in pdf format')
arguments.add_argument('--colorize', action='store_true', help='print news in colorized mode')

return arguments.parse_args()
19 changes: 19 additions & 0 deletions final_task/rss_reader/client/logg.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
import logging
import sys


# Set basic configs for logging
stdoutHandler = logging.StreamHandler(sys.stdout)
fileHandler = logging.FileHandler("parser.log", "a", encoding="utf-8")
logging.basicConfig(format=u'%(levelname)-8s [%(asctime)s] %(message)s',
level=logging.DEBUG,
handlers=[fileHandler])


def makeVerbose():
'''
1. print logs in stdout if there is --verbose argument
'''
stderrLogger = logging.StreamHandler()
stderrLogger.setFormatter(logging.Formatter(logging.BASIC_FORMAT))
logging.getLogger().addHandler(stderrLogger)
112 changes: 112 additions & 0 deletions final_task/rss_reader/client/rss_reader.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
from urllib.request import Request, urlopen
from datetime import datetime
from colored import stylize
import arg_parser
import feedparser
import requests
import colored
import html
import sys
import os

import logg


def main(version):

args = arg_parser.createArgparser(version)
params = dict()

if (args.verbose):
logg.makeVerbose()
requests.get('http://127.0.0.1:5000/verbose/')

if(args.colorize):
color = [colored.fg(150), colored.fg(50), colored.fg(189)]
else:
color = [colored.attr('reset'), colored.attr('reset'), colored.attr('reset')]

params = {'limit':args.limit, 'tojson': args.json,
'tohtml':args.tohtml, 'topdf':args.topdf, 'color':color}

if (args.date):
params['date'] = args.date
r = requests.get('http://127.0.0.1:5000/getcache/', json=params)
news = r.text

if(args.tohtml):
saveHTML(news, args.tohtml)
elif(args.topdf):
pdf_document = bytes(news, 'utf-8')
savePDF(pdf_document, args.topdf)
else:
print(news)
else:
try:
checkConnection(args.source)
params['url'] = args.source
r = requests.get('http://127.0.0.1:5000/print/', json=params)
news = r.text

if(args.tohtml):
saveHTML(news, args.tohtml)
elif(args.topdf):
pdf_document = bytes(news, 'utf-8')
savePDF(pdf_document, args.topdf)
else:
print(news)
except Exception as e:
logg.logging.error("Connection error" + str(e))
print("Connection error: ", e)



def saveHTML(html_document, html_path):
'''
1. create folder with html file
2. write html structure in file
'''
if not os.path.exists(html_path):
os.makedirs(html_path)
time_name = datetime.strftime(datetime.now(), "%H%M%S")
file_name = 'NewsFeed' + '-' + time_name + '.html'
html_file = os.path.join(html_path, file_name)

with open(html_file, 'w', encoding='utf-8') as f:
f.write(str(html_document))


def savePDF(doc, pdf_path):
'''
1. create folder with pdf file
2. write pdf in file
'''
if not os.path.exists(pdf_path):
os.makedirs(pdf_path)
time_name = datetime.strftime(datetime.now(), "%H%M%S")
file_name = 'NewsFeed' + '-' + time_name + '.pdf'
pdf_file = os.path.join(pdf_path, file_name)

with open(pdf_file, "w+b") as resultFile:
resultFile.write(doc)


def checkConnection(source):
'''Check connection to server'''
try:
source = Request(source)
response = urlopen(source)
except Exception as e:
raise Exception(e)
else:
logg.logging.info('Website is working')


if __name__ == "__main__":
# Check connection to server
try:
version = (requests.get('http://127.0.0.1:5000/version/')).text
main(version)
except requests.exceptions.ConnectionError as error:
print("ConnectionError: " + str(error))
logg.logging.error("ConnectionError: " + str(error))
51 changes: 51 additions & 0 deletions final_task/rss_reader/collect_news.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
import feedparser
import html

from . import logg, converter, news_parser


def collectNews(limit, tojson, tohtml, topdf, color, source):
'''
1. cache news
2. create html or pdf document
3. or return news in json or normal format
'''
news = list()

channel = feedparser.parse(source)
news.append(color[0] + "Feed: " + channel.feed.title + '\n')
limit = limit or len(channel.entries)

news_parser.cacheNews(channel)

if (tohtml or topdf):
html_doc = converter.createHtmlStructure(channel, limit, tohtml, topdf)
return html_doc
else:
for index, item in enumerate(channel.entries):
if (index == limit):
break
Comment on lines +26 to +27

@HenadziStantchik HenadziStantchik Nov 26, 2019

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is better to use slices instead of comparing index with limit on each iteration

Same for other occurrences.


if(index%2==0):
news.append(color[1])
else:
news.append(color[2])

logg.createLogs(item)

if (tojson):
news.append(news_parser.intoJson(item))
else:
news.append("\nTitle: " + html.unescape(item.title))
news.append("\nDate: " + item.published)
news.append("\nLink: " + item.link + '\n')
description = news_parser.getDescription(item.description)
if(description):
news.append(color[0] + "Description: " + description + '\n')
news.append(color[1] + "Links:" + "\n[1]: " + item.link + "(link)")
media_content = news_parser.checkMediaContent(item)
if(media_content):
news.append("\n[2]: " + media_content + '\n')

return news

Loading