Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
22b4987
Iteration 1
IlyaTorch Nov 5, 2019
dd8f4a2
Iteration 1
IlyaTorch Nov 5, 2019
e00637a
Iteration 1
IlyaTorch Nov 5, 2019
4d2524e
Iteration 1
IlyaTorch Nov 5, 2019
7facd9f
Iteration 1
IlyaTorch Nov 6, 2019
ed70243
Iteration 1
IlyaTorch Nov 6, 2019
c040344
Iteration 1
IlyaTorch Nov 6, 2019
7b6e763
Simplify the method parse_html
IlyaTorch Nov 7, 2019
dbecad7
Iteration 1
IlyaTorch Nov 8, 2019
266d868
Iteration 2
IlyaTorch Nov 8, 2019
3660a72
Iteration 2
IlyaTorch Nov 8, 2019
c51b5f7
Iteration 2
IlyaTorch Nov 8, 2019
17dc6b0
Iteration 2
IlyaTorch Nov 8, 2019
802f757
Iteration 2
IlyaTorch Nov 9, 2019
5fd4b38
Iteration 2
IlyaTorch Nov 9, 2019
fa2827a
Iteration 2
IlyaTorch Nov 9, 2019
26687c9
Iteration 2
IlyaTorch Nov 10, 2019
87b9c67
Iteration 3
IlyaTorch Nov 13, 2019
10e5685
Iteration 3
IlyaTorch Nov 18, 2019
02bdfe9
Iteration 4
IlyaTorch Nov 18, 2019
36239ab
Iteration 4
IlyaTorch Nov 18, 2019
cf9b602
Iteration 4
IlyaTorch Nov 19, 2019
777bb7e
Iteration 4
IlyaTorch Nov 19, 2019
5aa8697
Iteration 4
IlyaTorch Nov 21, 2019
d975822
Iteration 4
IlyaTorch Nov 21, 2019
9657f9d
Iteration 4
IlyaTorch Nov 21, 2019
da6074f
Iteration 4
IlyaTorch Nov 21, 2019
d606b35
Merge branch 'master' of https://github.com/IlyaTorch/FinalTaskRssParser
IlyaTorch Nov 21, 2019
01cc3e1
Iteration 4
IlyaTorch Nov 22, 2019
baf04ad
Iteration 4
IlyaTorch Nov 22, 2019
59bcd55
Iteration 4
IlyaTorch Nov 22, 2019
3c9cbf7
Iteration 4
IlyaTorch Nov 23, 2019
e71a8c6
Iteration 4
IlyaTorch Nov 23, 2019
0a375a8
Iteration 4
IlyaTorch Nov 23, 2019
763216c
Iteration 4
IlyaTorch Nov 24, 2019
3079c0e
Iteration 4
IlyaTorch Nov 24, 2019
33efade
Merge branch 'master' of https://github.com/IlyaTorch/FinalTaskRssParser
IlyaTorch Nov 24, 2019
7bb3514
Iteration 4
IlyaTorch Nov 24, 2019
179f3dd
Iteration 4
IlyaTorch Nov 25, 2019
e520a39
Iteration 4
IlyaTorch Nov 25, 2019
c472866
Iteration 4
IlyaTorch Nov 25, 2019
d41073c
Iteration 4
IlyaTorch Nov 25, 2019
541618b
add fonts
IlyaTorch Nov 25, 2019
b752259
add fonts
IlyaTorch Nov 25, 2019
4926564
Iteration 4
IlyaTorch Jan 2, 2020
0a3d262
Iteration 4
IlyaTorch Jan 2, 2020
82ef291
Iteraton 4
IlyaTorch Jan 2, 2020
15f02fd
Iteration 5
IlyaTorch Jan 3, 2020
6e6cf4e
Update README.md
IlyaTorch Jan 12, 2020
e64005a
Add some changes to structure of classes according to the mechanism o…
IlyaTorch Jan 19, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
196 changes: 193 additions & 3 deletions final_task/README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,193 @@
# Your readme here
Some text.
Checkout how to write this file using *markdown*.
# Python RSS-reader
Python RSS-reader is a command-line utility which receives RSS URL and prints results in human-readable format.

REQUIREMENTS:
-- feedparser 5.2.1
-- fpdf 1.7.2
-- dominate 2.4.0

5 mains files of project:
* rss_reader.py - the file which runs the application
* ConsoleParse.py - contains code which parses arguments from console
* Entry.py - contains class Entry which represent an article
* Handler.py - contains class Handler which performes functions of processing objects Entry
* Logging.py - contains decorator for printing loggs in stdout
Comment on lines +9 to +14

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if you kept it up to date.

*
To start Python RSS-reader run one of the following commands
in command line:
```shell
$ python rss_reader.py "https://news.yahoo.com/rss/" --limit 1
```
```shell
$ python rss_reader.py "https://timesofindia.indiatimes.com/rssfeedstopstories.cms" --json --limit 1
```

Structure of output when `--json` is selected:
```
{
"Feed": "Yahoo News - Latest News & Headlines",
"Title": "PHOTOS: #MenToo: The hidden tragedy of male sexual abuse in the military",
"DateInt": "20200102",
"Date": "Tue, 1 Dec 2019 ",
"Link": "https://news.yahoo.com/photos-men-too-the-hidden-tragedy-of-male-sexual-abuse-in-the-military-005342483.html",
"Summary": "[image 1: PHOTOS: #MenToo: The hidden tragedy of male sexual abuse in the military][1] Award-winning photojournalist Mary F. Calvert has spent six years documenting the prevalence of rape in the military and the effects on victims. She began with a focus on female victims but more recently has examined the underreported incidence of sexual assaults on men and the lifelong trauma it can inflict.",
"Links": [
"https://news.yahoo.com/photos-men-too-the-hidden-tragedy-of-male-sexual-abuse-in-the-military-005342483.html",
"http://l1.yimg.com/uu/api/res/1.2/LR4Vdg0MD6osVIDtZW75aA--/YXBwaWQ9eXRhY2h5b247aD04Njt3PTEzMDs-/https://media-mbst-pub-ue1.s3.amazonaws.com/creatr-uploaded-images/2019-12/316fa7e0-2c23-11ea-bed7-1ebe74b8c372"
],
"Source": "https://news.yahoo.com/rss/"
}
```
## Iteration 2
If you have installed Python then to export CLI utility rss-reader follow these steps:
1. Clone this repository
```
$ git clone https://github.com/IlyaTorch/FinalTaskRssParser.git
```
2. Go to the directory FinalTaskRssParser\final_task
3. run ```$ python setup.py sdist```
4. Go to the directory dist
```
$ cd dist
```
5. Install CLI utility rss-reader:
```
$ pip install rss-reader-4.0.tar.gz
```
And we can use CLI utility rss-reader:
```
rss-reader "https://news.yahoo.com/rss/" --limit 1
```
```
Feed: Yahoo News - Latest News & Headlines

Title: Graham now says Trump's Ukraine policy was too 'incoherent' for quid pro quo
Date: Wed, 06 Nov 2019 14:22:10 -0500
Link: https://news.yahoo.com/graham-trump-ukraine-incoherent-quid-pro-quo-192210175.html


[image 1: Graham now says Trump's Ukraine policy was too 'incoherent' for quid pro quo][1] A day after saying he wouldn’t bother to read the testimony, Sen. Lindsey Graham now says he did read it, and his conclusion is that the Trump administration’s Ukraine policy was too "incoherent" for it to have orchestrated the quid pro quo at the heart of the impeachment inquiry.


Links:
[0] https://news.yahoo.com/graham-trump-ukraine-incoherent-quid-pro-quo-192210175.html (link)
[1] http://l2.yimg.com/uu/api/res/1.2/aWhGys7_IW5qIjKaiJpPfg--/YXBwaWQ9eXRhY2h5b247aD04Njt3PTEzMDs-/https://media-mbst-pub-ue1.s3.amazonaws.com/creatr-uploaded-images/2019-11/5527ffe0-00ca-11ea-9f7d-d1e736c1315d (image)
```
If you don't have installed Python, follow these steps:
1. Download and install python from https://www.python.org/downloads/
2. Clone this repository
```
$ git clone https://github.com/IlyaTorch/FinalTaskRssParser.git
```
2. Go to the directory FinalTaskRssParser\final_task
3. run ```$ python setup.py sdist```
4. Go to the directory dist
```
$ cd dist
```
5. Install CLI utility rss-reader:
```
$ pip install rss-reader-1.0.tar.gz
```
And we can use it:
```
rss-reader "https://news.yahoo.com/rss/" --limit 1
```
```
Feed: Yahoo News - Latest News & Headlines

Title: Graham now says Trump's Ukraine policy was too 'incoherent' for quid pro quo
Date: Wed, 06 Nov 2019 14:22:10 -0500
Link: https://news.yahoo.com/graham-trump-ukraine-incoherent-quid-pro-quo-192210175.html


[image 1: Graham now says Trump's Ukraine policy was too 'incoherent' for quid pro quo][1] A day after saying he wouldn’t bother to read the testimony, Sen. Lindsey Graham now says he did read it, and his conclusion is that the Trump administration’s Ukraine policy was too "incoherent" for it to have orchestrated the quid pro quo at the heart of the impeachment inquiry.


Links:
[0] https://news.yahoo.com/graham-trump-ukraine-incoherent-quid-pro-quo-192210175.html (link)
[1] http://l2.yimg.com/uu/api/res/1.2/aWhGys7_IW5qIjKaiJpPfg--/YXBwaWQ9eXRhY2h5b247aD04Njt3PTEzMDs-/https://media-mbst-pub-ue1.s3.amazonaws.com/creatr-uploaded-images/2019-11/5527ffe0-00ca-11ea-9f7d-d1e736c1315d (image)
```
## Iteration 3
News is stored in local file cache.json as list of json objects.
App rss-reader can accept optional argument --date
```
$ python rss_reader.py "https://news.tut.by/rss/" --date 20200102
```
```
Feed: TUT.BY: Новости ТУТ

Title: Кристин Килер, любовница британского министра и советского шпиона: кем она была на самом деле?
Date: Fri, 2 Jan 2020
Link: https://news.tut.by/culture/667279.html?utm_campaign=news-feed&utm_medium=rss&utm_source=rss-news

[image 2: Фото: bbc.com][2] Кристин Килер было всего 19, когда она оказалась в центре секс-скандала, приведшего к отставке британского кабинета министров. Ее выставили злодейкой, и затем всю оставшуюся жизнь эта история преследовала ее. Впервые ее трактовка событий была воплощена в сериале, созданном Би-би-си.

Links:
[0] https://news.tut.by/culture/667279.html?utm_campaign=news-feed&utm_medium=rss&utm_source=rss-news (link)
[1] https://img.tyt.by/n/kultura/0c/9/kristin_killer3.jpg (image)
[2] https://img.tyt.by/thumbnails/n/kultura/0c/9/kristin_killer3.jpg (image)
```
```
$ python rss_reader.py --date 20200102
```
```
Feed: Yahoo News - Latest News & Headlines

Title: PHOTOS: #MenToo: The hidden tragedy of male sexual abuse in the military
Date: Tue, 1 Dec 2019
Link: https://news.yahoo.com/photos-men-too-the-hidden-tragedy-of-male-sexual-abuse-in-the-military-005342483.html

[image 1: PHOTOS: #MenToo: The hidden tragedy of male sexual abuse in the military][1] Award-winning photojournalist Mary F. Calvert has spent six years documenting the prevalence of rape in the military and the effects on victims. She began with a focus on female victims but more recently has examined the underreported incidence of sexual assaults on men and the lifelong trauma it can inflict.

Links:
[0] https://news.yahoo.com/photos-men-too-the-hidden-tragedy-of-male-sexual-abuse-in-the-military-005342483.html (link)
[1] http://l1.yimg.com/uu/api/res/1.2/LR4Vdg0MD6osVIDtZW75aA--/YXBwaWQ9eXRhY2h5b247aD04Njt3PTEzMDs-/https://media-mbst-pub-ue1.s3.amazonaws.com/creatr-uploaded-images/2019-12/316fa7e0-2c23-11ea-bed7-1ebe74b8c372 (image)


Feed: TUT.BY: Новости ТУТ

Title: Кристин Килер, любовница британского министра и советского шпиона: кем она была на самом деле?
Date: Fri, 2 Jan 2020
Link: https://news.tut.by/culture/667279.html?utm_campaign=news-feed&utm_medium=rss&utm_source=rss-news

[image 2: Фото: bbc.com][2] Кристин Килер было всего 19, когда она оказалась в центре секс-скандала, приведшего к отставке британского кабинета министров. Ее выставили злодейкой, и затем всю оставшуюся жизнь эта история преследовала ее. Впервые ее трактовка событий была воплощена в сериале, созданном Би-би-си.

Links:
[0] https://news.tut.by/culture/667279.html?utm_campaign=news-feed&utm_medium=rss&utm_source=rss-news (link)
[1] https://img.tyt.by/n/kultura/0c/9/kristin_killer3.jpg (image)
[2] https://img.tyt.by/thumbnails/n/kultura/0c/9/kristin_killer3.jpg (image)
```
Argument --date work with all the other arguments
```
$ python rss_reader.py --date 20191113 --json --verbose
```
## Iteration 4
Option of conversation of news in htmlf format is available.
Example:
```
$ python rss_reader.py "https://news.yahoo.com/rss/" --to-html "F:/Path/to/your/folder" --to-pdf "F:/Path/to/your/folder"
```
Option works with all the other attributes.
```
$ python rss_reader.py --date 20191118 --to-html "F:/Path/to/your/folder" --limit 1
```
## Iteration 5
A new optional argument `--colorize` is available. It prints the news in colorized mod.
Option works with all the other attributes execept `--to-html` and `--to-pdf` arguments.
```
$ python rss_reader.py --date 20200102 --colorize
```
```diff
+ Feed: Yahoo News - Latest News & Headlines

+ Title: PHOTOS: #MenToo: The hidden tragedy of male sexual abuse in the military
+ Date: Tue, 1 Dec 2019
+ Link: https://news.yahoo.com/photos-men-too-the-hidden-tragedy-of-male-sexual-abuse-in-the-military-005342483.html

+[image 1: PHOTOS: #MenToo: The hidden tragedy of male sexual abuse in the military][1] Award-winning photojournalist Mary F. Calvert has spent six years documenting the prevalence of rape in the military and the effects on victims. She began with a focus on female victims but more recently has examined the underreported incidence of sexual assaults on men and the lifelong trauma it can inflict.

+ Links:
+ [0] https://news.yahoo.com/photos-men-too-the-hidden-tragedy-of-male-sexual-abuse-in-the-military-005342483.html (link)
+ [1] http://l1.yimg.com/uu/api/res/1.2/LR4Vdg0MD6osVIDtZW75aA--/YXBwaWQ9eXRhY2h5b247aD04Njt3PTEzMDs-/https://media-mbst-pub-ue1.s3.amazonaws.com/creatr-uploaded-images/2019-12/316fa7e0-2c23-11ea-bed7-1ebe74b8c372 (image)
```
17 changes: 17 additions & 0 deletions final_task/rss_reader/ConsoleParse.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
import argparse


def get_arguments_from_console():
"""Reading command line arguments"""
arg_parser = argparse.ArgumentParser(description="Pure Python command-line RSS reader.")
arg_parser.add_argument("source", nargs='?', type=str, default="", help="RSS URL")
arg_parser.add_argument("--version", action="store_true", help="Print version info")
arg_parser.add_argument("--json", action="store_true", help="Print result as JSON in stdout")
arg_parser.add_argument("--verbose", action="store_true", help="Outputs verbose status messages")
arg_parser.add_argument("--limit", type=int, help="Limit news topics if this parameter provided")
arg_parser.add_argument("--to-html", type=str, help="Output to html format")
arg_parser.add_argument("--to-pdf", type=str, help="Output to pdf format")
arg_parser.add_argument("--date", type=int, help="The new from the specified day will be printed out")
arg_parser.add_argument("--colorize", action="store_true", help="Print news in colorized mode")

return arg_parser.parse_args()
Binary file added final_task/rss_reader/DejaVuSans.ttf
Binary file not shown.
120 changes: 120 additions & 0 deletions final_task/rss_reader/Entry.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
import logging
import time
from Logging import logging_decorator
import html
from html import parser


class Entry:
"""class for every article from http:link...link.rss"""
@logging_decorator
def __init__(self, feed: str = "", title: str = "", date: str = "", article_link: str = "",
summary: str = "", links: tuple = (), published_parsed: time.struct_time = ()):
self.__feed = feed
self.__title = self.parse_html(title)
self.__article_link = article_link
self.__links = links
self.__summary = self.parse_html(summary)
if published_parsed:
self.__publish_year = published_parsed.tm_year
self.__publish_month = published_parsed.tm_mon
self.__publish_day = published_parsed.tm_mday
# sometimes there is a problem when in the attribute published entries have day that is wrong
# then code below corrects it and truncates date-string
self.__date = (date[:date.find(",")+2] + str(self.__publish_day) + date[date[5:].find(' ') + 5:]
)[:len("Fri, 22 Nov 2019")]
else:
self.__date = date[:len("Fri, 22 Nov 2019")]
logging.info("Entry object created")

@logging_decorator
def get_feed(self) -> str:
return self.__feed

@logging_decorator
def get_title(self) -> str:
return self.__title

@logging_decorator
def get_article_link(self) -> str:
return self.__article_link

@logging_decorator
def get_links(self) -> tuple:
return self.__links

@logging_decorator
def get_summary(self) -> str:
return self.__summary

@logging_decorator
def get_publish_year(self):
return self.__publish_year

@logging_decorator
def get_publish_month(self):
return self.__publish_month

@logging_decorator
def get_publish_day(self):
return self.__publish_day

@logging_decorator
def get_date(self) -> str:
return self.__date

@logging_decorator
def print_feed(self) -> None:
print(f"Feed: {self.__feed}\n")

@logging_decorator
def print_title(self) -> None:
print(f"Title: {self.__title}")

@logging_decorator
def print_date(self) -> None:
print(f"Date: {self.__date}")

@logging_decorator
def print_link(self) -> None:
print(f"Link: {self.__article_link}")

@logging_decorator
def print_summary(self) -> None:
print(f"\n{self.__summary}\n")

@logging_decorator
def print_links(self) -> None:
print("Links:")
for num_link, link in enumerate(self.__links):
if num_link == 0:
print(f'[{num_link}] {link} (link)')
else:
print(f'[{num_link}] {link} (image)')
print('\n')

@logging_decorator
def parse_html(self, summary: str) -> str:
"""selects alt and src attributes from <img> and removes all the html tags from the entry.summary"""
while summary.count('<img'):
src = summary[summary.find("src=\"") + len("src=\""):
summary.find('"', summary.find("src=\"") + len("src=\""))
]
if src != "":
self.__links = list(self.__links)
self.__links.append(src)
self.__links = tuple(self.__links)
alt = summary[summary.find("alt=\"") + len("alt=\""):
summary.find('"', summary.find("alt=\"") + len("alt=\""))
]
start_cut = summary.find("<img")
summary = summary[: start_cut] \
+ f"[image {len(self.__links) - 1}: " + alt + "]" \
+ f"[{len(self.__links) - 1}] " \
+ summary[summary.find(">", start_cut + len("<img")) + 1:]

while summary.count('<'):
summary = summary[:summary.find('<')] + summary[summary.find('>') + 1:]

summary = html.parser.unescape(summary)
return summary
Loading