epam-python-courses-7-bsu · ElizabethUniverse · Nov 10, 2019 · Nov 10, 2019 · Nov 17, 2019 · Nov 17, 2019
diff --git a/final_task/README.md b/final_task/README.md
@@ -1,3 +1,70 @@
-# Your readme here
-Some text.
-Checkout how to write this file using *markdown*.
+## Iteration 1
+RSS reader is a command utility, which receives RSS URL and prints the result in convenient output format
+
+Input data has the following interface:
+
+`rss_reader.py source [-h] [--version] [--verbose] [--json] [--limit LIMIT]`
+````
+positional arguments:
+source - URL which provides a RSS feed
+optional arguments:
+-h - prints this help page
+--version - prints in stdout current version
+--verbose - prints all logs in stdout
+--json - prints news in JSON format
+--limit LIMIT - limits the amount of news entries in the output 
+````
+JSON structure:
+```
+[
+	{
+		"title":   "A black man was put in handcuffs after a police officer stopped him on a trainplatform because he was eating",
+		"article": "Bay Area Rapid Transit police said Steve Foster, of Concord, California,violated state law by eating a sandwich on a BART station's platform.  ",
+		"links": [
+			"https://news.yahoo.com/black-man-put-handcuffs-police-170516695.html",
+			"http://l.yimg.com/uu/api/res/1.2/iLcp4eQPeHI64PZ9LpeQcw--/YXBwaWQ9eXRhY2h5b247aD04Njt3PTEzMDs-/https://media.zenfs.com/en-US/insider_articles_922/e4254e78d7432dae4387d72624ee3086"
+		],
+		"link": "https://news.yahoo.com/black-man-put-handcuffs-police-170516695.html",
+		"date": "Mon, 11 Nov 2019 17:06:55 -0500"
+	},
+	{
+		...
+	},
+	...
+]
+```
+
+## Iteration 2
+to run rss parser on your computer you need to:
+1) clone repository from https://github.com/ElizabethUniverse/FinalTaskRssParser
+2) `$cd final_task`
+3)  `$python setup.py sdist bdist_wheel`
+4)  `$cd dist`
+3) `$pip install rss_reader-1.1.tar.gz`
+4) run `$rss_reader https://news.yahoo.com/rss --limit 2 --verbose`
+
+
+## Iteration 3
+News is stored in the csv cache in following format and with tab delimiter.
+
+`date    title    link   article   list_links`
+
+Now we are searching for the news in the cache with O(n) complexity. But in the near future we plan to optimize this process.
+
+If you want to receive news for the 15/11/2019, please enter the following command in the command line
+
+`$python rss_reader.py https://news.yahoo.com/rss --date 20191115`
+
+--date argument works without internet connection and with --verbose, --json, --limit LIMIT arguments the same way.
+
+## Iteration 4 
+
+News can be converted to pdf or html.
+
+If you want to convert news to pdf:
+
+`$python rss_reader.py https://news.yahoo.com/rss --to-pdf path`
+
+to html:
+
+`$python rss_reader.py https://news.yahoo.com/rss --to-html path`
diff --git a/final_task/rss_reader.egg-info/PKG-INFO b/final_task/rss_reader.egg-info/PKG-INFO
@@ -0,0 +1,80 @@
+Metadata-Version: 1.2
+Name: rss-reader
+Version: 1.4
+Summary: RSS parser
+Home-page: https://github.com/ElizabethUniverse/FinalTaskRssParser
+Author: Elizaveta Lapunova
+Author-email: liza.lapunova99@gmail.com
+License: BSD
+Description: ## Iteration 1
+        RSS reader is a command utility, which receives RSS URL and prints the result in convenient output format
+
+        Input data has the following interface:
+
+        `rss_reader.py source [-h] [--version] [--verbose] [--json] [--limit LIMIT]`
+        ````
+        positional arguments:
+        source - URL which provides a RSS feed
+        optional arguments:
+        -h - prints this help page
+        --version - prints in stdout current version
+        --verbose - prints all logs in stdout
+        --json - prints news in JSON format
+        --limit LIMIT - limits the amount of news entries in the output 
+        ````
+        JSON structure:
+        ```
+        [
+        	{
+        		"title":   "A black man was put in handcuffs after a police officer stopped him on a trainplatform because he was eating",
+        		"article": "Bay Area Rapid Transit police said Steve Foster, of Concord, California,violated state law by eating a sandwich on a BART station's platform.  ",
+        		"links": [
+        			"https://news.yahoo.com/black-man-put-handcuffs-police-170516695.html",
+        			"http://l.yimg.com/uu/api/res/1.2/iLcp4eQPeHI64PZ9LpeQcw--/YXBwaWQ9eXRhY2h5b247aD04Njt3PTEzMDs-/https://media.zenfs.com/en-US/insider_articles_922/e4254e78d7432dae4387d72624ee3086"
+        		],
+        		"link": "https://news.yahoo.com/black-man-put-handcuffs-police-170516695.html",
+        		"date": "Mon, 11 Nov 2019 17:06:55 -0500"
+        	},
+        	{
+        		...
+        	},
+        	...
+        ]
+        ```
+
+        ## Iteration 2
+        to run rss parser on your computer you need to:
+        1) clone repository from https://github.com/ElizabethUniverse/FinalTaskRssParser
+        2) `$cd final_task`
+        3)  `$python setup.py sdist bdist_wheel`
+        4)  `$cd dist`
+        3) `$pip install rss_reader-1.1.tar.gz`
+        4) run `$rss_reader https://news.yahoo.com/rss --limit 2 --verbose`
+
+
+        ## Iteration 3
+        News is stored in the csv cache in following format and with tab delimiter.
+
+        `date    title    link   article   list_links`
+
+        Now we are searching for the news in the cache with O(n) complexity. But in the near future we plan to optimize this process.
+
+        If you want to receive news for the 15/11/2019, please enter the following command in the command line
+
+        `$python rss_reader.py https://news.yahoo.com/rss --date 20191115`
+
+        --date argument works without internet connection and with --verbose, --json, --limit LIMIT arguments the same way.
+
+        ##Iteration 4 
+
+        News can be converted to pdf or html.
+
+        If you want to convert news to pdf:
+
+        `$python rss_reader.py https://news.yahoo.com/rss --to-pdf path`
+
+        to html:
+
+        `$python rss_reader.py https://news.yahoo.com/rss --to-html path`
+Platform: any
+Requires-Python: >=3.7.0
diff --git a/final_task/rss_reader.egg-info/SOURCES.txt b/final_task/rss_reader.egg-info/SOURCES.txt
@@ -0,0 +1,19 @@
+README.md
+setup.py
+rss_reader/CSVEntities.py
+rss_reader/ClassNews.py
+rss_reader/ToHTML.py
+rss_reader/ToPDF.py
+rss_reader/__init__.py
+rss_reader/__main__.py
+rss_reader/requirements.txt
+rss_reader/rss_reader.py
+rss_reader.egg-info/PKG-INFO
+rss_reader.egg-info/SOURCES.txt
+rss_reader.egg-info/dependency_links.txt
+rss_reader.egg-info/entry_points.txt
+rss_reader.egg-info/not-zip-safe
+rss_reader.egg-info/requires.txt
+rss_reader.egg-info/top_level.txt
+test/RssUnitTest.py
+test/__init__.py
diff --git a/final_task/rss_reader.egg-info/dependency_links.txt b/final_task/rss_reader.egg-info/dependency_links.txt
@@ -0,0 +1 @@
+
diff --git a/final_task/rss_reader.egg-info/entry_points.txt b/final_task/rss_reader.egg-info/entry_points.txt
@@ -0,0 +1,3 @@
+[console_scripts]
+rss_reader=rss_reader.rss_reader:main
+
diff --git a/final_task/rss_reader.egg-info/not-zip-safe b/final_task/rss_reader.egg-info/not-zip-safe
@@ -0,0 +1 @@
+
diff --git a/final_task/rss_reader.egg-info/requires.txt b/final_task/rss_reader.egg-info/requires.txt
@@ -0,0 +1,4 @@
+html2text==2019.9.26
+python-dateutil==2.8.0
+jinja2==2.10.1
+fpdf==1.7.2
diff --git a/final_task/rss_reader.egg-info/top_level.txt b/final_task/rss_reader.egg-info/top_level.txt
@@ -0,0 +1,2 @@
+rss_reader
+test
diff --git a/final_task/rss_reader/CSVEntities.py b/final_task/rss_reader/CSVEntities.py
@@ -0,0 +1,56 @@
+import csv
+from datetime import date
+from dateutil.parser import parse
+from dataclasses import dataclass, asdict
+import os
+
+import ClassNews
+
+FIELDNAMES = ['date', 'title', 'link', 'article', 'links']
+
+
+def csv_to_python(articles_list, csv_file):
+    """This function inserts news to the source csv file that has never been seen in it."""
+    if not os.path.exists(csv_file):
+        open(csv_file, 'x', encoding='utf-8').close()
+
+    articles_list_from_csv = []
+    with open(csv_file, "r", encoding='utf-8') as file:
+        reader = csv.DictReader(file, FIELDNAMES, delimiter='\t')
+        for item in reader:
+            r = ClassNews.Article(**item)
+            articles_list_from_csv.append(r)
+
+    union_list = articles_list_from_csv[:]
+    for item in articles_list:
+        if item not in articles_list_from_csv:
+            union_list.append(item)
+
+    with open(csv_file, "w", encoding='utf-8') as file:
+        writer = csv.DictWriter(file, fieldnames=FIELDNAMES, delimiter='\t')
+        for item in union_list:
+            writer.writerow(asdict(item))
+        return True
+    return False
+
+def return_news_to_date(input_date, csv_file, limit):
+    """This function read from the file those news that match by date"""
+    article_list_by_date = []
+    datetime_input = date(int(input_date[0:4]), int(input_date[4:6]), int(input_date[6:8]))
+    with open(csv_file, "r", encoding='utf-8') as file:
+        reader = csv.DictReader(file, FIELDNAMES, delimiter='\t')
+        match_counter = 0
+        for item in reader:
+            article_from_file = ClassNews.Article(**item)
+
+            date_time = parse(article_from_file.date)
+            date_from_file = date_time.date()
+
+            if date_from_file == datetime_input:
+                match_counter += 1
+                article_list_by_date.append(article_from_file)
+
+            if limit == match_counter:
+                return article_list_by_date
+
+    return article_list_by_date
diff --git a/final_task/rss_reader/ClassNews.py b/final_task/rss_reader/ClassNews.py
@@ -0,0 +1,77 @@
+import re
+import html2text
+from dataclasses import dataclass
+
+
+LINKS_TEMPLATE = '\"((http|https)://(\w|.)+?)\"'
+
+
+def xml_arguments_for_class(xml_string, limit):
+    """This function receive the xml and limit of articles and returns list of dictionaries"""
+    dict_article_list = []
+    text = html2text.HTML2Text()
+    text.ignore_images = True
+    text.ignore_links = True
+    text.ignore_emphasis = True
+    for counter, xml_news in enumerate(xml_string.iter('item')):
+        parser_dictionary = {}
+        for xml_news_item in xml_news:
+            # Here we create the article in the form of a dictionary
+            if xml_news_item.tag == 'title':
+                parser_dictionary['title'] = text.handle(xml_news_item.text).replace('\n', "")
+
+            if xml_news_item.tag == 'pubDate':
+                parser_dictionary['date'] = xml_news_item.text
+
+            if xml_news_item.tag == 'link':
+                parser_dictionary['link'] = xml_news_item.text
+
+            if xml_news_item.tag == 'description':
+                parser_dictionary['article'] = text.handle(xml_news_item.text).replace('\n', '')
+                parser_dictionary['links'] = xml_news_item.text.replace('\n', '')
+
+        dict_article_list.append(parser_dictionary)
+
+        if limit == counter + 1:
+            return dict_article_list
+    return dict_article_list
+
+def dicts_to_articles(dict_list):
+    """This function receive list of dictionaries and convert it to list of articles """
+    article_list = []
+    for item in dict_list:
+        article_list.append(Article(**item))
+    return article_list
+
+def html_text_to_list_links(html_links):
+    html_links = html_links.replace("\'", "\"")
+    list_links = []
+    for group1 in re.finditer(LINKS_TEMPLATE, html_links):
+        list_links.append(group1.group(1))
+    return list_links
+
+@dataclass
+class Article:
+    """This is news class, which receives dictionary and have title, date, link, article and links keys fields"""
+    title: str
+    date: str
+    link: str
+    article: str
+    links: str
+
+    def __post_init__(self):
+        self.links = html_text_to_list_links(self.links)
+
+    def __str__(self):
+        result_string_article = "\nTitle: %s\nDate: %s\nLink: %s\n\n%s\n\n" % (self.title, self.date, self.link,
+                                                                                  self.article)
+        for link_idx, link in enumerate(self.links):
+            result_string_article += "[%d]: %s\n" % (link_idx + 1, link)
+        result_string_article += '\n'
+        return result_string_article
+
+    def __eq__(self, other):
+        if self.article == other.article and self.title == other.title and self.link == other.link and \
+                self.date == other.date:
+            return True
+        return False
diff --git a/final_task/rss_reader/ToHTML.py b/final_task/rss_reader/ToHTML.py
@@ -0,0 +1,19 @@
+from jinja2 import Environment, FileSystemLoader
+import os
+
+FILENAME_HTML = "articles.html"
+
+
+def print_article_list_to_html(list_articles, path):
+    if not os.path.exists(path):
+        raise FileNotFoundError
+    html_stream = print_article_list(list_articles)
+    with open(os.path.join(path, FILENAME_HTML), "w", encoding='utf-8') as html:
+        html.write(html_stream)
+
+
+def print_article_list(list_articles):
+    # directory with templates
+    env = Environment(loader=FileSystemLoader('.'))
+    template = env.get_template('template.html')
+    return template.render(articles=list_articles)
diff --git a/final_task/rss_reader/ToPDF.py b/final_task/rss_reader/ToPDF.py
@@ -0,0 +1,44 @@
+import os
+from fpdf import FPDF
+
+FILENAME_PDF = "articles.pdf"
+
+
+def conv_str(input_str):
+    return (input_str.replace('\u2026', '').replace('\u2019', '').replace('\u201c', '').replace('\u201d', '')\
+        .replace('\u2013', '').replace('\u2018', ''))
+
+
+class PDF(FPDF):
+
+    # Page footer
+    def footer(self):
+        # Position at 1.5 cm from bottom
+        self.set_y(-15)
+        # Arial italic 8
+        self.set_font('Arial', 'I', 8)
+        # Page number
+        self.cell(0, 10, 'Page ' + str(self.page_no()) + '/{nb}', 0, 0, 'C')
+
+
+def print_article_list_to_pdf(list_articles, path):
+
+    if not os.path.exists(path):
+             raise FileNotFoundError
+    path = os.path.join(path, FILENAME_PDF)
+
+    pdf = PDF()
+    pdf.alias_nb_pages()
+    pdf.add_page()
+    pdf.set_font('Arial', '', 12)
+
+    for item in list_articles:
+        pdf.cell(0, 10, "Title: %s" % (conv_str(item.title)), 0, 1)
+        pdf.cell(0, 10, "Date: %s" % (conv_str(item.date)), 0, 1)
+        pdf.cell(0, 10, "Link: %s" % (conv_str(item.link)), 0, 1)
+        pdf.multi_cell(0, 10, '%s' % (conv_str(item.article)), 0, 1)
+        for idx, link in enumerate(item.links):
+            pdf.multi_cell(0, 10, "[%d]:%s" % (idx, (conv_str(link))), 0, 1)
+        pdf.cell(0, 10, "", 0, 1)
+    pdf.output(path, 'F')
+    return True
diff --git a/final_task/rss_reader/__init__.py b/final_task/rss_reader/__init__.py
@@ -0,0 +1 @@
+
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,3 @@
		[console_scripts]
		rss_reader=rss_reader.rss_reader:main