-
Notifications
You must be signed in to change notification settings - Fork 24
Elizaveta_Lapunova_liza.lapunova99@gmail.com #17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
ElizabethUniverse
wants to merge
11
commits into
epam-python-courses-7-bsu:master
Choose a base branch
from
ElizabethUniverse:my-branch
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
11 commits
Select commit
Hold shift + click to select a range
79e4671
iteration1_without_tests
ElizabethUniverse 1c4c6ea
iteration1_without_tests
ElizabethUniverse eee223c
fixed iteration 1 and iteration 2
ElizabethUniverse 765f0f8
fixed iteration 1 and iteration 2
ElizabethUniverse beb1e79
fixed iteration 1 and iteration 2
ElizabethUniverse 4dbda64
fixed iteration 2 and readme
ElizabethUniverse cec85b6
iteration 3, fixed readme and tests
ElizabethUniverse 4f66310
fixed iteration 3 and iteration 4
ElizabethUniverse d349efd
fixed iteration 4 and tests
ElizabethUniverse d64cfc4
final commit
ElizabethUniverse b6205e0
Update README.md
ElizabethUniverse File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,3 +1,70 @@ | ||
| # Your readme here | ||
| Some text. | ||
| Checkout how to write this file using *markdown*. | ||
| ## Iteration 1 | ||
| RSS reader is a command utility, which receives RSS URL and prints the result in convenient output format | ||
|
|
||
| Input data has the following interface: | ||
|
|
||
| `rss_reader.py source [-h] [--version] [--verbose] [--json] [--limit LIMIT]` | ||
| ```` | ||
| positional arguments: | ||
| source - URL which provides a RSS feed | ||
| optional arguments: | ||
| -h - prints this help page | ||
| --version - prints in stdout current version | ||
| --verbose - prints all logs in stdout | ||
| --json - prints news in JSON format | ||
| --limit LIMIT - limits the amount of news entries in the output | ||
| ```` | ||
| JSON structure: | ||
| ``` | ||
| [ | ||
| { | ||
| "title": "A black man was put in handcuffs after a police officer stopped him on a trainplatform because he was eating", | ||
| "article": "Bay Area Rapid Transit police said Steve Foster, of Concord, California,violated state law by eating a sandwich on a BART station's platform. ", | ||
| "links": [ | ||
| "https://news.yahoo.com/black-man-put-handcuffs-police-170516695.html", | ||
| "http://l.yimg.com/uu/api/res/1.2/iLcp4eQPeHI64PZ9LpeQcw--/YXBwaWQ9eXRhY2h5b247aD04Njt3PTEzMDs-/https://media.zenfs.com/en-US/insider_articles_922/e4254e78d7432dae4387d72624ee3086" | ||
| ], | ||
| "link": "https://news.yahoo.com/black-man-put-handcuffs-police-170516695.html", | ||
| "date": "Mon, 11 Nov 2019 17:06:55 -0500" | ||
| }, | ||
| { | ||
| ... | ||
| }, | ||
| ... | ||
| ] | ||
| ``` | ||
|
|
||
| ## Iteration 2 | ||
| to run rss parser on your computer you need to: | ||
| 1) clone repository from https://github.com/ElizabethUniverse/FinalTaskRssParser | ||
| 2) `$cd final_task` | ||
| 3) `$python setup.py sdist bdist_wheel` | ||
| 4) `$cd dist` | ||
| 3) `$pip install rss_reader-1.1.tar.gz` | ||
| 4) run `$rss_reader https://news.yahoo.com/rss --limit 2 --verbose` | ||
|
|
||
|
|
||
| ## Iteration 3 | ||
| News is stored in the csv cache in following format and with tab delimiter. | ||
|
|
||
| `date title link article list_links` | ||
|
|
||
| Now we are searching for the news in the cache with O(n) complexity. But in the near future we plan to optimize this process. | ||
|
|
||
| If you want to receive news for the 15/11/2019, please enter the following command in the command line | ||
|
|
||
| `$python rss_reader.py https://news.yahoo.com/rss --date 20191115` | ||
|
|
||
| --date argument works without internet connection and with --verbose, --json, --limit LIMIT arguments the same way. | ||
|
|
||
| ## Iteration 4 | ||
|
|
||
| News can be converted to pdf or html. | ||
|
|
||
| If you want to convert news to pdf: | ||
|
|
||
| `$python rss_reader.py https://news.yahoo.com/rss --to-pdf path` | ||
|
|
||
| to html: | ||
|
|
||
| `$python rss_reader.py https://news.yahoo.com/rss --to-html path` |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,80 @@ | ||
| Metadata-Version: 1.2 | ||
| Name: rss-reader | ||
| Version: 1.4 | ||
| Summary: RSS parser | ||
| Home-page: https://github.com/ElizabethUniverse/FinalTaskRssParser | ||
| Author: Elizaveta Lapunova | ||
| Author-email: liza.lapunova99@gmail.com | ||
| License: BSD | ||
| Description: ## Iteration 1 | ||
| RSS reader is a command utility, which receives RSS URL and prints the result in convenient output format | ||
|
|
||
| Input data has the following interface: | ||
|
|
||
| `rss_reader.py source [-h] [--version] [--verbose] [--json] [--limit LIMIT]` | ||
| ```` | ||
| positional arguments: | ||
| source - URL which provides a RSS feed | ||
| optional arguments: | ||
| -h - prints this help page | ||
| --version - prints in stdout current version | ||
| --verbose - prints all logs in stdout | ||
| --json - prints news in JSON format | ||
| --limit LIMIT - limits the amount of news entries in the output | ||
| ```` | ||
| JSON structure: | ||
| ``` | ||
| [ | ||
| { | ||
| "title": "A black man was put in handcuffs after a police officer stopped him on a trainplatform because he was eating", | ||
| "article": "Bay Area Rapid Transit police said Steve Foster, of Concord, California,violated state law by eating a sandwich on a BART station's platform. ", | ||
| "links": [ | ||
| "https://news.yahoo.com/black-man-put-handcuffs-police-170516695.html", | ||
| "http://l.yimg.com/uu/api/res/1.2/iLcp4eQPeHI64PZ9LpeQcw--/YXBwaWQ9eXRhY2h5b247aD04Njt3PTEzMDs-/https://media.zenfs.com/en-US/insider_articles_922/e4254e78d7432dae4387d72624ee3086" | ||
| ], | ||
| "link": "https://news.yahoo.com/black-man-put-handcuffs-police-170516695.html", | ||
| "date": "Mon, 11 Nov 2019 17:06:55 -0500" | ||
| }, | ||
| { | ||
| ... | ||
| }, | ||
| ... | ||
| ] | ||
| ``` | ||
|
|
||
| ## Iteration 2 | ||
| to run rss parser on your computer you need to: | ||
| 1) clone repository from https://github.com/ElizabethUniverse/FinalTaskRssParser | ||
| 2) `$cd final_task` | ||
| 3) `$python setup.py sdist bdist_wheel` | ||
| 4) `$cd dist` | ||
| 3) `$pip install rss_reader-1.1.tar.gz` | ||
| 4) run `$rss_reader https://news.yahoo.com/rss --limit 2 --verbose` | ||
|
|
||
|
|
||
| ## Iteration 3 | ||
| News is stored in the csv cache in following format and with tab delimiter. | ||
|
|
||
| `date title link article list_links` | ||
|
|
||
| Now we are searching for the news in the cache with O(n) complexity. But in the near future we plan to optimize this process. | ||
|
|
||
| If you want to receive news for the 15/11/2019, please enter the following command in the command line | ||
|
|
||
| `$python rss_reader.py https://news.yahoo.com/rss --date 20191115` | ||
|
|
||
| --date argument works without internet connection and with --verbose, --json, --limit LIMIT arguments the same way. | ||
|
|
||
| ##Iteration 4 | ||
|
|
||
| News can be converted to pdf or html. | ||
|
|
||
| If you want to convert news to pdf: | ||
|
|
||
| `$python rss_reader.py https://news.yahoo.com/rss --to-pdf path` | ||
|
|
||
| to html: | ||
|
|
||
| `$python rss_reader.py https://news.yahoo.com/rss --to-html path` | ||
| Platform: any | ||
| Requires-Python: >=3.7.0 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,19 @@ | ||
| README.md | ||
| setup.py | ||
| rss_reader/CSVEntities.py | ||
| rss_reader/ClassNews.py | ||
| rss_reader/ToHTML.py | ||
| rss_reader/ToPDF.py | ||
| rss_reader/__init__.py | ||
| rss_reader/__main__.py | ||
| rss_reader/requirements.txt | ||
| rss_reader/rss_reader.py | ||
| rss_reader.egg-info/PKG-INFO | ||
| rss_reader.egg-info/SOURCES.txt | ||
| rss_reader.egg-info/dependency_links.txt | ||
| rss_reader.egg-info/entry_points.txt | ||
| rss_reader.egg-info/not-zip-safe | ||
| rss_reader.egg-info/requires.txt | ||
| rss_reader.egg-info/top_level.txt | ||
| test/RssUnitTest.py | ||
| test/__init__.py |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
|
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,3 @@ | ||
| [console_scripts] | ||
| rss_reader=rss_reader.rss_reader:main | ||
|
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
|
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,4 @@ | ||
| html2text==2019.9.26 | ||
| python-dateutil==2.8.0 | ||
| jinja2==2.10.1 | ||
| fpdf==1.7.2 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,2 @@ | ||
| rss_reader | ||
| test |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,56 @@ | ||
| import csv | ||
| from datetime import date | ||
| from dateutil.parser import parse | ||
| from dataclasses import dataclass, asdict | ||
| import os | ||
|
|
||
| import ClassNews | ||
|
|
||
| FIELDNAMES = ['date', 'title', 'link', 'article', 'links'] | ||
|
|
||
|
|
||
| def csv_to_python(articles_list, csv_file): | ||
| """This function inserts news to the source csv file that has never been seen in it.""" | ||
| if not os.path.exists(csv_file): | ||
| open(csv_file, 'x', encoding='utf-8').close() | ||
|
|
||
| articles_list_from_csv = [] | ||
| with open(csv_file, "r", encoding='utf-8') as file: | ||
| reader = csv.DictReader(file, FIELDNAMES, delimiter='\t') | ||
| for item in reader: | ||
| r = ClassNews.Article(**item) | ||
| articles_list_from_csv.append(r) | ||
|
|
||
| union_list = articles_list_from_csv[:] | ||
| for item in articles_list: | ||
| if item not in articles_list_from_csv: | ||
| union_list.append(item) | ||
|
|
||
| with open(csv_file, "w", encoding='utf-8') as file: | ||
| writer = csv.DictWriter(file, fieldnames=FIELDNAMES, delimiter='\t') | ||
| for item in union_list: | ||
| writer.writerow(asdict(item)) | ||
| return True | ||
| return False | ||
|
|
||
| def return_news_to_date(input_date, csv_file, limit): | ||
| """This function read from the file those news that match by date""" | ||
| article_list_by_date = [] | ||
| datetime_input = date(int(input_date[0:4]), int(input_date[4:6]), int(input_date[6:8])) | ||
| with open(csv_file, "r", encoding='utf-8') as file: | ||
| reader = csv.DictReader(file, FIELDNAMES, delimiter='\t') | ||
| match_counter = 0 | ||
| for item in reader: | ||
| article_from_file = ClassNews.Article(**item) | ||
|
|
||
| date_time = parse(article_from_file.date) | ||
| date_from_file = date_time.date() | ||
|
|
||
| if date_from_file == datetime_input: | ||
| match_counter += 1 | ||
| article_list_by_date.append(article_from_file) | ||
|
|
||
| if limit == match_counter: | ||
| return article_list_by_date | ||
|
|
||
| return article_list_by_date |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,77 @@ | ||
| import re | ||
| import html2text | ||
| from dataclasses import dataclass | ||
|
|
||
|
|
||
| LINKS_TEMPLATE = '\"((http|https)://(\w|.)+?)\"' | ||
|
|
||
|
|
||
| def xml_arguments_for_class(xml_string, limit): | ||
| """This function receive the xml and limit of articles and returns list of dictionaries""" | ||
| dict_article_list = [] | ||
| text = html2text.HTML2Text() | ||
| text.ignore_images = True | ||
| text.ignore_links = True | ||
| text.ignore_emphasis = True | ||
| for counter, xml_news in enumerate(xml_string.iter('item')): | ||
| parser_dictionary = {} | ||
| for xml_news_item in xml_news: | ||
| # Here we create the article in the form of a dictionary | ||
| if xml_news_item.tag == 'title': | ||
| parser_dictionary['title'] = text.handle(xml_news_item.text).replace('\n', "") | ||
|
|
||
| if xml_news_item.tag == 'pubDate': | ||
| parser_dictionary['date'] = xml_news_item.text | ||
|
|
||
| if xml_news_item.tag == 'link': | ||
| parser_dictionary['link'] = xml_news_item.text | ||
|
|
||
| if xml_news_item.tag == 'description': | ||
| parser_dictionary['article'] = text.handle(xml_news_item.text).replace('\n', '') | ||
| parser_dictionary['links'] = xml_news_item.text.replace('\n', '') | ||
|
|
||
| dict_article_list.append(parser_dictionary) | ||
|
|
||
| if limit == counter + 1: | ||
| return dict_article_list | ||
| return dict_article_list | ||
|
|
||
| def dicts_to_articles(dict_list): | ||
| """This function receive list of dictionaries and convert it to list of articles """ | ||
| article_list = [] | ||
| for item in dict_list: | ||
| article_list.append(Article(**item)) | ||
| return article_list | ||
|
|
||
| def html_text_to_list_links(html_links): | ||
| html_links = html_links.replace("\'", "\"") | ||
| list_links = [] | ||
| for group1 in re.finditer(LINKS_TEMPLATE, html_links): | ||
| list_links.append(group1.group(1)) | ||
| return list_links | ||
|
|
||
| @dataclass | ||
| class Article: | ||
|
HenadziStantchik marked this conversation as resolved.
|
||
| """This is news class, which receives dictionary and have title, date, link, article and links keys fields""" | ||
| title: str | ||
| date: str | ||
| link: str | ||
| article: str | ||
| links: str | ||
|
|
||
| def __post_init__(self): | ||
| self.links = html_text_to_list_links(self.links) | ||
|
|
||
| def __str__(self): | ||
| result_string_article = "\nTitle: %s\nDate: %s\nLink: %s\n\n%s\n\n" % (self.title, self.date, self.link, | ||
| self.article) | ||
| for link_idx, link in enumerate(self.links): | ||
| result_string_article += "[%d]: %s\n" % (link_idx + 1, link) | ||
| result_string_article += '\n' | ||
| return result_string_article | ||
|
|
||
| def __eq__(self, other): | ||
| if self.article == other.article and self.title == other.title and self.link == other.link and \ | ||
| self.date == other.date: | ||
| return True | ||
| return False | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,19 @@ | ||
| from jinja2 import Environment, FileSystemLoader | ||
| import os | ||
|
|
||
| FILENAME_HTML = "articles.html" | ||
|
|
||
|
|
||
| def print_article_list_to_html(list_articles, path): | ||
| if not os.path.exists(path): | ||
| raise FileNotFoundError | ||
| html_stream = print_article_list(list_articles) | ||
| with open(os.path.join(path, FILENAME_HTML), "w", encoding='utf-8') as html: | ||
| html.write(html_stream) | ||
|
|
||
|
|
||
| def print_article_list(list_articles): | ||
| # directory with templates | ||
| env = Environment(loader=FileSystemLoader('.')) | ||
| template = env.get_template('template.html') | ||
| return template.render(articles=list_articles) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,44 @@ | ||
| import os | ||
| from fpdf import FPDF | ||
|
|
||
| FILENAME_PDF = "articles.pdf" | ||
|
|
||
|
|
||
| def conv_str(input_str): | ||
| return (input_str.replace('\u2026', '').replace('\u2019', '').replace('\u201c', '').replace('\u201d', '')\ | ||
| .replace('\u2013', '').replace('\u2018', '')) | ||
|
|
||
|
|
||
| class PDF(FPDF): | ||
|
|
||
| # Page footer | ||
| def footer(self): | ||
| # Position at 1.5 cm from bottom | ||
| self.set_y(-15) | ||
| # Arial italic 8 | ||
| self.set_font('Arial', 'I', 8) | ||
| # Page number | ||
| self.cell(0, 10, 'Page ' + str(self.page_no()) + '/{nb}', 0, 0, 'C') | ||
|
|
||
|
|
||
| def print_article_list_to_pdf(list_articles, path): | ||
|
|
||
| if not os.path.exists(path): | ||
| raise FileNotFoundError | ||
| path = os.path.join(path, FILENAME_PDF) | ||
|
|
||
| pdf = PDF() | ||
| pdf.alias_nb_pages() | ||
| pdf.add_page() | ||
| pdf.set_font('Arial', '', 12) | ||
|
|
||
| for item in list_articles: | ||
| pdf.cell(0, 10, "Title: %s" % (conv_str(item.title)), 0, 1) | ||
| pdf.cell(0, 10, "Date: %s" % (conv_str(item.date)), 0, 1) | ||
| pdf.cell(0, 10, "Link: %s" % (conv_str(item.link)), 0, 1) | ||
| pdf.multi_cell(0, 10, '%s' % (conv_str(item.article)), 0, 1) | ||
| for idx, link in enumerate(item.links): | ||
| pdf.multi_cell(0, 10, "[%d]:%s" % (idx, (conv_str(link))), 0, 1) | ||
| pdf.cell(0, 10, "", 0, 1) | ||
| pdf.output(path, 'F') | ||
| return True |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
|
|
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.