Zaharadniuk_Aliaksei_fiz.zagorodnAA@gmail.com by AlexSpaceBy · Pull Request #2 · epam-python-courses-7-bsu/FinalTaskRssParser

AlexSpaceBy · 2019-11-03T10:41:50Z

Первая версия программы RssReader. Так как создание программы (и ее логическая архитектура) делалась на протяжении предыдущей недели, то окончательный вид (расположение папок, название файлов) не совпадает с тем, что было указано в требованиях README.md. В файле README.txt и README.md имеется полное описание работы программы, указания для установки пакета программы (итерация 2, папка /dist, zip архив), и приведена структура программы. Модуль с unit-тестами будет добавлен позже (планирую его добавить после третьей итерации). В программе используются две сторонние библиотеки: feedparser.py и html2text (пакет из нескольких модулей). Данные библиотеки уже настроены для использования с программой, поэтому рекомендуется использовать именно их, а не аналогичные но скачанные из других репозиториев.

dzhigailo · 2019-11-03T14:06:12Z

+import sys
+
+
+def get_parse(args_in=''):


Here and below:
It's a good practice to add type hits in your function signatures (e.g. def get_parse(args_in='': str) -> Dict[str, Any]:

Please, take a loot at typing lib

All lines were found and fixed where appropriate. Will be updated during the next "push" iteration.

dzhigailo · 2019-11-03T14:08:23Z

+            logs.log_err_exit()
+            parser.exit()
+
+    return {'url': args.url, 'json': args.json, 'verbose': args.verbose, 'limit': args.limit}


Please, take a look at dataclasses lib. It might be better to replace this dict with dataclass object.

Since it may touch the fundamental logic of the program, I'll try to implement dataclasses when I have a working program with at least first four complete iterations.

dzhigailo · 2019-11-03T14:39:55Z

+    """This function receives the answer from server"""
+
+    news = feedparser.parse(url)
+    if news.entries:


return news if news.entries else None

The line was found and fixed. It will be updated during the next "push" iteration.

dzhigailo · 2019-11-03T14:45:13Z

+
+    for i in range(3):
+        try:
+            a = urllib.request.urlopen(url).getcode()


Please, do not use single-charachter variables names.

I have checked all one letter variables in the program and fixed it. It will be updated during the next "push" iteration.

dzhigailo · 2019-11-03T14:51:06Z

+    In case of failure, it reconnects in 10 seconds
+    """
+
+    for i in range(3):


This logic will retry to get a response from server if status code is other than 200. But it's is useless to retry for example 400, 500 HTTP errors, invalid urls and some. You can think about more optimal solution. urllib has it's own Retry included.

The function was fixed. It will be updated during the next "push" iteration.

HenadziStantchik · 2019-11-03T15:17:16Z

Так как создание программы (и ее логическая архитектура) делалась на протяжении предыдущей недели, то окончательный вид (расположение папок, название файлов) не совпадает с тем, что было указано в требованиях README.md.

Please make sure that project structure is the same as specified in our README.md. This is important for CI checks that will be added in future. Note that we only require that application entry point shold be in final_task/rss_reader/rss_reader.py and setup.py file for setuptools should be in final_task/setup.py. Other than that you are free to make your project structure as you see fit.

В файле README.txt и README.md имеется полное описание работы программы, указания для установки пакета программы (итерация 2, папка /dist, zip архив), и приведена структура программы.

Your documentation looks great, but you were supposed to use separate file final_task/README.md for it, not file containing instructions for PR creation. Please move your documentation there.

В программе используются две сторонние библиотеки: feedparser.py и html2text (пакет из нескольких модулей). Данные библиотеки уже настроены для использования с программой, поэтому рекомендуется использовать именно их, а не аналогичные но скачанные из других репозиториев.

While we allow using 3rd party libraries in your final project, it is realy bad practice to copy its code directly, even if you changed it a little. Usually you want to use it through imports. Or if you need to change its functionality a little, you should implement your own class that inherits the one in the library and reimplement required behavior there.

Please fix these issues, especially the last one,

AlexSpaceBy · 2019-11-07T16:14:56Z

Новая версия программы. Включает вторую и третью итерацию. Структура программы была изменена, чтобы полностью соответствовать требованиям, внесены изменения в код программы. Исправлено большинство замечаний. Полностью переделаны некоторые функции. Добавлена опция записи новостей в локальное хранилище, извлечение новостей из локального хранилища для последующей печати в консоль. Переписан setup.py файл, удалены библиотеки feedparser, html2text. Программа при установке автоматически скачивает новые и необходимые библиотеки для корректной работы. Работоспособность была проверена на Windows 10 (1903) и на Linux Fedora 30 с python версии 3.8

HenadziStantchik · 2019-11-07T21:34:59Z

Project structure still not exactly corresponds to the requirements. rss_reader folder is supposed to be inside final_task folder.
Please describe the way to launch your program in your documentation, because I could not do it. I constantly got ModuleNotFoundError

…eadme.md

AlexSpaceBy · 2019-11-08T11:13:26Z

Версия 3.1 (первые три итерации). Исправлена ошибка ModuleNotFoundError добавлением пути поиска файлов в переменные среды. Исправлены все замечания с import, изменена структура программы. В файл README.MD добавлена глава запуска приложения с примерами команд. Описан запуск "сырого" не установленного пакета из каталога rss_reader, сборка пакета для установки, установка пакета, запуск пакета при его установке.

HenadziStantchik · 2019-11-08T12:37:32Z

Here is feedback on your application work:

If user does not specify limit then he gets only 1 piece of news. He/she should get all available feed in this case.
--verbose argument is supposed to output logs in the process of script running, not after it completes its work.
Date: in the news is supposed to be publication date, not the date/time when you parsed these news.
The same is for storing the news in cache.
--limit does not work with --date argument
If you specify limit larger than feed size you will get IndexError exception with traceback printed in stdout.
Your news cache just appends new entries to the end of the file, even if the same news already were in it. As a result if I run your app 5 times with --limit 1 and then run it with --date <current_date> i will get 5 times the same piece of news.

3rd Iteration is not that trivial as it looks, specifically because of that we are asking you to describe your method of storing cached news.

HenadziStantchik · 2019-11-08T12:39:57Z

+        except SystemExit:
+            logs.log_err_init_args(args_in)
+            logs.log_err_exit()
+            parser.exit()


Here and below: It is better to have only one exit point in the program. For example you can raise your custom exception here and handle it in the main function.

It was fixed for major cycles. The rest will be fixed after fourth iteration.

HenadziStantchik · 2019-11-08T12:48:58Z

+                    flag = True
+    except FileNotFoundError:
+        print('Error: File not found. (Maybe this is a first time you are running a program)')
+        return 1


Here and below: while sometimes it is common practice to return your custom 'statuses' it reassly dampens code readability. It is better to raise/reraise an exception here and handle it in the main thread. If everythoong went good just return True.

It will be fixed after the fourth iteration.

AlexSpaceBy · 2019-11-14T19:20:12Z

Версия программы 3.5 Исправлены основные замечания по флагам limit, date, verbose, а так же их комбинации. Была дополнена третья итерация: прежде чем записать новость, программа проверяет наличие дубликата внутри журнала. Была сделана четвертая итерация: добавлено конвертирование в формат pdf согласно недавним рекомендациям в slack. Использование конвертации подробно описано в README.md

HenadziStantchik · 2019-11-15T10:40:11Z

+        # Everything deals with time delay will be repeated
+        except urllib.error.HTTPError as http_err:
+            if http_err.code in (503, 504, 522, 524):
+                print('')


Here, above and below: It is better to think of some other way to make an empty line instead of printing ''

It will be fixed during final commit.

HenadziStantchik · 2019-11-15T11:14:43Z

Here is some feedback on your application:

--date does not work as expected: date user specifies is supposed to be publication date, not parsing date:

--verbose argument is supposed to print logs together with news, not instead of them.
Currently it is impossible to read news from local cache without specifying RSS url.
ARIALUNI.TTF file not found during converting news to pdf format. got an exception with traceback in stdout.
Ensure that you have mo encoding problems in converted pdf file.
--verbose in tandem with -p is not supposed to convert log journal to pdf, just to print logs as you convert news to pdf in stdout.
You should convert news from local cache instead of internet source only when user specified --date option. Right now you do it every time.

AlexSpaceBy · 2019-11-20T18:15:05Z

Четвертая итерация, добавлено конвертирование в формат html. Улучшено конвертирование в формат pdf. Исправлена ошибка с отображением даты, улучшен вывод логов (теперь логи печатаются с новостями). Улучшен процесс конвертации новостей из локального хранилища (теперь для конвертации не нужен Интернет). Исправлена ошибка с шрифтами при конвертировании новостей. Улучшен ввод аргументов (теперь для использования программы не обязательно вводить url для позиционного аргумента). Исправлен ряд замечаний относительно кода.
p.s.
Из-за ошибки при добавлении коммита в ветку, было отправлено два одинаковых коммита с разницей в несколько минут. При рецензировании прошу учитывать только последний коммит.

HenadziStantchik · 2019-11-21T10:28:03Z

Из-за ошибки при добавлении коммита в ветку, было отправлено два одинаковых коммита с разницей в несколько минут. При рецензировании прошу учитывать только последний коммит.

You can squash these commits

HenadziStantchik · 2019-11-21T10:41:14Z

Here is some feedback on your app work:

Your app has encoding problems:

Same for --json, and in pdf converted file

--date does not work with --json
HTML converted file has encoding problems. (Firefox browser, Ubuntu OS):
(optional) It would be nice to make pdf converted file name same as html converted file name (e.g. with date)
Logs should not be converted to pdf or html. If --verbose option is specified then logs should be printed in stdout as the conversion goes.

HenadziStantchik · 2019-11-21T10:43:37Z

+    return {'url': args.url,
+            'json': args.json,
+            'verbose': args.verbose,
+            'limit': args.limit,
+            'date': args.date,
+            'pdf': args.pdf,
+            'html': args.html}


A question:

You already have all arguments inside args object. Do you really need to make this dict here? Why not just pass args everywhere instead?

I use this data in the code in rss_reader.py module. It is more convenient for me to use it in this form. In my opinion it make my code easy to read.

HenadziStantchik · 2019-11-21T10:52:33Z

+    pdf = FPDF()
+    pdf.set_font('Arial', 'B', size=14)
+    pdf.set_margins(10, 10, 10)
+    pdf.add_page(('P', 'A4'))


You had the same chunk of code above, it is better to move it to a separate function

The code above is not the same as above. The main difference is the font: arial_uni for the first case, Arial for the second case.

HenadziStantchik · 2019-11-21T10:54:08Z

+                url = line[73:]
+                info = line[0:66]


These numbers are confusingly specific, please leave a comment here why exactly it is 73 and 66

The function deals with logs, that has specific template: first 66 characters for technical information data, the rest for url. It is convenient for me to slice it this way.

HenadziStantchik · 2019-11-21T10:55:14Z

+def log_log_html():
+    """This function logs the conversion of log journal to html"""
+
+    logging.info('Log journal was converted to html')


Consider using logging decorator instead of this module.

HenadziStantchik · 2019-11-21T10:57:30Z

+def convert_date(date: str):
+    """This function converts date"""
+    month = {'Jan': '1',
+             'Feb': '2',
+             'Mar': '3',
+             'Apr': '4',
+             'May': '5',
+             'Jun': '6',
+             'Jul': '7',
+             'Aug': '8',
+             'Sep': '9',
+             'Oct': '10',
+             'Nov': '11',
+             'Dec': '12'}
+    day = date[5:7]
+    month_num = month[date[8:11]]
+    year = date[12:16]


same: datetime lib

Since the data from argsparse is a string, it is more convenient to use separate function instead of datetime.

This function is still redundant. It is better to use existing optimized solution that works in any situation instead of implementing new one

HenadziStantchik · 2019-11-21T10:58:21Z

+    logs.new_session()
+    commands = args_parser.get_parse()
+
+    while True:


It is better to just remove user input above and get rid of this while completely. This is not how app is supposed to work

The structure of the program was fixed, there is no more while cycle. It will be updated during the next commit.

HenadziStantchik · 2019-11-21T10:59:21Z

+
+
+if __name__ == '__main__':
+    main()


main function is too large. It is better to split it into a smaller functions

AlexSpaceBy · 2019-11-24T13:48:21Z

Финальная версия программы RSS Reader (четыре итерации) - исправлены основные замечания, изменена логика программы, дополнен функционал.

HenadziStantchik · 2019-12-03T10:10:42Z

+    data['link'] = rss.entries[limit].link
+    data['title'] = rss.entries[limit].title
+
+    """news_date for third iteration, yyyy-mm-dd"""


Comments are usually written using #

HenadziStantchik · 2019-12-03T10:11:19Z

+    if data['image'] is not None:
+        print('Image: ' + data['image'])
+        print('. . . . . . . . . . . . . . . . . . . . . .')
+        print('')


It is better to think of some other way for printing an empty line instead of using print("")

HenadziStantchik · 2019-12-03T10:33:20Z

Here is feedback on your application:
Iteration 1:
Everything works fine.
Iteration 2:
Installs fine. All iterations work.
Iteration 3:
Everything works fine.
Iteration 4:
Everything works fine.

The code is readable, decently structured, mostly corresponds to the PEP8 guidelines.
Tests are decent, but the coverage percentage is not high enough.
Commit messages are somewhat informative, but do not correspond to the recommended commit message style.
Overall aside from some commented issues, good job.

RssReader version 1.0 first and second iteration

bd69c4f

dzhigailo reviewed Nov 3, 2019

View reviewed changes

HenadziStantchik added reviewed Review was performed question Further information is requested and removed question Further information is requested reviewed Review was performed labels Nov 5, 2019

Third iteration with main fixes

a0089bd

HenadziStantchik reviewed Nov 7, 2019

View reviewed changes

Comment thread README.md Outdated

HenadziStantchik reviewed Nov 7, 2019

View reviewed changes

Comment thread rss_reader/args_parser.py Outdated

HenadziStantchik reviewed Nov 7, 2019

View reviewed changes

Comment thread rss_reader/rss_reader.py Outdated

Improved version 3.1 with resolved import dependencies and expanded r…

bba4bf6

…eadme.md

HenadziStantchik reviewed Nov 8, 2019

View reviewed changes

Comment thread final_task/rss_reader/rss_reader.py Outdated

HenadziStantchik reviewed Nov 8, 2019

View reviewed changes

Comment thread final_task/rss_reader/args_parser.py

HenadziStantchik reviewed Nov 8, 2019

View reviewed changes

Comment thread final_task/rss_reader/other.py Outdated

v 3.5 All major problems were fixed. PDF converter added.

99730a9

HenadziStantchik reviewed Nov 15, 2019

View reviewed changes

Comment thread final_task/rss_reader/rss_parser.py Outdated

HenadziStantchik reviewed Nov 15, 2019

View reviewed changes

Comment thread final_task/rss_reader/rss_parser.py Outdated

HenadziStantchik reviewed Nov 15, 2019

View reviewed changes

Comment thread final_task/rss_reader/news.py Outdated

Fourth iteration. Improved pdf and html conversion.

fe93945

Fourth Iteration. Improved pdf and html convertion

9821c26

AlexSpaceBy changed the title ~~Zaharadniuk_Aliaksei_fiz.zagorodnAA@gmail.com~~ Zaharadniuk_Aliaksei_fiz.zagorodnAA@gmail.com Please Review Nov 20, 2019