diff --git a/README.md b/README.md deleted file mode 100644 index f3171ab..0000000 --- a/README.md +++ /dev/null @@ -1,36 +0,0 @@ -# FinalTaskRssReader -For final task pull requests. - - -## How to create a pull request - -1. Create github account. *Preferrably using email you used when registerer on this course* -2. Fork this repository. ('Fork' button at the top right of this repository page) -3. Open the page of your *new repository* that was created when you forked this repo. -4. Press button clone or download at the middle right of the page and CTRL-C the url. -5. On your machine go to the directory you want. -6. Depending on the OS you are working with, open GitBash(Windows)/Command Line or Terminal(Linux) there -7. Use command `git clone ` - -Congrats! You have successfully forked our repository. - - -## Additional project structure requirements - -1. `setup.py` file for setuptools *must* be in the root of `final_task` folder. Use `setup.py` that is already there. (that means path to this file must end with `final_task/setup.py` ) -2. Entry point to your application, aka its main module *must* be named as `rss_reader.py` . Use `rss_reader.py` that is already in `rss_reader` folder. -3. You should describe how does your project work, how to launch it and etc in README.md in the `final_task/README.md` file. -4. If you used any non-standart libraries they must be listed in `rss_reader/requirements.txt` file. -5. All unit test files should be in separate folder called `tests`. - - -## Pull request requirements(!!!) - -1. When creating pull request make sure that `target branch` is `master` on OUR repo, not yours. -2. Pull request name *MUST* be in format: `YourFirstName_YourLastName_EmailYouUsedWhileRegisteringOnThisCourse` -3. Pull request which have any other name format, or invalid e-mail *will be ignored completely until you fix it*. So make sure you specified correct e-mail. -4. In pull request description specify your current iteration. You also can add there any other info you want us to know before we start code review. -5. *Pull request must NOT contain any .pyc files, any virtual environment files/folders, any IDE technical files*. - - - diff --git a/final_task/FinalTask.md b/final_task/FinalTask.md deleted file mode 100644 index 1515169..0000000 --- a/final_task/FinalTask.md +++ /dev/null @@ -1,136 +0,0 @@ -# Introduction to Python. Hometask -You are proposed to implement Python RSS-reader using **python 3.8**. - -The task consists of few iterations. Do not start new iteration if the previous one is not implemented yet. - -## Common requirements -* It is mandatory to use `argparse` module. -* Codebase must be covered with unit tests with at least 50% coverage. -* In case of any mistakes utility should print human-readable -error explanation. Exception tracebacks in stdout are prohibited in final version of application. -* Docstrings are mandatory for all methods, classes, functions and modules. -* Code must correspond to `pep8` (use `pycodestyle` utility for self-check). - * You can set line length up to 120 symbols. -* Commit messages should provide correct and helpful information about changes in commit. Messages like `Fix bug`, -`Tried to make workable`, `Temp commit` and `Finally works` are prohibited. -* Usage of external APIs is prohibited (except of APIs for receiving RSS) - -## [Iteration 1] One-shot command-line RSS reader. -RSS reader should be a command-line utility which receives [RSS](wikipedia.org/wiki/RSS) URL and prints results in human-readable format. - -You are free to choose format of the news console output. The textbox below provides an example of how it can be implemented: - -```shell -$ rss_reader.py "https://news.yahoo.com/rss/" --limit 1 - -Feed: Yahoo News - Latest News & Headlines - -Title: Nestor heads into Georgia after tornados damage Florida -Date: Sun, 20 Oct 2019 04:21:44 +0300 -Link: https://news.yahoo.com/wet-weekend-tropical-storm-warnings-131131925.html - -[image 2: Nestor heads into Georgia after tornados damage Florida][2]Nestor raced across Georgia as a post-tropical cyclone late Saturday, hours after the former tropical storm spawned a tornado that damaged -homes and a school in central Florida while sparing areas of the Florida Panhandle devastated one year earlier by Hurricane Michael. The storm made landfall Saturday on St. Vincent Island, a nature preserve -off Florida's northern Gulf Coast in a lightly populated area of the state, the National Hurricane Center said. Nestor was expected to bring 1 to 3 inches of rain to drought-stricken inland areas on its -march across a swath of the U.S. Southeast. - - -Links: -[1]: https://news.yahoo.com/wet-weekend-tropical-storm-warnings-131131925.html (link) -[2]: http://l2.yimg.com/uu/api/res/1.2/Liyq2kH4HqlYHaS5BmZWpw--/YXBwaWQ9eXRhY2h5b247aD04Njt3PTEzMDs-/https://media.zenfs.com/en/ap.org/5ecc06358726cabef94585f99050f4f0 (image) - -``` - -Utility should provide the following interface: -```shell -usage: rss_reader.py [-h] [--version] [--json] [--verbose] [--limit LIMIT] - source - -Pure Python command-line RSS reader. - -positional arguments: - source RSS URL - -optional arguments: - -h, --help show this help message and exit - --version Print version info - --json Print result as JSON in stdout - --verbose Outputs verbose status messages - --limit LIMIT Limit news topics if this parameter provided - -``` - -In case of using `--json` argument your utility should convert the news into [JSON](https://en.wikipedia.org/wiki/JSON) format. -You should come up with the JSON structure on you own and describe it in the README.md file for your repository or in a separate documentation file. - -The `--limit` argument should also affect JSON generation. - -With the argument `--verbose` your program should print all logs in stdout. - -Withe the argument `--version` your program should print in stdout it's current version and complete it's work. The version supposed to change with every iteration. - - -## [Iteration 2] Distribution - -* Utility should be wrapped into distribution package with `setuptools`. -* This package should export CLI utility named `rss-reader`. - -> Note: Double-check, that your utility works correctly after its new package was installed on a clean machine. - -## [Iteration 3] News caching -The RSS news should be stored in a local storage while reading. The way and format of this storage you can choose yourself. -Please describe it in a separate section of README.md or in the documentation. - -New optional argument `--date` must be added to your utility. It should take a date in `%Y%m%d` format. -For example: `--date 20191020` - -The cashed news can be read with it. The new from the specified day will be printed out. -If the news are not found return an error. - -If the `--date` argument is not provided, the utility should work like in the previous iterations. - -## [Iteration 4] Format converter - -You should implement the conversion of news in at least two of the suggested format: `.mobi`, `.epub`, `.fb2`, `.html`, `.pdf` - -New optional argument must be added to your utility. This argument receives the path where new file will be saved. The arguments should represents which format will be generated. - -For example: `--to-mobi` or `--to-fb2` or `--to-epub` - - -You can choose yourself the way in which the news will be displayed, but the final text result should contain pictures and links, if they exist in the original article and if the format permits to store this type of data. - -## * [Iteration 5] Output colorization -> Note: An optional iteration, it is not necessary to implement it. You can move on with it only if all the previous iterations (from 1 to 4) are completely implemented. - -You should add new optional argument `--colorize`, that will print the result of the utility in colorized mode. - -If the argument is not provided, the utility should work like in the previous iterations. - -> Note: Take a look at the [colorize](https://pypi.org/project/colorize/) library - -## * [Iteration 6] Web-server -> Note: An optional iteration, it is not necessary to implement it. You can move on with it only if all the previous iterations (from 1 to 4) are completely implemented. Introduction to Python course does not cover the topics that are needed for the implementation of this part. - -There are several mandatory requirements in this iteration: -* `Docker` + `docker-compose` usage (at least 2 containers: one for web-application, one for DB) -* Web application should provide all the implemented in the previous parts of the task functionality, using the REST API: - - One-shot conversion from RSS to Human readable format - - Server-side news caching - - Conversion in epub, mobi, fb2 or other formats - -Feel free to choose the way of implementation, libraries and frameworks. (We suggest you `Django Rest Framework` + `PostgreSQL` combination) - -You can implement any functionality that you want. The only requirement is to add the description into README file or update project documentation, for example: -* authorization/authentication -* automatic scheduled news update -* adding new RSS sources using API - - - ---- -Implementations will be checked with the latest cPython interpreter of 3.8 branch. ---- - - -> Always code as if the guy who ends up maintaining your code will be a violent psychopath who knows where you live. Code for readability. **John F. Woods** diff --git a/final_task/README.md b/final_task/README.md index 7af281f..c7bab73 100644 --- a/final_task/README.md +++ b/final_task/README.md @@ -1,3 +1,437 @@ -# Your readme here -Some text. -Checkout how to write this file using *markdown*. +# RSS Reader version 4.0 by AlexSpaceBy (fiz.zagorodnAA@gmail.com) + +First, second, third and fourth iteration. + +## General description: + +The RssReader program takes rss url and receives news and print it in console in following form: + +*========================================== RSS Reader ==============================* +*Feed: Channel feed* +*Title: Title of the news* +*Date: Day, Month year hh.mm.ss* +*Link: Link to full news* + +*TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT* +*TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT* +*TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT* + +*Image: Image for the news, if available.* +*(News separator). . . . . . . . . . . . . . . . .* + +*Next news* + +*...* + +*=================================End for news=======================================* + +To run the program, you should do the following: + +*For Windows:* + 1. Run console as Administrator (mandatory) + 2. Type the following: `rss-reader [url] [flags] [parameters]` +*For Linux:* + 1. Type the following: `sudo rss-reader `[url] [flags] [parameters]` + +Program has a number of flags described below, that can be used to control it. +Program logs all events to `logJournal.log` file. Program writes all news to `news.log` file. + +## Program flags + +Program behavior is guided by a number of flags: + +`-v` - Print the version of the program in console (use: `-v`). +`-h` - Print the help message (use: `-h`). +`-b` - Print logs from logJournal to console (use: `[url] -b`). +`-j` - Print news in JSON format to console( use: `[url] -j`). +`-l` - Limit the number of news topics (use: `[url] -l [number]`). +`-d` - Date to print the news with it from history (use: `[url] -d [YYYYmmdd]`). +`-p` - Convert news to pdf and store it locally (use: `[url] -p [destination folder]`). +`-hl` - Convert news to html and store it locally(use: `[url] -hl [destination folder]`). + + +## logJournal + +There are three main types of events: + +*1. INFO* +*2. WARNING* +*3. ERROR* + +To see log journal, one needs to use `[-b]` flag. + +*For Windows:* Log Journal is stored in the default Python directory (in my case: `C:\Program Files\Python38\Lib\site-packages\rss_reader`). + + +*For Linux:* Log Journal is stored in the default Python directory (in my case: `/usr/local/lib/python3.7/site-packages/rss_reader`). + +## Program logic description + +The program has a builtin checker that checks whether the `[url]` is an actual url. +The url must have the following format: `https://nameofthesite.domain[/rest part of the link]`. +In case `[url]` is wrong, the program tells suggests you either to change the `[ur]` or to quit. +If `[url]` looks like url, the program tryes to check if there is a server on the other side. +If the server is not available, the program will try to reconnect to it in 10 seconds. Three +attempts will be made. In the server is not available, the program asks you either to change `[url]` +or to quit. If there is a server on the other side, the program tries to take RSS Feed. If `[url]` leads to +site, server, or something that doesn't have RSS feed, the program will ask you either to change `[url]` or to quit. +If everything is ok, the program will print news to console. The `[-l]` flag set up the limit for the news to print to console. +By default the program stores the news in local news.log file. The `[-d]` flag with `YYYYmmdd` format prints all news corresponding to +the `YYYYmmdd` date. The detailed description of news storage is described below. There is an option to convert nwes to pdf. The detailed +description of this option is provided below. + +## JSON format + +There is a builtin JSON converter that converts the output to the specified format. The converter envokes by using `[-j]` flag. +It prints the news in JSON format to console (by using json library). The program uses the folloving JSON formatting: + +{ +"feed": string, +"link": string, +"title": string, +"date": datetime, +"description": string, +"image": string +} + +The whole feed can be stored in JSON format by using the following formatting: + +{ +int: { + "feed": string, + "link": string, + "title": string, + "date": datetime, + "description": string, + "image": string + }, +int: { + "feed": string, + "link": string, + "title": string, + "date": datetime, + "description": string, + "image": string + }, +... +int: { + "feed": string, + "link": string, + "title": string, + "date": datetime, + "description": string, + "image": string + } +} + +Maximum int corresponds either to `[-l]` flag, or to the whole number of news for the server (for some it can be 50, for some it can be 100). + +## newsLog journal + +By default RssReader stores all news taken from the Internet in news.log file in the following format: + +*!!!#####STARTOFNEWS#####!!!YYYYmmdd* +*Feed: (line with feed)* +*Title: (line with litle)* +*Date: (line with date)* +*Link: (line with link)* +*Image: (line with image)* + +*TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT* +*TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT* +*TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT* + +*. . . . . . . . . . . . . . . . . . .* +*!!!#####ENDOFNEWS#####!!!YYYYmmdd* + +The first and the last line show the beginning and the end of news respectively. When one send `-d YYYYmmdd` command, the program searches +through all journal for the news that strat with `!!!#####STARTOFNEWS#####!!!YYYYmmdd` and ends with `!!!#####ENDOFNEWS#####!!!YYYYmmdd` and +prints all lines in between. There is a builtin function that does preliminary check of date. If date is from the future, the program will tell about it. +If there is no news.log file (that may happened if you run the program for the first time, the news.log in this case have not existed yet) or there is +no news that corresponds to the specified `YYYYmmdd` date, the program will tell about it. +There is a builtin function that checks whether the news has been already added to the local storage. If so, the program logs the attempt, and does not add it to +the local storage. To use this function the program takes the link for the news and checks if there is a news with the same link. + +Typical usage (provided you have the news log in your local storage, if not, just provide a valid rss url to create one): + + `rss-reader -d 20191119` - prints all news for 2019.11.19 to console from local storage + + `rss-reader -d 20191119 -l 10` - prints 10 news for 2019.11.19 to console from local storage + +*For Windows:* News Journal is stored in the default Python directory (in my case: `C:\Program Files\Python38\Lib\site-packages\rss_reader`). + +*For Linux:* News Journal is stored in the default Python directory (in my case: `/usr/local/lib/python3.7/site-packages/rss_reader`). + +## Convert news to pdf or html + +The program can convert the news to pdf or html. It can take either news from the Internet, or from local storage. The program converts news to pdf by invoking `[-p]` flag. +The program converts news to html by invoking `[-hl]` flag.In the document the program provide all data about the news, including image and link. If the image is not available, or corrupted, +the program will let you know about it. The image and link are interactive and can be used as URL for the sourse by clicking. Due to saving of free space on hard drive, the program cashes only text +information and link to the news. When you convert news from local storage, the program downloads the picture from the Internet. In there is no Internet, the program converts the +news from local storage, but instead of picture, it draws notice that there is no Internet connection. + +Typical usage of the pdf converter (if it was installed from the package): + + `rss-reader https://news.yahoo.com/rss/ -p C:\Users\User\Destination_Folder\` - convert all news from the feed to pdf. + + `rss-reader https://news.yahoo.com/rss/ -p C:\Users\User\Destination_Folder\ -l 10` - convert 10 news from the feed to pdf. + + `rss-reader https://news.yahoo.com/rss/ -p C:\Users\User\Destination_Folder\ -l 10 -j` - convert 10 news from the json feed to pdf. + + `rss-reader -b -p C:\Users\User\Destination_Folder\` - convert log journal to pdf + + `rss-reader -d 20191119 -p C:\Users\User\Destination_Folder\` - convert all news from the local storage to pdf for 20191119. + + `rss-reader -d 20191119 -p C:\Users\User\Destination_Folder\ -l 10` - convert 10 news from the local storage to pdf for 20191119. + + `rss-reader https://news.yahoo.com/rss/ -hl C:\Users\User\Destination_Folder\` - convert all news from the feed to html. + + `rss-reader https://news.yahoo.com/rss/ -hl C:\Users\User\Destination_Folder\ -l 10` - convert 10 news from the feed to html. + + `rss-reader -b -hl C:\Users\User\Destination_Folder\` - convert log journal to html/ + + `rss-reader -d 20191119 -hl C:\Users\User\Destination_Folder\` - convert all news from the local storage to pdf for 20191119. + + `rss-reader -d 20191119 -hl C:\Users\User\Destination_Folder\ -l 10` - convert 10 news from the local storage to pdf for 20191119. + + + +The `Destination_Folder\` is a folder where files after pdf conversion will be stored. The `\` in the end of the path is a mandatory for the program. If the `Destination_Folder` does not exist, +or you have no permission to create pdf inside it, the program will let you know. Since pdf convertion is a heavy weight procedure, it can take time. If ther is no errors, just wait until it converts. + +## General issues + +Since for every RSS `[url]` can be implemented it own format, that does not fully comply with RSS specification, the program uses +html2text library to make the news text more readable. Since the program uses `argparse` standart library, there is somr difficulties with exceptions. +Namely, all Exceptions in this library inherits from Exception class, not from BaseExceptions. Because of it, all exceptions work the way they don't ment to be. + +## Installation + ++++++++++++++++++++++++++++++++++++++++ +CREATION OF PACKAGE +++++++++++++++++++++++++++++++++++++++ +How to create package using setup.py +====================================== +1. Go to the */final_task* folder wich has `setup.py` file + +3.1 Windows: Run console as Administrator + 3.1.1 Run the following code: `python setup.py sdist --formats=zip` + +3.1 Linux: Run the following command: `sudo python3 setup.py sdist --formats=zip` (provided your python v3.8 has a shortcut `python3`) + +4. The zip package will be inside `.../final_task/dist` folder. + ++++++++++++++++++++++++++++++++++++++++ +INSTALLATION FOR WINDOWS (tested for Wondows 10 1903) ++++++++++++++++++++++++++++++++++++++++ +Package installation: +========================================= +zip Installation: +========================================= +1. Run Console with administrator privileges (run as Administrator). +2. In console go to directory with *rss-reader-4.0.zip* package. +3. Run the following command: `pip install rss-reader-4.0.zip` +========================================= + + ++++++++++++++++++++++++++++++++++++++++ +INSTALLATION FOR LINUX (tested for Fedora 30) ++++++++++++++++++++++++++++++++++++++++ +Package installation: +========================================= +zip Installation: +========================================= +1. In console go to directory with *rss-reader-4.0.zip* package. +2. Run the following command: `sudo pip3 install rss-reader-4.0.zip`. +========================================= + + + +## How to run the program + +DIRECT RUN FROM rss_reader FOLDER: +==================================================================================================================== +WARINING: +If you try to run directly from the folder, and have never installed the program from the package, YOU MUST INSTALL the following packages: + +Windows: +1. feedparser version 5.2.1 (`pip install feedparser==5.2.1`) +2. html2text version 2019.9.26 (`pip install html2text==2019.9.26`) +3. fpdf version 1.7.2 (`pip install fpdf==1.7.2`) + +To convert the news to pdf, it is highly recommended to check if `ARIALUNI.TTF` is inside the rss_reader folder. + +Linux: +1. feedparser version 5.2.1 (`sudo pip3 install feedparser==5.2.1`) +2. html2text version 2019.9.26 (`sudo pip3 install html2text==2019.9.26`) +3. fpdf version 1.7.2 (`sudo pip3 install fpdf==1.7.2`) + +To convert the news to pdf, it is highly recommended to check if `ARIALUNI.TTF` is inside the rss_reader folder. +==================================================================================================================== + +Windows: +======== +1. Run console as Administrator. +2. Go to */rss_reader* folder with `rss_reader.py` file. +3. Run: `python rss_reader.py [flags] [parameters]` + +Example: + `python rss_reader.py https://news.yahoo.com/rss/` - will show you RSS feed from mews.yahoo + `python rss_reader.py https://news.yahoo.com/rss/ -l 10` - will show you RSS feed with 10 news + `python rss_reader.py https://news.yahoo.com/rss/ -v` - will show you version of the program + `python rss_reader.py https://news.yahoo.com/rss/ -l 10 -j` - will show you RSS feed with 10 news in JSON format + + `python rss_reader.py -b` - will show you log journal + + `python rss_reader.py -d 20191119` - prints all news for 2019.11.19 to console from local storage + `python rss_reader.py -d 20191119 -l 10` - prints 10 news for 2019.11.19 to console from local storage + + `python rss_reader.py https://news.yahoo.com/rss/ -p C:\Users\User\Destination_Folder\` - convert all news from the feed to pdf. + `python rss_reader.py https://news.yahoo.com/rss/ -p C:\Users\User\Destination_Folder\ -l 10` - convert 10 news from the feed to pdf. + `python rss_reader.py https://news.yahoo.com/rss/ -p C:\Users\User\Destination_Folder\ -l 10 -j` - convert 10 news from the json feed to pdf. + `python rss_reader.py -b -p C:\Users\User\Destination_Folder\` - convert log journal to pdf + `python rss_reader.py -d 20191119 -p C:\Users\User\Destination_Folder\` - convert all news from the local storage to pdf for 20191119. + `python rss_reader.py -d 20191119 -p C:\Users\User\Destination_Folder\ -l 10` - convert 10 news from the local storage to pdf for 20191119. + + `python rss_reader.py https://news.yahoo.com/rss/ -hl C:\Users\User\Destination_Folder\` - convert all news from the feed to html. + `python rss_reader.py https://news.yahoo.com/rss/ -hl C:\Users\User\Destination_Folder\ -l 10` - convert 10 news from the feed to html. + `python rss_reader.py -b -hl C:\Users\User\Destination_Folder\` - convert log journal to html. + `python rss_reader.py -d 20191119 -hl C:\Users\User\Destination_Folder\` - convert all news from the local storage to html for 20191119. + `python rss_reader.py -d 20191119 -hl C:\Users\User\Destination_Folder\ -l 10` - convert 10 news from the local storage to html for 20191119. + +Linux: +====== +1. Open the console +2. Go to */rss_reader* folder with `rss_reader.py` file. +3. Run: `sudo python3 rss_reader.py [flags] [parameters]` (provided your python v3.8 has a shortcut `python3`) + +Example: + `sudo python3 rss_reader.py https://news.yahoo.com/rss/` - will show you RSS feed from mews.yahoo + `sudo python3 rss_reader.py https://news.yahoo.com/rss/ -l 10` - will show you RSS feed with 10 news + `sudo python3 rss_reader.py https://news.yahoo.com/rss/ -v` - will show you version of the program + `sudo python3 rss_reader.py https://news.yahoo.com/rss/ -l 10 -j` - will show you RSS feed with 10 news in JSON format + + `sudo python3 rss_reader.py -d 20191119` - prints all news for 2019.11.19 to console from local storage + `sudo python3 rss_reader.py -d 20191119 -l 10` - prints 10 news for 2019.11.19 to console from local storage + + `sudo python3 rss_reader.py https://news.yahoo.com/rss/ -p /Users/User/Destination_Folder/` - convert all news from the feed to pdf. + `sudo python3 rss_reader.py https://news.yahoo.com/rss/ -p /Users/User/Destination_Folder/ -l 10` - convert 10 news from the feed to pdf. + `sudo python3 rss_reader.py https://news.yahoo.com/rss/ -p /Users/User/Destination_Folder/ -l 10 -j` - convert 10 news from the json feed to pdf. + `sudo python3 rss_reader.py -b -p /Users/User/Destination_Folder/` - convert log journal to pdf + `sudo python3 rss_reader.py -d 20191119 -p /Users/User/Destination_Folder/` - convert all news from the local storage to pdf for 20191119. + `sudo python3 rss_reader.py -d 20191119 -p /Users/User/Destination_Folder/ -l 10` - convert 10 news from the local storage to pdf for 20191119. + + `sudo python3 rss_reader.py https://news.yahoo.com/rss/ -hl /Users/User/Destination_Folder/` - convert all news from the feed to html. + `sudo python3 rss_reader.py https://news.yahoo.com/rss/ -hl /Users/User/Destination_Folder/ -l 10` - convert 10 news from the feed to html. + `sudo python3 rss_reader.py -b -hl /Users/User/Destination_Folder/` - convert log journal to html. + `sudo python3 rss_reader.py -d 20191119 -hl /Users/User/Destination_Folder/` - convert all news from the local storage to html for 20191119. + `sudo python3 rss_reader.py -d 20191119 -hl /Users/User/Destination_Folder/ -l 10` - convert 10 news from the local storage to html for 20191119. + +All files like logJournal and news.log will be inside rss_reader directory. +===================================================================================================================== + + +PACKAGE RUN (assume you did every step from Installation and installed rss-reader-4.0.zip) +===================================================================================================================== +Windows: +======== +1. Run console as Administrator. +3. Run: `rss-reader [flags] [parameters]` + +Example: + `rss-reader https://news.yahoo.com/rss/` - will show you RSS feed from mews.yahoo + `rss-reader https://news.yahoo.com/rss/ -l 10` - will show you RSS feed with 10 news + `rss-reader https://news.yahoo.com/rss/ -v` - will show you version of the program + `rss-reader https://news.yahoo.com/rss/ -l 10 -j` - will show you RSS feed with 10 news in JSON format + + `rss-reader -b` - will show you log journal + + `rss-reader -d 20191119` - prints all news for 2019.11.19 to console from local storage + `rss-reader -d 20191119 -l 10` - prints 10 news for 2019.11.19 to console from local storage + + `rss-reader https://news.yahoo.com/rss/ -p C:\Users\User\Destination_Folder\` - convert all news from the feed to pdf. + `rss-reader https://news.yahoo.com/rss/ -p C:\Users\User\Destination_Folder\ -l 10` - convert 10 news from the feed to pdf. + `rss-reader https://news.yahoo.com/rss/ -p C:\Users\User\Destination_Folder\ -l 10 -j` - convert 10 news from the json feed to pdf. + `rss-reader -b -p C:\Users\User\Destination_Folder\` - convert log journal to pdf + `rss-reader -d 20191119 -p C:\Users\User\Destination_Folder\` - convert all news from the local storage to pdf for 20191119. + `rss-reader -d 20191119 -p C:\Users\User\Destination_Folder\ -l 10` - convert 10 news from the local storage to pdf for 20191119. + + `rss-reader https://news.yahoo.com/rss/ -hl C:\Users\User\Destination_Folder\` - convert all news from the feed to html. + `rss-reader https://news.yahoo.com/rss/ -hl C:\Users\User\Destination_Folder\ -l 10` - convert 10 news from the feed to html. + `rss-reader -b -hl C:\Users\User\Destination_Folder\` - convert log journal to html. + `rss-reader -d 20191119 -hl C:\Users\User\Destination_Folder\` - convert all news from the local storage to html for 20191119. + `rss-reader -d 20191119 -hl C:\Users\User\Destination_Folder\ -l 10` - convert 10 news from the local storage to html for 20191119. + +All files like logJournal and news.log will be inside the default Python directory (in my case: `C:\Program Files\Python38\Lib\site-packages\rss_reader`). + +Linux: +====== +1. Run the console. +3. Run: `sudo rss-reader [flags] [parameters]` (provided your python v3.8 has a shortcut `python3`) + +Example: + `sudo rss-reader https://news.yahoo.com/rss/` - will show you RSS feed from mews.yahoo + `sudo rss-reader https://news.yahoo.com/rss/ -l 10` - will show you RSS feed with 10 news + `sudo rss-reader https://news.yahoo.com/rss/ -v` - will show you version of the program + `sudo rss-reader https://news.yahoo.com/rss/ -l 10 -j` - will show you RSS feed with 10 news in JSON format + + `sudo rss-reader -b` - will show you log journal + + `sudo rss-reader -d 20191119` - prints all news for 2019.11.19 to console from local storage + `sudo rss-reader -d 20191119 -l 10` - prints 10 news for 2019.11.19 to console from local storage + + `sudo rss-reader https://news.yahoo.com/rss/ -p /Users/User/Destination_Folder/` - convert all news from the feed to pdf. + `sudo rss-reader https://news.yahoo.com/rss/ -p /Users/User/Destination_Folder/ -l 10` - convert 10 news from the feed to pdf. + `sudo rss-reader https://news.yahoo.com/rss/ -p /Users/User/Destination_Folder/ -l 10 -j` - convert 10 news from the json feed to pdf. + `sudo rss-reader -b -p /Users/User/Destination_Folder/` - convert log journal to pdf + `sudo rss-reader -d 20191119 -p /Users/User/Destination_Folder/` - convert all news from the local storage to pdf for 20191119. + `sudo rss-reader -d 20191119 -p /Users/User/Destination_Folder/ -l 10` - convert 10 news from the local storage to pdf for 20191119. + + `sudo rss-reader https://news.yahoo.com/rss/ -hl /Users/User/Destination_Folder/` - convert all news from the feed to html. + `sudo rss-reader https://news.yahoo.com/rss/ -hl /Users/User/Destination_Folder/ -l 10` - convert 10 news from the feed to html. + `sudo rss-reader -b -hl /Users/User/Destination_Folder/` - convert log journal to html. + `sudo rss-reader -d 20191119 -hl /Users/User/Destination_Folder/` - convert all news from the local storage to html for 20191119. + `sudo rss-reader -d 20191119 -hl /Users/User/Destination_Folder/ -l 10` - convert 10 news from the local storage to html for 20191119. + +All files like logJournal and news.log will be inside the default Python directory (in my case: `/usr/local/lib/python3.7/site-packages/rss_reader`). + +## progect structure + +final_task +|---rss-reader +| | +| |---args_parser.py +| |---json_converter.py +| |---logs.py +| |---news.py +| |---rss_reader.py +| |---rss_parser.py +| |---converter.py +| |---ARIALUNI.TTF +| |---requirements.txt +| |---tests +| |-----test_args_parser.py +| |-----test_rss_parser.py +| |-----test_news.py +| +|---setup.py +|---README.txt +|---__init__.py + + +## Resourses for testing + +RssReader was tested with following RSS: + +https://news.yahoo.com/rss/ +https://3dnews.ru/news/rss/ +https://www.tomshardware.com/feeds/all +https://people.onliner.by/feed +http://feeds.bbci.co.uk/news/rss.xml?edition=int +http://static.feed.rbc.ru/rbc/logical/footer/news.rss +https://www.gazeta.ru/export/rss/first.xml +https://rss.nytimes.com/services/xml/rss/nyt/World.xml +https://www.reddit.com/r/worldnews/.rss +https://www.aljazeera.com/xml/rss/all.xml +http://feeds.washingtonpost.com/rss/world +https://www.engadget.com/rss.xml diff --git a/final_task/__init__.py b/final_task/__init__.py index e69de29..dbb1a73 100644 --- a/final_task/__init__.py +++ b/final_task/__init__.py @@ -0,0 +1,4 @@ +from rss_reader.rss_reader import main + +if __name__ == 'rss_reader': + main() diff --git a/final_task/rss_reader/ARIALUNI.TTF b/final_task/rss_reader/ARIALUNI.TTF new file mode 100644 index 0000000..51a18bc Binary files /dev/null and b/final_task/rss_reader/ARIALUNI.TTF differ diff --git a/final_task/rss_reader/args_parser.py b/final_task/rss_reader/args_parser.py new file mode 100644 index 0000000..35769bb --- /dev/null +++ b/final_task/rss_reader/args_parser.py @@ -0,0 +1,89 @@ +""" +This module process the input arguments +""" + +import argparse +import re +import logs +import sys +import datetime +import os + + +def get_parse(args_in='') -> dict: + """This function takes the arguments from command line""" + + parser = argparse.ArgumentParser(prog='RSS reader', description='RSS reader. Takes arguments from command line.') + + parser.add_argument('url', type=str, nargs='?', default='url', help='Link to RSS channel(line without spaces).') + parser.add_argument('-v', '--version', action='version', version='%(prog)s 4.0', help='Print version info.') + parser.add_argument('-j', '--json', action='store_const', const=True, help='Print result as json in stdout.') + parser.add_argument('-b', '--verbose', action='store_const', const=True, help='Print all logs in stdout.') + parser.add_argument('-l', '--limit', type=int, help='Limit of news topics (natural number).') + parser.add_argument('-d', '--date', type=int, help='Date to print news from history, yyyymmdd (natural number).') + parser.add_argument('-p', '--pdf', type=str, help=r'Convert news to pdf (use -p [path where to store\].') + parser.add_argument('-hl', '--html', type=str, help=r'Convert news to html (use -hl [path where to store\].') + + if args_in: + try: + args = parser.parse_args(args_in) + logs.log_init_args(args) + except SystemExit: + logs.log_err_init_args(args_in) + logs.log_err_exit() + parser.exit() + else: + try: + args = parser.parse_args() + logs.log_init_args(args) + except SystemExit: + logs.log_err_init_args(sys.argv) + logs.log_err_exit() + parser.exit() + + return {'url': args.url, + 'json': args.json, + 'verbose': args.verbose, + 'limit': args.limit, + 'date': args.date, + 'pdf': args.pdf, + 'html': args.html} + + +def validate_url(url: str) -> bool: + """This function validates if RSS link is an actual URL""" + + url_check = re.findall('http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\), ]|(?:%[0-9a-fA-F][0-9a-fA-F]))+', url) + + if len(url_check) == 0: + return False + + return len(url) == len(url_check[0]) + + +def validate_args(data: dict) -> bool: + """This function validates the arguments are correct""" + + date_time = datetime.datetime.now() + + if data['limit'] is not None: + if data['limit'] <= 0: + print('Limit is invalid.') + return False + + if data['date']: + if data['date'] > int(date_time.strftime("%Y%m%d")): + print('Date is wrong: today date ' + date_time.strftime("%Y%m%d") + ' is less than your date.') + return False + + if data['pdf']: + if not os.access(data['pdf'], os.W_OK): + print('The path is either not exist or can not be reached.') + return False + + if data['html']: + if not os.access(data['html'], os.W_OK): + print('The path is either not exist or can not be reached.') + return False + + return True diff --git a/final_task/rss_reader/converter.py b/final_task/rss_reader/converter.py new file mode 100644 index 0000000..0d4820f --- /dev/null +++ b/final_task/rss_reader/converter.py @@ -0,0 +1,188 @@ +"""This module converts data to pdf""" + +from fpdf import FPDF +import urllib.request as url +import os +import sys +import datetime +import re + +THIS_DIRECTORY = os.path.abspath(os.path.dirname(__file__)) +FONT = 'ARIALUNI.TTF' + +WIN_FONT = r'C:\Windows\Fonts\arial.ttf' +LIN_FONT = r'/usr/share/fonts/dejavu/DejaVuSansCondensed.ttf' + +if sys.platform == 'win32': + if os.path.isfile(WIN_FONT): + FONTPATH = WIN_FONT + else: + FONTPATH = os.path.join(THIS_DIRECTORY, FONT) + +if sys.platform == 'linux': + if os.path.isfile(LIN_FONT): + FONTPATH = LIN_FONT + else: + FONTPATH = os.path.join(THIS_DIRECTORY, FONT) + + +def convert_pdf(rss_news_clean: dict, path: str): + """This function creates pdf""" + + date_time = datetime.datetime.now() + + pdf = FPDF() + pdf.add_font('arial_uni', '', FONTPATH, True) + pdf.set_margins(10, 10, 10) + counter = 1 + + for rss_news in rss_news_clean.values(): + pdf.add_page(('P', 'A4')) + + if rss_news['image']: + try: + url.urlretrieve(rss_news['image'], str(counter)+'imageTemp.jpg') + pdf.image(str(counter)+'imageTemp.jpg', 120, 12, 50, 50, link=rss_news['image']) + os.remove(str(counter)+'imageTemp.jpg') + counter += 1 + except RuntimeError: + create_image_template(pdf) + pdf.multi_cell(w=50, h=12, align='C', txt=' \n IMAGE IS CORRUPTED') + except url.HTTPError: + create_image_template(pdf) + pdf.multi_cell(w=50, h=12, align='C', txt=' \n IMAGE IS CORRUPTED') + except url.URLError: + create_image_template(pdf) + pdf.multi_cell(w=50, h=12, align='C', txt=' \n NO INTERNET CONNECTION') + except ValueError: + create_image_template(pdf) + pdf.multi_cell(w=50, h=12, align='C', txt=' \n NO PICTURE AVAILABLE') + + else: + create_image_template(pdf) + pdf.multi_cell(w=50, h=12, align='C', txt=' \n NO PICTURE AVAILABLE') + + pdf.set_font("arial_uni", size=12) + + title_string = str(rss_news['title'].encode('utf-8', 'ignore').decode('utf-8')) + feed_string = str(rss_news['feed'].encode('utf-8', 'ignore').decode('utf-8')) + news_string = str(rss_news['description'].encode('utf-8', 'ignore').decode('utf-8')) + + # The most common problematic character of unicode + if ''' in title_string: + title_string = re.sub(''', "'", title_string) + + if '"' in title_string: + title_string = re.sub('"', "'", title_string) + + if ''' in feed_string: + feed_string = re.sub(''', "'", feed_string) + + if ''' in news_string: + news_string = re.sub(''', "'", news_string) + + if '"' in news_string: + news_string = re.sub('"', "'", news_string) + + string = 'Feed: ' + feed_string + '\n' + ' ' + '\n' 'Title: ' + title_string + '\n' + ' ' + '\n' + string += 'Date: ' + rss_news['date'] + '\n' + ' ' + '\n' + pdf.multi_cell(w=100, h=6, txt=string) + + pdf.set_font("arial_uni", size=12) + string = 'News: ' + '\n' + ' \n\n' + news_string + pdf.write(h=6, txt=string) + + pdf.set_font("Arial", "B", size=12) + pdf.write(h=6, txt='Link: ' + rss_news['link'], link=rss_news['link']) + + pdf.output(path + 'news' + date_time.strftime("%Y%m%d-%H-%M-%S") + '.pdf', 'F') + + +def convert_log_pdf(log_journal: dict, path: str): + """This function converts log journal to pdf""" + + pdf = FPDF() + pdf.set_font('Arial', 'B', size=14) + pdf.set_margins(10, 10, 10) + pdf.add_page(('P', 'A4')) + + pdf.write(h=6, txt='Log journal\n') + + pdf.set_font('Arial', size=10) + + for line in log_journal.values(): + pdf.write(h=6, txt=line) + + pdf.output(path+'log.pdf', 'F') + + +def convert_html(rss_news_clean: dict, path: str): + """This function creates html""" + + date_time = datetime.datetime.now() + + with open(path+'news' + date_time.strftime("%Y%m%d-%H-%M-%S") + '.html', 'w+', encoding='utf-8') as file: + file.write('') + file.write('') + file.write(' News for ' + date_time.strftime("%Y.%m.%d - %H:%M:%S") + '') + file.write('') + file.write('') + + for rss_news in rss_news_clean.values(): + + title_string = str(rss_news['title'].encode('utf-8', 'ignore').decode('utf-8')) + news_string = str(rss_news['description'].encode('utf-8', 'ignore').decode('utf-8')) + + # The most common problematic character of unicode + if ''' in title_string: + title_string = re.sub(''', "'", title_string) + + if '"' in title_string: + title_string = re.sub('"', "'", title_string) + + if ''' in news_string: + news_string = re.sub(''', "'", news_string) + + if '"' in news_string: + news_string = re.sub('"', "'", news_string) + + file.write('
') + file.write(title_string + '

') + file.write('' + rss_news['date'] + '

') + file.write('
' + news_string + '

') + + if rss_news['image'] is not None: + file.write('
' + '
') + else: + file.write(' Image is not available

') + + file.write('
Link: ' + rss_news['link'] + '

') + file.write('') + file.write('


') + + file.write(' ') + + +def convert_log_html(log_journal: dict, path: str): + """This function converts log to html""" + + with open(path+'log.html', 'w+', encoding='utf-8') as file: + file.write('') + file.write('') + file.write(' Log journal output ') + file.write('') + file.write('') + + for line in log_journal.values(): + file.write('

' + line + '

') + + file.write(' ') + + +def create_image_template(pdf): + """This function creates image template""" + + pdf.set_line_width(1) + pdf.rect(120, 12, 50, 50) + pdf.set_xy(120, 12) + pdf.set_font("Arial", "B", size=14) diff --git a/final_task/rss_reader/json_converter.py b/final_task/rss_reader/json_converter.py new file mode 100644 index 0000000..29116c8 --- /dev/null +++ b/final_task/rss_reader/json_converter.py @@ -0,0 +1,15 @@ +"""This module provides conversion of news to JSON format""" + +import json + + +def convert_json(data: dict) -> dict: + """This function converts data to JSON""" + + return json.dumps(data) + + +def print_json(data: dict): + """This function prints data in JSON""" + + print(json.dumps(data, ensure_ascii=False, indent=4)) diff --git a/final_task/rss_reader/logs.py b/final_task/rss_reader/logs.py new file mode 100644 index 0000000..fa94bef --- /dev/null +++ b/final_task/rss_reader/logs.py @@ -0,0 +1,221 @@ +"""This module creates logs, writes logs, reads logs.""" + +import logging +import os +import news + +THIS_DIRECTORY = os.path.abspath(os.path.dirname(__file__)) + +JOURNAL_LOG = 'logJournal.log' +DIRECTORY = os.path.join(THIS_DIRECTORY, JOURNAL_LOG) + +args = '%(asctime)s - %(levelname)s - %(message)s' +logging.basicConfig(filename=DIRECTORY, format=args, level=logging.INFO, filemode='a+') + + +def new_session(): + """This function logs the beginning of a new session""" + + logging.info('+++++ STARTING NEW SESSION +++++') + + +def end_session(): + """This function ends the session""" + + logging.info('+++++ ENDING THE SESSION +++++') + + +def log_url(name: str): + """This function logs url""" + + logging.info('Initial url success: ' + name) + + +def log_connection(name: str): + """This function logs connection""" + + logging.info('Server connection success: ' + name) + + +def log_rss(name: str): + """This function logs RSS URL""" + + logging.info('RSS connection success: ' + name) + + +def log_wrong_url(name: str): + """This function logs the wrong url""" + + logging.warning('Potentially wrong URL: ' + name) + + +def log_connection_failed(name: str): + """This function logs the failed connection""" + + logging.error('Cannot create connection with server: ' + name) + + +def log_wrong_rss(name: str): + """This function logs wrong RSS URL""" + + logging.warning('Cannot get RSS feed: ' + name) + + +def log_init_args(name: str): + """This function logs initial arguments""" + + logging.info('Initial arguments: %s', name) + + +def log_err_init_args(name: str): + """This function logs wrong initial arguments""" + + logging.error('Wrong initial arguments: %s', name) + + +def log_err_exit(): + """This function logs the error exit""" + + logging.error('+++++ ERROR EXIT +++++') + + +def print_log(): + """This function prints""" + + with open(DIRECTORY, 'r') as log: + for line in log: + if 'INFO - News was written to local storage: url -' in line: + """ + This code cuts line into two slices: + url - slice with link + info - slice with information + Since lines has the same pattern, the slices will be the same + """ + url = line[73:] + info = line[0:66] + print(info) + print('Link for the news: ' + url, end='') + news.news_log_add(url) + else: + print(line, end='') + + +def log_prepare() -> dict: + """This function prepares log journal for conversion""" + log_journal = dict() + counter = 0 + + with open(DIRECTORY, 'r') as log: + for line in log: + log_journal[counter] = line + counter += 1 + + return log_journal + + +def log_print(): + """This function logs print operation""" + + logging.info('Log journal printed') + + +def log_news_store(url: str): + """This function logs news storage operation""" + + logging.info('News was written to local storage: url - '+url) + + +def log_news_print(): + """This function logs news print from storage""" + + logging.info('News was printed from storage') + + +def log_news_print_err(): + """This function logs the error if news is not in storage""" + + logging.error('The news is not in storage') + + +def log_news_filenotfound(): + """This function logs if local news storage is not found""" + + logging.error('Local news storage is not found') + + +def log_invalid_arguments(args: str): + """This function logs invalid arguments""" + + logging.error('Arguments not valid: ' + args) + + +def log_news_limit(limit: int): + """This function logs when news limit is reached""" + + logging.info('News limit for print is reached: ' + str(limit)) + + +def log_news_copycat(url: str): + """This function logs attempt of writing duplicate news""" + + logging.warning('Attempt to write duplicate news: url - ' + url) + + +def log_news_local_storage_pdf(): + """This function logs the conversion from local storage""" + + logging.info('News from local storage vere converted to pdf') + + +def log_news_pdf(): + """This function logs the conversion from rss feed""" + + logging.info('News from rss feed were converted to pdf') + + +def log_news_local_storage_html(): + """This function logs the conversion from local storage""" + + logging.info('News from local storage vere converted to html') + + +def log_news_html(): + """This function logs the conversion from rss feed""" + + logging.info('News from rss feed were converted to html') + + +def log_log_pdf(): + """This function logs the conversion of log journal to pdf""" + + logging.info('Log journal was converted to pdf') + + +def log_log_html(): + """This function logs the conversion of log journal to html""" + + logging.info('Log journal was converted to html') + + +def print_log_verbose_pdf(): + """This function prints""" + + str_1 = 'News from rss feed were converted to pdf' + str_2 = 'News from local storage vere converted to pdf' + + with open(DIRECTORY, 'r') as log: + for line in log: + if (str_1 in line) or (str_2 in line): + print(line, end='') + + +def print_log_verbose_html(): + """This function prints""" + + str_1 = 'News from rss feed were converted to html' + str_2 = 'News from local storage vere converted to html' + + with open(DIRECTORY, 'r') as log: + for line in log: + if (str_1 in line) or (str_2 in line): + print(line, end='') diff --git a/final_task/rss_reader/news.py b/final_task/rss_reader/news.py new file mode 100644 index 0000000..5b45100 --- /dev/null +++ b/final_task/rss_reader/news.py @@ -0,0 +1,160 @@ +""" +This module process the news cashing and storing +""" + +import os + +THIS_DIRECTORY = os.path.abspath(os.path.dirname(__file__)) +NEWS_LOG = 'news.log' +DIRECTORY = os.path.join(THIS_DIRECTORY, NEWS_LOG) + + +def news_check(rss_news: dict) -> bool: + """This function checks, if the news is inside the local storage""" + try: + with open(DIRECTORY, encoding='utf-8') as file: + for line in file: + if rss_news['link'] in line: + return False + return True + except FileNotFoundError: + return True + + +def news_store(rss_news: dict): + """This function stores the news to the dictionary""" + with open(DIRECTORY, 'a+', encoding='utf-8') as file: + file.write('!!!#####STARTOFNEWS#####!!!' + rss_news['news_date'] + '\n') + file.write('Feed: ' + rss_news['feed'] + '\n') + file.write('Title: ' + rss_news['title'] + '\n') + file.write('Date: ' + rss_news['date'] + '\n') + file.write('Link: ' + rss_news['link'] + '\n') + + try: + file.write('Image: ' + rss_news['image'] + '\n') + except TypeError: + file.write('Image: ' + ' image is not available.' + '\n') + + file.write(' ' + '\n') + file.write(rss_news['description']) + file.write('. . . . . . . . . . . . . . . . . . .' + '\n') + file.write('!!!#####ENDOFNEWS#####!!!' + rss_news['news_date'] + '\n') + + +def news_print(date: str, limit: int) -> int: + """This function prints the news from local storage with specified [date]""" + flag = False + empty = True + counter = 0 + try: + with open(DIRECTORY, 'r', encoding='utf-8') as file: + for line in file: + if '!!!#####ENDOFNEWS#####!!!'+date in line: + flag = False + + if flag: + print(line, end='') + empty = False + + if '!!!#####STARTOFNEWS#####!!!'+date in line: + flag = True + + if counter == limit: + return 3 + + if limit != -1: + counter += 1 + except FileNotFoundError: + print('Error: File not found. (Maybe this is a first time you are running a program)') + return 1 + + if empty: + print('Error: news with ' + date + ' not found.') + return 2 + + return 0 + + +def news_decompose(date: str, limit: int) -> dict: + """This function takes the news from local storage""" + flag = False + empty = True + counter_news = 0 + counter_lines = 0 + counter = 0 + news = dict() + dict_buffer = dict() + news_buffer = list() + string_buffer = '' + + try: + with open(DIRECTORY, 'r', encoding='utf-8') as file: + for line in file: + if '!!!#####ENDOFNEWS#####!!!' + date in line: + flag = False + + if string_buffer: + dict_buffer['description'] = string_buffer + news[counter_news] = dict_buffer + counter_news += 1 + + counter_lines = 0 + news_buffer = list() + dict_buffer = dict() + string_buffer = '' + + if flag: + empty = False + if counter_lines < 5: + news_buffer.append(line) + counter_lines += 1 + elif counter_lines == 5: + dict_buffer['feed'] = news_buffer[0].replace('Feed: ', '').replace('\n', '') + dict_buffer['title'] = news_buffer[1].replace('Title: ', '').replace('\n', '') + dict_buffer['date'] = news_buffer[2].replace('Date: ', '').replace('\n', '') + dict_buffer['link'] = news_buffer[3].replace('Link: ', '').replace('\n', '') + dict_buffer['image'] = news_buffer[4].replace('Image: ', '').replace('\n', '') + counter_lines += 1 + else: + string_buffer += line + + if '!!!#####STARTOFNEWS#####!!!' + date in line: + flag = True + + if counter == limit: + return news + + if limit != -1: + counter += 1 + + if empty: + print('Error: news with ' + date + ' not found.') + news[0] = 2 + return news + + return news + + except FileNotFoundError: + print('Error: File not found. (Maybe this is a first time you are running a program)') + news[0] = 1 + return news + + +def news_log_add(url: str): + """This function takes the news by url""" + flag_1 = False + counter = 0 + try: + with open(DIRECTORY, 'r', encoding='utf-8') as file: + for line in file: + if url in line: + flag_1 = True + if flag_1: + counter += 1 + if counter > 3: + if '!!!#####ENDOFNEWS#####!!!' in line: + return 0 + else: + print(line, end='') + except FileNotFoundError: + return 1 diff --git a/final_task/rss_reader/requirements.txt b/final_task/rss_reader/requirements.txt index e69de29..67f3c04 100644 --- a/final_task/rss_reader/requirements.txt +++ b/final_task/rss_reader/requirements.txt @@ -0,0 +1,3 @@ +feedparser == 5.2.1 +html2text == 2019.9.26 +fpdf == 1.7.2 \ No newline at end of file diff --git a/final_task/rss_reader/rss_parser.py b/final_task/rss_reader/rss_parser.py new file mode 100644 index 0000000..80af0ba --- /dev/null +++ b/final_task/rss_reader/rss_parser.py @@ -0,0 +1,167 @@ +""" +This module interchange between Internet and program +""" + +import feedparser +import html2text +import re +import datetime +import time +import urllib.request +import urllib.error +import http.client + + +SERVER_ANSWER = http.client.responses + + +def get_rss(url: str) -> dict: + """This function receives the answer from server""" + + news = feedparser.parse(url) + return news if news.entries else None + + +def process_rss(rss: dict, limit: int) -> dict: + """This function process rss news""" + + data = dict() + + try: + rss.entries[limit] + except AttributeError: + return False + + data['feed'] = rss.feed.title + data['link'] = rss.entries[limit].link + data['title'] = rss.entries[limit].title + + """news_date for third iteration, yyyy-mm-dd""" + + try: + data['date'] = rss.entries[limit].published + date_time = datetime.datetime.now() + data['news_date'] = convert_date(rss.entries[limit].published) + except AttributeError: + date_time = datetime.datetime.now() + data['date'] = date_time.strftime("%d/%m/%Y %H:%M:%S") + data['news_date'] = date_time.strftime("%Y%m%d") + + try: + data['description'] = rss.entries[limit].summary_detail['value'] + data['image'] = rss.entries[limit].summary_detail['value'] + except AttributeError: + data['description'] = rss.entries[limit].title_detail['value'] + data['image'] = rss.entries[limit].title_detail['value'] + + data['description'] = re.sub('

', '', data['description']) + data['description'] = re.sub('', '', data['description']) + data['description'] = re.sub('', '', data['description']) + data['description'] = html2text.html2text(data['description']) + + if ''' in data['title']: + data['title'] = re.sub(''', "'", data['title']) + + if '"' in data['title']: + data['title'] = re.sub('"', "'", data['title']) + + if ''' in data['description']: + data['description'] = re.sub(''', "'", data['description']) + + if '"' in data['description']: + data['description'] = re.sub('"', "'", data['description']) + + img_raw = re.search('> bool: + """This function tries to connect to RSS url + In case of failure, it reconnects in 10 seconds + """ + for connection_tryouts in range(3): + + try: + tryout = urllib.request.urlopen(url).getcode() + return tryout == 200 + + # Everything deals with time delay will be repeated + except urllib.error.HTTPError as http_err: + if http_err.code in (503, 504, 522, 524): + print('') + print('The server is not available:') + print('Trying to reconnect') + + """We wait between connection tryouts""" + for time_delay in range(10): + print('. ', end='') + time.sleep(1) + print('') + + # If server answer is a common code or something is not a common code + elif http_err.code in SERVER_ANSWER.keys(): + print('The server can not be reached: Reason: %s' % SERVER_ANSWER[http_err.code]) + return False + else: + print('Unknown error: %s' % http_err.code) + return False + + # In case HTTPError is not working + except urllib.error.URLError as url_err: + print('The server can not be reached. Reason: %s' % url_err.reason) + + return False + + +def convert_date(date: str): + """This function converts date""" + month = {'Jan': '1', + 'Feb': '2', + 'Mar': '3', + 'Apr': '4', + 'May': '5', + 'Jun': '6', + 'Jul': '7', + 'Aug': '8', + 'Sep': '9', + 'Oct': '10', + 'Nov': '11', + 'Dec': '12'} + day = date[5:7] + month_num = month[date[8:11]] + year = date[12:16] + + return year+month_num+day diff --git a/final_task/rss_reader/rss_reader.py b/final_task/rss_reader/rss_reader.py index e69de29..c8c9c15 100644 --- a/final_task/rss_reader/rss_reader.py +++ b/final_task/rss_reader/rss_reader.py @@ -0,0 +1,214 @@ +"""Main module of the program""" + +import sys +import os + +THIS_DIRECTORY = os.path.abspath(os.path.dirname(__file__)) +sys.path.append(THIS_DIRECTORY) + + +import args_parser +import rss_parser +import json_converter +import converter +import logs +import news +import ast + + +def main(): + + """This is a main function""" + print('============================== RSS Reader ==================================') + logs.new_session() + commands = args_parser.get_parse() + + """Check if all input arguments are valid""" + if args_parser.validate_args(commands): + pass + else: + print('Invalid input arguments.') + print('Check your input parameters') + logs.log_invalid_arguments(str(commands)) + + return 1 + + """Print log journal""" + if commands['verbose'] and (not commands['pdf']) and (not commands['html']): + logs.log_print() + logs.print_log() + + return 0 + + """Convert log to pdf""" + if commands['verbose'] and commands['pdf']: + logs.print_log_verbose_pdf() + log_journal = logs.log_prepare() + converter.convert_log_pdf(log_journal, commands['pdf']) + logs.log_log_pdf() + print('Log journal was converted to pdf') + + return 0 + + """Convert log to html""" + if commands['verbose'] and commands['html']: + logs.print_log_verbose_html() + log_journal = logs.log_prepare() + converter.convert_log_html(log_journal, commands['html']) + logs.log_log_html() + print('Log journal was converted to html') + + return 0 + + """Print news log from history""" + if commands['date'] and (commands['pdf'] is None) and (commands['html'] is None) and (commands['json'] is None): + if commands['limit']: + print_tryout = news.news_print(str(commands['date']), commands['limit']) + else: + print_tryout = news.news_print(str(commands['date']), -1) + + if print_tryout == 0: + logs.log_news_print() + elif print_tryout == 1: + logs.log_news_filenotfound() + elif print_tryout == 2: + logs.log_news_print_err() + elif print_tryout == 3: + logs.log_news_limit(commands['limit']) + + return 0 + + """Print news from history in json format""" + if commands['date'] and (commands['json']) and (commands['pdf'] is None) and (commands['html'] is None): + if commands['limit']: + news_tryout = news.news_decompose(str(commands['date']), commands['limit']) + else: + news_tryout = news.news_decompose(str(commands['date']), -1) + + json_converter.print_json(news_tryout) + + return 0 + + """Convert news from history to pdf""" + if commands['date'] and commands['pdf']: + if commands['limit']: + decompose_tryout = news.news_decompose(str(commands['date']), commands['limit']) + else: + decompose_tryout = news.news_decompose(str(commands['date']), -1) + + if decompose_tryout[0] == 1: + logs.log_news_filenotfound() + elif decompose_tryout[0] == 2: + logs.log_news_print_err() + else: + converter.convert_pdf(decompose_tryout, commands['pdf']) + + logs.log_news_local_storage_pdf() + print('News from local storage were converted to pdf.') + + return 0 + + """Convert news from history to html""" + if commands['date'] and commands['html']: + if commands['limit']: + decompose_tryout = news.news_decompose(str(commands['date']), commands['limit']) + else: + decompose_tryout = news.news_decompose(str(commands['date']), -1) + + if decompose_tryout[0] == 1: + logs.log_news_filenotfound() + elif decompose_tryout[0] == 2: + logs.log_news_print_err() + else: + converter.convert_html(decompose_tryout, commands['html']) + + logs.log_news_local_storage_html() + print('News from local storage were converted to html.') + + return 0 + + """Check if URL looks like URL""" + if args_parser.validate_url(commands['url']): + logs.log_url(commands['url']) + else: + print('Invalid URL: check your URL.') + print('(Example of valid URL: https://news.yahoo.com/rss/)') + print('Enter valid URL') + logs.log_wrong_url(commands['url']) + + return 1 + + """Check if there is a server on other side""" + if rss_parser.connect_rss(commands['url']): + logs.log_connection(commands['url']) + else: + print('Enter new URL') + logs.log_connection_failed(commands['url']) + + return 1 + + """Check if URL is a RSS URL""" + rss_news_raw = rss_parser.get_rss(commands['url']) + if rss_news_raw: + logs.log_rss(commands['url']) + news_limit = len(rss_news_raw.entries) + else: + print('The RSS feed is not responding.') + print('Check your URL') + logs.log_wrong_rss(commands['url']) + + return 1 + + """Print the news and prepare data for further processing + "i" is a running index. + """ + rss_news_clean = dict() + + """Check if we out of range in news limit, if yes we print all news, if no, we print 'limit' news""" + if commands['limit'] and (commands['limit'] <= news_limit): + limit = commands['limit'] + else: + limit = news_limit + + for i in range(limit): + if rss_parser.process_rss(rss_news_raw, i): + rss_news_clean[i] = rss_parser.process_rss(rss_news_raw, i) + if commands['json'] and (not commands['pdf']): + json_converter.print_json((rss_news_clean[i])) + elif not commands['pdf'] and not commands['html']: + rss_parser.print_rss(rss_news_clean[i]) + else: + print('Limit for news is reached.') + + """Write the news to file""" + for value in rss_news_clean.values(): + if news.news_check(value): + news.news_store(value) + logs.log_news_store(value['link']) + else: + logs.log_news_copycat(value['link']) + + """Create pdf file""" + if commands['pdf'] and rss_news_clean and (not commands['json']): + converter.convert_pdf(rss_news_clean, commands['pdf']) + logs.log_news_pdf() + print("News were converted to pdf") + + if commands['pdf'] and commands['json']: + json_news = ast.literal_eval(json_converter.convert_json(rss_news_clean)) + converter.convert_pdf(json_news, commands['pdf']) + logs.log_news_pdf() + print("News were converted to pdf") + + """Create html file""" + if commands['html'] and rss_news_clean: + converter.convert_html(rss_news_clean, commands['html']) + logs.log_news_html() + print("News were converted to html") + + logs.end_session() + print('============================= End for news =================================') + + +if __name__ == '__main__': + main() diff --git a/final_task/rss_reader/tests/test_args_parser.py b/final_task/rss_reader/tests/test_args_parser.py new file mode 100644 index 0000000..ea32ef8 --- /dev/null +++ b/final_task/rss_reader/tests/test_args_parser.py @@ -0,0 +1,69 @@ +""" +This module tests args_parser module +""" + +import re +import os +import sys + +THIS_DIRECTORY = os.path.abspath(os.path.dirname(__file__)) + +DIR = 'tests' + +THIS_DIRECTORY = re.sub(DIR, '', THIS_DIRECTORY) + +sys.path.append(THIS_DIRECTORY) + + +import unittest +import args_parser + + +class ARGSparser(unittest.TestCase): + def test_get_parse(self): + + test_dict_1 = { + 'url': 'http://news.yahoo.com/rss/', + 'json': None, + 'verbose': None, + 'limit': None, + 'date': None, + 'pdf': None, + 'html': None} + + input_1 = ['http://news.yahoo.com/rss/'] + + test_dict_2 = { + 'url': 'http://news.yahoo.com/rss/', + 'json': True, + 'verbose': True, + 'limit': 9, + 'date': 20191107, + 'pdf': 'C://', + 'html': 'C:/'} + + input_2 = ['http://news.yahoo.com/rss/', '-j', '-b', '-l', '9', '-d', '20191107', '-p', 'C://', '-hl', 'C:/'] + + input_3 = ['url', '-j', '999', '-b', '999', '-l', 'aaa'] + + self.assertDictEqual(args_parser.get_parse(input_1), test_dict_1) + self.assertDictEqual(args_parser.get_parse(input_2), test_dict_2) + with self.assertRaises(SystemExit): args_parser.get_parse(input_3) + + def test_validate_url(self): + + self.assertEqual(args_parser.validate_url('https://news.yahoo.com/rss/'), True) + self.assertEqual(args_parser.validate_url('www.wrongurl.com'), False) + + def test_validate_args(self): + + self.assertEqual(args_parser.validate_args({'limit': 9, 'date': 20191107, 'pdf': None, 'html': None}), True) + self.assertEqual(args_parser.validate_args({'limit': -9, 'date': 20191107, 'pdf': None, 'html': None}), False) + self.assertEqual(args_parser.validate_args({'limit': 9, 'date': 22191107, 'pdf': None, 'html': None}), False) + self.assertEqual(args_parser.validate_args({'limit': -9, 'date': 21191107, 'pdf': None, 'html': None}), False) + + +if __name__ == '__main__': + unittest.main() + + diff --git a/final_task/rss_reader/tests/test_news.py b/final_task/rss_reader/tests/test_news.py new file mode 100644 index 0000000..9c66662 --- /dev/null +++ b/final_task/rss_reader/tests/test_news.py @@ -0,0 +1,47 @@ +""" +This module tests news module +""" + +import re +import os +import sys + +THIS_DIRECTORY = os.path.abspath(os.path.dirname(__file__)) + +DIR = 'tests' + +THIS_DIRECTORY = re.sub(DIR, '', THIS_DIRECTORY) + +sys.path.append(THIS_DIRECTORY) + + +import unittest +import news + + +class NewsCheck(unittest.TestCase): + def setUp(self) -> None: + + with open('news.log', 'w', encoding='utf-8') as file: + file.write('Good url') + + def test_news_check(self): + news_link = {'link': 'Good url'} + + self.assertEqual(news.news_check(news_link), True) + + def tearDown(self) -> None: + os.remove('news.log') + + +class NewsPrint(unittest.TestCase): + + def test_news_print(self): + self.assertEqual(news.news_print('21191125', 1), 2) + + def test_news_log(self): + self.assertEqual(news.news_log_add('url'), None) + + +if __name__ == '__main__': + unittest.main() \ No newline at end of file diff --git a/final_task/rss_reader/tests/test_rss_parser.py b/final_task/rss_reader/tests/test_rss_parser.py new file mode 100644 index 0000000..3d5cffd --- /dev/null +++ b/final_task/rss_reader/tests/test_rss_parser.py @@ -0,0 +1,44 @@ +"""This module tests rss_parser module +""" +import re +import os +import sys + +THIS_DIRECTORY = os.path.abspath(os.path.dirname(__file__)) + +DIR = 'tests' + +THIS_DIRECTORY = re.sub(DIR, '', THIS_DIRECTORY) + +sys.path.append(THIS_DIRECTORY) + + +import unittest +import rss_parser + + +class RSSParser(unittest.TestCase): + def test_get_rss(self): + self.assertEqual(rss_parser.get_rss('https://www.google.com'), None) + self.assertNotEqual(rss_parser.get_rss('https://news.yahoo.com/rss/'), None) + + def test_connect_url(self): + self.assertEqual(rss_parser.connect_rss('https://httpstat.us/200'), True) + self.assertEqual(rss_parser.connect_rss('https://httpstat.us/504'), False) + self.assertEqual(rss_parser.connect_rss('https://httpstat.us/405'), False) + self.assertEqual(rss_parser.connect_rss('https://httpstat.us/511'), False) + + def test_convert_date(self): + self.assertEqual(rss_parser.convert_date('Fri, 22 Nov 2019 15:47:25 -0500'), '20191122') + self.assertNotEqual(rss_parser.convert_date('Fri, 22 Nov 2019 15:47:25 -0500'), '20192211') + self.assertEqual(rss_parser.convert_date('Sat, 23 Nov 2019 18:50:07 -0500'), '20191123') + self.assertNotEqual(rss_parser.convert_date('Sat, 23 Nov 2019 18:50:07 -0500'), '20192311') + + +if __name__ == '__main__': + unittest.main() + + + + + diff --git a/final_task/setup.py b/final_task/setup.py index e69de29..e15b970 100644 --- a/final_task/setup.py +++ b/final_task/setup.py @@ -0,0 +1,15 @@ +from setuptools import setup + +setup( + name='rss-reader', + version='v4.0', + packages=['rss_reader'], + package_data={'rss_reader': ['ARIALUNI.ttf']}, + install_requires=['feedparser == 5.2.1', 'html2text == 2019.9.26', 'fpdf==1.7.2'], + url='www.github.com', + license='LICENCE.txt', + author='AlexSpaceBy', + author_email='fiz.zagorodnAA@gmail.com', + description='RSS Reader', + entry_points={'console_scripts': ['rss-reader = rss_reader.rss_reader:main']} +)