diff --git a/Groupwork (Group 5)/Collecting data Group Project - building the reddit scraper.ipynb b/Groupwork (Group 5)/Collecting data Group Project - building the reddit scraper.ipynb
new file mode 100644
index 0000000..aa56ab0
--- /dev/null
+++ b/Groupwork (Group 5)/Collecting data Group Project - building the reddit scraper.ipynb	
@@ -0,0 +1,1425 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "4e6409e2",
+   "metadata": {},
+   "source": [
+    "# The creation and exploration of the r/Feminism dataset"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "884380e1",
+   "metadata": {},
+   "source": [
+    "This file serves as a guide for the  creation of a Reddit webscraper with PRAW. We used our own research question, which you can find in our report, for guidance on how to approach your own research using a scraper. After creating your own dataframe with Pandas, we show you how to inspect and clean it for further use. In our report we illustrated some examples of further use with some brief visualizations created with flourish. However a possible output ofcourse depends on your own research. \n",
+    "\n",
+    "In the workbook file on our github you can find a file in which you can create your own Reddit scraper of your choice. You can use this file for reference and as an example to the assignments you find in there. "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bdb227e8",
+   "metadata": {},
+   "source": [
+    "1). Use https://www.reddit.com/prefs/apps to create a Reddit app. Choose 'Create App.' Here you can fill in a name (user agent), description and redirect uri. As described in the PRAW documentation (https://praw.readthedocs.io/en/latest/getting_started/authentication.html#script-application) \n",
+    "you should choose http://localhost:8080 as your uri. \n",
+    "\n",
+    "For the name you should avoid using words like 'scraping' or 'bot.' It could be that Reddit will not allow your authorization if you use these words. Lastly, select script for personal use and press 'create app.' \n",
+    "\n",
+    "The client_id is a code which can be found underneath 'personal use script.' The client_secret can be found next to 'secret.' The user_agent is the name you chose yourself. \n",
+    "\n",
+    "For our scraper we chose the 'reddit_read_only.' This means the scraper will only gather the data. \n",
+    "\n",
+    "For a more indepth explanation on creating the Reddit app we refer to the tutorial section in our report or take a look here: https://towardsdatascience.com/scraping-reddit-data-1c0af3040768."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "ff2a759e",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "#If you have not already done this, install these libraries first with !pip install name_of_library\n",
+    "import praw\n",
+    "import pandas as pd"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "cb19e399",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Display Name: Feminism\n",
+      "Title: Feminism - “the personal is political” \n",
+      "Description: >#\n",
+      "\n",
+      ">* [Library](http://www.reddit.com/r/Feminism/search?q=flair%3A%22full+text%22&sort=new&restrict_sr=on)\n",
+      "\n",
+      ">#\n",
+      "\n",
+      ">* [Tags](http://redd.it/209vts)\n",
+      "\n",
+      ">#\n",
+      "\n",
+      ">* [FAQ and resources](https://docs.google.com/document/d/1TpHPEo3pG-QlB7dWCF-fcJFgayFmlQia-RjpKQGFP4A/edit#bookmark=id.p80ha3e7jbzv)\n",
+      "* [Concepts](http://redd.it/1fkhq4)\n",
+      "* [Studies] (https://docs.google.com/document/d/1TpHPEo3pG-QlB7dWCF-fcJFgayFmlQia-RjpKQGFP4A/edit#bookmark=id.abrf8mm38svw)\n",
+      "* [Feminist works](https://docs.google.com/document/d/1TpHPEo3pG-QlB7dWCF-fcJFgayFmlQia-RjpKQGFP4A/edit#bookmark=id.jsay6nakas1s)\n",
+      "* [Organizations](https://docs.google.com/document/d/1TpHPEo3pG-QlB7dWCF-fcJFgayFmlQia-RjpKQGFP4A/edit#bookmark=id.jsay6nakas1s)\n",
+      "\n",
+      ">#\n",
+      "\n",
+      ">* [Currents](http://redd.it/166i8a)\n",
+      "\n",
+      ">#\n",
+      "\n",
+      ">* [Definition](http://redd.it/1fkhkf)\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "### [](#h3-blue)\n",
+      ">**Feminism** is the pursuit of equality in regards to women's rights. It has manifested across centuries and continents through [various movements, currents and ideologies](http://www.reddit.com/r/Feminism/comments/166i8a/a_short_introduction_to_feminist_movements/).\n",
+      "\n",
+      "Welcome to the feminism community!  This is a space for discussing and promoting awareness of issues related to equality for women.\n",
+      "\n",
+      "####Recommended introductory reading:\n",
+      "\n",
+      "- a selection of **[feminist works](https://docs.google.com/document/d/1TpHPEo3pG-QlB7dWCF-fcJFgayFmlQia-RjpKQGFP4A/edit#heading=h.vyclhseeefrl)**\n",
+      "\n",
+      "- on the **[history of feminism](https://docs.google.com/document/d/1TpHPEo3pG-QlB7dWCF-fcJFgayFmlQia-RjpKQGFP4A/edit#heading=h.x60lc44f3gn2)**\n",
+      "\n",
+      "- feminist **[blogs and websites](http://redd.it/19l1wv)**\n",
+      "\n",
+      "- **[recurrent questions](http://www.reddit.com/r/AskFeminists/search?q=flair%3ARecurrent_questions&restrict_sr=on)**\n",
+      "\n",
+      "- tagged browsing: posted **[studies](http://www.reddit.com/r/Feminism/search?q=flair%3Astudy&restrict_sr=on&sort=relevance&t=all)**, **[classic works](http://www.reddit.com/r/Feminism/search?q=flair%3Aclassic&restrict_sr=on&sort=relevance&t=all)**\n",
+      "\n",
+      "\n",
+      "####Issues related to [women's rights](https://docs.google.com/document/d/1TpHPEo3pG-QlB7dWCF-fcJFgayFmlQia-RjpKQGFP4A/edit?pli=1#bookmark=id.v1jecz3p3hpc):\n",
+      "\n",
+      "- [bodily integrity and autonomy](http://redd.it/142nzm)\n",
+      "\n",
+      "- [fair wages and equal career opportunities](http://redd.it/142o2s)\n",
+      "\n",
+      "- [the right to vote and the representation of women in politics](http://redd.it/145z0n)\n",
+      "\n",
+      "- [the right to own property](http://redd.it/146es0)\n",
+      "\n",
+      "- [the right to education](http://redd.it/1475xh)\n",
+      "\n",
+      "Our FAQ also has sections on issues related to [LGBT](https://docs.google.com/document/d/1TpHPEo3pG-QlB7dWCF-fcJFgayFmlQia-RjpKQGFP4A/edit?pli=1#bookmark=id.gs6n32up92ey) rights and [men's](https://docs.google.com/document/d/1TpHPEo3pG-QlB7dWCF-fcJFgayFmlQia-RjpKQGFP4A/edit#bookmark=id.xf29a9z1r0tt) rights.\n",
+      "\n",
+      "####Other Recommended Subreddits  \n",
+      "\n",
+      " |  | \n",
+      ":--| :--|\n",
+      "/r/twoXchromosomes | /r/AskFeminists|\n",
+      "/r/CriticalTheory| /r/domesticviolence  |\n",
+      "/r/MeToo | /r/relationship_advice |\n",
+      "/r/rapecounseling | /r/ainbow |\n",
+      "/r/BodyAcceptance | /r/SexPositive |\n",
+      " |  | \n",
+      "\n",
+      "\n",
+      "For a larger selection of civic issues subreddits, click [here](https://docs.google.com/document/d/1TpHPEo3pG-QlB7dWCF-fcJFgayFmlQia-RjpKQGFP4A/edit#bookmark=id.h456x5acmpv9)\n",
+      "\n",
+      "####Posting Rules\n",
+      "\n",
+      "\\- all posts and discussions must be relevant to women's issues\n",
+      "\n",
+      "\\- all posts must come from an educated perspective\n",
+      "\n",
+      "\\- promoting regressive agendas is not permitted\n",
+      "\n",
+      "\\- be respectful and courteous\n",
+      "\n",
+      "\\- respect the \"assume good faith\" principle\n",
+      " \n",
+      "[Click here for more info](https://www.reddit.com/r/Feminism/about/rules)\n",
+      "\n",
+      "**Rules regarding debating**:\n",
+      "\n",
+      "Criticism of feminist concepts/organizations/persons is **welcomed** if it meets the following criteria:\n",
+      "\n",
+      "\\- it is topical/directly relevant to the topic at hand;\n",
+      "\n",
+      "\\- it is verifiably sourced (i.e. it doesn’t rely on mere dismissiveness/speculation, non-feminist preferences or anecdotal evidence. In particular, pure anti-feminist propaganda is not allowed, since personal non-/anti-feminist preferences are deemed as not informative or relevant); furthermore, presentation of relevant data must not be biased against the feminist position (i.e. there should be a best effort to include the evidence/arguments supportive of the feminist position);\n",
+      "\n",
+      "\\- it is properly qualified: i.e. it correctly identifies the problem at the appropriate level, instead of unwarrantably generalizing it, especially if it does so for the whole collection of movements that constitute feminism;\n",
+      "\n",
+      "\\- all ideological considerations must contribute to understanding the feminist perspective, and be consistent with an attitude of encouragement towards further learning.\n"
+     ]
+    }
+   ],
+   "source": [
+    "reddit_read_only = praw.Reddit(client_id=\"\",       #your client id  \n",
+    "                               client_secret=\"\",   #your client secret \n",
+    "                               user_agent=\"\")      # your user agent\n",
+    "subreddit = reddit_read_only.subreddit(\"Feminism\") #The name of the subreddit, in our case: (r/)Feminism.\n",
+    " \n",
+    "#With these lines of code you can check if PRAW is connected to the subreddit of your choice.\n",
+    "\n",
+    "# Display the name of the Subreddit\n",
+    "print(\"Display Name:\", subreddit.display_name)\n",
+    " \n",
+    "# Display the title of the Subreddit\n",
+    "print(\"Title:\", subreddit.title)\n",
+    " \n",
+    "# Display the description of the Subreddit\n",
+    "print(\"Description:\", subreddit.description)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "64c20a41",
+   "metadata": {},
+   "source": [
+    "Once you are connected to the subreddit it is time to make a dataframe using pandas. In order to do this, you have to add the values to an empty list. In our example we collected the hotposts. Here you can set the limit for yourself. This is especially important if you want to scrape a larger subreddit. In our case looking for the hotposts automatically scrapes all the posts from the subreddit. \n",
+    "\n",
+    "It is also possible to scrape the top posts (a selection of the most popular posts from the subreddit). To do this you can run the line: \n",
+    "\n",
+    "for post in subreddit.top(\"month\"): \n",
+    "\n",
+    "You can specify if you want the top posts from the current week, month or year.\n",
+    "\n",
+    "You can decide which values you want to collect. Eventually you can create a dataframe where you specify your desired column names. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "fb6152d2",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "                                                 title  score       id  \\\n",
+      "0    This is a comprehensive list of resources for ...   2547   phrcrn   \n",
+      "1    On New Year's Eve, Iranian women express their...    189  10134y8   \n",
+      "2    She led two historic victories for abortion ri...    204  101106p   \n",
+      "3    Amal(12) | Pregnant child bride: A story of ma...     84  1014ivy   \n",
+      "4         My brother watches Hamza and it’s scaring me     76  1010yt6   \n",
+      "..                                                 ...    ...      ...   \n",
+      "791  A 31-year-old woman who had her tubes removed ...   1030   xuiv4z   \n",
+      "792      Husband subscribed to Jordan P podcast. Help!    190   xuvhd1   \n",
+      "793  Journalist/Writer Julia Ioffe posts the ultima...      0   xvox9v   \n",
+      "794  Canada significantly undercounts maternal deat...    220   xuiix4   \n",
+      "795  Men under 30 are less accepting of women’s rig...    129   xum6lz   \n",
+      "\n",
+      "    subreddit                                                url  \\\n",
+      "0    Feminism  https://www.reddit.com/r/Feminism/comments/phr...   \n",
+      "1    Feminism                    https://v.redd.it/r6d7rq8b4k9a1   \n",
+      "2    Feminism  https://www.theguardian.com/world/2023/jan/01/...   \n",
+      "3    Feminism                    https://v.redd.it/w4k473wkyg9a1   \n",
+      "4    Feminism  https://www.reddit.com/r/Feminism/comments/101...   \n",
+      "..        ...                                                ...   \n",
+      "791  Feminism  https://www.businessinsider.com/new-york-woman...   \n",
+      "792  Feminism  https://www.reddit.com/r/Feminism/comments/xuv...   \n",
+      "793  Feminism  https://twitter.com/juliaioffe/status/15772880...   \n",
+      "794  Feminism  https://www.cbc.ca/news/canada/canada-maternal...   \n",
+      "795  Feminism  https://www.msn.com/en-gb/news/world/men-under...   \n",
+      "\n",
+      "     num_comments                                               body  \\\n",
+      "0             236  **Update** I guess I've been mass reported for...   \n",
+      "1               3                                                      \n",
+      "2               4                                                      \n",
+      "3               6                                                      \n",
+      "4              46  My little brother (14M) listens to a lot of “r...   \n",
+      "..            ...                                                ...   \n",
+      "791            74                                                      \n",
+      "792           102  He told me this morning he subscribed. What ca...   \n",
+      "793             2                                                      \n",
+      "794             2                                                      \n",
+      "795            17                                                      \n",
+      "\n",
+      "          created  \n",
+      "0    1.630761e+09  \n",
+      "1    1.672633e+09  \n",
+      "2    1.672627e+09  \n",
+      "3    1.672638e+09  \n",
+      "4    1.672627e+09  \n",
+      "..            ...  \n",
+      "791  1.664802e+09  \n",
+      "792  1.664831e+09  \n",
+      "793  1.664913e+09  \n",
+      "794  1.664801e+09  \n",
+      "795  1.664810e+09  \n",
+      "\n",
+      "[796 rows x 8 columns]\n"
+     ]
+    }
+   ],
+   "source": [
+    "posts = []\n",
+    "\n",
+    "for post in subreddit.hot(limit=2000):\n",
+    "    posts.append([post.title, post.score, post.id, post.subreddit, post.url, post.num_comments, post.selftext, post.created])\n",
+    "feminism_df = pd.DataFrame(posts,columns=['title', 'score', 'id', 'subreddit', 'url', 'num_comments', 'body', 'created'])\n",
+    "print(feminism_df)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6fcfe574",
+   "metadata": {},
+   "source": [
+    "Here we can check what is now in our dataframe. In our case the r/feminism dataset has 796 rows and 8 columns. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "id": "e91f9263",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "(796, 8)"
+      ]
+     },
+     "execution_count": 9,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "feminism_df.shape"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c635b952",
+   "metadata": {},
+   "source": [
+    "Here we add in the dates of when the threads were posted:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "id": "081ec7c4",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import datetime as dt\n",
+    "feminism_df['date'] = pd.to_datetime(feminism_df['created'], utc=True, unit='s')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "id": "b5c6a6e2",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "0     2021-09-04 13:15:02+00:00\n",
+       "1     2023-01-02 04:23:21+00:00\n",
+       "2     2023-01-02 02:37:39+00:00\n",
+       "3     2023-01-02 05:35:48+00:00\n",
+       "4     2023-01-02 02:35:44+00:00\n",
+       "                 ...           \n",
+       "791   2022-10-03 13:02:01+00:00\n",
+       "792   2022-10-03 21:06:27+00:00\n",
+       "793   2022-10-04 19:56:50+00:00\n",
+       "794   2022-10-03 12:47:30+00:00\n",
+       "795   2022-10-03 15:14:54+00:00\n",
+       "Name: date, Length: 796, dtype: datetime64[ns, UTC]"
+      ]
+     },
+     "execution_count": 11,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "feminism_df['date']"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "755f4feb",
+   "metadata": {},
+   "source": [
+    "After collection, you can save the dataset as a csv file. It is important to do this before your analysis, especially if you work with others, as the Reddit App you created contains private information you definitly should not make available to the public. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "id": "2b68c222",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "feminism_df.to_csv(\"feminism reddit dataset.csv\", index=True)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ee229384",
+   "metadata": {},
+   "source": [
+    "In order to explore our r/Feminism dataset a bit we call it again as the above code does not work without client id, client secret and user agent:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "c6609a59",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "feminism_df = pd.read_csv('feminism reddit dataset.csv', delimiter = ',', encoding= 'utf-8')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b2cd33ec",
+   "metadata": {},
+   "source": [
+    "If you want to explore your dataset, you can start by looking at the types of values:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "2341223f",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "Unnamed: 0        int64\n",
+       "title            object\n",
+       "score             int64\n",
+       "id               object\n",
+       "subreddit        object\n",
+       "url              object\n",
+       "num_comments      int64\n",
+       "body             object\n",
+       "created         float64\n",
+       "date             object\n",
+       "dtype: object"
+      ]
+     },
+     "execution_count": 5,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "feminism_df.dtypes"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "18c448b6",
+   "metadata": {},
+   "source": [
+    "Now lets see how many observations the dataset has:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "0ed0fde2",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "<class 'pandas.core.frame.DataFrame'>\n",
+      "RangeIndex: 796 entries, 0 to 795\n",
+      "Data columns (total 10 columns):\n",
+      " #   Column        Non-Null Count  Dtype  \n",
+      "---  ------        --------------  -----  \n",
+      " 0   Unnamed: 0    796 non-null    int64  \n",
+      " 1   title         796 non-null    object \n",
+      " 2   score         796 non-null    int64  \n",
+      " 3   id            796 non-null    object \n",
+      " 4   subreddit     796 non-null    object \n",
+      " 5   url           796 non-null    object \n",
+      " 6   num_comments  796 non-null    int64  \n",
+      " 7   body          282 non-null    object \n",
+      " 8   created       796 non-null    float64\n",
+      " 9   date          796 non-null    object \n",
+      "dtypes: float64(1), int64(3), object(6)\n",
+      "memory usage: 62.3+ KB\n"
+     ]
+    }
+   ],
+   "source": [
+    "feminism_df.info()\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "afc93f5e",
+   "metadata": {},
+   "source": [
+    "It seems only the column 'body' has some null values. This is because reddit users could make posts with only a title. Moreover, a post could include only a picture or video. In this case the link to these images or video's will be stored as a value in the 'url' column."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "06047e28",
+   "metadata": {},
+   "source": [
+    "We noticed the column 'Unnamed 0' is the same as the index so we can drop it. Also the 'created' column is redundant as it shows a code from when it is created. As we also added the dates in the dataframe this column is no longer needed. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "b92d98ee",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "feminism_df = feminism_df.drop(columns=['created'])\n",
+    "feminism_df = feminism_df.drop(columns=['Unnamed: 0'])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7c1a2621",
+   "metadata": {},
+   "source": [
+    "Here we get a quick overview of some rows with missing values for the 'body' column:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "15d3cf8a",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>Unnamed: 0</th>\n",
+       "      <th>title</th>\n",
+       "      <th>score</th>\n",
+       "      <th>id</th>\n",
+       "      <th>subreddit</th>\n",
+       "      <th>url</th>\n",
+       "      <th>num_comments</th>\n",
+       "      <th>body</th>\n",
+       "      <th>created</th>\n",
+       "      <th>date</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>1</td>\n",
+       "      <td>On New Year's Eve, Iranian women express their...</td>\n",
+       "      <td>189</td>\n",
+       "      <td>10134y8</td>\n",
+       "      <td>Feminism</td>\n",
+       "      <td>https://v.redd.it/r6d7rq8b4k9a1</td>\n",
+       "      <td>3</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>1.672633e+09</td>\n",
+       "      <td>2023-01-02 04:23:21+00:00</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>2</th>\n",
+       "      <td>2</td>\n",
+       "      <td>She led two historic victories for abortion ri...</td>\n",
+       "      <td>204</td>\n",
+       "      <td>101106p</td>\n",
+       "      <td>Feminism</td>\n",
+       "      <td>https://www.theguardian.com/world/2023/jan/01/...</td>\n",
+       "      <td>4</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>1.672627e+09</td>\n",
+       "      <td>2023-01-02 02:37:39+00:00</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>3</th>\n",
+       "      <td>3</td>\n",
+       "      <td>Amal(12) | Pregnant child bride: A story of ma...</td>\n",
+       "      <td>84</td>\n",
+       "      <td>1014ivy</td>\n",
+       "      <td>Feminism</td>\n",
+       "      <td>https://v.redd.it/w4k473wkyg9a1</td>\n",
+       "      <td>6</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>1.672638e+09</td>\n",
+       "      <td>2023-01-02 05:35:48+00:00</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>5</th>\n",
+       "      <td>5</td>\n",
+       "      <td>Women are more critical of female toplessness ...</td>\n",
+       "      <td>146</td>\n",
+       "      <td>100vlhd</td>\n",
+       "      <td>Feminism</td>\n",
+       "      <td>https://www.psypost.org/2022/10/women-are-more...</td>\n",
+       "      <td>39</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>1.672613e+09</td>\n",
+       "      <td>2023-01-01 22:35:35+00:00</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>6</th>\n",
+       "      <td>6</td>\n",
+       "      <td>This has bothered me for a long time</td>\n",
+       "      <td>2609</td>\n",
+       "      <td>100c37f</td>\n",
+       "      <td>Feminism</td>\n",
+       "      <td>https://i.redd.it/2xwf53zf3d9a1.png</td>\n",
+       "      <td>88</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>1.672548e+09</td>\n",
+       "      <td>2023-01-01 04:46:51+00:00</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>...</th>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>790</th>\n",
+       "      <td>790</td>\n",
+       "      <td>Infosys to face age, gender bias suit by forme...</td>\n",
+       "      <td>3</td>\n",
+       "      <td>xvsmln</td>\n",
+       "      <td>Feminism</td>\n",
+       "      <td>https://www.theregister.com/2022/10/04/infosys...</td>\n",
+       "      <td>0</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>1.664922e+09</td>\n",
+       "      <td>2022-10-04 22:23:45+00:00</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>791</th>\n",
+       "      <td>791</td>\n",
+       "      <td>A 31-year-old woman who had her tubes removed ...</td>\n",
+       "      <td>1030</td>\n",
+       "      <td>xuiv4z</td>\n",
+       "      <td>Feminism</td>\n",
+       "      <td>https://www.businessinsider.com/new-york-woman...</td>\n",
+       "      <td>74</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>1.664802e+09</td>\n",
+       "      <td>2022-10-03 13:02:01+00:00</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>793</th>\n",
+       "      <td>793</td>\n",
+       "      <td>Journalist/Writer Julia Ioffe posts the ultima...</td>\n",
+       "      <td>0</td>\n",
+       "      <td>xvox9v</td>\n",
+       "      <td>Feminism</td>\n",
+       "      <td>https://twitter.com/juliaioffe/status/15772880...</td>\n",
+       "      <td>2</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>1.664913e+09</td>\n",
+       "      <td>2022-10-04 19:56:50+00:00</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>794</th>\n",
+       "      <td>794</td>\n",
+       "      <td>Canada significantly undercounts maternal deat...</td>\n",
+       "      <td>220</td>\n",
+       "      <td>xuiix4</td>\n",
+       "      <td>Feminism</td>\n",
+       "      <td>https://www.cbc.ca/news/canada/canada-maternal...</td>\n",
+       "      <td>2</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>1.664801e+09</td>\n",
+       "      <td>2022-10-03 12:47:30+00:00</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>795</th>\n",
+       "      <td>795</td>\n",
+       "      <td>Men under 30 are less accepting of women’s rig...</td>\n",
+       "      <td>129</td>\n",
+       "      <td>xum6lz</td>\n",
+       "      <td>Feminism</td>\n",
+       "      <td>https://www.msn.com/en-gb/news/world/men-under...</td>\n",
+       "      <td>17</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>1.664810e+09</td>\n",
+       "      <td>2022-10-03 15:14:54+00:00</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "<p>514 rows × 10 columns</p>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "     Unnamed: 0                                              title  score  \\\n",
+       "1             1  On New Year's Eve, Iranian women express their...    189   \n",
+       "2             2  She led two historic victories for abortion ri...    204   \n",
+       "3             3  Amal(12) | Pregnant child bride: A story of ma...     84   \n",
+       "5             5  Women are more critical of female toplessness ...    146   \n",
+       "6             6               This has bothered me for a long time   2609   \n",
+       "..          ...                                                ...    ...   \n",
+       "790         790  Infosys to face age, gender bias suit by forme...      3   \n",
+       "791         791  A 31-year-old woman who had her tubes removed ...   1030   \n",
+       "793         793  Journalist/Writer Julia Ioffe posts the ultima...      0   \n",
+       "794         794  Canada significantly undercounts maternal deat...    220   \n",
+       "795         795  Men under 30 are less accepting of women’s rig...    129   \n",
+       "\n",
+       "          id subreddit                                                url  \\\n",
+       "1    10134y8  Feminism                    https://v.redd.it/r6d7rq8b4k9a1   \n",
+       "2    101106p  Feminism  https://www.theguardian.com/world/2023/jan/01/...   \n",
+       "3    1014ivy  Feminism                    https://v.redd.it/w4k473wkyg9a1   \n",
+       "5    100vlhd  Feminism  https://www.psypost.org/2022/10/women-are-more...   \n",
+       "6    100c37f  Feminism                https://i.redd.it/2xwf53zf3d9a1.png   \n",
+       "..       ...       ...                                                ...   \n",
+       "790   xvsmln  Feminism  https://www.theregister.com/2022/10/04/infosys...   \n",
+       "791   xuiv4z  Feminism  https://www.businessinsider.com/new-york-woman...   \n",
+       "793   xvox9v  Feminism  https://twitter.com/juliaioffe/status/15772880...   \n",
+       "794   xuiix4  Feminism  https://www.cbc.ca/news/canada/canada-maternal...   \n",
+       "795   xum6lz  Feminism  https://www.msn.com/en-gb/news/world/men-under...   \n",
+       "\n",
+       "     num_comments body       created                       date  \n",
+       "1               3  NaN  1.672633e+09  2023-01-02 04:23:21+00:00  \n",
+       "2               4  NaN  1.672627e+09  2023-01-02 02:37:39+00:00  \n",
+       "3               6  NaN  1.672638e+09  2023-01-02 05:35:48+00:00  \n",
+       "5              39  NaN  1.672613e+09  2023-01-01 22:35:35+00:00  \n",
+       "6              88  NaN  1.672548e+09  2023-01-01 04:46:51+00:00  \n",
+       "..            ...  ...           ...                        ...  \n",
+       "790             0  NaN  1.664922e+09  2022-10-04 22:23:45+00:00  \n",
+       "791            74  NaN  1.664802e+09  2022-10-03 13:02:01+00:00  \n",
+       "793             2  NaN  1.664913e+09  2022-10-04 19:56:50+00:00  \n",
+       "794             2  NaN  1.664801e+09  2022-10-03 12:47:30+00:00  \n",
+       "795            17  NaN  1.664810e+09  2022-10-03 15:14:54+00:00  \n",
+       "\n",
+       "[514 rows x 10 columns]"
+      ]
+     },
+     "execution_count": 7,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "feminism_df[feminism_df.isnull().any(axis=1)]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1749098f",
+   "metadata": {},
+   "source": [
+    "To filter the post with actual text in the body (so without images, links and video's):"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "id": "b86d50da",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>Unnamed: 0</th>\n",
+       "      <th>title</th>\n",
+       "      <th>score</th>\n",
+       "      <th>id</th>\n",
+       "      <th>subreddit</th>\n",
+       "      <th>url</th>\n",
+       "      <th>num_comments</th>\n",
+       "      <th>body</th>\n",
+       "      <th>created</th>\n",
+       "      <th>date</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>0</td>\n",
+       "      <td>This is a comprehensive list of resources for ...</td>\n",
+       "      <td>2547</td>\n",
+       "      <td>phrcrn</td>\n",
+       "      <td>Feminism</td>\n",
+       "      <td>https://www.reddit.com/r/Feminism/comments/phr...</td>\n",
+       "      <td>236</td>\n",
+       "      <td>**Update** I guess I've been mass reported for...</td>\n",
+       "      <td>1.630761e+09</td>\n",
+       "      <td>2021-09-04 13:15:02+00:00</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>4</th>\n",
+       "      <td>4</td>\n",
+       "      <td>My brother watches Hamza and it’s scaring me</td>\n",
+       "      <td>76</td>\n",
+       "      <td>1010yt6</td>\n",
+       "      <td>Feminism</td>\n",
+       "      <td>https://www.reddit.com/r/Feminism/comments/101...</td>\n",
+       "      <td>46</td>\n",
+       "      <td>My little brother (14M) listens to a lot of “r...</td>\n",
+       "      <td>1.672627e+09</td>\n",
+       "      <td>2023-01-02 02:35:44+00:00</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>7</th>\n",
+       "      <td>7</td>\n",
+       "      <td>Being a teen girl sucks</td>\n",
+       "      <td>236</td>\n",
+       "      <td>100ofn7</td>\n",
+       "      <td>Feminism</td>\n",
+       "      <td>https://www.reddit.com/r/Feminism/comments/100...</td>\n",
+       "      <td>12</td>\n",
+       "      <td>As I am in my last year of highschool and am g...</td>\n",
+       "      <td>1.672594e+09</td>\n",
+       "      <td>2023-01-01 17:28:45+00:00</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>8</th>\n",
+       "      <td>8</td>\n",
+       "      <td>Is the whole thing a lie?</td>\n",
+       "      <td>31</td>\n",
+       "      <td>100yxd9</td>\n",
+       "      <td>Feminism</td>\n",
+       "      <td>https://www.reddit.com/r/Feminism/comments/100...</td>\n",
+       "      <td>10</td>\n",
+       "      <td>I've gone through hell this year. It has made ...</td>\n",
+       "      <td>1.672621e+09</td>\n",
+       "      <td>2023-01-02 01:00:42+00:00</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>12</th>\n",
+       "      <td>12</td>\n",
+       "      <td>Girls fighting for the future of the environme...</td>\n",
+       "      <td>11</td>\n",
+       "      <td>100yiuh</td>\n",
+       "      <td>Feminism</td>\n",
+       "      <td>https://youtu.be/_HTdyorjL0E</td>\n",
+       "      <td>0</td>\n",
+       "      <td>This documentary is a year old but these girls...</td>\n",
+       "      <td>1.672620e+09</td>\n",
+       "      <td>2023-01-02 00:42:40+00:00</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>...</th>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>771</th>\n",
+       "      <td>771</td>\n",
+       "      <td>Yes or no to bras</td>\n",
+       "      <td>4</td>\n",
+       "      <td>xx10f8</td>\n",
+       "      <td>Feminism</td>\n",
+       "      <td>https://www.reddit.com/r/Feminism/comments/xx1...</td>\n",
+       "      <td>9</td>\n",
+       "      <td>Hello, hi.\\n\\nSo I am that kind of person that...</td>\n",
+       "      <td>1.665049e+09</td>\n",
+       "      <td>2022-10-06 09:33:52+00:00</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>772</th>\n",
+       "      <td>772</td>\n",
+       "      <td>Why do schools need to know female athletes cy...</td>\n",
+       "      <td>26</td>\n",
+       "      <td>xwpqup</td>\n",
+       "      <td>Feminism</td>\n",
+       "      <td>https://www.reddit.com/r/Feminism/comments/xwp...</td>\n",
+       "      <td>19</td>\n",
+       "      <td>Why would schools and coaches need to know an ...</td>\n",
+       "      <td>1.665012e+09</td>\n",
+       "      <td>2022-10-05 23:25:06+00:00</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>775</th>\n",
+       "      <td>775</td>\n",
+       "      <td>Mobile PP clinics launched today</td>\n",
+       "      <td>4</td>\n",
+       "      <td>xwr4ad</td>\n",
+       "      <td>Feminism</td>\n",
+       "      <td>https://www.reddit.com/r/Feminism/comments/xwr...</td>\n",
+       "      <td>0</td>\n",
+       "      <td>Mobile PP clinic launched today, bringing much...</td>\n",
+       "      <td>1.665016e+09</td>\n",
+       "      <td>2022-10-06 00:27:15+00:00</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>784</th>\n",
+       "      <td>784</td>\n",
+       "      <td>Please God tell me this is Not true. If it is ...</td>\n",
+       "      <td>56</td>\n",
+       "      <td>xvxxsb</td>\n",
+       "      <td>Feminism</td>\n",
+       "      <td>https://www.reddit.com/r/Feminism/comments/xvx...</td>\n",
+       "      <td>35</td>\n",
+       "      <td>—Florida’s state government finds itself in th...</td>\n",
+       "      <td>1.664936e+09</td>\n",
+       "      <td>2022-10-05 02:21:09+00:00</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>792</th>\n",
+       "      <td>792</td>\n",
+       "      <td>Husband subscribed to Jordan P podcast. Help!</td>\n",
+       "      <td>190</td>\n",
+       "      <td>xuvhd1</td>\n",
+       "      <td>Feminism</td>\n",
+       "      <td>https://www.reddit.com/r/Feminism/comments/xuv...</td>\n",
+       "      <td>102</td>\n",
+       "      <td>He told me this morning he subscribed. What ca...</td>\n",
+       "      <td>1.664831e+09</td>\n",
+       "      <td>2022-10-03 21:06:27+00:00</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "<p>282 rows × 10 columns</p>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "     Unnamed: 0                                              title  score  \\\n",
+       "0             0  This is a comprehensive list of resources for ...   2547   \n",
+       "4             4       My brother watches Hamza and it’s scaring me     76   \n",
+       "7             7                            Being a teen girl sucks    236   \n",
+       "8             8                          Is the whole thing a lie?     31   \n",
+       "12           12  Girls fighting for the future of the environme...     11   \n",
+       "..          ...                                                ...    ...   \n",
+       "771         771                                  Yes or no to bras      4   \n",
+       "772         772  Why do schools need to know female athletes cy...     26   \n",
+       "775         775                   Mobile PP clinics launched today      4   \n",
+       "784         784  Please God tell me this is Not true. If it is ...     56   \n",
+       "792         792      Husband subscribed to Jordan P podcast. Help!    190   \n",
+       "\n",
+       "          id subreddit                                                url  \\\n",
+       "0     phrcrn  Feminism  https://www.reddit.com/r/Feminism/comments/phr...   \n",
+       "4    1010yt6  Feminism  https://www.reddit.com/r/Feminism/comments/101...   \n",
+       "7    100ofn7  Feminism  https://www.reddit.com/r/Feminism/comments/100...   \n",
+       "8    100yxd9  Feminism  https://www.reddit.com/r/Feminism/comments/100...   \n",
+       "12   100yiuh  Feminism                       https://youtu.be/_HTdyorjL0E   \n",
+       "..       ...       ...                                                ...   \n",
+       "771   xx10f8  Feminism  https://www.reddit.com/r/Feminism/comments/xx1...   \n",
+       "772   xwpqup  Feminism  https://www.reddit.com/r/Feminism/comments/xwp...   \n",
+       "775   xwr4ad  Feminism  https://www.reddit.com/r/Feminism/comments/xwr...   \n",
+       "784   xvxxsb  Feminism  https://www.reddit.com/r/Feminism/comments/xvx...   \n",
+       "792   xuvhd1  Feminism  https://www.reddit.com/r/Feminism/comments/xuv...   \n",
+       "\n",
+       "     num_comments                                               body  \\\n",
+       "0             236  **Update** I guess I've been mass reported for...   \n",
+       "4              46  My little brother (14M) listens to a lot of “r...   \n",
+       "7              12  As I am in my last year of highschool and am g...   \n",
+       "8              10  I've gone through hell this year. It has made ...   \n",
+       "12              0  This documentary is a year old but these girls...   \n",
+       "..            ...                                                ...   \n",
+       "771             9  Hello, hi.\\n\\nSo I am that kind of person that...   \n",
+       "772            19  Why would schools and coaches need to know an ...   \n",
+       "775             0  Mobile PP clinic launched today, bringing much...   \n",
+       "784            35  —Florida’s state government finds itself in th...   \n",
+       "792           102  He told me this morning he subscribed. What ca...   \n",
+       "\n",
+       "          created                       date  \n",
+       "0    1.630761e+09  2021-09-04 13:15:02+00:00  \n",
+       "4    1.672627e+09  2023-01-02 02:35:44+00:00  \n",
+       "7    1.672594e+09  2023-01-01 17:28:45+00:00  \n",
+       "8    1.672621e+09  2023-01-02 01:00:42+00:00  \n",
+       "12   1.672620e+09  2023-01-02 00:42:40+00:00  \n",
+       "..            ...                        ...  \n",
+       "771  1.665049e+09  2022-10-06 09:33:52+00:00  \n",
+       "772  1.665012e+09  2022-10-05 23:25:06+00:00  \n",
+       "775  1.665016e+09  2022-10-06 00:27:15+00:00  \n",
+       "784  1.664936e+09  2022-10-05 02:21:09+00:00  \n",
+       "792  1.664831e+09  2022-10-03 21:06:27+00:00  \n",
+       "\n",
+       "[282 rows x 10 columns]"
+      ]
+     },
+     "execution_count": 8,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "body_df = feminism_df[feminism_df['body'].notna()]\n",
+    "body_df"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "daac6082",
+   "metadata": {},
+   "source": [
+    "Lastly, lets take a look at the values of the dataset:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "id": "99a0467f",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>Unnamed: 0</th>\n",
+       "      <th>title</th>\n",
+       "      <th>score</th>\n",
+       "      <th>id</th>\n",
+       "      <th>subreddit</th>\n",
+       "      <th>url</th>\n",
+       "      <th>num_comments</th>\n",
+       "      <th>body</th>\n",
+       "      <th>created</th>\n",
+       "      <th>date</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>count</th>\n",
+       "      <td>796.000000</td>\n",
+       "      <td>796</td>\n",
+       "      <td>796.000000</td>\n",
+       "      <td>796</td>\n",
+       "      <td>796</td>\n",
+       "      <td>796</td>\n",
+       "      <td>796.000000</td>\n",
+       "      <td>282</td>\n",
+       "      <td>7.960000e+02</td>\n",
+       "      <td>796</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>unique</th>\n",
+       "      <td>NaN</td>\n",
+       "      <td>792</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>796</td>\n",
+       "      <td>1</td>\n",
+       "      <td>792</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>282</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>795</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>top</th>\n",
+       "      <td>NaN</td>\n",
+       "      <td>Yes we can.</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>phrcrn</td>\n",
+       "      <td>Feminism</td>\n",
+       "      <td>https://theconversation.com/women-in-antarctic...</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>**Update** I guess I've been mass reported for...</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>2022-12-09 03:43:49+00:00</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>freq</th>\n",
+       "      <td>NaN</td>\n",
+       "      <td>3</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>1</td>\n",
+       "      <td>796</td>\n",
+       "      <td>2</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>1</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>2</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>mean</th>\n",
+       "      <td>397.500000</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>258.898241</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>23.077889</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>1.668580e+09</td>\n",
+       "      <td>NaN</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>std</th>\n",
+       "      <td>229.929699</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>442.125368</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>40.938321</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>2.703460e+06</td>\n",
+       "      <td>NaN</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>min</th>\n",
+       "      <td>0.000000</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>0.000000</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>0.000000</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>1.630761e+09</td>\n",
+       "      <td>NaN</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>25%</th>\n",
+       "      <td>198.750000</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>15.000000</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>2.000000</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>1.666444e+09</td>\n",
+       "      <td>NaN</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>50%</th>\n",
+       "      <td>397.500000</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>84.000000</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>6.000000</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>1.668573e+09</td>\n",
+       "      <td>NaN</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>75%</th>\n",
+       "      <td>596.250000</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>328.000000</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>23.000000</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>1.670594e+09</td>\n",
+       "      <td>NaN</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>max</th>\n",
+       "      <td>795.000000</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>3388.000000</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>278.000000</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>1.672650e+09</td>\n",
+       "      <td>NaN</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "        Unnamed: 0        title        score      id subreddit  \\\n",
+       "count   796.000000          796   796.000000     796       796   \n",
+       "unique         NaN          792          NaN     796         1   \n",
+       "top            NaN  Yes we can.          NaN  phrcrn  Feminism   \n",
+       "freq           NaN            3          NaN       1       796   \n",
+       "mean    397.500000          NaN   258.898241     NaN       NaN   \n",
+       "std     229.929699          NaN   442.125368     NaN       NaN   \n",
+       "min       0.000000          NaN     0.000000     NaN       NaN   \n",
+       "25%     198.750000          NaN    15.000000     NaN       NaN   \n",
+       "50%     397.500000          NaN    84.000000     NaN       NaN   \n",
+       "75%     596.250000          NaN   328.000000     NaN       NaN   \n",
+       "max     795.000000          NaN  3388.000000     NaN       NaN   \n",
+       "\n",
+       "                                                      url  num_comments  \\\n",
+       "count                                                 796    796.000000   \n",
+       "unique                                                792           NaN   \n",
+       "top     https://theconversation.com/women-in-antarctic...           NaN   \n",
+       "freq                                                    2           NaN   \n",
+       "mean                                                  NaN     23.077889   \n",
+       "std                                                   NaN     40.938321   \n",
+       "min                                                   NaN      0.000000   \n",
+       "25%                                                   NaN      2.000000   \n",
+       "50%                                                   NaN      6.000000   \n",
+       "75%                                                   NaN     23.000000   \n",
+       "max                                                   NaN    278.000000   \n",
+       "\n",
+       "                                                     body       created  \\\n",
+       "count                                                 282  7.960000e+02   \n",
+       "unique                                                282           NaN   \n",
+       "top     **Update** I guess I've been mass reported for...           NaN   \n",
+       "freq                                                    1           NaN   \n",
+       "mean                                                  NaN  1.668580e+09   \n",
+       "std                                                   NaN  2.703460e+06   \n",
+       "min                                                   NaN  1.630761e+09   \n",
+       "25%                                                   NaN  1.666444e+09   \n",
+       "50%                                                   NaN  1.668573e+09   \n",
+       "75%                                                   NaN  1.670594e+09   \n",
+       "max                                                   NaN  1.672650e+09   \n",
+       "\n",
+       "                             date  \n",
+       "count                         796  \n",
+       "unique                        795  \n",
+       "top     2022-12-09 03:43:49+00:00  \n",
+       "freq                            2  \n",
+       "mean                          NaN  \n",
+       "std                           NaN  \n",
+       "min                           NaN  \n",
+       "25%                           NaN  \n",
+       "50%                           NaN  \n",
+       "75%                           NaN  \n",
+       "max                           NaN  "
+      ]
+     },
+     "execution_count": 9,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "feminism_df.describe(include='all')\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "25c09a63",
+   "metadata": {},
+   "source": [
+    "Here we can see the the avarage score(upvotes) which is around 259 per post. We can see the avarage comments per post which is 23. The maximum number of upvotes is 3388. The maximum number of comments is 278. "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1e77fa97",
+   "metadata": {},
+   "source": [
+    "We also see that the 'title' column has 4 values which are not unique. This means there are a few posts in the dataset that appear more than once. Lets check which posts this are: "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "7a53ef76",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>title</th>\n",
+       "      <th>score</th>\n",
+       "      <th>id</th>\n",
+       "      <th>subreddit</th>\n",
+       "      <th>url</th>\n",
+       "      <th>num_comments</th>\n",
+       "      <th>body</th>\n",
+       "      <th>date</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>118</th>\n",
+       "      <td>Yes we can.</td>\n",
+       "      <td>813</td>\n",
+       "      <td>zqn682</td>\n",
+       "      <td>Feminism</td>\n",
+       "      <td>https://i.redd.it/021o2vz4r17a1.jpg</td>\n",
+       "      <td>8</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>2022-12-20 12:29:06+00:00</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>121</th>\n",
+       "      <td>Women Heavily Underrepresented in Political De...</td>\n",
+       "      <td>3</td>\n",
+       "      <td>zrfcid</td>\n",
+       "      <td>Feminism</td>\n",
+       "      <td>https://web-mind.io/artificial-intelligence/wo...</td>\n",
+       "      <td>0</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>2022-12-21 09:15:24+00:00</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>210</th>\n",
+       "      <td>Yes we can.</td>\n",
+       "      <td>895</td>\n",
+       "      <td>zfibca</td>\n",
+       "      <td>Feminism</td>\n",
+       "      <td>https://i.redd.it/2csazo9cck4a1.jpg</td>\n",
+       "      <td>19</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>2022-12-07 23:47:35+00:00</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>774</th>\n",
+       "      <td>Women in Antarctica face assault and harassmen...</td>\n",
+       "      <td>9</td>\n",
+       "      <td>xwlrzs</td>\n",
+       "      <td>Feminism</td>\n",
+       "      <td>https://theconversation.com/women-in-antarctic...</td>\n",
+       "      <td>0</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>2022-10-05 20:43:51+00:00</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "                                                 title  score      id  \\\n",
+       "118                                        Yes we can.    813  zqn682   \n",
+       "121  Women Heavily Underrepresented in Political De...      3  zrfcid   \n",
+       "210                                        Yes we can.    895  zfibca   \n",
+       "774  Women in Antarctica face assault and harassmen...      9  xwlrzs   \n",
+       "\n",
+       "    subreddit                                                url  \\\n",
+       "118  Feminism                https://i.redd.it/021o2vz4r17a1.jpg   \n",
+       "121  Feminism  https://web-mind.io/artificial-intelligence/wo...   \n",
+       "210  Feminism                https://i.redd.it/2csazo9cck4a1.jpg   \n",
+       "774  Feminism  https://theconversation.com/women-in-antarctic...   \n",
+       "\n",
+       "     num_comments body                       date  \n",
+       "118             8  NaN  2022-12-20 12:29:06+00:00  \n",
+       "121             0  NaN  2022-12-21 09:15:24+00:00  \n",
+       "210            19  NaN  2022-12-07 23:47:35+00:00  \n",
+       "774             0  NaN  2022-10-05 20:43:51+00:00  "
+      ]
+     },
+     "execution_count": 4,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "feminism_df[feminism_df.duplicated(subset=['title'])]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "433d41d6",
+   "metadata": {},
+   "source": [
+    "It could be that the 'yes we can' is a repost. However we see that they did not post a body but only an image. So it could also be that this is a different image. The other posts are not the same. They do have some similar words but the body shows other url's as well. For our dataset we do not have to remove these posts. We do know  now for sure that our dataset does not have actual reposts. You have to decide for yourself and your research if rows have to be removed. In this case it would be the easiest to drop a row by index number: df.drop([0, 1]) or you can take a look here: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.drop.html. "
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.12"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/Groupwork (Group 5)/Data Management Plan (DMP).pdf b/Groupwork (Group 5)/Data Management Plan (DMP).pdf
new file mode 100644
index 0000000..9e3efea
Binary files /dev/null and b/Groupwork (Group 5)/Data Management Plan (DMP).pdf differ
diff --git a/Groupwork (Group 5)/README.md b/Groupwork (Group 5)/README.md
new file mode 100644
index 0000000..eb7969d
--- /dev/null
+++ b/Groupwork (Group 5)/README.md	
@@ -0,0 +1,9 @@
+# Collecting data from r/Feminism: A Reddit webscraper for research in the Digital Humanities
+
+This repository contains a group project we created for research in the field of Digital Humanities. We wanted to explore the different ways in which scholars of the field could actually collect various forms of data for their research purposes. In order to do so, we came up with a brief project which helps us to better understand the process of data collection. Moreover, this project gave us the opportunity to create a guide for further research. 
+
+We decided to explore the social media platform Reddit. On here we focused on a topic analysis of r/Feminism. In order to gain our data we needed to make our own scraper. The guide in this repository contains the step-by-step process of how we programmed the scraper. For this we used PRAW: Python Reddit Api Wrapper. This is a great and user-friendly way to scrape data from Reddit. We are aware that there are multiple ways of scraping Reddit. For example, using the Pushshift Api. These are also great tools. However, we found that using PRAW is more consistent in the data gathering and the servers it connects to. 
+
+The workbook in this repository allows researchers to make their own Reddit scraper. It takes you through the whole process again but this time with the subreddit of your own choosing. This file contains various example assignments you can do. Still, it also allows you to use it for your own desire. The code in the guide is reusable and with a view adjustments suitable for many projects. 
+
+The report in this repository contains the foundations of our own research project which guided us for the data collection. In addition, it contains the documentation of the file in which we created our own scraper in the form of a tutorial. In order to explore our own r/Feminism dataset we created a data visualization on flourish. This visualization is described in our Tutorial section but is also visible with a link in this repository. The visualization serves as a way in which we wanted to give a brief example of the ways in which the r/Feminism dataset could be used. Moreover, in the tutorial we followed up on this by giving a view recommendations for further, more in-depth, data analysis of our dataset. The workbook which serves as active learning exercise is also explained more in-depth in the report. Finally, this repository contains a Data Management Plan (DMP). In here we elaborated on the ways in which our dataset is considered FAIR. Moreover, we justify our methods, tools and research more in-depth in the DMP. 
diff --git a/Groupwork (Group 5)/Report_Collecting data from rFeminism.pdf b/Groupwork (Group 5)/Report_Collecting data from rFeminism.pdf
new file mode 100644
index 0000000..db576d8
Binary files /dev/null and b/Groupwork (Group 5)/Report_Collecting data from rFeminism.pdf differ
diff --git a/Groupwork (Group 5)/Workbook- Making your own scraper.ipynb b/Groupwork (Group 5)/Workbook- Making your own scraper.ipynb
new file mode 100644
index 0000000..c4a8ba4
--- /dev/null
+++ b/Groupwork (Group 5)/Workbook- Making your own scraper.ipynb	
@@ -0,0 +1,358 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "c766bdfa",
+   "metadata": {},
+   "source": [
+    "# Workbook: Making your own Reddit scraper with PRAW"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c87b65f9",
+   "metadata": {},
+   "source": [
+    "In this file we will go step-by-step through the whole process of making a Reddit web scraper. If you want to create a dataset of any subreddit you like, you can just simply fill in the empty code spaces. In this way, any time you need a webscraper for Reddit you can just come back to this file and fill everything in. Moreover, we will guide you into cleaning and expecting the dataset but you can always skip this part if you think it is not needed  (although we highly recommend do to so in order to get a better understanding of your dataset). \n",
+    "\n",
+    "If anything is unclear you can look at the other file in our github reposity where we created the dataset of r/Feminism. Moreover, you can always look at the tutorial in our report. "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ea57770c",
+   "metadata": {},
+   "source": [
+    "## 1. Installing PRAW"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4d6b33de",
+   "metadata": {},
+   "source": [
+    "First make sure you downloaded PRAW to your computer: "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "d4d48ff0",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!pip install praw"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6214d481",
+   "metadata": {},
+   "source": [
+    "## 2. Importing the PRAW and pandas libraries"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c30629d7",
+   "metadata": {},
+   "source": [
+    "Now we have to import praw and pandas to build the scraper and analyze the dataset."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b78a3bc2",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import praw\n",
+    "import pandas as pd"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c3ed37ca",
+   "metadata": {},
+   "source": [
+    "## 3. Creating a Reddit App and connecting to the subreddit"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "26642b8f",
+   "metadata": {},
+   "source": [
+    "Use https://www.reddit.com/prefs/apps to create a Reddit app. Choose 'Create App.' Here you can fill in a name (user agent), description and redirect uri. As described in the PRAW documentation (https://praw.readthedocs.io/en/latest/getting_started/authentication.html#script-application) you should choose http://localhost:8080 as your uri.\n",
+    "\n",
+    "For the name you should avoid using words like 'scraping' or 'bot.' It could be that Reddit will not allow your authorization if you use these words. Lastly, select script for personal use and press 'create app.'\n",
+    "\n",
+    "The client_id is a code which can be found underneath 'personal use script.' The client_secret can be found next to 'secret.' The user_agent is the name you chose yourself.\n",
+    "\n",
+    "For our scraper we chose the 'reddit_read_only.' This means the scraper will only gather the data.\n",
+    "\n",
+    "For a more indepth explanation on creating the Reddit app we refer to the tutorial section in our report or take a look here: https://towardsdatascience.com/scraping-reddit-data-1c0af3040768."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "0862f704",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "reddit_read_only = praw.Reddit(client_id=\"\",       #your client id  \n",
+    "                               client_secret=\"\",   #your client secret \n",
+    "                               user_agent=\"\")      # your user agent\n",
+    "subreddit = reddit_read_only.subreddit(\"\") #The name of the subreddit.If you want to scrape all subreddits use 'all'\n",
+    " \n",
+    "#With these lines of code you can check if PRAW is connected to the subreddit of your choice.\n",
+    "\n",
+    "# Display the name of the Subreddit\n",
+    "print(\"Display Name:\", subreddit.display_name)\n",
+    " \n",
+    "# Display the title of the Subreddit\n",
+    "print(\"Title:\", subreddit.title)\n",
+    " \n",
+    "# Display the description of the Subreddit\n",
+    "print(\"Description:\", subreddit.description)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d9cde1d2",
+   "metadata": {},
+   "source": [
+    "## 4. Scraping data and creating a dataset"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a1337aea",
+   "metadata": {},
+   "source": [
+    "Now it is time to actually gain the data and put it in a pandas dataset. For this you have to follow the three steps as explained in our guide: \n",
+    "\n",
+    "1. Make an empty list\n",
+    "2. Make a loop to append the desired values to your list. Think about the information you need: Do you want usernames, titles, upvotes, name of the subreddit etc (Praw collects them automatically)\n",
+    "3. Make a pandas dataframe and specify the column names.\n",
+    "\n",
+    "Think of the type of posts you need and the amount (limit): top posts or hot posts.\n",
+    "\n",
+    "Example assignment: You want to collect 50 top posts from all subreddits. For this you also want to know the usernames, title of the thread, amount of upvotes, amount of comments, date of creation, the text in the post and the name of the subreddit. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "0bc61da5",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "posts = []\n",
+    "\n",
+    "#your code here:\n",
+    "for post in ...:\n",
+    "    posts.append([...])\n",
+    "df = pd.DataFrame(posts,columns=[...])\n",
+    "print(df)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ca734636",
+   "metadata": {},
+   "source": [
+    "## 5. Inspecting and cleaning the dataset"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9bc32b89",
+   "metadata": {},
+   "source": [
+    "It is important to know what is in the dataset you created. Therefore you can run a few simple pandas commands:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "509be0c3",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "#Checking the rows and columns: \n",
+    "df.shape"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "f3312867",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "#Checking the values: \n",
+    "df.dtypes"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "5537dd5d",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "#Checking the observations: \n",
+    "df.info()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f809c6ad",
+   "metadata": {},
+   "source": [
+    "You probably noticed that you cannot see the actual dates of when the posts are created. Lets change this. \n",
+    "\n",
+    "Example assignment: Change the created column to dates and drop the created column. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ee6887d0",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import datetime as dt\n",
+    "df['...'] = pd.to_datetime(df['...'] utc=True, unit='s')\n",
+    "df = df.drop(columns=['...'])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c6b14a03",
+   "metadata": {},
+   "source": [
+    "Now lets take a look again at the observations of your dataset. Does it have any null values? It is likely the column which contains the text of the thread has some null values as Reddit users could post threads without text. \n",
+    "\n",
+    "Example assignment: Create an overview of the rows with missing values for this column and think how this affects your dataset and further research. Does it matter? How can you interpret this?"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ce44b29e",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df[df.isnull().any(axis=1)]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0394ce10",
+   "metadata": {},
+   "source": [
+    "Example assignment: Now lets say you only want a dataset with posts which actually have text in the post. Create a new dataframe to filter the other posts out."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "1d0f82b5",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "body_df = df[df['...'].notna()]\n",
+    "body_df"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7d8f5773",
+   "metadata": {},
+   "source": [
+    "Now its time to take a closer look at the values of your dataset. \n",
+    "\n",
+    "Example assignment: Interpret the values. What is the average amount of upvotes? What is the maximum and minimum? The same goes for the comments. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "97745b6b",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df.describe(include='all')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3bb5f263",
+   "metadata": {},
+   "source": [
+    "Example assignment: As you saw in our guide, it is very likely for the title column to not only exist out of unique values. Check this for yourself. If this is the case with your dataset aswell, look at the rows with duplicates. How can you interpret the duplicates? Do you need to remove them from your dataset?"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "e29b3ae1",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df[df.duplicated(subset=['title'])]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b3bbebbf",
+   "metadata": {},
+   "source": [
+    "## 6. Saving your dataset to a CSV file"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "65a59547",
+   "metadata": {},
+   "source": [
+    "Now its time to save your dataset to a CSV file:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "af2fb073",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df.to_csv(\"...\", index=True)"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.12"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/Groupwork (Group 5)/feminism-reddit dataset.csv b/Groupwork (Group 5)/feminism-reddit dataset.csv
new file mode 100644
index 0000000..61a811d
--- /dev/null
+++ b/Groupwork (Group 5)/feminism-reddit dataset.csv	
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:fa7b42f00ec9040f6dc49c9857d003d63d8969bf9b5c9ef45553be858bb3ab01
+size 506811
diff --git a/Groupwork (Group 5)/flourish plot b/Groupwork (Group 5)/flourish plot
new file mode 100644
index 0000000..629352a
--- /dev/null
+++ b/Groupwork (Group 5)/flourish plot	
@@ -0,0 +1,2 @@
+https://public.flourish.studio/visualisation/12415427/
+