forked from VHP4Safety/github_intro_training
-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy path006_knowledgebase.Rmd
More file actions
328 lines (215 loc) · 26.8 KB
/
006_knowledgebase.Rmd
File metadata and controls
328 lines (215 loc) · 26.8 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
# Knowledge base
```{r, include=FALSE}
knitr::opts_chunk$set(echo = T, class.source="Rchunk", class.output="Rout")
```
This part of the reader contains all sorts of information you may need later, or info we can refer to during the workshop.
Usually, these are catered towards coders
## Licences
### Types of software licences
Software licenses are legal agreements that determine how a software application can be used by individuals or organizations. There are various types of licenses that software developers use to protect their intellectual property and control the distribution and usage of their software. Some of the common types of software licenses include:
Proprietary License: A proprietary license is a restrictive license that allows the software to be used only as per the terms and conditions specified by the software developer. This type of license often limits the number of users, devices, or installations, and prohibits modifications or redistribution of the software without explicit permission from the developer. Proprietary licenses are often used by commercial software vendors who sell their software for a fee and retain full control over its usage.
Open Source License: An open source license is a type of license that allows users to access, use, modify, and distribute the software freely. The key characteristic of open source software is that the source code, or the human-readable version of the software, is available for anyone to view and modify. Examples of open source licenses include the GNU General Public License (GPL), the MIT License, and the Apache License. Open source licenses promote collaboration, transparency, and community-driven development.
Freeware: Freeware is a type of license that allows users to download, install, and use the software for free without any monetary cost. However, freeware licenses may impose certain restrictions on the usage, such as prohibiting commercial use or distribution without permission from the developer. Freeware is often used for consumer-oriented software applications, such as antivirus software, media players, and web browsers.
Shareware: Shareware is a type of license that allows users to try the software for free for a limited period, typically 30 days, before requiring them to purchase a license to continue using the software. Shareware licenses usually allow users to install and use the software on multiple devices, but restrict certain advanced features or functionalities until a license is purchased. Shareware is commonly used for software applications that target individual users, such as productivity tools, games, and multimedia software.
Subscription License: A subscription license is a type of license that grants users access to the software for a specified period, typically monthly or annually, in exchange for a recurring fee. Subscription licenses often include regular updates, maintenance, and support from the software developer during the subscription period. Subscription licenses are commonly used for cloud-based software applications, such as software-as-a-service (SaaS) products, where users access the software over the internet rather than installing it locally on their devices.
Software licenses play a crucial role in defining how software can be used, modified, and distributed. Proprietary licenses offer strict control and limitations, open source licenses promote collaboration and transparency, freeware allows free usage but may impose certain restrictions, shareware allows trial usage with purchase requirement, and subscription licenses provide access for a specified period in exchange for recurring fees. It's important for users to understand the terms and conditions of software licenses before using or distributing software to ensure compliance with legal requirements and respect for intellectual property rights.
### Creative Commons Flavours
Creative Commons (CC) licenses are a type of open source license that provide a standardized and flexible way for content creators to share their work with others while retaining certain rights. There are several flavors of Creative Commons licenses, each with its own set of permissions and restrictions.
The most permissive CC license is the **CC0 (CC Zero) license**, which allows content to be used, modified, and distributed without any restrictions. This means that the content can be used for any purpose, including commercial use, without requiring attribution to the original creator.
The **Attribution (BY)** license requires that the original creator be credited when the content is used, modified, or distributed. This license allows for maximum flexibility in using the content while still acknowledging the original creator's contribution.
The **ShareAlike (SA)** license requires that any modifications or derivatives of the content be shared under the same license. This means that if someone modifies the content, they must also release their modifications under the same Creative Commons license. This license promotes a collaborative approach and ensures that derivative works remain open source.
The **NonCommercial (NC)** license restricts the use of the content for commercial purposes. This means that the content can be used, modified, and distributed for non-commercial purposes only, such as for personal or educational use, but not for commercial gain.
Finally, the **NoDerivatives (ND)** license prohibits any modifications or derivatives of the content. This means that the content must be used and distributed without any modifications, and derivative works cannot be created from it.
Creative Commons licenses provide a flexible and standardized framework for content creators to share their work with others while retaining certain rights. These licenses allow for a range of permissions and restrictions, from permissive licenses that allow for maximum freedom of use to more restrictive licenses that protect certain rights. Understanding the different flavors of Creative Commons licenses is important for both content creators and users to ensure compliance with the terms and conditions of the licenses and promote open source collaboration in the creative community.
### adding a license from Rstudio with a package
Add an appropriate licence to a new Github repo, according this workflow:
## Licence in RStudio Project / R package
When working in R, we can collect our R code easily in the most suitable vehicle for R code to travel in: an R package. It sounds complicated, but creating an R package from your existing (or new) code is actually quite easy. We will see how this goed in more detail in chapter `r-package`. Relevant to the topic at hand is how to add a licence to your RStudio Project. There is two steps:
1. Initate an R-package from your exiting RStudio Project
2. Add a LICENCE file to this R-package
The code below show you how to do this.
Execute this code in a new and empty Github repo. Clone the repo to RStudio from Github, as shown earlier.
### Install and Load the 'usethis' Package
In your RStudio console, install the 'usethis' package if you haven't already done so using the following commands:
```{r, eval=FALSE}
install.packages("usethis")
```
Then, load the package into your R session using the following command:
```{r}
library(usethis)
```
### Initiate the Package
Next, use the create_package() function from the 'usethis' package to initiate the package structure in your project directory. This function sets up the necessary files and folders for a standard R package. You can customize the package details by passing arguments to the function. Here's an example:
```{r, eval=FALSE}
create_package(path = ".")
```
This creates an R package in the current directory (which should be your project directory), initializes a git repository, opens the package in RStudio, and generates a .gitignore file.
### Add a licence
To add a licence run one of the following, or choose another build-in licence from usethis.
```{r, eval=FALSE}
usethis::use_agpl3_license()
usethis::use_apache_license()
usethis::use_cc0_license()
usethis::use_ccby_license()
```
## Managing change in a stable way
Creating a stable release in GitHub involves several steps to ensure that the codebase is reliable, thoroughly tested, and ready for production use. First, it's essential to have a well-organized and maintained Git repository, with a clear branching and versioning strategy. Once the development work for the release is complete and thoroughly reviewed, the next step is to create a release branch or tag, following the established versioning conventions. This ensures that the codebase is in a stable state and ready for deployment.
Before creating the release, it's crucial to thoroughly test the codebase to catch any potential issues. This may include running automated tests, conducting manual testing, and performing quality assurance (QA) checks to ensure the code is free from bugs and meets the intended functionality. Additionally, documentation, including release notes, should be updated to provide clear instructions on how to use and deploy the new release.
Once the codebase has passed all the necessary tests and documentation is up-to-date, the release can be created in GitHub. This typically involves creating a new release using the "Releases" tab in the GitHub repository and specifying the release version, title, and release notes. It's important to ensure that the release is properly tagged and associated with the corresponding commit or branch in the repository.
After creating the release, it's crucial to conduct thorough testing of the release version to ensure that it works as expected in the intended environment. This may involve deploying the release to a staging environment or conducting additional testing in a production-like setup.
Once the release has been thoroughly tested and validated, it can be marked as stable, and the release can be officially communicated to users or customers. This may involve updating release notes, documentation, and other relevant information to inform users about the new stable release.
Creating a stable release in GitHub requires careful planning, thorough testing, and proper documentation. By following established release management best practices, developers can ensure that their code is stable, reliable, and ready for production use, providing a solid foundation for software development projects.
## Writing tests in R
For deploying a test-deriven software development strategy (or in short: Unit Testing) in R, the `{testthat}` package forms a great place to start. For an extensive documentation, see [here](https://testthat.r-lib.org/). Below we show a short example on how to write a test for a function in R, using this framwork.
Suppose we have a simple function called `addition()` that takes two arguments and returns their sum. We want to write a test for this function to ensure that it behaves correctly. Here's how we can do that using the 'testthat' package:
```{r, eval=FALSE}
# Load the testthat package
library(testthat)
# Define the function to be tested
addition <- function(a, b) {
return(a + b)
}
# Write a test for the addition() function
test_that("addition() function works correctly", {
# Test case 1
expect_equal(addition(2, 3), 5, "Addition of 2 and 3 should be 5.")
# Test case 2
expect_equal(addition(-4, 8), 4, "Addition of -4 and 8 should be 4.")
# Test case 3
expect_equal(addition(0, 0), 0, "Addition of 0 and 0 should be 0.")
})
```
In this example, we use the `test_that()` function from the `{testthat}` package to define a test that contains multiple test cases. Within each test case, we use the expect_equal() function to compare the output of the addition() function with the expected result. If the expected result matches the actual result, the test passes. If not, the test fails, and an error message is displayed.
Normally, you would store these tests in specific folder that runs the tests upon 'build-time'. An easy way to setup the required infrastructure in your R-package is by using again the `{usethis}` package.
```{r, eval=FALSE}
usethis::use_test("test_name")
```
The test-code above would be written in the .R file, created by this call.
Running all tests in a package can be done by:
```{r, eval=FALSE}
devtools::test()
```
It's important to write meaningful test cases that cover different scenarios and cases to thoroughly test the function's behavior. By running these tests using the 'testthat' package, we can quickly detect and fix any issues or regressions in our code, ensuring that it behaves correctly and reliably.
## Github actions
Running tests and other automated workflows can be further automated, using Github actions. Github Actions is an incredibly useful tool for R package development as it allows for automated workflows and continuous integration (CI) processes. With Github Actions, developers can automate various tasks, such as building and testing R packages, checking code quality, and deploying packages to CRAN or other package repositories. This automation helps to catch potential issues early in the development process, ensuring that the package is reliable and robust.
Github Actions also provides a seamless integration with the Git and GitHub workflow, allowing for easy setup and configuration of CI workflows directly in the repository. This makes it convenient for R package developers to incorporate automated testing and other development tasks into their existing workflow.
Another key benefit of Github Actions is its flexibility and customization options. Developers can define custom workflows tailored to their specific needs, and take advantage of a wide range of pre-defined actions from the Github Actions marketplace. This allows for extensive customization and scalability, making it a powerful tool for R package development in projects of all sizes.
Furthermore, Github Actions offers excellent visibility and traceability, allowing developers to easily track the status of their workflows and identify any failures or issues. Detailed logs and reports help in troubleshooting and diagnosing problems, enabling swift resolution and ensuring that the R package remains in a stable and releasable state.
In summary, Github Actions is a valuable tool for R package development, providing automated workflows, continuous integration, customization options, and excellent visibility. It helps in ensuring code quality, catching issues early, and streamlining the development process, making it an indispensable tool for modern R package development workflows.
## Collaboration workflow in bash and Rstudio
Collaborating on a GitHub code repository is a fundamental aspect of modern software development, enabling teams to work together seamlessly, regardless of location. Here are some steps and code examples to collaborate effectively on GitHub:
There are a few different workflow that you can use to collaborate in github. We will show you one example: `clone->branch->commit->push->pull-request->merge`. This workflow consists of the following steps. Below the workflow, we continue with how the commands would look like, when working from a terminal. You could also combine these commands with working from the GUI in RStudio.
### Workflow for collaboration
Clone the repository: To start collaborating, clone the repository to your local machine using the following command in your terminal:
```
# Bash
git clone <repository_url>
```
try this out with the Github [repo that contains the Reader for this course](https://github.com/VHP4Safety/github_intro_training)
1. Create a new branch: Create a new branch for your changes to isolate them from the main branch. Choose a descriptive name for the branch and create it using the following command:
```
# Bash
git checkout -b <myinitials_mytask>
```
2. Make changes: Make your desired changes to the codebase, such as adding new features, fixing bugs, or improving documentation, using your preferred code editor. Here: Try to find a typo and correct it. You should be fine. For sure there are sure some typos to be found! Save the file.
3. Stage and commit changes: Stage and commit your changes locally with descriptive commit messages using the following commands:
```
# Bash
git add .
git commit -m "Your commit message here"
```
4. Push changes to the remote repository: Push your changes to the remote repository on GitHub using the following command:
```
# Bash
git push origin <branch_name>
```
5. Create a pull request: Go to the repository's GitHub page, navigate to the "Pull Requests" tab, and click on the "New Pull Request" button. Choose the appropriate base and compare branches, and provide a descriptive title and comment for your pull request.
6. Review and merge: Collaborators can review your changes, provide feedback, and approve the pull request. Once approved, the repository owner or an authorized collaborator can merge the changes into the main branch.
7. Update local repository: After the changes are merged, update your local repository with the latest changes from the main branch using the following commands:
```
# Bash
git checkout main
git pull origin main
```
8. Repeat steps 3-8: Continue collaborating by repeating steps 3-8 for subsequent changes and updates to the repository.
By following these steps and utilizing Git commands, teams can effectively collaborate on GitHub to develop high-quality software together.
### RStudio GUI
Some of the commands above can also be run from the RStudio Git client. In the top right corner of the GUI in the `Git Pane`, you will find some options to `Commmit`, `Pull` and `Push` from the GUI.
```{r}
knitr::include_graphics("https://book.cds101.com/img/stage_step_3.png")
```
### Visualize the history of a Github repo
To get an idea on who contributed what, and how the repo was developed, we can create visulizations of the github reopo from a terminal or R. If you are interested in how to visualize the history of github repo, including commits to all the branches etc. Check out [this blog](https://www.r-bloggers.com/2018/03/guide-to-tidy-git-analysis/). To get (a simple) graphical representation of a github repo run the following command from the terminal.
```
# bash
git log --graph --full-history --all --color --pretty=format:"%x1b[31m%h%x09%x1b[32m%d%x1b[0m%x20%s"
```
```
* 38cb99a (HEAD -> main, origin/main, origin/HEAD) adapted layout
* c1cfbc6 Merge branch 'main' of https://github.com/VHP4Safety/github_intro_training
|\
| * 1998a66 Merge pull request #2 from VHP4Safety/ontwikkelbranche_Alyanne1
| |\
| | * 2c74145 (origin/ontwikkelbranche_Alyanne1, origin/git-collaboration, git-collaboration) oplettennietvergeten
| | * debb613 data en github
```
There are also tools that help you visualize the repo's history is a more user friendly way. If you like to learn more, see e.g. [this blogpost](https://livablesoftware.com/tools-to-visualize-the-history-of-a-git-repository/)
The easiest way to visually inspect your repos commit history is by inspecting the `History` in the `Git Pane` of your RStudio GUI. See the image below:
```{r}
knitr::include_graphics("https://carpentries-incubator.github.io/git-Rstudio-course/fig/history-rstudio2.png")
```
Licenced CC-BY 4.0 (from https://carpentries-incubator.github.io/git-Rstudio-course)
## Git security and privacy
### A Github token {#tokens}
GitHub tokens are access credentials that serve as a secure way to authenticate and interact with the GitHub API. The purpose of GitHub tokens is to provide users with a secure and efficient means to programmatically access and manipulate GitHub repositories, gists, and other resources without exposing sensitive information such as passwords. These tokens are widely used in software development workflows, including automation scripts, continuous integration/continuous deployment (CI/CD) pipelines, and other applications that require interacting with GitHub programmatically.
One popular way to use GitHub tokens in R is through the credentials package, which provides an easy and convenient method for setting up and managing GitHub tokens for authentication. Here's an example of how to set up a GitHub token using the credentials package in RMarkdown syntax:
```{r setup-github-token, eval=FALSE}
# Load the credentials package
library(credentials)
# Set a GitHub token
set_github_pat()
```
In the code above, we first load the `credentials` package, which provides a comprehensive set of functions for managing different types of authentication credentials. Then, we use the `token_create()` function from the `credentials` package to create a GitHub token, specifying the token type as `"github"`. This function will prompt the user to enter their GitHub personal access token, which can be obtained from GitHub by following the relevant documentation. The created token is stored in the `github_token` object. Finally, we use the RMarkdown syntax to print the details of the created token using the `github_token` object.
Using GitHub tokens in R has several advantages. First, it enhances security by eliminating the need to store passwords in code or configuration files, reducing the risk of exposing sensitive information. GitHub tokens are single-use tokens that can be easily revoked or regenerated, providing better security compared to using long-lived personal access tokens or passwords. Additionally, GitHub tokens enable automation and integration of GitHub workflows into R-based projects, making it easier to incorporate GitHub functionalities into data analysis, code deployment, and version control processes.
The `credentials` package in R provides a user-friendly way to manage GitHub tokens and other authentication credentials. It supports various types of tokens, including GitHub, Google Cloud, Azure, and many others, making it a versatile choice for managing authentication in R projects that require interaction with different APIs or services. The `credentials` package also provides functions for securely storing and retrieving tokens, allowing users to easily reuse tokens across different sessions or projects without exposing them in plain text.
In conclusion, GitHub tokens are valuable access credentials that provide a secure and efficient way to authenticate and interact with GitHub resources programmatically. They are widely used in software development workflows and can be easily managed in R using the `credentials` package. By using GitHub tokens in R, users can enhance security, automate workflows, and integrate GitHub functionalities into their R-based projects, making it a powerful tool for seamless interaction with GitHub resources.
### Storing and using passwords and other keys in your code
Storing keys, passwords, or any sensitive information directly in code can pose a security risk, as it increases the likelihood of accidental exposure or unauthorized access. This is especially true when working with APIs that require authentication, such as the RStudio API. One best practice is to separate sensitive information, such as keys or tokens, from the code and manage them securely. In R, one approach is to use environment variables or external configuration files to store and manage keys securely, and then access them in the code using appropriate methods.
#### R
Here's an example in R using the dotenv package, which allows you to store and manage keys as environment variables in a .env file:
```{r, eval=FALSE}
# Load the dotenv package
library(dotenv)
# Load the keys from .env file
dotenv::load_dot_env()
# Access the key in the code
my_key <- Sys.getenv("EXAMPLE_KEY")
# Use the key in API calls
api_call(api_key = my_key, ...)
```
In the code above, we first load the dotenv package, which provides functions to load and manage environment variables from a .env file. We then use the load_dot_env() function to load the keys from the .env file, which could contain sensitive information such as API keys or passwords. Next, we use the Sys.getenv() function to access the value of the key in the code, which can be used in API calls or other operations. This way, the sensitive key is not exposed in the code and is managed securely in a separate location.
#### bash
Similarly, in bash, you can manage keys by exporting them as system variables, which can be accessed by other scripts or processes running on the same system:
```
# bash:
# Export the key as a system variable
export EXAMPLE_KEY="your_key_value"
```
```
# Use the key in other scripts or processes
./script.sh
```
In the example above, we use the export command to set the value of the key "EXAMPLE_KEY" as a system variable, making it accessible to other scripts or processes running on the same system. This way, the sensitive key is not hard-coded in the script, but rather managed as a system variable, providing a more secure way to store and access sensitive information.
In conclusion, storing sensitive information such as keys in code can pose security risks, and it's important to follow best practices for managing keys securely. Using environment variables, external configuration files, or system variables can help separate sensitive information from the code and provide a more secure approach to managing keys. By implementing these security measures, you can protect sensitive information and reduce the risk of accidental exposure or unauthorized access, ensuring the confidentiality and integrity of your applications or workflows.
## Interactive asking for passwords {#passwords}
The `{rstudioapi}` package provides the `askForSecret()` function, which allows you to securely prompt the user for sensitive information, such as passwords or API keys, in RStudio. Here's an example of how you can use the `askForSecret()` function in RMarkdown syntax:
```{r, eval=FALSE}
library(rstudioapi)
# Prompt user for a secret
my_secret <- rstudioapi::askForSecret("Please enter your API key:")
# Use the secret in the code
api_call(api_key = my_secret, ...)
```
In the RMarkdown document above, we first load the `{rstudioapi}` package, which provides the `askForSecret()` function. We then use the `askForSecret()` function to prompt the user for a secret, in this case, an API key. The user will see a secure dialog box where they can enter the secret, and the value will be stored in the `my_secret` variable. We can then use the `my_secret` variable in the code, such as passing it as a parameter in an API call.
Note that `askForSecret()` is specifically designed to work with RStudio and is only available when running R code within RStudio. It provides an added layer of security as the entered secret is not visible in the R console or stored in the R environment, reducing the risk of accidental exposure. However, it's important to keep in mind that this function is not intended for storing secrets permanently and should be used for temporary or interactive use only.
Using `askForSecret()` in combination with other best practices, such as storing secrets in environment variables or external configuration files, can help you securely manage sensitive information in your R code, reducing the risk of unauthorized access or accidental exposure of sensitive information.