| output | |||||||||
|---|---|---|---|---|---|---|---|---|---|
|
The applications were developed by Elias Carlsson as a master thesis project 2022 at the department of Immunotechnology, LTH.
Make sure following packages are installed (they will be requried in the scripts):
dependencies <- c("rio","dplyr","tidyverse","shiny","corrplot","ggrepel","ggplot2","pheatmap","lmerTest","patchwork","digest","plotly","reshape2","Hmisc","psych","stats","reshape", "caTools", "randomForest", "caret", "survminer", "ggpubr", "janitor", "coxme", "survival", "forcats")
req <- lapply(dependencies, require, character.only = TRUE)
lapply(dependencies[!unlist(req)],install.packages, character.only = TRUE)
It might be necessary to define the working directory for the apps to work.
your_directory = “C:change/this/to/dicretory/to/this/folder”
setwd(your_directory)
If error messages appears containing "No such file or directory" this might solve the problem.
Right now the data-folder contains an example-dataset of randomly generated data. Try the apps first with this dataset to see that installation works as it should. There are four apps:
- app_normalization
- app_lmem
- app_survival
- app_ML
Try to start them all and click around to ensure that everything works as it should.
The app comes with a folder and script structure, that might cause failure if not intact.
Following files/folders are necessary.
Top layer
- app_survival.R
- app_normalization.R
- app_ML.R
- app_lmem.R
- configsetup.R
- README.md
- cache (folder)
- functions (folder)
- data (folder)
- miscellaneous (folder)
cache (folder)
- survival_functions (folder)
- lmem_functions (folder)
functions (folder)
- load_data.R
- survival_functions.R
- lmem_functions.R
- normalization_functions.R
- ML_functions.R
data (folder)
- setup.RData
- your datasets
miscellaneous
- normalyzerDE_matrices.R
- configsetup.R
It is possible to add GeoMx datasets to the applications. To do this, there are essentially two steps. 1) Add your data, 2) Configure the setup-file. These steps are described below.
Add the datafiles at your desired location (preferably in the "data" folder, or in a subfolder to it). The requirement for the data is as following:
GeoMx data and data normalized in GeoMx
- The apps uses raw GeoMx data and loads the excel-file the way it is exported from GeoMx.
NormalyzerDE data
- The apps can also handle data normalized in NormalyzerDE. Just place the text-files output in a folder.
- There is a script in (
./miscellaneous/normalyzerDE_matrices.R) that created design and data matrixes for the GeoMx-data. If this one is used with NormalyzerDE, everything should work fine. This script requires you to enter all the names of your proteins, and is therefore recommended to be performed after 2.2.
OBS: The data matrix to NormalyzerDE is scaled up by factor 10 and is therefore downscaled again when loaded
Next it is necessary to provide some information of your dataset to the apps. For instance the location of your datafiles and the names of your variables. There is a R script (./miscellaneous/configsetup.R) aiding in this process.
setup.RData is a RData-file providing this information. The datafile should be stored in the data-folder, as following: ./data/setup.RData
Following is a table defining what values needs to be contained in setup.RData along with a table of examples:
| Number | Description | Name | Type | OBS |
|---|---|---|---|---|
| 1 | Vector containing all available proteins | proteins |
a character vector | |
| 2 | Vector containing all household and negative control proteins | cn |
a named character vector | HK proteins named "HK", negative control named "NegC" |
| 3 | Vector containing all types relevant for analysis | vec_type |
a character vector | |
| 4 | Vector containing locations of datasets | loc |
a named character vector | |
| 5 | Vector containing locations of normalyzerDE datasets | loc_nDE |
a named character vector | Needs to contain the element: "None" = 0 |
| 6 | Vector defining the names of relevant features | feature_names |
a named character vector | * (See note) |
| Number | Example |
|---|---|
| 1 | proteins <- c("BIM", "PanCk") |
| 2 | cn <- c("HK" = "GAPDH", "NegC" = Ms IgG2a") |
| 3 | vec_type <- c("Main_type", "Type", "Stage_1") |
| 4 | loc <- c("H3 Normalized" = "./data/H3data.xlsx", "Non normalized" = "./data/NNdata.xlsx") |
| 5 | loc_nDE <- c("None" = 0, "VSN Normalized" = "./data/nDE/VSNdata.txt") |
| 6 | feature_names <- c("Diagnosis date" = "Diagnosis_date", "Area" = "AOI surface area"...) |
* Included in feature names must be:
- "Diagnosis date" =
- "Cause of death" =
- "Last record alive" =
- "Date of death" =
- "Area" =
- "Nuclei count" =
- "PatientID" =
If there is already a time column and event column in the dataset, make sure they are named "time" and "event". Then:
- "Diagnosis date" = ""
- "Cause of death" = ""
- "Last record alive" = ""
- "Date of death" = ""
Which dataset is preselected within the applications has to be changed manually in every app.
Ctrl+F for : selected = loc_nDE[ - to change which nDE dataset is selected Ctrl+F for : selected = loc[ - to change which dataset is selected.
Choosing dataset and filtering it can be done in every app and looks like in the image displayed below. Selecting dataset allows for comparasion of different normalization approaches/similar projects. Filtering the data allows for selecting only specific values from some columns. For example only selecting ROI-type = tumor or Therapy_sucessful = Yes.
The scatterplots in the normalization app is limited to plotting three negative control and three housekeepers.
If error message "Inf value..." (usually when two or more random effects) change method to nelder-mead Did not work? Did you define columns correctly in feature_names? Is your event and time column named "event" and "time"?
For all biomarkers a RF-regression will be made. If regression is wanted for parameters as well this can be edited in ML_functions.R
Ctrl+F for "Add which parameters should be regression manually"