GCLr is an R package designed to streamline genetic data analyses
performed by Gene Conservation Lab (GCL) staff. Many of the functions in
this package require data pulled directly from the GCL Oracle database
Loki. Because of this, only Alaska Department of Fish and Game staff
with credentials for accessing Loki will be able to use this package to
its full extent. However, the package does allow for users without Loki
credentials to read in genetic data contained in GENEPOP and rubias
formatted files and convert the data into the objects used that can be
used by GCLr (see GCLr::genepop2gcl() and GCLr::base2gcl).
Here are some examples of things that this package can be used for:
-
Laboratory workflow
-
pull data reports directly from Loki into R
-
get_asl_data()gets age, sex, and length data -
get_extraction_info()gets DNA extraction information -
get_geno()gets raw genotypes -
get_gtseq_metadatagets GTseq metadata -
get_tissue_data()gets genetic tissue data -
get_tissue_locations()gets location of archived tissues
-
-
create lab project sample sheets
get_gtseq_sample_sheet()creates a GTseq project sample sheet
-
break up Loki import files
split_gtscore_loki_import()splits a GTscore import file into multiple files
-
-
Laboratory quality control (QC)
qc_templatea template .RMD to compare the original (project) and reanalyzed (QC) sample genotypes to check for lab errors in data so they can be corrected, and calculates failure and error rates of genotypic data
-
Quality assurance (baseline and mixed stock analyses)
-
remove_ind_miss_loci()removes samples with missing genotypes -
dupcheck_within_sillys()identifies duplicated samples -
remove_dups()removes one sample from each duplicate set -
find_alt_species()checks for wrong species -
read_genepop_hwe()reads inGENEPOPHardy-Weinberg equilibrium test results
-
-
Baseline analysis
-
Population structure
-
collections_map()creates an interactive map of collection locations -
fishers_test()tests for homogeneity of allele frequencies, -
pool_collections()combines collections into populations, -
read_genepop_dis()reads inGENEPOPlinkage disequilibrium test results -
summarize_LD()summarizesGENEPOPlinkage disequilibrium test results -
create_pwfst_tree()creates a phylogenetic tree based on pairwise FST -
create_mds_plot()creates an interactive multidimensional scaling -
locus_stats()calculates observed heterozygosity, FIS, and FST by locus
-
-
Baseline evaluation
-
create_rubias_base_eval()createsrubiasmixture and baseline objects/files for evaluating baseline reporting groups -
run_rubias_base_eval()runs tests to evaluate the identifiability of baseline reporting units for genetic mixed stock analysis -
plot_baseline_eval()plots the results of the baseline evaluation tests
-
-
Individual assignment (IA) evaluation
-
loo_rate_calc()calculates leave-one-out (LOO) error rates for each reporting group -
plot_loo_prec_rec()plots LOO results as a precision-recall curve -
IA_thresholds()calculates IA probablity thresholds based on precision-recall standards
-
-
-
Genetic mixed stock analysis (MSA)
-
create_rubias_base()creates arubiasreference (aka “baseline”) object/file -
create_rubias_mix()creates arubiasmixture object/file -
run_rubias_mix()runs an MSA inrubias -
custom_comb_rubias_output()summarizes the rubias MSA results -
stratified_estimator_rubias()combines estimates for multiple strata into a single set of estimates -
summarize_rubias_individual_assign()summarizes the rubias individual assignment results
-
-
Data conversion
-
gcl2fstat()creates a genotypes file inFSTATformat -
gcl2nexus()creates a genotypes file inNEXUSformat -
gcl2genepop()creates a genotypes file inGENEPOPformat -
genepop2colony()creates a genotypes file inCOLONYformat -
genepop2gcl()reads in aGENEPOPfile and creates .gcl objects -
base2gcl()takes arubiasbaseline object and creates .gcl objects
-
You can install the package from GitHub using pak.
install.packages("pak")
pak::pak("commfish/GCLr")If you have any issues running the functions in this package, please file an issue on GitHub.
Issues can also be filed if you want to request enhancements to functions or additional functions to be added to the package.
This package generally follows the git-flow branching model, using semantic versioning to document releases. Below is a quick summary of the different branches.
-
main- stable version of the package, commits/merges to
maintrigger a new version number
- stable version of the package, commits/merges to
-
develop- ongoing, general improvements to the package including minor bug fixes
-
feature-branches- specific improvements to the package (i.e. creating functions for individual assignment)
-
hotfix-
for fixing a serious bug found on
mainin the latest version of the package -
references an issue
-
merge back into
mainanddevelop, triggers a new version number
-
Below is a generalized protocol for updating the package version by
merging changes from the develop branch into the main branch. If
working from a feature-branch, follow this same workflow for
feature-branch –> develop, then develop –> main.
-
commit changes on
developbranch -
update
NEWS.md, but without the version header #1, commit -
update the package version with
usethis::use_version(), choose major, minor, or patch, have it commit for you -
push to
develop -
create a pull request (base:
maincompare:develop) with meaningful title (i.e., merging to version 1.X.X) and brief description -
pull requests to
mainrequire review so we can keepmainstable! -
merge after folks approve, confirm the merge, do not delete
developbranch -
e-mail all GCL staff to notify everyone about the update
-
pull from
mainto make sure you have the latest and greatest version -
if your serious bug still exists, create an issue
-
create
hotfix_issue_XXbranch (referencing the issue #) frommain -
working on the
hotfix_issue_XXbranch, commit necessary changes to resolve the issue, note that you can include keywords in your commit that will auto-magically close the issue once thehotfix_issue_XXbranch is merged back intomain -
update
NEWS.md, but without the version header #1, commit -
update the package version with
usethis::use_version(), choose patch, have it commit for you -
push to
hotfix_issue_XX -
create a pull request (base:
maincompare:hotfix_issue_XX) with meaningful title (i.e., Hotfix issue #) and brief description -
pull requests to
mainrequire review so we can keepmainstable! -
merge after folks approve, confirm the merge, do not delete
hotfix_issue_XXbranch yet -
create another pull request (base:
developcompare:hotfix_issue_XX) with meaningful title (i.e., Hotfix issue # merging to develop) and brief description -
merge, confirm the merge, now you can delete the
hotfix_issue_XXbranch -
e-mail all GCL staff to notify everyone about the update
ADF&G Division of Sport Fisheries Introduction to Git provides a good overview of Git, however, note that generally assumes that you will be working off of the shared network drive, rather than cloning to your local C:/ drive.
ADF&G’s Reproducible Research R Best Practices
GitKraken is a nice GUI alternative to GitHub for visualizing the commit/branch network
