-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathpath.yml
More file actions
125 lines (125 loc) · 12.1 KB
/
path.yml
File metadata and controls
125 lines (125 loc) · 12.1 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
# path.yml
---
Grails:
- Wander, then get lost, in a massive global web of linked data, just because
- Link a taxonomist's phenotypes to genomicist's genotypes
- Create the Graph of Life (on just 2 axes, no, no, not that kind of graph)
- Identify the Earth's species
- Pick a list of species to send to Mars when The Apocalypse hits, justify the list
- Create an environment where scientists do science and code simultaneously
- Engineer a tool to sort 7k insects (collected in 3 days in Panama) from 4-~0.2 mm large from a single ethanol vial into morpho-species in individual microtubes, sequencing and scanning them at the same time (that's 1 vial, 3 days work... good luck scaling)
Stages:
Level 1:
- Learn Git. OK, that is maybe Level 4, maybe 5, but really, you're likely not going to read that far, so just learn it and reap the benefits
- Transcribe data into a field as a first year college student hourly, by chance, realize that things look pretty cool under a microscope
- Don't even make it to a lab, transcribe verbatim text of some weird specimen label as a mechanical turk
- Read a funny name for some animal and plant and realize it is "scientific"
- Stumble on the <a href="https://www.biodiversitylibrary.org/">BHL</a> get lost for an evening.
Level 2:
- Write a macro in Excel to transform your PIs data, get furious when your PI sorts the columns and saves the data, overwriting your work, write another to fix the mistake, copy the file and only pass along the copy
Level 3:
- You learned Git, right?
- Read a file via a script you wrote, a FASTA file
- Bang your head on regular expressions because you want to convert the DNA in that FASTA file to AA
- Write a file via script
- You grok <a href="http://drivendata.github.io/cookiecutter-data-science/">Cookie Cutter Data Science</a>
- IRL, feel BLUE, <a href="https://www.biodiversityliteracy.com/">Biodiversity Literacy in Undergraduate Education</a>
Level 4:
- Git, remember?
- Use R to learn about mapping to Darwin Core thanks to the <a href="https://github.com/trias-project/checklist-recipe/wiki/Getting-started">Checklist Recipie Tutorial</a>
- Use what you learned in the Software Carpentry class to finish your tenure-seeking prof's request in little under an hour, reap the rewards of their appreciation
- Realize there is no way in hell you're formatting those 1k specimens into a material examined section by hand every time you publish a paper, write a database report or 4k line script to do it for you, rejoice when the reviewer tells you to sort by "A" when you did "B" and it takes you three seconds to finish
- Help move to your labs documents to Google Docs/Sheets/Drive, worry about Big Brother reading everything, but have too much fun doing science to think about it
- Wonder if your lab has a backup for all this digital stuff (skip to 8)
- Realize the week you spent on your FASTA parser was for not, CPAN had you covered
- Use Access or Filemaker to store data in a relational format
- Use a Juypter notebook
- YARE (yet another regular expression) foray, discover positive look ahead
- The Loop, it has such power, not just one graph, 1k, not just testing 1 value but 10k values... now what result to present to your graduate committee? <AHA moment> OH wowwww, I can *NEST LOOPS*! <forgets committee, deadlines, and goes back to hacking>
Level 5:
- You split your 4k line script into 4 files, each one doing a different part
- You start using "foo" and "bar" in comments within your code, and when you inadvertently use these words in your weekly lab-meeting you get funny looks from your lab-mates
- Realize things have advanced slightly since CPAN, but not much
- Build a website that serves your Lab's homepage
- Take a Software Carpentry workshop and feel really good about the result
- Get blamed for asking for the answers to your homework on Stack Overflow or Perl Monks because you're trying to customize a regex to parse this weird sequence data, rage in response
- Open an issue on a GitHub/Gitlab issue tracker
- Spend 2 days tweaking the OCR parameters of Tesseract, give up, and use the defaults.
- Take 2 days to write a script that parses 1500 records, feel awesome until you realize it would have taken you 2 hours to do it manually, and your script is so custom it won't be useful for anything else
- Hear about the concept of SOP, standard operating procedures, help your lab to start writing these
Level 6:
- Your 4k script is now many small files, and you discover Unit Tests, and with them you absolutely crush it when it's time for YARE, this one super-duper complex
- Write a custom pipeline that reads from multiple local resources (e.g. CSV, Excel, Databases)
- Implement JBrowse or other "static" website plugins in your lab's website
- Write the engine that parses your fellow PhD students data and plots a pretty graph (Neil Stephonson/Crytponomicon reference)
- Move away from Filemaker and Access to Postgres or MySQL
- Develop a lab website that sits on top a relationship relational database
- Migrate a 20 year old database to a new system
- Provide an answer to a GitHub/Gitlab issue that lets the issue be closed
- Propose to use a Google technology to solve a biological question
- Assign yourself an ORCID ID, wonder why you didn't do that earlier
- Finally get the reference to VIM and Emacs
- After the 10k request to change color or font or references cited style or your figure legend or or or ... conclude that some folks (who happen to be biologists) are waaay more interested in how things look than what they *mean*
- Build a AI based classifier to classify the images you took of your specimen's heads, wonder why it fails, realize you have to look at them manually to build up a training dataset
- Come across some weird biodiversity informatics project and realize its interfaces SUCK, use your degree in Design to radically improve the software as an hourly
- "ruby -e 'require \"nokogiri\"; doc = File.open(\"file_scrapped_from_the_web_containing_semi-structured_list_of_names_you_want_to_parse.html\") { |f| Nokogiri::XML(f) }; doc.errors'"
Level 7:
- Generalize and restructure your custom scripts as a R function or package and share with the world
- Share your Regular Expression skill as a library for others to use
- Realize you are writing code that support science, and that science takes a while, and maybe it would be good if your code had Unit Tests so you could come back to it in 5 years and have a chance of it working as you expect
- Be bored at the content of a Software Carpentry workshop, realize you'd have more fun teaching the class, then teach the class after writing a new module and sharing it to the world
- Contribute a video that describe the use of your favourite open-source software, get a million likes from biologists because you share it to the world
- Lose sleep at night because... something about backups, and what does that field *REAALY* mean, and shouldn't that code run faster, and 3 weeks for the tree to finish, 3 weeks!?!
- Write a custom likelihood model for phylogenetic inference in a format compatible with others
- Master Vim or Emacs as your primary code editor, fret when the key-binds you use aren't available in your collaborators collaborative document editor of choice
- Contribute to the definitions of classes in an OBO Foundry ontology
- Adopt LSIDs
- Discover <a href="https://en.wikipedia.org/wiki/Resource_Description_Framework">RDF</a>
- Wonder why folks get all worked up about Citation Metrics, i.e. another way bad-actors will game the system, when admins should focus on getting to know their scientists, it's not rocket science, good people are good
- Share the results of that 20 year old database to the rest of the world via an API
- Find OpenRefine, wonder why you didn't do this at Level 2
- Rage when Google end-of-life the technology you built during your grant
- Your PyPi/Gem/CRAN library to map the "foo" from "bar" in a D3.js graph is used to publish a graph in an <del>Science</del> open-access journal"
- Annotate your data with URIs from a OBO Foundry ontology
- Ponder turning a silly list of ideas in YML format into something more robust, with resolve URIs, versioning, and rules for deprecation, and edges, a domains and ranges
- Play with SPARQL endpoints
Level 8:
- Lead the development of a OBO foundry accepted Ontology
- Replace a whole pile of your day to day scripts with sed, awk, and friends
- You remember backups, and employ a system to deploy your infrastructure (data and apps) to Docker, your institution's Library, some university across the ocean as an image, your tape backup, your brother-in-law's Synology Unit
- Gain sleep at night because you used Continuous Integration to test your software
- Produce a (biological) <em>Gold Standard</em> dataset that becomes the source for engineering competitions
- Worry about the difference between functional and object-oriented languages, and what impact both may have on the science you do
- Worry about the philosophical limitations of identifiers, and whether membership changes in a set mean you're talking about a new entity, say something interesting to the public about your thoughts
- Question your adoption of LSIDs
- Look at your Perl code you wrote during your dissertation and wonder who the hell did that?!
- Star worrying about caching results of long running GIS queries.
- Start claiming that the argument "technology! because we need to do things faster" is a false premise, and rather the argument is "technology! because we need to improve the meaning behind our data"
- After you plot that super cool graph or follow the reviewers suggestion to perhaps not use COMIC SANS realize that how things look can help communicate what they mean
- Optimize your library from prior Levels using Rust or Go
- Wonder why the recommended standard for displaying a DOI is not followed by one of the largest DOI providers, fork you scraping code to accomodate
- Write a protein folding algorithm
- Actually build an application on SPARQL endpoints
- Contribute to <a href="http://uberon.github.io/about.html">UBERON</a>
- AI augmented measurements of specimen morphology
Level 9:
- Become the director of the global biodiversity informatics organization
- You assume the maintenance of a well established open source library, the community loves you
- Develop the core of a system that maps, and returns the results *very* quickly, of all specimens ever digitized
- Write the engine that reads custom models for phylogenetic inference
- Take a sabbatical to serve on a funding agencies team
- Write a sequence aligner, that is faster, more accurate, and parallelizes better than BLAST, HMMER, etc.
- You write systems to scrape *all* biological data and put it on a single machine
- Write code to parse everything, all the text ever published in a week
- Explain to others why they shouldn't use Google technologies as the core of their scientific software if they want their software to last more than a couple years
- Lead the development of a TDWG standard
- Implement "Manual Mondays" in your lab wherein only wet-cycles are allowed.
- Worry about ORCID IDs becoming the Facebook for scientists
- Wonder how Darwin did all that stuff without GUIDs?
- Test limits to triple stores because wouldn't it be nice if everything is a triple? Realize biology is kinda grand in scale.
- Wiki, Graph database, or Relational? Yes.
- Participate in an R/Python/Ruby for biodiversity informatics workshop to see what those crazy kids are doing these days (teach an old dog new tricks)
Level 10:
- Taxonomies used by global biodiversity platforms seamlessly integrate the true phylogenetic information contained within local uses of informal (aggregate) names across space and time
- Design and implement out a universal persistent identifier model that works for biology (assumes you passed on to another plane of existence)
- 42