-
Notifications
You must be signed in to change notification settings - Fork 7
IMGT update #34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IMGT update #34
Conversation
|
It looks like I need to update the index since things were added since I made the branch and then readd the new terms - I'll do this and rebuild + commit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the python script, I think the paths to files should be passed as command line args. You could just pass the path to the ontology/ directory and path to index.tsv.
Additionally, any time you're writing to one of the TSVs it would probably be better to use CSV writer instead of joining with \t.
I left comments in-line for most of them, but I stopped around line 300 since it was all just the same thing :)
Also, can we add this task to the Makefile? Maybe as make update-alleles.
|
I want to make some decisions about #37 before merging this. If we decide to move large |
|
I think squashing would still keep it in the git history (I might be wrong though?), but there are other ways to remove files: https://docs.github.com/en/github/managing-large-files/removing-files-from-a-repositorys-history |
|
If we remove a large file from the branch, then squash merge to master, I think it will not appear in the history. I think we should end up with one commit that jumps over the large files that we don't want. But I'm certain that if we create a clean branch to merge, and delete the dev branch with the large files without merging it, then the history will not include the large files. |
|
I created a clean PR in #39 to get rid of the "big" MRO. |
|
Closing in favour of #39. |
Adds a massive amount of data from the IMGT - all alleles are pulled into chain, chain-sequence, genetic-locus, and molecule.tsv. Molecule.tsv only has HLA-A,B,C,DQ,DR,DP per Randi's suggestion. Updates on the current IMGT release can be done by just calling
python check_missing_alleles.pybut output sheets need to be sorted.I ran the validation and built locally to make sure it's okay and I didn't run into any issues. I also tried to adhere to some style guidelines I saw in knocean documentation such as using Black / f string syntax. Let me know if there are any other improvements or changes to make