Skip to content

Conversation

@acrinklaw
Copy link

Adds a massive amount of data from the IMGT - all alleles are pulled into chain, chain-sequence, genetic-locus, and molecule.tsv. Molecule.tsv only has HLA-A,B,C,DQ,DR,DP per Randi's suggestion. Updates on the current IMGT release can be done by just calling python check_missing_alleles.py but output sheets need to be sorted.

I ran the validation and built locally to make sure it's okay and I didn't run into any issues. I also tried to adhere to some style guidelines I saw in knocean documentation such as using Black / f string syntax. Let me know if there are any other improvements or changes to make

@acrinklaw
Copy link
Author

It looks like I need to update the index since things were added since I made the branch and then readd the new terms - I'll do this and rebuild + commit

Copy link
Collaborator

@beckyjackson beckyjackson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the python script, I think the paths to files should be passed as command line args. You could just pass the path to the ontology/ directory and path to index.tsv.

Additionally, any time you're writing to one of the TSVs it would probably be better to use CSV writer instead of joining with \t.

I left comments in-line for most of them, but I stopped around line 300 since it was all just the same thing :)

Also, can we add this task to the Makefile? Maybe as make update-alleles.

@beckyjackson beckyjackson mentioned this pull request Aug 27, 2020
@jamesaoverton
Copy link
Collaborator

I want to make some decisions about #37 before merging this. If we decide to move large mro.owl etc. out of version control, then I'll want a new branch (or just a squash commit?) that does not include the large files in our git history forever.

@beckyjackson
Copy link
Collaborator

I think squashing would still keep it in the git history (I might be wrong though?), but there are other ways to remove files: https://docs.github.com/en/github/managing-large-files/removing-files-from-a-repositorys-history

@jamesaoverton
Copy link
Collaborator

If we remove a large file from the branch, then squash merge to master, I think it will not appear in the history. I think we should end up with one commit that jumps over the large files that we don't want.

But I'm certain that if we create a clean branch to merge, and delete the dev branch with the large files without merging it, then the history will not include the large files.

@beckyjackson
Copy link
Collaborator

I created a clean PR in #39 to get rid of the "big" MRO.

@jamesaoverton
Copy link
Collaborator

Closing in favour of #39.

@beckyjackson beckyjackson deleted the imgt-update branch October 7, 2020 16:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants