Introduction and observations on local setup. #868
Replies: 3 comments
-
|
Hi, @NgangaKamau3! I don't think I have observed the test fail due to discrepancies in MD5 columns you mention. Can you create an issue documenting the test fails? We have generally ignored the scatter_mapbox warnings recently but we would be grateful if you could propose a solution in a PR. |
Beta Was this translation helpful? Give feedback.
-
|
@jonbrenas after further investigating the failures I saw earlier, I realized my initial local environment wasn't properly set up and had miscued the WGS catalog. I’ve now worked on those deprecation warnings , which were triggered by the legacy scatter_mapbox and append_trace methods in the anoph suite. I’ve submitted a PR, #889 that migrates these to scatter_map and add_trace. This resolved the perceived failures in my environment and brought the local test suite to pass fully. |
Beta Was this translation helpful? Give feedback.
-
|
Hi @jonbrenas @tristanpwdennis, Hope you're both well! I'm continuing to explore the malariagen-data-python internals as I think through the taxon classifier integration. While profiling metadata ingestion locally, I noticed that _cache_files in AnophelesBase appears to grow without an eviction boundary, as read_files updates the dictionary over time. Combined with the async cat() usage for GCS latency optimization, this seems to produce steady memory growth in longer sessions. Before drawing conclusions, I wanted to check whether the unbounded cache is intentional for session-wide performance, or if introducing something like an LRU-style cap might make sense as datasets scale. Happy to share profiling traces (mprof) if useful, or look into a potential fix if this is an area you'd like addressed. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi @jonbrenas @tristanpwdennis ! I'm Ng'ang'a Kamau, a student interested in the GSoC taxonomic identification project. While setting up locally((WSL2/Conda/Poetry), I noticed the simulator tests fail due to the new MD5 columns in the WGS catalog. I've drafted a fix that makes the assertions more robust to these schema changes. Also noticed a lot of scatter_mapbox warnings in the logs. Would you prefer a PR for this, or is the simulator supposed to be kept in strict sync with the expected columns list?
Thanks.
Beta Was this translation helpful? Give feedback.
All reactions