APP-356 Convert POTNTL_DUP_INV_SUM (Potential Duplicate Investigations)#3131
APP-356 Convert POTNTL_DUP_INV_SUM (Potential Duplicate Investigations)#3131krista-skylight wants to merge 63 commits intomainfrom
Conversation
…into kc/convert_POTNTL_DUP_INV_SUM merge branches
…into kc/convert_POTNTL_DUP_INV_SUM merge main
…into kc/convert_POTNTL_DUP_INV_SUM merge from main
…into kc/convert_POTNTL_DUP_INV_SUM merge with main
…into kc/convert_POTNTL_DUP_INV_SUM merge changes from main
…into kc/convert_POTNTL_DUP_INV_SUM merge from main
JordanGuinn
left a comment
There was a problem hiding this comment.
This is looking good!! 🔥
| library_name: str = Field(min_length=1) | ||
| data_source_name: str = Field(min_length=1) | ||
| subset_query: str = Field(min_length=1) | ||
| days_value: int | None = None # Specific to potntl_dup_inv_sum |
There was a problem hiding this comment.
(thought, nb): Probably not worth addressing at this moment, but if there end up being multiple reports that require unique properties like this, maybe we end up sketching out a custom_props Object field or similar to capture them all? Just in the spirit of minimizing the amount of library-specific bits on the ReportSpec model.
There was a problem hiding this comment.
Yea def a good idea to revisit as we work through more translations!
| sas: | ||
| image: ghcr.io/cdcent/nbs7-sas-linux:v1.0.4 | ||
| platform: linux/amd64 | ||
| container_name: "${COMPOSE_PROJECT_NAME}-sas" |
There was a problem hiding this comment.
(q, nb): Any particular reason for this addition?
There was a problem hiding this comment.
I do not know enough to understand why but when I was troubleshooting issues with @mcmcgrath13 doing sas translations, she suggested making this change to get around an error message that I can no longer recall lol.
Mary, do you think I should push this for everyone or leave it as something only I modify?
There was a problem hiding this comment.
I think it's fine to push - makes working with the sas container easier via compose
|
|
||
| db_tables = [t['table_name'] for t in schema['tables']] | ||
| fk_tables = schema['config']['nbs']['fk_tables'] | ||
| fk_tables = schema['config'].get('nbs', {}).get('fk_tables', []) |
There was a problem hiding this comment.
(q, nb): What's this change for?
There was a problem hiding this comment.
so if there are no fk tables specified (not always relevant), then we default to empty list
| ] | ||
|
|
||
| def test_execute_report_with_days_value(self, snapshot): | ||
| """Test with a specific days value (e.g., 365 days).""" |
There was a problem hiding this comment.
(q, nb): Can/should we do any actual date comparisons of the underlying results returned by the report lib, to ensure they actually match the days value provided?
There was a problem hiding this comment.
No, unfortunately, the report doesn't retain data about the two event dates for the potential duplicate or how close in days they are to each other. That's why I did a comparison between 3650 and 30 days to check that 3650 has more potential dupilcates than 30.
| - column_name: ILLNESS_ONSET_DATE | ||
| type: string | ||
| data: None | ||
| null_percentage: 1.0 |
There was a problem hiding this comment.
(q, nb): why include the column, but have everything be null?
There was a problem hiding this comment.
I was just trying to mimic everything I saw in the original table as much as possible since I wasn't sure what downstream effects empty columns have on this library or other ones using this table
| data: f"{fake.date_between(start_date='-1y', end_date='today').strftime('%Y-%m-%d')} 12:00:00.000" | ||
| null_percentage: 0.05 | ||
|
|
||
| - column_name: EVENT_DATE_TYPE |
There was a problem hiding this comment.
should this be linked off of the non-null ness of EVENT_DATE? (similar to how in phdc the state is dependent on state cd)
There was a problem hiding this comment.
oooh great point! I just modified accordingly
|
|
||
| # KLUDGE: NULL writing is not always correct | ||
| result = result.replace(' nan,', ' NULL,') | ||
| result = result.replace('nan', ' NULL') |
There was a problem hiding this comment.
(q, nb): why did we need to add this one? there's a risk that a valid part of a string with nan in it ill now be turned into NULL is it the opening paren case of (nan,?
There was a problem hiding this comment.
I remember adding some of these to get get_faker_sql to not break for my data but now that I'm trying it again without that line, it still does work, so I have removed it!
Co-authored-by: Mary McGrath <m.c.mcgrath13@gmail.com>
…Cgov/NEDSS-Modernization into kc/convert_POTNTL_DUP_INV_SUM pull upstream
…tntl_dup_inv_sum/test_execute_report_with_days_value/snapshot.yml
…s_potntl_dup_inv_sum/test_execute_report_with_days_value/snapshot.yml
|
❌ The last analysis has failed. |
|




Description
Please include a summary of the changes and any key information a reviewer may need.
Tickets
Checklist for adding a library:
apps/modernization-api/src/main/resources/db/changelog/report/execution/03_ODSE_Data_Report_Library_Init.sqlapps/modernization-api/src/main/resources/db/report/execution/librariesnamed<your library name>.sqlapps/modernization-api/src/main/resources/db/report-execution-changelog.yml. It should be added to the latestchangeSetsince the last release - this could require a newchangeSetif there isn't one since last releaseapps/report-execution/src/librariesnamed in lowercase, but generally following the naming convention of SAS (needs to be human recognizable as the same library)executefunction is the required method and its signature will (someday) be checked for validityapps/report-execution/tests/libraries/<your_library_here>pyfile following the conventions established for other librariessubset_sqlcan be assumed to be well tested by the modernization-api, so focus on any logic and additional joins/queries/analysis that is added in the library