Conversation
…erifying dataframe shapes.
|
Also, wanted to note that in this PR, I resolved an issue with the |
|
I'm expecting to make some changes to this PR, since I wanted to add some tests, and figured I could make several fixes. Note that I'll be making some comments to help track what needs to be modified, compared to the usual PR where I use them for feedback. |
There was a problem hiding this comment.
Note that .DS_Store should have been part of the .gitignore, I'll throw it into a commit.
tests/gsheet/test_gsheet.py
Outdated
| monkeypatch.setenv("ROSTER_SHEET_KEY", environ.get("TEST_SHEET_KEY")) | ||
| response = client.post("/api/gsheet/group-attendance", | ||
| params={ | ||
| "spreadsheet_id": environ.get("TEST_SHEET_KEY"), |
There was a problem hiding this comment.
Note that this defeats the point of the monkey patch; additionally env variables should be called from config.py (Not environ.get())
|
|
||
| @pytest.mark.integration | ||
| @pytest.mark.gsheet | ||
| def testDefaultLookbackDays(self, monkeypatch, client): |
There was a problem hiding this comment.
Apologies, for some reason I missed that the integration test was implemented here. I'll still be pushing some commits over to fix other minor issues.
| ) | ||
| ) | ||
|
|
||
| print(attendance_query) |
There was a problem hiding this comment.
Print statements shouldn't be here in production code. I'll also be removing these in commits.
| Copy database records of select students from a specified Google Sheet and record them onto the same Sheet given a date range | ||
| """ | ||
| try: | ||
| if not start_date or not end_date: |
There was a problem hiding this comment.
Note that this means that if only 1 parameter is fulfilled, the argument will be overridden. Should not be a problem for our use case. but goes against the optional declaration, since they're dependent on each other. May create confusion down the line, may considering setting the defaults separately.
| date_grid = np.zeros((len(cti_ids), len(dates)), dtype=bool) | ||
| pandas_grid = pandas.DataFrame(date_grid, index=cti_data.index, columns=dates) | ||
|
|
||
| result_grid = pandas.concat([cti_data, pandas_grid], axis=1) |
There was a problem hiding this comment.
Issue with Implementation: The date columns should be generated by the sessions, not the actual dates. Consider a student that attends 2 sessions on the same date, but different times. They would meet the DW requirement, but appear wrong on this sheet. This needs to be resolved before the PR merges. If I'm understanding this wrong, correct me.
NickGuerrero
left a comment
There was a problem hiding this comment.
You did pretty good for a first major PR to this project. My comments are primarily criticisms because I'm looking for problems, but you overall stuck to the general requirements well. I submitted some fixes mentioned, but I would like to see some changes before approving the PR.
- The columns should be based by session, not by dates. Since we can have multiple sessions in a day, that should be represented. I believe I used MM/DD - 1,2,3... as an example.
- Move tests/gsheet/test_gsheet.py to its own folder, i.e. tests/gsheet/group_attendance/test_group_attendance.py. I didn't move it since I didn't want to disturb anything without ensuring the tests would still pass.
- I also left the print statements in there, since I believe you'll use them again for development/testing. Make sure to remove them before re-submitting the PR.
…-sys into sergio/group_attendance
| """ | ||
| Fetch attendance records and create an attendance matrix of select cti_ids and a date range, | ||
| given the associated Accelerate tables | ||
| @param eng: A SQLAlchemy Engine object that connects to the database |
There was a problem hiding this comment.
It's a bit odd to document only 1 of the parameters; since this is the main function, all parameters should be filled in, perhaps with some high-level details on execution (e.g. how the output matrix is structured, what happens if only 1 date is used, some of the defaults).
| session_cols: Dict[int, str] = {} | ||
| col_order: List[str] = [] | ||
|
|
||
| for session_date, group in by_date.groupby("session_date", sort=True): |
There was a problem hiding this comment.
This feels like a somewhat complicated way to enumerate the dates, especially since the dates are already sorted (I believe really only need to see the previous date to determine whether or not to add the number if you were running sequentially). It does work with the data frame methods, so I don't have any requested changes.
On the other hand, I don't like the variable name "group", since it's not immediately clear what's in the group. "date_group" or "session_group" may be better. Not a big issue, but an issue with readability.
| result_grid.insert( | ||
| 0, | ||
| "email", | ||
| [id_to_email.get(cid, "NOT FOUND") for cid in result_grid.index], |
There was a problem hiding this comment.
I like the "NOT FOUND" addition - it makes it very clear if something is wrong with the attendance processing (No primary email with a cti_id means no attendance).
| try: | ||
| column_index = headers.index("cti_id") + 1 | ||
| except ValueError: | ||
| print("Column name not found") |
There was a problem hiding this comment.
print() is acceptable for now, but ideally a logger to console is used for more reliability.
| # Skip | ||
| continue | ||
|
|
||
| worksheet.clear() |
There was a problem hiding this comment.
I'm not a huge fan of clearing the worksheet once values are extracted, since the attendance matrix still needs to be created by this point. This could possibly result in the ids being cleared on the worksheet, something goes wrong in the session processing, and the original worksheet stays clear.
It means that we should probably keep id backups just in case. Additionally, id group data shouldn't live only on the spreadsheet (e.g. Accountability Groups can be generated from SA assignments, then this group attendance is run for an SA view. Opposed to keeping the id data in a spreadsheet).
|
After PR Review discussion:
|

Pull Request Name
I created a new router for the
api/gsheet/group-attendanceendpoint.Workflow:
a. If not, calculate the start/end dates using our
default_attendance_lookback_days.cti_idcolumn and extract all values.Issues Fixed
Tests
I created two tests for the endpoint:
fetch_cti_ids_from_sheetandfetch_group_attendance. The test guarantees that the/group-attendanceendpoint output in our Google Sheet matches the shape of the dataframe returned by our service functions.default_attendance_lookback_dayslogic works correctly if we are missing both the start/end dates.P.S., the test was run locally, but I did face an issue with the
database/mongo/core.pyfile. I believe it was an issue with the CTI_MONGO_URL variable not being found, so I had to manually export it into my own shell usingexport CTI_MONGO_URL=...to run the test.Real endpoint test using my personal Heroku dyno. I created a small script on my own copy of the "Test Worksheet" sheet:

After reviewing the tables in StudentAttendance, I gathered the few cti_ids and verified that they accurately printed True for the correct ids/dates. However, there wasn't too much data to test it on. Result:

Checklist