Skip to content

Sergio/group attendance#79

Open
sezavala wants to merge 11 commits intomainfrom
sergio/group_attendance
Open

Sergio/group attendance#79
sezavala wants to merge 11 commits intomainfrom
sergio/group_attendance

Conversation

@sezavala
Copy link
Copy Markdown
Collaborator

@sezavala sezavala commented Feb 1, 2026

Pull Request Name

I created a new router for the api/gsheet/group-attendance endpoint.
Workflow:

  1. Check if both the start/end dates are provided.
      a. If not, calculate the start/end dates using our default_attendance_lookback_days.
  2. Connect to Google Spread depending on app_env (JSON or Heroku env variables).
  3. Based on the current sheet you are on, it will search for a cti_id column and extract all values.
  4. Match every cti_id with its primary email ("NOT FOUND" if none).
  5. Create a dataframe of all dates between the start/end dates of type=bool, and rows=len(cti_ids).
  6. After an SQL query, try to locate the cti_id and date in the dataframe. If found, set the value to True.
  7. Finally, the algorithm clears the sheet (to ensure a clean output) and writes the dataframe data to the sheet.

Issues Fixed

Tests

I created two tests for the endpoint:

  1. The first test verifies that the endpoint returns a 201 (CREATED) status code after being given a start/end date. Then, we run our service functions fetch_cti_ids_from_sheet and fetch_group_attendance. The test guarantees that the /group-attendance endpoint output in our Google Sheet matches the shape of the dataframe returned by our service functions.
  2. The next test ensures the default_attendance_lookback_days logic works correctly if we are missing both the start/end dates.
Screenshot 2026-02-01 at 2 11 31 PM

P.S., the test was run locally, but I did face an issue with the database/mongo/core.py file. I believe it was an issue with the CTI_MONGO_URL variable not being found, so I had to manually export it into my own shell using export CTI_MONGO_URL=... to run the test.

Real endpoint test using my personal Heroku dyno. I created a small script on my own copy of the "Test Worksheet" sheet:
Screenshot 2026-02-01 at 3 01 46 PM

After reviewing the tables in StudentAttendance, I gathered the few cti_ids and verified that they accurately printed True for the correct ids/dates. However, there wasn't too much data to test it on. Result:
Screenshot 2026-02-01 at 2 59 25 PM

Checklist

  • I've reviewed the contribution guide
  • I've confirmed the changes work on my machine
  • This pull request is ready for review

@sezavala
Copy link
Copy Markdown
Collaborator Author

sezavala commented Feb 5, 2026

Also, wanted to note that in this PR, I resolved an issue with the write_to_gsheet function. Instead of connecting to the Google worksheet using the provided configuration key (Prod key or Dev key), the function always used the Prod key when connecting to a Google worksheet.

@NickGuerrero
Copy link
Copy Markdown
Owner

I'm expecting to make some changes to this PR, since I wanted to add some tests, and figured I could make several fixes. Note that I'll be making some comments to help track what needs to be modified, compared to the usual PR where I use them for feedback.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that .DS_Store should have been part of the .gitignore, I'll throw it into a commit.

monkeypatch.setenv("ROSTER_SHEET_KEY", environ.get("TEST_SHEET_KEY"))
response = client.post("/api/gsheet/group-attendance",
params={
"spreadsheet_id": environ.get("TEST_SHEET_KEY"),
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this defeats the point of the monkey patch; additionally env variables should be called from config.py (Not environ.get())


@pytest.mark.integration
@pytest.mark.gsheet
def testDefaultLookbackDays(self, monkeypatch, client):
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies, for some reason I missed that the integration test was implemented here. I'll still be pushing some commits over to fix other minor issues.

)
)

print(attendance_query)
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Print statements shouldn't be here in production code. I'll also be removing these in commits.

Copy database records of select students from a specified Google Sheet and record them onto the same Sheet given a date range
"""
try:
if not start_date or not end_date:
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this means that if only 1 parameter is fulfilled, the argument will be overridden. Should not be a problem for our use case. but goes against the optional declaration, since they're dependent on each other. May create confusion down the line, may considering setting the defaults separately.

date_grid = np.zeros((len(cti_ids), len(dates)), dtype=bool)
pandas_grid = pandas.DataFrame(date_grid, index=cti_data.index, columns=dates)

result_grid = pandas.concat([cti_data, pandas_grid], axis=1)
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue with Implementation: The date columns should be generated by the sessions, not the actual dates. Consider a student that attends 2 sessions on the same date, but different times. They would meet the DW requirement, but appear wrong on this sheet. This needs to be resolved before the PR merges. If I'm understanding this wrong, correct me.

Copy link
Copy Markdown
Owner

@NickGuerrero NickGuerrero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You did pretty good for a first major PR to this project. My comments are primarily criticisms because I'm looking for problems, but you overall stuck to the general requirements well. I submitted some fixes mentioned, but I would like to see some changes before approving the PR.

  • The columns should be based by session, not by dates. Since we can have multiple sessions in a day, that should be represented. I believe I used MM/DD - 1,2,3... as an example.
  • Move tests/gsheet/test_gsheet.py to its own folder, i.e. tests/gsheet/group_attendance/test_group_attendance.py. I didn't move it since I didn't want to disturb anything without ensuring the tests would still pass.
  • I also left the print statements in there, since I believe you'll use them again for development/testing. Make sure to remove them before re-submitting the PR.

@sezavala
Copy link
Copy Markdown
Collaborator Author

Changed the function so the columns are only session dates (since it is on the test database, there isn't much). On days where there is more than one session, they are enumerated.

Screenshot 2026-02-16 at 7 57 36 PM

"""
Fetch attendance records and create an attendance matrix of select cti_ids and a date range,
given the associated Accelerate tables
@param eng: A SQLAlchemy Engine object that connects to the database
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit odd to document only 1 of the parameters; since this is the main function, all parameters should be filled in, perhaps with some high-level details on execution (e.g. how the output matrix is structured, what happens if only 1 date is used, some of the defaults).

session_cols: Dict[int, str] = {}
col_order: List[str] = []

for session_date, group in by_date.groupby("session_date", sort=True):
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like a somewhat complicated way to enumerate the dates, especially since the dates are already sorted (I believe really only need to see the previous date to determine whether or not to add the number if you were running sequentially). It does work with the data frame methods, so I don't have any requested changes.

On the other hand, I don't like the variable name "group", since it's not immediately clear what's in the group. "date_group" or "session_group" may be better. Not a big issue, but an issue with readability.

result_grid.insert(
0,
"email",
[id_to_email.get(cid, "NOT FOUND") for cid in result_grid.index],
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the "NOT FOUND" addition - it makes it very clear if something is wrong with the attendance processing (No primary email with a cti_id means no attendance).

try:
column_index = headers.index("cti_id") + 1
except ValueError:
print("Column name not found")
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

print() is acceptable for now, but ideally a logger to console is used for more reliability.

# Skip
continue

worksheet.clear()
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a huge fan of clearing the worksheet once values are extracted, since the attendance matrix still needs to be created by this point. This could possibly result in the ids being cleared on the worksheet, something goes wrong in the session processing, and the original worksheet stays clear.

It means that we should probably keep id backups just in case. Additionally, id group data shouldn't live only on the spreadsheet (e.g. Accountability Groups can be generated from SA assignments, then this group attendance is run for an SA view. Opposed to keeping the id data in a spreadsheet).

@NickGuerrero
Copy link
Copy Markdown
Owner

After PR Review discussion:

  • worksheet.clear() is clearing data between tests unintentionally (endpoint logic & service logic are run without the write-back).
    • Find a better place for the clear. It should happen as close to the write-back as possible.
      • Could use a select operation instead of a full clear
    • Between tests, find a way to reset the group attendance sheet (Test Group), clean-up, maybe with a fixture

@sezavala sezavala requested a review from NickGuerrero March 6, 2026 01:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[TASK] Add feature for group attendance

2 participants