Sergio/group attendance by sezavala · Pull Request #79 · NickGuerrero/cti-sys

sezavala · 2026-02-01T23:06:37Z

Pull Request Name

I created a new router for the api/gsheet/group-attendance endpoint.
Workflow:

Check if both the start/end dates are provided.
a. If not, calculate the start/end dates using our default_attendance_lookback_days.
Connect to Google Spread depending on app_env (JSON or Heroku env variables).
Based on the current sheet you are on, it will search for a cti_id column and extract all values.
Match every cti_id with its primary email ("NOT FOUND" if none).
Create a dataframe of all dates between the start/end dates of type=bool, and rows=len(cti_ids).
After an SQL query, try to locate the cti_id and date in the dataframe. If found, set the value to True.
Finally, the algorithm clears the sheet (to ensure a clean output) and writes the dataframe data to the sheet.

Issues Fixed

Fixed: [TASK] Add feature for group attendance #76

Tests

I created two tests for the endpoint:

The first test verifies that the endpoint returns a 201 (CREATED) status code after being given a start/end date. Then, we run our service functions fetch_cti_ids_from_sheet and fetch_group_attendance. The test guarantees that the /group-attendance endpoint output in our Google Sheet matches the shape of the dataframe returned by our service functions.
The next test ensures the default_attendance_lookback_days logic works correctly if we are missing both the start/end dates.

P.S., the test was run locally, but I did face an issue with the database/mongo/core.py file. I believe it was an issue with the CTI_MONGO_URL variable not being found, so I had to manually export it into my own shell using export CTI_MONGO_URL=... to run the test.

Real endpoint test using my personal Heroku dyno. I created a small script on my own copy of the "Test Worksheet" sheet:

After reviewing the tables in StudentAttendance, I gathered the few cti_ids and verified that they accurately printed True for the correct ids/dates. However, there wasn't too much data to test it on. Result:

Checklist

I've reviewed the contribution guide
I've confirmed the changes work on my machine
This pull request is ready for review

…erifying dataframe shapes.

sezavala · 2026-02-05T02:52:36Z

Also, wanted to note that in this PR, I resolved an issue with the write_to_gsheet function. Instead of connecting to the Google worksheet using the provided configuration key (Prod key or Dev key), the function always used the Prod key when connecting to a Google worksheet.

NickGuerrero · 2026-02-11T08:37:25Z

I'm expecting to make some changes to this PR, since I wanted to add some tests, and figured I could make several fixes. Note that I'll be making some comments to help track what needs to be modified, compared to the usual PR where I use them for feedback.

NickGuerrero · 2026-02-11T08:38:21Z

docs/.DS_Store

Note that .DS_Store should have been part of the .gitignore, I'll throw it into a commit.

NickGuerrero · 2026-02-11T08:40:42Z

tests/gsheet/test_gsheet.py

+        monkeypatch.setenv("ROSTER_SHEET_KEY", environ.get("TEST_SHEET_KEY"))
+        response = client.post("/api/gsheet/group-attendance",
+                               params={
+                                   "spreadsheet_id": environ.get("TEST_SHEET_KEY"),


Note that this defeats the point of the monkey patch; additionally env variables should be called from config.py (Not environ.get())

NickGuerrero · 2026-02-11T08:44:52Z

tests/gsheet/group_attendance/test_gsheet.py

+
+    @pytest.mark.integration
+    @pytest.mark.gsheet
+    def testDefaultLookbackDays(self, monkeypatch, client):


Apologies, for some reason I missed that the integration test was implemented here. I'll still be pushing some commits over to fix other minor issues.

NickGuerrero · 2026-02-11T08:48:31Z

src/gsheet/group_attendance/service.py

+        )
+    )
+
+    print(attendance_query)


Print statements shouldn't be here in production code. I'll also be removing these in commits.

NickGuerrero · 2026-02-11T08:52:37Z

src/gsheet/group_attendance/router.py

+    Copy database records of select students from a specified Google Sheet and record them onto the same Sheet given a date range
+    """
+    try:
+        if not start_date or not end_date:


Note that this means that if only 1 parameter is fulfilled, the argument will be overridden. Should not be a problem for our use case. but goes against the optional declaration, since they're dependent on each other. May create confusion down the line, may considering setting the defaults separately.

NickGuerrero · 2026-02-11T09:12:53Z

src/gsheet/group_attendance/service.py

+    date_grid = np.zeros((len(cti_ids), len(dates)), dtype=bool)
+    pandas_grid = pandas.DataFrame(date_grid, index=cti_data.index, columns=dates)
+
+    result_grid = pandas.concat([cti_data, pandas_grid], axis=1)


Issue with Implementation: The date columns should be generated by the sessions, not the actual dates. Consider a student that attends 2 sessions on the same date, but different times. They would meet the DW requirement, but appear wrong on this sheet. This needs to be resolved before the PR merges. If I'm understanding this wrong, correct me.

NickGuerrero

You did pretty good for a first major PR to this project. My comments are primarily criticisms because I'm looking for problems, but you overall stuck to the general requirements well. I submitted some fixes mentioned, but I would like to see some changes before approving the PR.

The columns should be based by session, not by dates. Since we can have multiple sessions in a day, that should be represented. I believe I used MM/DD - 1,2,3... as an example.
Move tests/gsheet/test_gsheet.py to its own folder, i.e. tests/gsheet/group_attendance/test_group_attendance.py. I didn't move it since I didn't want to disturb anything without ensuring the tests would still pass.
I also left the print statements in there, since I believe you'll use them again for development/testing. Make sure to remove them before re-submitting the PR.

…oup_attendance

…-sys into sergio/group_attendance

sezavala · 2026-02-17T04:04:21Z

Changed the function so the columns are only session dates (since it is on the test database, there isn't much). On days where there is more than one session, they are enumerated.

NickGuerrero · 2026-02-26T08:10:32Z

src/gsheet/group_attendance/service.py

+    """
+    Fetch attendance records and create an attendance matrix of select cti_ids and a date range,
+    given the associated Accelerate tables
+    @param eng: A SQLAlchemy Engine object that connects to the database


It's a bit odd to document only 1 of the parameters; since this is the main function, all parameters should be filled in, perhaps with some high-level details on execution (e.g. how the output matrix is structured, what happens if only 1 date is used, some of the defaults).

NickGuerrero · 2026-02-26T08:23:28Z

src/gsheet/group_attendance/service.py

+    session_cols: Dict[int, str] = {}
+    col_order: List[str] = []
+
+    for session_date, group in by_date.groupby("session_date", sort=True):


This feels like a somewhat complicated way to enumerate the dates, especially since the dates are already sorted (I believe really only need to see the previous date to determine whether or not to add the number if you were running sequentially). It does work with the data frame methods, so I don't have any requested changes.

On the other hand, I don't like the variable name "group", since it's not immediately clear what's in the group. "date_group" or "session_group" may be better. Not a big issue, but an issue with readability.

NickGuerrero · 2026-02-26T08:25:58Z

src/gsheet/group_attendance/service.py

+    result_grid.insert(
+        0,
+        "email",
+        [id_to_email.get(cid, "NOT FOUND") for cid in result_grid.index],


I like the "NOT FOUND" addition - it makes it very clear if something is wrong with the attendance processing (No primary email with a cti_id means no attendance).

NickGuerrero · 2026-02-26T08:27:25Z

src/gsheet/group_attendance/service.py

+    try:
+        column_index = headers.index("cti_id") + 1
+    except ValueError:
+        print("Column name not found")


print() is acceptable for now, but ideally a logger to console is used for more reliability.

NickGuerrero · 2026-02-26T08:31:40Z

src/gsheet/group_attendance/service.py

+                # Skip
+                continue
+
+    worksheet.clear()


I'm not a huge fan of clearing the worksheet once values are extracted, since the attendance matrix still needs to be created by this point. This could possibly result in the ids being cleared on the worksheet, something goes wrong in the session processing, and the original worksheet stays clear.

It means that we should probably keep id backups just in case. Additionally, id group data shouldn't live only on the spreadsheet (e.g. Accountability Groups can be generated from SA assignments, then this group attendance is run for an SA view. Opposed to keeping the id data in a spreadsheet).

NickGuerrero · 2026-02-28T01:49:55Z

After PR Review discussion:

worksheet.clear() is clearing data between tests unintentionally (endpoint logic & service logic are run without the write-back).
- Find a better place for the clear. It should happen as close to the write-back as possible.
  - Could use a select operation instead of a full clear
- Between tests, find a way to reset the group attendance sheet (Test Group), clean-up, maybe with a fixture

…ted tests

sezavala added 5 commits January 25, 2026 18:40

Merge branch 'main' of github.com:NickGuerrero/cti-sys

bec6bcc

group attendance prototype

d00646a

group attendance prototype

00e6185

Full implementation of gsheet/group-attendance endpoint with a test v…

6032c80

…erifying dataframe shapes.

Final implementation of group-attendance router with tests

ac522c2

Ignore .DS_Store files

1385c77

NickGuerrero reviewed Feb 11, 2026

View reviewed changes

docs/.DS_Store

Copy link
Copy Markdown

Owner

NickGuerrero Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that .DS_Store should have been part of the .gitignore, I'll throw it into a commit.

NickGuerrero reviewed Feb 11, 2026

View reviewed changes

Remove direct env calls, use the config.py settings

da924be

NickGuerrero reviewed Feb 11, 2026

View reviewed changes

NickGuerrero requested changes Feb 11, 2026

View reviewed changes

sezavala added 3 commits February 16, 2026 19:51

only returns session dates with duplicates enumerated

7d75760

Merge branch 'main' of github.com:NickGuerrero/cti-sys into sergio/gr…

8df567f

…oup_attendance

Merge branch 'sergio/group_attendance' of github.com:NickGuerrero/cti…

c9f0047

…-sys into sergio/group_attendance

NickGuerrero reviewed Feb 26, 2026

View reviewed changes

Added parameters to fetch function, updated worksheet clear, and upda…

5a42dc2

…ted tests

sezavala requested a review from NickGuerrero March 6, 2026 01:52

Conversation

sezavala commented Feb 1, 2026

Pull Request Name

Issues Fixed

Tests

Checklist

Uh oh!

sezavala commented Feb 5, 2026

Uh oh!

NickGuerrero commented Feb 11, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

NickGuerrero left a comment

Choose a reason for hiding this comment

Uh oh!

sezavala commented Feb 17, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

NickGuerrero commented Feb 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants