Skip to content

Cookie ids#146

Open
sandragjacinto wants to merge 3 commits intomainfrom
cookie-ids
Open

Cookie ids#146
sandragjacinto wants to merge 3 commits intomainfrom
cookie-ids

Conversation

@sandragjacinto
Copy link
Collaborator

Description

Why?

How?

Screenshots (if appropriate):

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

Checklist:

  • My code follows the code style of this project.
  • My code is tested.
  • I have updated the documentation accordingly.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR migrates session tracking from the X-Session-ID header to an x-session-id cookie, centralizing request parsing and adding utilities to resolve/create users & sessions based on that cookie across API endpoints and services.

Changes:

  • Introduce extract_session_cookie() / extract_origin_from_request() helpers and apply them across routers + middleware.
  • Add resolve_user_and_session() utility and a new /user_and_session endpoint that sets the session cookie.
  • Update data-collection plumbing, tests, and environment/configuration to support cookie-based session IDs and an ENV setting.

Reviewed changes

Copilot reviewed 21 out of 24 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
src/app/user/utils/utils.py Adds async helper to resolve/create user+session from a session UUID.
src/app/user/api/router.py Adds cookie-setting endpoint and refactors bookmark routes to use cookie-derived identity.
src/app/tutor/api/router.py Switches session ID sourcing from header to cookie.
src/app/search/api/router.py Switches session ID sourcing from header to cookie.
src/app/api/api_v1/endpoints/chat.py Switches session ID sourcing from header to cookie for data-collection.
src/app/api/api_v1/endpoints/metric.py Switches session ID sourcing from header to cookie for syllabus download tracking.
src/app/middleware/monitor_requests.py Switches request monitoring session ID from header to cookie.
src/app/shared/utils/requests.py New shared request helpers for cookie + origin extraction.
src/app/shared/domain/constants.py Defines cookie name and TTL constant for session cookie.
src/app/services/sql_db/queries_user.py Adjusts session lookup signature and adds new exception/logging behavior.
src/app/services/data_collection.py Changes session_id typing to UUID and removes string-to-UUID conversion.
src/app/api/api_v1/api.py Updates user router import wiring.
src/app/core/config.py Adds required ENV setting for environment detection.
src/app/tests/test_utils.py Adds unit tests for resolve_user_and_session.
src/app/tests/api/api_v1/test_user.py Updates user API tests to use cookies and new bookmark routes.
src/app/tests/api/api_v1/test_search.py Updates search API tests to send session via Cookie header.
pytest.ini Sets ENV=test for test runs.
k8s/welearn-api/values.dev.yaml Provides ENV config for dev deployment.
k8s/welearn-api/values.staging.yaml Provides ENV config for staging deployment.
k8s/welearn-api/values.prod.yaml Provides ENV config for production deployment.
.env.example Documents ENV for local development.
Comments suppressed due to low confidence (2)

src/app/user/api/router.py:135

  • The route path contains a literal ":document_id" segment, but document_id is not declared as a path parameter (FastAPI uses {document_id}), so it will be treated as a query param and the URL will literally include :document_id. Either change the route to use a proper path param (/bookmarks/{document_id}) or remove :document_id from the path and keep document_id as a query parameter.
    src/app/user/api/router.py:162
  • Same issue as above: the route path includes a literal :document_id segment. This is easy to confuse with a real path parameter and will likely break clients expecting /bookmarks/<uuid>. Consider switching to /bookmarks/{document_id} (path param) or /bookmarks with document_id only as a query parameter.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

if user:
return user.id
else:
logger.error(f"User with id {user_id} not found,")
Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error log message has a trailing comma ("not found,"). If this is meant to be a full sentence, remove the trailing comma to keep logs clean and consistent.

Suggested change
logger.error(f"User with id {user_id} not found,")
logger.error(f"User with id {user_id} not found")

Copilot uses AI. Check for mistakes.
Comment on lines 258 to 263
response = client.post(
f"{settings.API_V1_STR}/user/:user_id/bookmarks/:document_id",
params={"user_id": user_id, "document_id": document_id},
f"{settings.API_V1_STR}/user/bookmarks/:document_id",
params={"document_id": document_id},
headers={"X-API-Key": "test"},
cookies={"x-session-id": str(session_id)},
)
Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test calls /user/bookmarks/:document_id (literal :document_id) and passes document_id via query params. If the intent is to have a path parameter, the route should be /user/bookmarks/{document_id} and the test should call /user/bookmarks/<uuid>. If the intent is a query parameter, remove :document_id from the URL to avoid confusion.

Copilot uses AI. Check for mistakes.
Comment on lines 84 to 92
with session_maker() as s:
session = s.execute(select(Session).where(Session.id == session_id)).first()
if session:
logger.info(
"Valid session. user_id=%s session_id=%s",
session[0].inferred_user_id,
session_id,
)
return session[0].inferred_user_id
Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_user_from_session_id currently treats any session row as valid and does not check Session.end_at > now (unlike get_or_create_session_sync). This can make expired sessions resolve to a user indefinitely, especially problematic now that cookies can persist long-term. Consider filtering by end_at in the query (and/or deleting/invalidating expired sessions).

Copilot uses AI. Check for mistakes.


SESSION_COOKIE_NAME = "x-session-id"
SESSION_TTL_SECONDS = 60 * 60 * 24 * 400 # 400 days
Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SESSION_TTL_SECONDS is set to ~400 days, but get_or_create_session_sync currently creates DB sessions with end_at = now + 24h. This mismatch will cause long-lived cookies to reference expired DB sessions (and can lead to frequent session re-creation / unexpected identity behavior). Consider aligning cookie TTL with DB session validity (or renaming this to a long-lived user-tracking cookie and adjusting the DB model accordingly).

Suggested change
SESSION_TTL_SECONDS = 60 * 60 * 24 * 400 # 400 days
SESSION_TTL_SECONDS = 60 * 60 * 24 # 1 day

Copilot uses AI. Check for mistakes.
else:
logger.warning(
f"No X-Session-ID header provided for {request.url.path}"
f"No X-Session-ID cookie provided for {request.url.path}"
Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The warning message still refers to "X-Session-ID" even though the code now reads the session from a cookie (x-session-id). Consider updating the log message to reference the actual cookie name (ideally using SESSION_COOKIE_NAME) to avoid confusion during ops/debugging.

Suggested change
f"No X-Session-ID cookie provided for {request.url.path}"
f"No x-session-id cookie provided for {request.url.path}"

Copilot uses AI. Check for mistakes.
AZURE_API_VERSION: str

LLM_MODEL_NAME: str
ENV: str
Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ENV is now a required Settings field. Because settings = Settings() is instantiated at import time, missing ENV will prevent the app from starting (ValidationError). If you want backward compatibility / safer local runs, consider giving ENV a default (e.g., "development") and/or validating it as a constrained enum.

Suggested change
ENV: str
ENV: str = "development"

Copilot uses AI. Check for mistakes.
Comment on lines +21 to +28
user_id = await run_in_threadpool(get_user_from_session_id, session_uuid)

if not user_id:
logger.info("No user found. Creating new user and session.")
user_id = await run_in_threadpool(get_or_create_user_sync, None, referer)
session_uuid = await run_in_threadpool(
get_or_create_session_sync, user_id, None, host, referer
)
Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolve_user_and_session creates a new user+session whenever get_user_from_session_id returns falsy. When used from endpoints like bookmarks, an invalid/unknown cookie will silently create a brand-new user and then read/write bookmarks on that new user, which is likely unintended. Consider splitting this into two flows (e.g., resolve_existing_user_from_session that 401s on missing/invalid session vs. an explicit "create user/session" endpoint).

Copilot uses AI. Check for mistakes.
Comment on lines +93 to +96
else:
HTTPException(
status_code=404, detail=f"Session with id {session_id} not found"
)
Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_user_from_session_id instantiates an HTTPException in the else branch but never raises it, so the exception is effectively ignored and the function falls through returning None. Either raise the exception or remove it and just return None (and let callers decide how to handle missing sessions).

Suggested change
else:
HTTPException(
status_code=404, detail=f"Session with id {session_id} not found"
)

Copilot uses AI. Check for mistakes.
Comment on lines 21 to +35
def get_or_create_user_sync(
user_id: uuid.UUID | None = None, referer: str | None = None
) -> uuid.UUID:
with session_maker() as s:
if user_id:
user = s.execute(
select(InferredUser.id).where(InferredUser.id == user_id)
).first()
if user:
return user.id
else:
logger.error(f"User with id {user_id} not found,")
raise HTTPException(
status_code=404, detail=f"User with id {user_id} not found"
)
Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_or_create_user_sync is a low-level DB helper but it now raises fastapi.HTTPException. This couples the SQL layer to the web layer and can also be mishandled by callers that catch broad exceptions (turning a 404 into a 500). Prefer raising a domain/ValueError here and translate to HTTPException in the API layer.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants