Skip to content

Course Recommendation sync#266

Merged
Ar-temis merged 24 commits into
mainfrom
225-course-recs
Apr 21, 2026
Merged

Course Recommendation sync#266
Ar-temis merged 24 commits into
mainfrom
225-course-recs

Conversation

@Ar-temis
Copy link
Copy Markdown
Member

@Ar-temis Ar-temis commented Apr 21, 2026

Summary

Adds an agentic course recommendation pipeline (Issue #225) that replaces 20+ executor iterations with
a single structured, policy-aware tool call.

New: CourseRecommender tool (chatdku/core/tools/course_recommender.py)

Given a major and completed courses, it:

  • Fuzzy-matches the major requirements file and parses required course codes from major + common-core
    requirements
  • Diffs against completed courses to get remaining requirements
  • Batch-checks schedule availability against cleaned_classdata.csv
  • Evaluates prerequisite satisfaction per offered course (with anti-requisite stripping and OR/AND
    logic)
  • Returns a grouped Markdown report: recommended / eligible-but-not-offered / prereqs-not-met /
    no-schedule-data

Executor: dynamic agenda accumulation (executor.py)

  • Renames plan → current_agenda in AssessSignature / _ActSignatureBase
  • Adds agenda_extensions output so the assessor can surface newly discovered investigation areas
    mid-trajectory (e.g. a policy doc names a mandatory course not in the original plan)
  • Extensions accumulate into current_agenda each iteration and flow through to Act and Distill
  • Fuses assessment into the executor call and removes the separate assessment step
  • Token ratios updated for the growable agenda; get_token_limits() keeps legacy plan= kwarg for
    back-compat

Planner: policy-first course planning (plan.py)

  • Requires policy retrieval (VectorRetriever/KeywordRetriever) before CourseRecommender so
    year-specific mandatory courses surface first
  • Instructs CourseRecommender as the single baseline eligibility tool
  • Adds a full Class-of-2027 Data Science demo showing the policy-first → recommender →
    agenda-extension flow

Conversation memory

  • Keeps only the last 3 exchanges in history; earlier exchanges are summarized
  • Summary is now placed before history (chronological order)

Agent / config / infra

  • Registers CourseRecommenderOuter with paths from config
  • Raises default max_iterations 3 → 5 to fit policy retrieval + recommender + extensions
  • config.py: None-safe quote_plus(db_password or '')
  • devsync.sh: replaces the misleading ~/.env symlink with a REDIS_HOST presence check pointing to
    add_user.sh
  • Renames syllabi_tool/ → syllabi/; removes unnecessary outer tool wrappers
  • Disables thinking in model configs
  • Adds TUI with ASCII logo on startup
  • major_requirements._best_match: adds _MIN_MATCH_SCORE = 40 so nonsense queries (e.g. "astrology")
    return None instead of a spurious match

Ar-temis and others added 21 commits April 15, 2026 14:58
fix: reduce agent cold-start time from ~30s to ~9s
  The agent was hanging for roughly 30 seconds between the end of uv sync and the first line of output. A startup timing diagnostic (utils/startup_timer.py) revealed that virtually all of this time was spent
  in Python's import phase, not in initialization.

  Root cause

  import dspy triggers dspy.clients, which unconditionally imports litellm. On every process start, litellm was fetching a remote model-pricing database from the internet, adding ~42 seconds of network I/O
  before a single line of agent code ran. Setting the environment variable LITELLM_LOCAL_MODEL_COST_MAP=True before import dspy instructs litellm to use its bundled local cost map instead, dropping that import
   from ~24s to ~2.6s.

  Secondary bottlenecks

  Two smaller but measurable import-time costs were also deferred. First, keyword_retriever.py was calling nltk.data.find() and importing nltk.corpus.stopwords and nltk.tokenize.word_tokenize at module load
  time, costing ~2.9s on every startup even before any retrieval happened. These imports are now deferred to KeywordRetriever.__init__(), so they are paid once when the agent is constructed rather than when
  the module is first imported. The loaded references are stored as instance attributes so query calls pay no additional import cost. Second, major_requirements.py and course_schedule.py both imported
  use_phoenix from chatdku.setup at module level solely for use in their if __name__ == "__main__" smoke-test blocks. This pulled in llama_index.core and llama_index.embeddings at import time unnecessarily.
  The import was moved inside the __main__ guard where it belongs.

  Changes

  chatdku/core/agent.py — added os.environ.setdefault("LITELLM_LOCAL_MODEL_COST_MAP", "True") immediately before import dspy with a comment explaining why order matters.

  chatdku/core/tools/retriever/keyword_retriever.py — removed top-level import nltk, from nltk.corpus import stopwords, and from nltk.tokenize import word_tokenize; moved NLTK resource checks and corpus
  imports into KeywordRetriever.__init__(), storing stopwords and word_tokenize as instance attributes for zero-cost reuse in query().

  chatdku/core/tools/major_requirements.py and chatdku/core/tools/course_schedule.py — removed from chatdku.setup import use_phoenix from module scope; added it inside the if __name__ == "__main__" block where
   it is actually needed.

  utils/startup_timer.py — added as a diagnostic utility that times each import and initialization step individually so future regressions can be caught and attributed precisely.
## CourseRecommender tool (chatdku/core/tools/course_recommender.py) [NEW]
Adds a deterministic Python tool that replaces 20+ individual executor
iterations with a single structured call. Given a student's major and
completed courses, it:
- Fuzzy-matches the major requirements Markdown file (reuses _best_match)
- Parses course codes from both major and common-core requirement files
- Computes remaining required courses by diffing against completed set
- Batch-checks schedule availability against cleaned_classdata.csv
- Checks prerequisite satisfaction for each offered course using the
  DKUHub prereq CSV, with anti-requisite stripping and OR/AND logic
- Returns a grouped Markdown report: recommended, eligible-but-not-offered,
  prerequisites-not-met, and no-schedule-data sections

Key fixes during development:
- Schedule CSV uses Mon/Tues/Wed/Thurs/Fri + Mtg Start/Mtg End columns
  (not Days/Start Time), discovered by inspecting the real server CSV
- GLOCHALL added to _KNOWN_SUBJECTS (missing subject code)
- Anti-requisite course codes were being treated as prerequisites; fixed
  by splitting prereq text at "Anti-requisite" before extracting codes
- Integer Catalog column breaks .str accessor; fixed with .astype(str)

## major_requirements.py — fuzzy match threshold
Adds _MIN_MATCH_SCORE = 40 to _best_match so that queries with no
meaningful overlap (e.g. "astrology") return None instead of a
spurious low-confidence match.

## Executor: dynamic agenda accumulation (executor.py)
The Executor was strictly plan-following; it now supports on-the-fly
agenda extension as tool results reveal new requirements:
- Renames plan → current_agenda in AssessSignature and _ActSignatureBase
- Adds agenda_extensions output field to AssessSignature so the assessor
  can report newly discovered investigation areas (e.g. a policy document
  names a mandatory course not in the original plan)
- forward() accumulates extensions into current_agenda each iteration;
  all subsequent Assess and Act steps see the full extended agenda
- Distill step receives the final extended agenda, not the original plan
- get_token_limits() accepts both current_agenda and legacy plan= kwarg
  for backwards compatibility with the agent.py call site
- Token ratios updated: current_agenda gets 3/16 in act (was 2/15 for plan)
  to reflect that the agenda can grow

## Planner: policy-first course planning (plan.py)
The Planner previously instructed the Executor to call individual
requirements/schedule/prereq tools. Updated to:
- Require policy retrieval FIRST (VectorRetriever or KeywordRetriever)
  to surface year-specific mandatory courses before calling CourseRecommender
- Instruct CourseRecommender as the single baseline tool for eligibility
- Add a full schedule planning demo (Class of 2027 Data Science student)
  showing the policy-first → CourseRecommender → agenda extension flow

## agent.py — tool registration and config fixes
- Registers CourseRecommenderOuter in build_agent() with paths from config
- Raises default max_iterations from 3 → 5 (policy retrieval +
  CourseRecommender + potential agenda extensions need the headroom)
- Fixes get_token_limits() call to pass current_agenda="" (not plan="")
  to match the renamed Executor field

## config.py — None-safe DB password
quote_plus(db_password) raises TypeError when DB_PASSWORD env var is
unset. Fixed to quote_plus(db_password or '').

## devsync.sh — shared secrets verification
Removed the ~/.env symlink logic, which was misleading: credentials are
injected via /etc/profile.d/chatdku.sh for chatdku_devs group members,
not through a .env file. Replaced with a REDIS_HOST presence check that
points developers to add_user.sh if secrets are missing.

## pyproject.toml
Adds thefuzz>=0.22.1 (used by major_requirements.py for fuzzy matching).

## Tests (90 tests, all passing locally and on server)
- tests/test_course_recommender.py [NEW]: 28 tests covering
  parse_course_codes (9), prerequisites_met (7), full recommendation
  pipeline scenarios TC1-TC7 (7), and infrastructure/span tests (5)
- tests/test_agent_configuration.py [NEW]: 11 structural tests verifying
  Planner instructions require policy retrieval before CourseRecommender,
  AssessSignature has agenda_extensions output, current_agenda field
  naming is correct, and max_iterations default >= 5
- tests/conftest.py: adds sample_classdata_real_csv fixture with actual
  server column layout; patches course_recommender.span_ctx_start
- tests/test_major_requirements.py: removes imports of _jaccard and
  _tokenize which were replaced by thefuzz
- black reformatted conftest.py, test_agent_configuration.py,
  course_recommender.py, test_course_recommender.py
- removed unused _run_recommendation import from test_course_recommender.py
- removed assigned-but-never-used ethldr_not_eligible variable in TC7
- E402 warnings in agent.py are pre-existing (intentional os.environ
  setdefault before dspy/litellm imports) — not touched
@Ar-temis Ar-temis requested a review from pomegranar April 21, 2026 06:41
@Ar-temis Ar-temis added agent-pipeline This issue is about agent-pipeline feature-add A new feature or functionality. Not bug. labels Apr 21, 2026
Comment thread chatdku/core/tools/course_recommender.py Fixed
Comment thread chatdku/core/tools/retriever/keyword_retriever.py Fixed
Comment thread tests/test_llama_index_tools.py Fixed
Comment thread tests/test_sql_agent.py
):
"""Runs without error when config has no tracer (uses nullcontext)."""
import chatdku.core.tools.syllabi_tool.query_curriculum_db as mod
import chatdku.core.tools.syllabi.syllabi_tool as mod
prereq_df = _load_prereq_df(prereq_csv_path)
prereq_available = True
except Exception:
prereq_df = None
return
_ensure_nltk_resource("corpora/stopwords", "stopwords")
_ensure_nltk_resource("tokenizers/punkt_tab", "punkt_tab")
_nltk_ready = True
@Ar-temis Ar-temis merged commit 409a6aa into main Apr 21, 2026
4 checks passed
@pomegranar pomegranar deleted the 225-course-recs branch May 1, 2026 08:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent-pipeline This issue is about agent-pipeline feature-add A new feature or functionality. Not bug.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants