Course Recommendation sync#266
Merged
Merged
Conversation
fast forward 225
fix: reduce agent cold-start time from ~30s to ~9s
The agent was hanging for roughly 30 seconds between the end of uv sync and the first line of output. A startup timing diagnostic (utils/startup_timer.py) revealed that virtually all of this time was spent
in Python's import phase, not in initialization.
Root cause
import dspy triggers dspy.clients, which unconditionally imports litellm. On every process start, litellm was fetching a remote model-pricing database from the internet, adding ~42 seconds of network I/O
before a single line of agent code ran. Setting the environment variable LITELLM_LOCAL_MODEL_COST_MAP=True before import dspy instructs litellm to use its bundled local cost map instead, dropping that import
from ~24s to ~2.6s.
Secondary bottlenecks
Two smaller but measurable import-time costs were also deferred. First, keyword_retriever.py was calling nltk.data.find() and importing nltk.corpus.stopwords and nltk.tokenize.word_tokenize at module load
time, costing ~2.9s on every startup even before any retrieval happened. These imports are now deferred to KeywordRetriever.__init__(), so they are paid once when the agent is constructed rather than when
the module is first imported. The loaded references are stored as instance attributes so query calls pay no additional import cost. Second, major_requirements.py and course_schedule.py both imported
use_phoenix from chatdku.setup at module level solely for use in their if __name__ == "__main__" smoke-test blocks. This pulled in llama_index.core and llama_index.embeddings at import time unnecessarily.
The import was moved inside the __main__ guard where it belongs.
Changes
chatdku/core/agent.py — added os.environ.setdefault("LITELLM_LOCAL_MODEL_COST_MAP", "True") immediately before import dspy with a comment explaining why order matters.
chatdku/core/tools/retriever/keyword_retriever.py — removed top-level import nltk, from nltk.corpus import stopwords, and from nltk.tokenize import word_tokenize; moved NLTK resource checks and corpus
imports into KeywordRetriever.__init__(), storing stopwords and word_tokenize as instance attributes for zero-cost reuse in query().
chatdku/core/tools/major_requirements.py and chatdku/core/tools/course_schedule.py — removed from chatdku.setup import use_phoenix from module scope; added it inside the if __name__ == "__main__" block where
it is actually needed.
utils/startup_timer.py — added as a diagnostic utility that times each import and initialization step individually so future regressions can be caught and attributed precisely.
## CourseRecommender tool (chatdku/core/tools/course_recommender.py) [NEW] Adds a deterministic Python tool that replaces 20+ individual executor iterations with a single structured call. Given a student's major and completed courses, it: - Fuzzy-matches the major requirements Markdown file (reuses _best_match) - Parses course codes from both major and common-core requirement files - Computes remaining required courses by diffing against completed set - Batch-checks schedule availability against cleaned_classdata.csv - Checks prerequisite satisfaction for each offered course using the DKUHub prereq CSV, with anti-requisite stripping and OR/AND logic - Returns a grouped Markdown report: recommended, eligible-but-not-offered, prerequisites-not-met, and no-schedule-data sections Key fixes during development: - Schedule CSV uses Mon/Tues/Wed/Thurs/Fri + Mtg Start/Mtg End columns (not Days/Start Time), discovered by inspecting the real server CSV - GLOCHALL added to _KNOWN_SUBJECTS (missing subject code) - Anti-requisite course codes were being treated as prerequisites; fixed by splitting prereq text at "Anti-requisite" before extracting codes - Integer Catalog column breaks .str accessor; fixed with .astype(str) ## major_requirements.py — fuzzy match threshold Adds _MIN_MATCH_SCORE = 40 to _best_match so that queries with no meaningful overlap (e.g. "astrology") return None instead of a spurious low-confidence match. ## Executor: dynamic agenda accumulation (executor.py) The Executor was strictly plan-following; it now supports on-the-fly agenda extension as tool results reveal new requirements: - Renames plan → current_agenda in AssessSignature and _ActSignatureBase - Adds agenda_extensions output field to AssessSignature so the assessor can report newly discovered investigation areas (e.g. a policy document names a mandatory course not in the original plan) - forward() accumulates extensions into current_agenda each iteration; all subsequent Assess and Act steps see the full extended agenda - Distill step receives the final extended agenda, not the original plan - get_token_limits() accepts both current_agenda and legacy plan= kwarg for backwards compatibility with the agent.py call site - Token ratios updated: current_agenda gets 3/16 in act (was 2/15 for plan) to reflect that the agenda can grow ## Planner: policy-first course planning (plan.py) The Planner previously instructed the Executor to call individual requirements/schedule/prereq tools. Updated to: - Require policy retrieval FIRST (VectorRetriever or KeywordRetriever) to surface year-specific mandatory courses before calling CourseRecommender - Instruct CourseRecommender as the single baseline tool for eligibility - Add a full schedule planning demo (Class of 2027 Data Science student) showing the policy-first → CourseRecommender → agenda extension flow ## agent.py — tool registration and config fixes - Registers CourseRecommenderOuter in build_agent() with paths from config - Raises default max_iterations from 3 → 5 (policy retrieval + CourseRecommender + potential agenda extensions need the headroom) - Fixes get_token_limits() call to pass current_agenda="" (not plan="") to match the renamed Executor field ## config.py — None-safe DB password quote_plus(db_password) raises TypeError when DB_PASSWORD env var is unset. Fixed to quote_plus(db_password or ''). ## devsync.sh — shared secrets verification Removed the ~/.env symlink logic, which was misleading: credentials are injected via /etc/profile.d/chatdku.sh for chatdku_devs group members, not through a .env file. Replaced with a REDIS_HOST presence check that points developers to add_user.sh if secrets are missing. ## pyproject.toml Adds thefuzz>=0.22.1 (used by major_requirements.py for fuzzy matching). ## Tests (90 tests, all passing locally and on server) - tests/test_course_recommender.py [NEW]: 28 tests covering parse_course_codes (9), prerequisites_met (7), full recommendation pipeline scenarios TC1-TC7 (7), and infrastructure/span tests (5) - tests/test_agent_configuration.py [NEW]: 11 structural tests verifying Planner instructions require policy retrieval before CourseRecommender, AssessSignature has agenda_extensions output, current_agenda field naming is correct, and max_iterations default >= 5 - tests/conftest.py: adds sample_classdata_real_csv fixture with actual server column layout; patches course_recommender.span_ctx_start - tests/test_major_requirements.py: removes imports of _jaccard and _tokenize which were replaced by thefuzz
- black reformatted conftest.py, test_agent_configuration.py, course_recommender.py, test_course_recommender.py - removed unused _run_recommendation import from test_course_recommender.py - removed assigned-but-never-used ethldr_not_eligible variable in TC7 - E402 warnings in agent.py are pre-existing (intentional os.environ setdefault before dspy/litellm imports) — not touched
…nge that gets summarized
| ): | ||
| """Runs without error when config has no tracer (uses nullcontext).""" | ||
| import chatdku.core.tools.syllabi_tool.query_curriculum_db as mod | ||
| import chatdku.core.tools.syllabi.syllabi_tool as mod |
Updated Flake8 configuration to ignore E402 error.
| prereq_df = _load_prereq_df(prereq_csv_path) | ||
| prereq_available = True | ||
| except Exception: | ||
| prereq_df = None |
| return | ||
| _ensure_nltk_resource("corpora/stopwords", "stopwords") | ||
| _ensure_nltk_resource("tokenizers/punkt_tab", "punkt_tab") | ||
| _nltk_ready = True |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds an agentic course recommendation pipeline (Issue #225) that replaces 20+ executor iterations with
a single structured, policy-aware tool call.
New: CourseRecommender tool (chatdku/core/tools/course_recommender.py)
Given a major and completed courses, it:
requirements
logic)
no-schedule-data
Executor: dynamic agenda accumulation (executor.py)
mid-trajectory (e.g. a policy doc names a mandatory course not in the original plan)
back-compat
Planner: policy-first course planning (plan.py)
year-specific mandatory courses surface first
agenda-extension flow
Conversation memory
Agent / config / infra
add_user.sh
return None instead of a spurious match