Course Recommendation sync by Ar-temis · Pull Request #266 · Edge-Intelligence-Lab/ChatDKU

Ar-temis · 2026-04-21T06:41:38Z

Summary

Adds an agentic course recommendation pipeline (Issue #225) that replaces 20+ executor iterations with
a single structured, policy-aware tool call.

New: CourseRecommender tool (chatdku/core/tools/course_recommender.py)

Given a major and completed courses, it:

Fuzzy-matches the major requirements file and parses required course codes from major + common-core
requirements
Diffs against completed courses to get remaining requirements
Batch-checks schedule availability against cleaned_classdata.csv
Evaluates prerequisite satisfaction per offered course (with anti-requisite stripping and OR/AND
logic)
Returns a grouped Markdown report: recommended / eligible-but-not-offered / prereqs-not-met /
no-schedule-data

Executor: dynamic agenda accumulation (executor.py)

Renames plan → current_agenda in AssessSignature / _ActSignatureBase
Adds agenda_extensions output so the assessor can surface newly discovered investigation areas
mid-trajectory (e.g. a policy doc names a mandatory course not in the original plan)
Extensions accumulate into current_agenda each iteration and flow through to Act and Distill
Fuses assessment into the executor call and removes the separate assessment step
Token ratios updated for the growable agenda; get_token_limits() keeps legacy plan= kwarg for
back-compat

Planner: policy-first course planning (plan.py)

Requires policy retrieval (VectorRetriever/KeywordRetriever) before CourseRecommender so
year-specific mandatory courses surface first
Instructs CourseRecommender as the single baseline eligibility tool
Adds a full Class-of-2027 Data Science demo showing the policy-first → recommender →
agenda-extension flow

Conversation memory

Keeps only the last 3 exchanges in history; earlier exchanges are summarized
Summary is now placed before history (chronological order)

Agent / config / infra

Registers CourseRecommenderOuter with paths from config
Raises default max_iterations 3 → 5 to fit policy retrieval + recommender + extensions
config.py: None-safe quote_plus(db_password or '')
devsync.sh: replaces the misleading ~/.env symlink with a REDIS_HOST presence check pointing to
add_user.sh
Renames syllabi_tool/ → syllabi/; removes unnecessary outer tool wrappers
Disables thinking in model configs
Adds TUI with ASCII logo on startup
major_requirements._best_match: adds _MIN_MATCH_SCORE = 40 so nonsense queries (e.g. "astrology")
return None instead of a spurious match

fast forward 225

fix: reduce agent cold-start time from ~30s to ~9s The agent was hanging for roughly 30 seconds between the end of uv sync and the first line of output. A startup timing diagnostic (utils/startup_timer.py) revealed that virtually all of this time was spent in Python's import phase, not in initialization. Root cause import dspy triggers dspy.clients, which unconditionally imports litellm. On every process start, litellm was fetching a remote model-pricing database from the internet, adding ~42 seconds of network I/O before a single line of agent code ran. Setting the environment variable LITELLM_LOCAL_MODEL_COST_MAP=True before import dspy instructs litellm to use its bundled local cost map instead, dropping that import from ~24s to ~2.6s. Secondary bottlenecks Two smaller but measurable import-time costs were also deferred. First, keyword_retriever.py was calling nltk.data.find() and importing nltk.corpus.stopwords and nltk.tokenize.word_tokenize at module load time, costing ~2.9s on every startup even before any retrieval happened. These imports are now deferred to KeywordRetriever.__init__(), so they are paid once when the agent is constructed rather than when the module is first imported. The loaded references are stored as instance attributes so query calls pay no additional import cost. Second, major_requirements.py and course_schedule.py both imported use_phoenix from chatdku.setup at module level solely for use in their if __name__ == "__main__" smoke-test blocks. This pulled in llama_index.core and llama_index.embeddings at import time unnecessarily. The import was moved inside the __main__ guard where it belongs. Changes chatdku/core/agent.py — added os.environ.setdefault("LITELLM_LOCAL_MODEL_COST_MAP", "True") immediately before import dspy with a comment explaining why order matters. chatdku/core/tools/retriever/keyword_retriever.py — removed top-level import nltk, from nltk.corpus import stopwords, and from nltk.tokenize import word_tokenize; moved NLTK resource checks and corpus imports into KeywordRetriever.__init__(), storing stopwords and word_tokenize as instance attributes for zero-cost reuse in query(). chatdku/core/tools/major_requirements.py and chatdku/core/tools/course_schedule.py — removed from chatdku.setup import use_phoenix from module scope; added it inside the if __name__ == "__main__" block where it is actually needed. utils/startup_timer.py — added as a diagnostic utility that times each import and initialization step individually so future regressions can be caught and attributed precisely.

…toml

## CourseRecommender tool (chatdku/core/tools/course_recommender.py) [NEW] Adds a deterministic Python tool that replaces 20+ individual executor iterations with a single structured call. Given a student's major and completed courses, it: - Fuzzy-matches the major requirements Markdown file (reuses _best_match) - Parses course codes from both major and common-core requirement files - Computes remaining required courses by diffing against completed set - Batch-checks schedule availability against cleaned_classdata.csv - Checks prerequisite satisfaction for each offered course using the DKUHub prereq CSV, with anti-requisite stripping and OR/AND logic - Returns a grouped Markdown report: recommended, eligible-but-not-offered, prerequisites-not-met, and no-schedule-data sections Key fixes during development: - Schedule CSV uses Mon/Tues/Wed/Thurs/Fri + Mtg Start/Mtg End columns (not Days/Start Time), discovered by inspecting the real server CSV - GLOCHALL added to _KNOWN_SUBJECTS (missing subject code) - Anti-requisite course codes were being treated as prerequisites; fixed by splitting prereq text at "Anti-requisite" before extracting codes - Integer Catalog column breaks .str accessor; fixed with .astype(str) ## major_requirements.py — fuzzy match threshold Adds _MIN_MATCH_SCORE = 40 to _best_match so that queries with no meaningful overlap (e.g. "astrology") return None instead of a spurious low-confidence match. ## Executor: dynamic agenda accumulation (executor.py) The Executor was strictly plan-following; it now supports on-the-fly agenda extension as tool results reveal new requirements: - Renames plan → current_agenda in AssessSignature and _ActSignatureBase - Adds agenda_extensions output field to AssessSignature so the assessor can report newly discovered investigation areas (e.g. a policy document names a mandatory course not in the original plan) - forward() accumulates extensions into current_agenda each iteration; all subsequent Assess and Act steps see the full extended agenda - Distill step receives the final extended agenda, not the original plan - get_token_limits() accepts both current_agenda and legacy plan= kwarg for backwards compatibility with the agent.py call site - Token ratios updated: current_agenda gets 3/16 in act (was 2/15 for plan) to reflect that the agenda can grow ## Planner: policy-first course planning (plan.py) The Planner previously instructed the Executor to call individual requirements/schedule/prereq tools. Updated to: - Require policy retrieval FIRST (VectorRetriever or KeywordRetriever) to surface year-specific mandatory courses before calling CourseRecommender - Instruct CourseRecommender as the single baseline tool for eligibility - Add a full schedule planning demo (Class of 2027 Data Science student) showing the policy-first → CourseRecommender → agenda extension flow ## agent.py — tool registration and config fixes - Registers CourseRecommenderOuter in build_agent() with paths from config - Raises default max_iterations from 3 → 5 (policy retrieval + CourseRecommender + potential agenda extensions need the headroom) - Fixes get_token_limits() call to pass current_agenda="" (not plan="") to match the renamed Executor field ## config.py — None-safe DB password quote_plus(db_password) raises TypeError when DB_PASSWORD env var is unset. Fixed to quote_plus(db_password or ''). ## devsync.sh — shared secrets verification Removed the ~/.env symlink logic, which was misleading: credentials are injected via /etc/profile.d/chatdku.sh for chatdku_devs group members, not through a .env file. Replaced with a REDIS_HOST presence check that points developers to add_user.sh if secrets are missing. ## pyproject.toml Adds thefuzz>=0.22.1 (used by major_requirements.py for fuzzy matching). ## Tests (90 tests, all passing locally and on server) - tests/test_course_recommender.py [NEW]: 28 tests covering parse_course_codes (9), prerequisites_met (7), full recommendation pipeline scenarios TC1-TC7 (7), and infrastructure/span tests (5) - tests/test_agent_configuration.py [NEW]: 11 structural tests verifying Planner instructions require policy retrieval before CourseRecommender, AssessSignature has agenda_extensions output, current_agenda field naming is correct, and max_iterations default >= 5 - tests/conftest.py: adds sample_classdata_real_csv fixture with actual server column layout; patches course_recommender.span_ctx_start - tests/test_major_requirements.py: removes imports of _jaccard and _tokenize which were replaced by thefuzz

- black reformatted conftest.py, test_agent_configuration.py, course_recommender.py, test_course_recommender.py - removed unused _run_recommendation import from test_course_recommender.py - removed assigned-but-never-used ethldr_not_eligible variable in TC7 - E402 warnings in agent.py are pre-existing (intentional os.environ setdefault before dspy/litellm imports) — not touched

summarized

…nge that gets summarized

    ):
        """Runs without error when config has no tracer (uses nullcontext)."""
-        import chatdku.core.tools.syllabi_tool.query_curriculum_db as mod
+        import chatdku.core.tools.syllabi.syllabi_tool as mod


Updated Flake8 configuration to ignore E402 error.

+        prereq_df = _load_prereq_df(prereq_csv_path)
+        prereq_available = True
+    except Exception:
+        prereq_df = None


+        return
+    _ensure_nltk_resource("corpora/stopwords", "stopwords")
+    _ensure_nltk_resource("tokenizers/punkt_tab", "punkt_tab")
+    _nltk_ready = True


Ar-temis and others added 21 commits April 15, 2026 14:58

Merge pull request #260 from Edge-Intelligence-Lab/main

3ea1628

fast forward 225

upgraded agent to take positional argument as a user query.

a78cdc4

added a cool TUI for agent

a952666

llama-index dependency bump (fixes warnings on agent run)

e1d42e3

Added a logo to TUI after startup sequence.

f7eee1c

deleted 2-year-old log file nohup.out

091f9f1

Ignore CSV files

1db485e

added thefuzz and pymupdf4llm to toml and added a few reasons to the …

eb194a6

…toml

black formatted

0282677

Renamed query lookup to syllabi lookus

cd6ac07

Refactor - Removed unnecessary tool outer functions

d1bfbfe

fused both the assessment and executor together

544e54e

Updated the model's configs and disabled thinking.

c06ac34

Only 3 conversation exchanges are saved in history and rest is

9bb3df5

summarized

conversation summary before the history since it is the earlier excha…

f13fe9e

…nge that gets summarized

Removed assessment and infused it into the executor call

a4467e5

used Path() classes for input paths

e96b56c

Added back in thought

d714ec5

Ar-temis requested a review from pomegranar April 21, 2026 06:41

Ar-temis assigned pomegranar and Ar-temis Apr 21, 2026

Ar-temis added agent-pipeline This issue is about agent-pipeline feature-add A new feature or functionality. Not bug. labels Apr 21, 2026

github-code-quality Bot found potential problems Apr 21, 2026

View reviewed changes

Ar-temis and others added 3 commits April 21, 2026 14:46

Merge branch 'main' into 225-course-recs

0eb00e1

Modify Flake8 ignore rules in lint.yml

ce206cb

Updated Flake8 configuration to ignore E402 error.

Fixing black and flake8 complaints

8f5797d

github-code-quality Bot found potential problems Apr 21, 2026

View reviewed changes

Ar-temis merged commit 409a6aa into main Apr 21, 2026
4 checks passed

pomegranar deleted the 225-course-recs branch May 1, 2026 08:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Course Recommendation sync#266

Course Recommendation sync#266
Ar-temis merged 24 commits into
mainfrom
225-course-recs

Ar-temis commented Apr 21, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Ar-temis commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

New: CourseRecommender tool (chatdku/core/tools/course_recommender.py)

Executor: dynamic agenda accumulation (executor.py)

Planner: policy-first course planning (plan.py)

Conversation memory

Agent / config / infra

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Ar-temis commented Apr 21, 2026 •

edited

Loading