Add score extraction from CI logs by markpolyak · Pull Request #39 · markpolyak/lab_grader_web

markpolyak · 2025-12-03T07:02:46Z

Implements configurable score extraction from GitHub Actions logs with support for multiple patterns and automatic decimal separator detection.

Features:

Score extraction from CI logs using configurable regex patterns
Multiple pattern support (tried in order until match found)
Automatic decimal separator detection from Google Sheets locale
Score validation (all occurrences must match)
Format: v@10.5 or v@10,5 depending on locale
Combined format with penalty: v@10.5-3 or v@10,5-3
Frontend display of extracted scores
Comprehensive test coverage

Backend changes:

New module: grading/score.py with score extraction logic
Updated grading/grader.py with check_score() method
Updated grading/sheets_client.py with get_decimal_separator()
Updated main.py to integrate score processing
Updated grading/init.py exports

Frontend changes:

Updated RegistrationForm to display score information

Configuration:

Add 'score.patterns' list to lab config in YAML
Patterns are regex with first capturing group = score
Optional feature (backward compatible)

Documentation:

Updated CLAUDE.md with score configuration examples
Added test suite in tests/test_score.py

Example config:

labs:
  "1":
    score:
      patterns:
        - '##\[notice\]Points\s+(\d+(?:[.,]\d+)?)/\d+'
        - 'Score\s+is\s+(\d+(?:[.,]\d+)?)'

Implements configurable score extraction from GitHub Actions logs with support for multiple patterns and automatic decimal separator detection. Features: - Score extraction from CI logs using configurable regex patterns - Multiple pattern support (tried in order until match found) - Automatic decimal separator detection from Google Sheets locale - Score validation (all occurrences must match) - Format: v@10.5 or v@10,5 depending on locale - Combined format with penalty: v@10.5-3 or v@10,5-3 - Frontend display of extracted scores - Comprehensive test coverage Backend changes: - New module: grading/score.py with score extraction logic - Updated grading/grader.py with check_score() method - Updated grading/sheets_client.py with get_decimal_separator() - Updated main.py to integrate score processing - Updated grading/__init__.py exports Frontend changes: - Updated RegistrationForm to display score information Configuration: - Add 'score.patterns' list to lab config in YAML - Patterns are regex with first capturing group = score - Optional feature (backward compatible) Documentation: - Updated CLAUDE.md with score configuration examples - Added test suite in tests/test_score.py Example config: ```yaml labs: "1": score: patterns: - '##\[notice\]Points\s+(\d+(?:[.,]\d+)?)/\d+' - 'Score\s+is\s+(\d+(?:[.,]\d+)?)' ```

- Added detailed section on score extraction from CI logs - Included multiple pattern examples for different log formats - Documented flexible pattern syntax with .*? for robust matching - Added tips for creating reliable regex patterns - Explained decimal separator auto-detection - Updated process description with score extraction step - Provided frontend display examples Key improvements: - Flexible pattern: 'ПРЕДВАРИТЕЛЬНАЯ.*?ОЦЕНКА.*?ЖУРНАЛ:\s*(\d+(?:[.,]\d+)?)' - Handles variable whitespace and formatting in logs - Better guidance for YAML regex configuration

Updated pattern to use .*? (non-greedy any character) for robust matching: - 'ПРЕДВАРИТЕЛЬНАЯ.*?ОЦЕНКА.*?ЖУРНАЛ:\s*(\d+(?:[.,]\d+)?)' This handles: - Variable whitespace after GitHub Actions timestamp - Any formatting between key words - Optional whitespace before the score value Added additional patterns for common formats: - 'ИТОГО:\s*(\d+(?:[.,]\d+)?)\s*баллов' for total score format

Enhanced debugging to diagnose score pattern matching issues: - Show first/last 500 chars of logs being searched - Log each pattern attempt with detailed results - Show sample lines containing keywords (ОЦЕНКА, ЖУРНАЛ) - Number patterns and jobs for easier tracking - Display matched values when pattern succeeds - Indicate when no keyword lines found in logs This will help identify: - If logs contain expected text - Encoding or formatting issues - Which pattern (if any) should match - Which jobs are being checked

The issue appears to be that the Python script output (ИТОГОВЫЙ ОТЧЁТ) is not present in the logs fetched from GitHub API, even though it's visible in the GitHub UI. This commit adds diagnostics to: - Search for common report keywords (ИТОГОВЫЙ ОТЧЁТ, ИТОГО:, баллов, etc.) - Show context around any found keywords - Check for timestamps around the expected output time (05:22:34) - Warn if no report keywords found This will help confirm whether the issue is: 1. Logs incomplete/truncated by GitHub API 2. Output in a different job or step 3. Some other fetching issue

Problem: Was only showing first/last 500 chars of logs in debug output, while logs contain 87K+ chars. The Python script output is in the middle. Changes: - Show middle 500 chars sample (around position len/2) - Add case-insensitive keyword search (both exact and lowercase) - Search for timestamp lines around 05:22:34 (expected score time) - Show sample lines from 25%, 50%, 75% positions in logs - Add ОЦЕНКА, ЖУРНАЛ, ПРЕДВАРИТЕЛЬНАЯ to keyword search list - Show position and context when keyword found This will reveal exactly where the score output is in the logs and why the pattern isn't matching it.

Added byte-level debugging to diagnose score pattern matching: - Test simple string search for 'ПРЕДВАРИТЕЛЬНАЯ' to verify Cyrillic works - Show UTF-8 bytes of both pattern and actual log line - Show number of matches found by each pattern attempt - Display sample line containing the target text with its bytes This will reveal if the issue is: - Encoding mismatch between pattern and logs - Pattern syntax error - Invisible characters in logs

Root cause: GitHub API returns job logs in UTF-8, but without proper charset in Content-Type headers. The requests library was auto-detecting encoding incorrectly (likely as Latin-1/ISO-8859-1), causing Cyrillic characters to be mojibake (e.g., 'ПРЕДВАРИТЕЛЬНАЯ' became 'Ð\x9fÑ\x80ÐµÐ´...') Solution: Explicitly set resp.encoding = 'utf-8' before accessing resp.text to force proper UTF-8 decoding of GitHub Actions logs. This fixes: - Pattern matching for Cyrillic text in score extraction - TASKID extraction with Cyrillic output - Any other log parsing with non-ASCII characters Evidence from debug logs: - Simple string search for 'ПРЕДВАРИТЕЛЬНАЯ' was failing - Line 731 showed mojibake: 'Ð¦ÐµÐ½Ñ\x82Ñ\x80...' instead of Cyrillic - Pattern bytes were correct UTF-8 (b'\xd0\x9f\xd0\xa0...') - But log text was decoded wrong

Cleaned up verbose debug output now that encoding issue is resolved: - Removed log samples (first/middle/last 500 chars) - Removed keyword search diagnostics - Removed byte-level encoding checks - Removed pattern bytes display - Kept essential logging: log size, pattern attempts, match results The encoding fix (UTF-8) resolved the core issue, so detailed diagnostics are no longer needed for normal operation.

Created docs/COURSE_CONFIG.md with complete reference for all supported YAML configuration options: **Top-level sections:** - course: General course info (name, semester, university, etc.) - course.github: GitHub integration (organization, teachers) - course.google: Google Sheets integration (spreadsheet, columns) - course.staff: Teaching staff list - course.labs: Lab configurations (per lab settings) - misc: System settings (timeouts, etc.) **Lab-level options:** - Basic: github-prefix, short-name - CI/CD: ci (workflows/jobs configuration) - TASKID: taskid-max, taskid-shift, ignore-task-id - Penalties: penalty-max, penalty-strategy (weekly/daily) - Score extraction: score.patterns (regex list for extracting points) - File checks: files, forbidden-modifications - MOSS: language, max-matches, local-path, additional, basefiles - Requirements: report (required sections) - Validation: commits, issues (custom validation rules) **Key features documented:** - Score extraction with regex patterns and decimal separator auto-detection - Penalty strategies (weekly vs daily) - TASKID validation with shift calculation - CI workflow filtering - MOSS plagiarism detection configuration - Custom validation rules for commits/issues Updated CLAUDE.md to reference new documentation.

Problem: Deadlines specified as dates only (e.g., '19.11.2025') were parsed as midnight (00:00:00) instead of end of day (23:59:59). This caused incorrect penalty calculations: Example 1 (incorrect before fix): - Deadline in sheet: 19.11.2025 - Parsed as: 19.11.2025 00:00:00 UTC+3 - Tests passed: 2025-11-18T22:55:34Z = 19.11.2025 01:55:34 UTC+3 - Result: 01:55:34 > 00:00:00 → late by ~1.5 hours → penalty -1 - Expected: No penalty (submitted before end of deadline day) Example 2 (incorrect before fix): - Deadline: 19.11.2025 00:00:00 - Submitted: 03.12.2025 10:00 - Delta: 14 days 10 hours - Calculation: 14 // 7 = 2, 14 % 7 = 0, but seconds > 0 → rounds up - Result: 2 + 1 = 3 weeks → penalty -3 - Expected: penalty -2 Solution: When deadline is parsed without explicit time, set time to 23:59:59 (end of day). This makes deadlines like '19.11.2025' mean 'until 23:59:59 on that day' as users expect. After fix: - Deadline: 19.11.2025 23:59:59 UTC+3 - Example 1: 01:55:34 < 23:59:59 → no penalty ✓ - Example 2: 13 days delay → (13//7 + 1) = 2 weeks → penalty -2 ✓ The weekly penalty calculation formula is correct and unchanged: - 1-7 days late = -1 point - 8-14 days late = -2 points - 15-21 days late = -3 points etc.

Added ability to view archived courses for students with academic debt. Frontend changes: - Added status parameter to fetchCourses() API call (active/archived/all) - Added course status toggle buttons above course list - Active courses (default) - shows currently running courses - Archived courses - shows past courses for debt resolution - All courses - shows both (admin only) - Status selection persists across course list updates UI implementation: - ButtonGroup with 3 status options (active/archived/all) - Highlighted active selection with colors.selected - Admin-only "All courses" option for full visibility - Automatically refetch courses when status changes Translations added (ru/en/zh): - activeCourses: "Активные курсы" / "Active Courses" / "活跃课程" - archivedCourses: "Архив курсов" / "Archived Courses" / "归档课程" - allCourses: "Все курсы" / "All Courses" / "所有课程" Backend support: - Backend already supports ?status= query parameter - active (default), archived, all This allows students with academic debt to access archived course materials while keeping main page focused on current courses.

- Move from center to fixed top-right corner (below language selector) - Change to compact vertical button group (120px width) - Fix color contrast: selected=#3c3c43 (dark) with white text, unselected=#f5f5f5 (light) with dark text - Reduce prominence: smaller font (12px), less padding - Position at top:60px, right:16px, z-index:2999 This makes the toggle less obtrusive since archived courses are used by few students while active courses serve hundreds.

Changed position from top:60px to top:110px to prevent overlap with "Для преподавателей" button.

Increased z-index from 3000 to 3200 and added MenuProps to ensure dropdown menu appears above admin button (z-index 3100) and course status toggle (z-index 2999).

Using disablePortal renders the dropdown in normal DOM hierarchy instead of Portal, which should properly apply z-index stacking.

claude and others added 30 commits December 3, 2025 00:21

Update fundamental-statistics-2025.yaml

ad7d23c

Update fundamental-statistics-2025.yaml

97dc8dd

Update fundamental-statistics-2025.yaml

07e74ed

Update fundamental-statistics-2025.yaml

b96b49e

Update fundamental-statistics-2025.yaml

5bf6357

Create machine-learning-2025.yaml

0dc5a50

Update index.yaml

74a3fa0

Add files via upload

d39e9ae

Update index.yaml

1239a78

Update fundamental-statistics-2025.yaml

f24f7e3

Update machine-learning-2025.yaml

237bef2

Move course status toggle lower to avoid overlap with admin button

726f551

Changed position from top:60px to top:110px to prevent overlap with "Для преподавателей" button.

Fix language selector dropdown z-index to appear above all buttons

7633443

Increased z-index from 3000 to 3200 and added MenuProps to ensure dropdown menu appears above admin button (z-index 3100) and course status toggle (z-index 2999).

Add disablePortal to language selector to fix z-index issues

ef6c1e4

Using disablePortal renders the dropdown in normal DOM hierarchy instead of Portal, which should properly apply z-index stacking.

Update fundamental-statistics-2025.yaml

e4f07f5

Update operating-systems-2025.yaml

94fdc51

Update fundamental-statistics-2025.yaml

9c9c04c

markpolyak merged commit e09891e into main Jan 9, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add score extraction from CI logs#39

Add score extraction from CI logs#39
markpolyak merged 30 commits intomainfrom
claude/add-commit-verification-019ZNeWTDFAHcFiFjVGTocvc

markpolyak commented Dec 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

markpolyak commented Dec 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants