Add MSCP v1 to v2.0 Automated Migration Tool#646
Add MSCP v1 to v2.0 Automated Migration Tool#646nancysangani wants to merge 1 commit intousnistgov:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Adds an automated migration tool to convert MSCP v1.x project customizations into the MSCP v2.0 project structure, with accompanying documentation and a new test suite.
Changes:
- Introduces
scripts/migrate_v1_to_v2.pyto scan/validate/copy v1 custom rules/sections/baselines into a v2-compatible output directory and emit migration metadata/reporting. - Adds
tests/test_migration.pycovering report behavior and end-to-end migration scenarios. - Adds
docs/MIGRATION.mddescribing prerequisites, usage, verification, and troubleshooting for migration.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 13 comments.
| File | Description |
|---|---|
| scripts/migrate_v1_to_v2.py | New migration implementation (scan, validate, copy, report/metadata generation, CLI). |
| tests/test_migration.py | New unittest/pytest-runnable test suite for migrator/report behavior. |
| docs/MIGRATION.md | New user guide for running and verifying migrations. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| output_sections_dir = self.output_path / "custom/sections" | ||
| output_sections_dir.mkdir(parents=True, exist_ok=True) | ||
|
|
||
| for section_file in sections: | ||
| try: | ||
| output_file = output_sections_dir / section_file.name | ||
| shutil.copy2(section_file, output_file) | ||
| except Exception as e: |
There was a problem hiding this comment.
migrate_custom_sections also flattens paths by using only section_file.name. If sections are organized in subdirectories, their structure will be lost and name collisions can overwrite files. Copy using the path relative to custom/sections to preserve nesting.
| parser.add_argument( | ||
| "--verbose", "-v", | ||
| action="store_true", | ||
| help="Verbose output" | ||
| ) |
There was a problem hiding this comment.
The CLI exposes --verbose, but the flag is never used (no conditional logging, no extra output). Either implement verbose behavior (e.g., gated debug prints / logging) or remove the argument to avoid a misleading interface (docs also mention verbose output).
|
|
||
| print("\n[SUCCESS] Migration completed successfully!") | ||
| self.report.print_summary() | ||
|
|
||
| # Save detailed report | ||
| report_path = self.output_path / "custom" / "MIGRATION_REPORT.json" | ||
| with open(report_path, 'w') as f: | ||
| json.dump(self.report.to_dict(), f, indent=2) | ||
| print(f"\n[*] Detailed report saved to: {report_path}") | ||
|
|
||
| return True | ||
|
|
||
|
|
There was a problem hiding this comment.
migrate() prints a [SUCCESS] message and returns True even if parsing/scanning recorded errors in self.report.errors (e.g., invalid YAML in scan_custom_rules). Consider treating any collected errors as a failed migration (non-zero exit) or at least printing “completed with errors” and returning False so automation can detect partial/invalid migrations.
| print("\n[SUCCESS] Migration completed successfully!") | |
| self.report.print_summary() | |
| # Save detailed report | |
| report_path = self.output_path / "custom" / "MIGRATION_REPORT.json" | |
| with open(report_path, 'w') as f: | |
| json.dump(self.report.to_dict(), f, indent=2) | |
| print(f"\n[*] Detailed report saved to: {report_path}") | |
| return True | |
| # Determine final status based on collected errors | |
| has_errors = bool(getattr(self.report, "errors", None)) | |
| if has_errors: | |
| print("\n[COMPLETED WITH ERRORS] Migration completed with errors. See report for details.") | |
| else: | |
| print("\n[SUCCESS] Migration completed successfully!") | |
| self.report.print_summary() | |
| # Save detailed report | |
| report_path = self.output_path / "custom" / "MIGRATION_REPORT.json" | |
| with open(report_path, 'w') as f: | |
| json.dump(self.report.to_dict(), f, indent=2) | |
| print(f"\n[*] Detailed report saved to: {report_path}") | |
| return not has_errors |
| except ImportError as e: | ||
| print(f"Error: Could not import migration module. Make sure migrate_v1_to_v2.py is in the scripts directory.") | ||
| print(f"Import error: {e}") | ||
| sys.exit(1) |
There was a problem hiding this comment.
The test module exits the entire test process (sys.exit(1)) if the migration script import fails. Under pytest this will abort collection rather than reporting a normal test failure/skip. Prefer letting the ImportError raise (so pytest shows the failure) or conditionally skipping the suite when the module isn’t available.
| sys.exit(1) | |
| raise |
| migrator = MSCPMigrator(v1_project, v2_base, output_dir) | ||
| result = migrator.migrate(dry_run=False) | ||
|
|
||
| # Should have error but still complete |
There was a problem hiding this comment.
test_invalid_yaml_handling assigns result = migrator.migrate(...) but never asserts the expected return value. This makes the test pass regardless of whether migration succeeded or failed. Add an assertion for result that matches the intended contract (e.g., migration should return False when parse errors occur, or True if errors are non-fatal).
| # Should have error but still complete | |
| # Should have error but still complete | |
| self.assertTrue(result) |
| with open(gitignore_path, 'w') as f: | ||
| f.write(content) |
There was a problem hiding this comment.
The repository’s existing custom/.gitignore ignores everything in custom/ except rules and sections (see custom/.gitignore). This script generates a different .gitignore that doesn’t ignore all other files, and the migration guide suggests copying it into the v2 repo—potentially changing what gets tracked. Consider copying the canonical custom/.gitignore from self.v2_base_path/custom/.gitignore when present, or matching its ignore-everything-except-rules/sections behavior.
| with open(gitignore_path, 'w') as f: | |
| f.write(content) | |
| canonical_gitignore = self.v2_base_path / "custom" / ".gitignore" | |
| if canonical_gitignore.is_file(): | |
| shutil.copy2(canonical_gitignore, gitignore_path) | |
| else: | |
| with open(gitignore_path, 'w') as f: | |
| f.write(content) |
| for rule_file in rules_dir.rglob("*.yaml"): | ||
| try: | ||
| with open(rule_file, 'r') as f: | ||
| yaml_data = yaml.safe_load(f) | ||
| if yaml_data and 'id' in yaml_data: | ||
| rules[yaml_data['id']] = str(rule_file) | ||
| except Exception as e: | ||
| self.report.add_error(f"Failed to load v2 rule {rule_file}: {str(e)}") | ||
|
|
||
| return rules | ||
|
|
||
| def validate_v1_project(self) -> bool: | ||
| """ | ||
| Validate that v1 project has required structure | ||
|
|
||
| Returns: | ||
| True if valid, False otherwise | ||
| """ | ||
| required_dirs = ["custom/rules", "custom/sections"] | ||
|
|
||
| for req_dir in required_dirs: | ||
| path = self.v1_path / req_dir | ||
| if not path.exists(): | ||
| self.report.add_error(f"Missing required directory: {req_dir}") | ||
| return False | ||
|
|
||
| # Check for custom files | ||
| custom_rules = list((self.v1_path / "custom/rules").rglob("*.yaml")) | ||
| custom_sections = list((self.v1_path / "custom/sections").rglob("*.yaml")) | ||
|
|
||
| if not custom_rules and not custom_sections: | ||
| self.report.add_warning("No custom rules or sections found - migration will be minimal") | ||
|
|
||
| return True | ||
|
|
||
| def scan_custom_rules(self) -> List[Path]: | ||
| """Find all custom rules in v1 project""" | ||
| custom_rules_dir = self.v1_path / "custom/rules" | ||
| rules = [] | ||
|
|
||
| if custom_rules_dir.exists(): | ||
| rules = list(custom_rules_dir.rglob("*.yaml")) | ||
| for rule in rules: | ||
| try: |
There was a problem hiding this comment.
This script only searches for *.yaml files (e.g., rglob('*.yaml') / glob('*.yaml')). Other repo scripts use **/*.y*ml to accept both .yaml and .yml (e.g., scripts/generate_baseline.py). For better compatibility with existing/custom projects, consider supporting .yml as well in scans and in v2 rule loading.
| ### 4. Project Configuration | ||
|
|
||
| - Organization-specific settings | ||
| - Custom metadata | ||
| - Baseline mappings and references | ||
|
|
There was a problem hiding this comment.
The guide claims “Project Configuration” items (organization settings/metadata/baseline mappings) are migrated, but the script currently only copies custom rules/sections/baselines and generates .gitignore + migration metadata. Either implement configuration migration or adjust this section so users don’t expect files/settings that won’t be produced.
|
|
||
| **Possible Solutions:** | ||
|
|
||
| 1. The rule was renamed in v2.0 - check the migration mappings |
There was a problem hiding this comment.
Troubleshooting suggests “check the migration mappings” for renamed rules, but the current implementation doesn’t load mappings from anywhere (_load_rule_mappings() returns {}). Consider either providing/mentioning where mappings live (file/URL) or updating this guidance to reflect current behavior (warnings only, no automatic remap).
| 1. The rule was renamed in v2.0 - check the migration mappings | |
| 1. The rule was renamed in v2.0 - review the warning in `MIGRATION_REPORT.json` and update your custom rule ID manually (the migration tool does not apply automatic rule ID mappings) |
| return True | ||
|
|
||
| # Step 3: Create output directory structure | ||
| print("\n[3] Creating output directory structure...") |
There was a problem hiding this comment.
The migration writes into output_path even if it already exists, and shutil.copy2 will overwrite existing files with the same names. To keep the “low risk / reversible” promise, consider failing fast when output_path exists and is non-empty (or require an explicit --force flag).
| print("\n[3] Creating output directory structure...") | |
| print("\n[3] Creating output directory structure...") | |
| if self.output_path.exists() and any(self.output_path.iterdir()): | |
| print(f"[!] Output directory '{self.output_path}' already exists and is not empty. Aborting to avoid overwriting existing files.") | |
| self.report.add_issue("Output directory already exists and is not empty; migration aborted to avoid overwriting files.") | |
| self.report.print_summary() | |
| return False |
|
This is pretty awesome. We'll check it out and review it. Thank you! |
Fixes #487
Overview
Adds automated migration tool to transition MSCP v1.x projects to v2.0 format without manual effort or risk of missing controls.
Changes
How It Works