Skip to content

Conversation

@skrawcz
Copy link
Contributor

@skrawcz skrawcz commented Dec 28, 2025

Adds more license headers

Changes

  1. Updated headers
  • examples/ - 537 files
    • 343 Python files (.py)
    • 105 Markdown files (.md)
    • 73 Jupyter notebooks (.ipynb)
    • 2 Shell scripts (.sh)
    • 2 SQL files (.sql)
    • 9 Dockerfiles
    • 4 README files (no extension)
  • ui/ - 23 files
    • 22 test files in ui/sdk/tests/
    • 1 file in ui/backend/server/trackingserver_base/
  • docs/ - 3 files
    • conf.py
    • data_adapters_extension.py
    • make_testimonials.py
  • Other directories - 5 files
    • 2 files in scripts/
    • 1 file in hamilton/experimental/
    • 1 file in writeups/
    • 1 file in ui/backend/

Total: 568 files with Apache 2 license headers added

  1. Created Tooling
  • scripts/check_license_headers.py - Script to find files missing Apache 2 license headers
    • Supports: .py, .md, .ipynb, .sh, .sql, Dockerfile, and README files
    • Can check specific file types or all at once
    • Excludes build artifacts, node_modules, and documentation snippets
  • scripts/add_license_headers.py - Script to automatically add license headers
    • Adds appropriate comment formats based on file type:
      • Python/Shell/Dockerfile: # comments
      • Markdown/README: HTML comments
      • SQL: -- comments
      • Jupyter notebooks: Markdown cell at the beginning
    • Handles special cases (shebangs, existing headers)
    • Dry-run mode for testing
  1. Updated Notebook Validation
  • examples/validate_examples.py - Updated pre-commit hook validator
    • Now correctly handles notebooks with license headers
    • Checks for license at cell 0, setup at cell 1, title at cell 2 (if license present)
    • Maintains backward compatibility with notebooks without license headers

Exclusions

  • Documentation snippets in docs/code-comparisons/*snippets/ - These are embedded in docs via literalinclude and adding headers would clutter the rendered documentation
  • examples/plotly/interactive.html - Has a different license
  • dev_tools/language_server/tests/ (2 files) - Already have appropriate Apache 2.0 licenses from third-party sources (Open Law Library, Palantir Technologies)

Testing

  • Verified all files with scripts/check_license_headers.py
  • Pre-commit notebook validation passes for all example notebooks
  • License headers use correct comment format for each file type

Notes

  • License header format follows Apache Software Foundation standard
  • Headers are added at the beginning of files (after shebang if present)
  • No functional code changes - only license header additions

Checklist

  • PR has an informative and human-readable title (this will be pulled into the release notes)
  • Changes are limited to a single goal (no scope creep)
  • Code passed the pre-commit check & code is left cleaner/nicer than when first encountered.
  • Any change in functionality is tested
  • New functions are documented (with a description, list of inputs, and expected output)
  • Placeholder code is flagged / future TODOs are captured in comments
  • Project documentation has been updated if adding/changing functionality.

@skrawcz skrawcz changed the title Add more licenses Adds more licenses and utilities for license checking Dec 28, 2025
Copy link
Contributor

@jernejfrank jernejfrank left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the scripts, makes things a lot easier.

I did not go through the 600+ modified files, just the new scripts.

The check_licence_header makes sense to me to add to CI so that we get reminded in case we add new files.

@@ -0,0 +1,297 @@
#!/usr/bin/env python3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine for now, but maybe worth adding a TODO to simplify this in case we get more file types with different comment behaviour. Since the licence text (including line breaks) stays the same, we really just need different char insertion helpers based on file extension.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added.

Comment on lines 169 to 172
# Check if file already has a license header
if "Licensed to the Apache Software Foundation" in content or "Apache License" in content:
print(f" ↷ Skipping {file_path} (already has license header)")
return False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we check the whole file, so some comment in the middle of the file saying "Apache License" could trigger this. Would be better to check only first and second (in case of shebang) line of the file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just saw below there's already a helper below to check for licence header. Would make sense to just re-use it here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sense

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

"""


def add_license_to_python(file_path: Path, content: str) -> str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All add_licence_to_* helpers have file_path but only use content

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -0,0 +1,182 @@
#!/usr/bin/env python3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here too, file missing licence header ;)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@skrawcz skrawcz requested a review from jernejfrank December 28, 2025 11:23
Copy link
Contributor

@jernejfrank jernejfrank left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks!

@skrawcz skrawcz merged commit b0f71a5 into main Dec 29, 2025
1 of 5 checks passed
@skrawcz skrawcz deleted the add_more_licenses branch December 29, 2025 01:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants