Enable event-driven Lambda execution for S3 Python index generation#3737
Enable event-driven Lambda execution for S3 Python index generation#3737araravik-psd wants to merge 15 commits intomainfrom
Conversation
|
@amd-shiraz Can you point me to the location I could use in the infra repo to save a copy of the lambda function wrapper used here? |
pls put the scripts inside tools: https://github.com/ROCm/TheRock-Infra/tree/main/tools (create a folder perhaps if you think multiple scripts can go in for this). for terraform related lambda code put it under projects https://github.com/ROCm/TheRock-Infra/tree/main/projects, base modules goes under modules https://github.com/ROCm/TheRock-Infra/tree/main/modules/aws . |
HereThereBeDragons
left a comment
There was a problem hiding this comment.
as you are already touching the file it would be good to do a proper freshening up of the file and improve the overall QoL when using it :)
| # bucket for TheRock | ||
| # We also manage `therock-nightly-python` (not the default to make the script safer to test) | ||
| BUCKET_NAME = getenv("S3_BUCKET_PY", "therock-dev-python") | ||
| BUCKET_NAME = getenv("S3_BUCKET_PY") |
There was a problem hiding this comment.
as this is a bit of cleanup i personally would appreciate if this could also be supported by a cli arg --bucket?
unless @ScottTodd has a different opinion?
There was a problem hiding this comment.
This has its origin in upstream script. It depends on how the Lambda wrapper script will handle it an pass argument. In general I prefer to pass explicit arguments instead of using environment variables. However, can be a follow up also but command line args are preferred for manual use.
This would however mean to update our workflows as well. I'd rather do it in a follow up therefore once we switched over. Less refactoring (as calling the script in workflows will be dropped anyway).
There was a problem hiding this comment.
Good points. I added --bucket, but kept S3_BUCKET_PY as a fallback so we don’t break any existing workflows or Lambda usage.
Right now it works like this:
- If --bucket is passed, we use that (better for manual runs)
- Otherwise we fall back to S3_BUCKET_PY
- If neither is set, it fails fast
I also didn’t hardcode a default bucket on purpose — didn’t want to risk accidentally writing to prod.
Could you have a quick look at the initialize_bucket() logic and confirm this makes sense? Mainly want to make sure it’s fine from your side and doesn’t cause any surprises.
| # TODO: bucket mirror just to hold index used with CDN | ||
| # BUCKET_CDN = S3.Bucket('therock-nightly-python-testing') |
There was a problem hiding this comment.
| # TODO: bucket mirror just to hold index used with CDN | |
| # BUCKET_CDN = S3.Bucket('therock-nightly-python-testing') |
We don't have this.
marbre
left a comment
There was a problem hiding this comment.
Some additional comments from my agent:
Critical Issues
1. Missing lambda_function.py — the PR's primary goal is absent
The PR description says "Introduced a thin lambda_function.py wrapper to
handle S3 events" but the diff only modifies manage.py. The Lambda
integration layer never ships. Without it, this PR is a pure CLI refactor
with no Lambda functionality delivered.
2. with_metadata behavior regression — SHA256 checksums dropped from normal uploads
Before:
idx = S3Index.from_S3(prefix=prefix, with_metadata=True or args.compute_sha256)
# Note: `True or X` always evaluates True — metadata always fetchedAfter (update_pep503_index):
idx = S3Index.from_S3(prefix=prefix, with_metadata=compute_sha256)Metadata (SHA256 checksums) is now only fetched when --compute-sha256
is passed. For normal upload runs, compute_sha256=False, so
with_metadata=False, meaning uploaded index.html files will have no
#sha256=... fragments. This silently breaks PEP 503 checksum links in the
package index.
The original True or args.compute_sha256 was itself a bug (always True),
but the fix should be with_metadata=True for normal upload paths, not
with_metadata=compute_sha256.
3. Uninitialized globals — Lambda invocations will crash with NameError
The original module-level assignments:
BUCKET_NAME = getenv("S3_BUCKET_PY", "therock-dev-python")
BUCKET = S3.Bucket(BUCKET_NAME)
INDEX_BUCKETS = {BUCKET}...are removed and replaced with initialize_bucket(), which is only called
from main(). But BUCKET_NAME, BUCKET, and INDEX_BUCKETS are no longer
declared at module scope at all. Any code path (Lambda handler, import-time
side effects) that uses these before initialize_bucket() runs will get
NameError: name 'BUCKET_NAME' is not defined. The globals should at least
be declared as None at module level.
Moderate Issues
4. Loss of prefix validation / all behavior
Before: choices=PREFIXES + ["all"] enforced known prefixes; "all" was
treated as "process all prefixes."
After: nargs="?" with no choices — any string is accepted silently,
and "all" is now treated as a literal prefix name (will process a
non-existent all/ prefix). Old callers passing "all" will silently do the
wrong thing. At minimum, add a deprecation note or handle "all" explicitly.
5. detect_prefixes_from_bucket uses BUCKET_NAME global
paginator.paginate(Bucket=BUCKET_NAME, ...)Fine for CLI (because initialize_bucket() runs first in main()), but for
Lambda use it depends on call order. Given that lambda_function.py is
missing, this call order is unverified.
Minor Issues
6. update_pep503_index docstring is a comment, not a docstring
def update_pep503_index(prefix: str, compute_sha256: bool = False, upload: bool = True):
# Regenerates the PEP 503 simple index...Should use """...""" for consistency with other functions in the file.
7. No tests for new auto-detection path
resolve_prefixes() and detect_prefixes_from_bucket() have no unit or
integration test coverage.
8. Blank line removed between PREFIXES and CUSTOM_PREFIX
Cosmetic, inconsistent with surrounding style.
Summary
| Area | Status |
|---|---|
| Deliverable | lambda_function.py missing — core goal unmet |
| Regression | SHA256 checksums dropped from uploaded indexes |
| Stability | Uninitialized globals will crash non-CLI code paths |
| Compatibility | all prefix and choices validation silently removed |
The update_pep503_index() extraction and initialize_bucket() refactor are
in the right direction for Lambda support, but the PR needs: the Lambda file
itself, the with_metadata fix, module-level global declarations, and
handling of the all prefix case before it's mergeable.
@marbre Thanks a lot for pointing these out. I’ve already pushed updates addressing them:
For the auto-detection logic, I am running them locally, will add the outputs as a git gist of CLI test output for both v2/ and v2-staging/ in this PR to show it working as expected. I can add tests for these as a subsequent PR if that works All of this is already committed in the latest revision. |
marbre
left a comment
There was a problem hiding this comment.
The PR description is still misleading:
Introduced a thin lambda_function.py wrapper to handle S3 events.
This is not part of the PR, thus it shouldn't be mentioned to be part. Rather looks like a missing file. It is okay to use an agent for your PRs but you still own the code as well as any description. That said, please follow https://llvm.org/docs/AIToolPolicy.html#llvm-ai-tool-use-policy:
it is strongly recommended that contributors write PR descriptions themselves
| """ | ||
| NOTE: | ||
| This function is used both by the CLI (via main()) and by the AWS Lambda | ||
| wrapper (lambda_function.py). It must remain callable without relying on | ||
| CLI argument parsing or main() initialization. | ||
| """ |
There was a problem hiding this comment.
| """ | |
| NOTE: | |
| This function is used both by the CLI (via main()) and by the AWS Lambda | |
| wrapper (lambda_function.py). It must remain callable without relying on | |
| CLI argument parsing or main() initialization. | |
| """ | |
| # NOTE: | |
| # This function is used both by the CLI (via main()) and by the AWS Lambda | |
| # wrapper (lambda_function.py). It must remain callable without relying on | |
| # CLI argument parsing or main() initialization. |
Should not be a string.
There was a problem hiding this comment.
Could also used in the function string, wdyt?
| # NOTE: | ||
| # initialize_bucket() is required for both CLI and Lambda usage. | ||
| # Do not move bucket initialization exclusively into main(), as the | ||
| # Lambda wrapper imports and calls update_pep503_index() directly. |
There was a problem hiding this comment.
Why not move this into the function description (currently there is none).
There was a problem hiding this comment.
I think you forgot to delete this after added into the function. I see the same sentence in docstring.
| "Base prefix for auto-detection (e.g. v2/, v2-staging/, v3/whl/). " | ||
| "Required when using --auto-detect-prefixes." |
There was a problem hiding this comment.
Missing indention:
| "Base prefix for auto-detection (e.g. v2/, v2-staging/, v3/whl/). " | |
| "Required when using --auto-detect-prefixes." | |
| "Base prefix for auto-detection (e.g. v2/, v2-staging/, v3/whl/). " | |
| "Required when using --auto-detect-prefixes." |
|
@erman-gurses @HereThereBeDragons @amd-shiraz please review before I do a final round. |
| initialize_bucket() is required for both CLI and Lambda usage. | ||
| Do not move bucket initialization exclusively into main(), as the | ||
| Lambda wrapper imports and calls update_pep503_index() directly. | ||
| ''' |
There was a problem hiding this comment.
Nit: indentation slightly off for '''
| # Must be provided via --bucket (CLI) or S3_BUCKET_PY (env). | ||
|
|
||
| # Bucket globals configured via initialize_bucket() | ||
| BUCKET_NAME: Optional[str] = None |
There was a problem hiding this comment.
Based on style guide, we should not use Optional https://github.com/ROCm/TheRock/blob/main/docs/development/style_guides/python_style_guide.md#add-specific-type-hints-liberally
There was a problem hiding this comment.
Updated the new annotations to use T | None per the style guide. Optional is still used elsewhere in the file, so I’ve kept the import to avoid unrelated refactors.
| # NOTE: | ||
| # initialize_bucket() is required for both CLI and Lambda usage. | ||
| # Do not move bucket initialization exclusively into main(), as the | ||
| # Lambda wrapper imports and calls update_pep503_index() directly. |
There was a problem hiding this comment.
I think you forgot to delete this after added into the function. I see the same sentence in docstring.
HereThereBeDragons
left a comment
There was a problem hiding this comment.
only some minor improvments. and please also have a look at erman s comments.
|
@araravik-psd is this PR ready to land after approval ? If yes can do another pass. |
| for obj in all_sorted_packages: | ||
| full_package_name = path.basename(obj) | ||
| package_name = full_package_name.split('-')[0] | ||
| print(f"[DEBUG] Evaluating package: {package_name}") |
There was a problem hiding this comment.
Is this temporary? If not, we might remove it.
There was a problem hiding this comment.
Yes, this should be removed
| print(f"[DEBUG] Evaluating package: {package_name}") |
| # Allow new ROCm multi-arch + device packages | ||
| ROCM_PACKAGE_PREFIXES = ( | ||
| "rocm_", | ||
| "rocm-sdk", # safety for normalized names | ||
| "rocm_sdk_device", | ||
| "amd_torch_device", | ||
| ) |
There was a problem hiding this comment.
What's the value of having this in an extra list? Some of those are already in PACKAGE_ALLOW_LIST and amd_torch_device is actually not a ROCm package.
There was a problem hiding this comment.
We further need to add amd_torchvision_device but the place here seems wrong.
There was a problem hiding this comment.
Thanks @marbre Removed the extra list and extended the existing allow-list logic instead. Added explicit handling for amd_torchvision_device* and similar dynamic packages.
| # Allow legacy packages + explicitly handle dynamic multi-arch packages | ||
| if ( | ||
| pkg in PACKAGE_ALLOW_LIST | ||
| or pkg.startswith("rocm_sdk_device_") |
There was a problem hiding this comment.
why not add those to PACKAGE_ALLOW_LIST?
There was a problem hiding this comment.
These are dynamic arch-suffixed packages (_gfx), so adding them to PACKAGE_ALLOW_LIST would require enumerating all variants. Using startswith avoids needing updates for every new architecture. This was my thinking here
There was a problem hiding this comment.
If this is the case how have the rocm-sdk-libraries-gfx* packages ever worked?
There was a problem hiding this comment.
Good catch, looking at the actual wheel names again, the single-arch rocm_sdk_libraries_gfx* packages are also dynamically suffixed similar to the rocm_sdk_device_* wheels.
I am testing this now without the startswith() handling to verify whether the existing normalization/filtering logic already covers these correctly before keeping the additional prefix checks.
There was a problem hiding this comment.
@marbre @HereThereBeDragons I re-checked the original logic and rocm_sdk_libraries_gfx* packages were already being special-cased via:
match(r"rocm_sdk_libraries_gfx", package_name.lower())
https://github.com/ROCm/TheRock/blob/main/build_tools/third_party/s3_management/manage.py#L216
So they were never handled through PACKAGE_ALLOW_LIST.
The new startswith() handling is effectively extending the same pattern to the newer dynamically suffixed multi-arch package families like rocm_sdk_device_* and amd_torch_device_*.
This reverts commit c784a0f.
445bb8c to
c4f34fe
Compare
marbre
left a comment
There was a problem hiding this comment.
There are still a few minor issues to address. @araravik-psd please follow up and let's land this latest tomorrow.
| for obj in all_sorted_packages: | ||
| full_package_name = path.basename(obj) | ||
| package_name = full_package_name.split('-')[0] | ||
| print(f"[DEBUG] Evaluating package: {package_name}") |
There was a problem hiding this comment.
Yes, this should be removed
| print(f"[DEBUG] Evaluating package: {package_name}") |
| # Allow legacy packages + explicitly handle dynamic multi-arch packages | ||
| if ( | ||
| pkg in PACKAGE_ALLOW_LIST | ||
| or pkg.startswith("rocm_sdk_device_") |
There was a problem hiding this comment.
If this is the case how have the rocm-sdk-libraries-gfx* packages ever worked?
5471a43 to
2360e78
Compare

This PR refactors the S3 index generation logic to support event-driven execution via AWS Lambda, in addition to existing CLI/CI workflows.
Key Changes
Result
The Lambda wrapper PR is here:
https://github.com/ROCm/TheRock-Infra/pull/107/changes
Test results for newly added auto-detection logic:
https://gist.github.com/araravik-psd/4f2869e071ea1ee384cd3587f31a6660