Skip to content

Enable event-driven Lambda execution for S3 Python index generation#3737

Open
araravik-psd wants to merge 15 commits intomainfrom
users/arravikum/manage_lambda_changes
Open

Enable event-driven Lambda execution for S3 Python index generation#3737
araravik-psd wants to merge 15 commits intomainfrom
users/arravikum/manage_lambda_changes

Conversation

@araravik-psd
Copy link
Copy Markdown
Contributor

@araravik-psd araravik-psd commented Mar 3, 2026

This PR refactors the S3 index generation logic to support event-driven execution via AWS Lambda, in addition to existing CLI/CI workflows.

Key Changes

  1. Refactored manage.py to expose reusable index update logic.
  2. Regenerate indexes dynamically based on uploaded/deleted object prefix.
  3. Removed reliance on static PREFIXES for Lambda execution.
  4. Made S3_BUCKET_PY environment variable mandatory for safer bucket targeting.
  5. Improved Lambda observability by returning processed prefix + object count.
  6. Ensured recursion protection by skipping index.html events.
  7. Support for .whl, .zip, and .tar.gz artifacts.
  8. Updated Package selection including changes from Multi-arch package structure

Result

  • Clear separation between indexing logic and infrastructure.
  • Cleaner and more maintainable long term.
  • Reusable logic for both CI and server-side execution.
  • Enables S3 retention policy compatibility (index regenerates on object deletion).
  • Cleaner and more maintainable long term.

The Lambda wrapper PR is here:

https://github.com/ROCm/TheRock-Infra/pull/107/changes

Test results for newly added auto-detection logic:

https://gist.github.com/araravik-psd/4f2869e071ea1ee384cd3587f31a6660

@araravik-psd
Copy link
Copy Markdown
Contributor Author

@amd-shiraz Can you point me to the location I could use in the infra repo to save a copy of the lambda function wrapper used here?

CC: @marbre @HereThereBeDragons

@amd-shiraz
Copy link
Copy Markdown
Contributor

amd-shiraz commented Mar 4, 2026

@amd-shiraz Can you point me to the location I could use in the infra repo to save a copy of the lambda function wrapper used here?

CC: @marbre @HereThereBeDragons

pls put the scripts inside tools: https://github.com/ROCm/TheRock-Infra/tree/main/tools (create a folder perhaps if you think multiple scripts can go in for this). for terraform related lambda code put it under projects https://github.com/ROCm/TheRock-Infra/tree/main/projects, base modules goes under modules https://github.com/ROCm/TheRock-Infra/tree/main/modules/aws .

Copy link
Copy Markdown
Contributor

@HereThereBeDragons HereThereBeDragons left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as you are already touching the file it would be good to do a proper freshening up of the file and improve the overall QoL when using it :)

# bucket for TheRock
# We also manage `therock-nightly-python` (not the default to make the script safer to test)
BUCKET_NAME = getenv("S3_BUCKET_PY", "therock-dev-python")
BUCKET_NAME = getenv("S3_BUCKET_PY")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as this is a bit of cleanup i personally would appreciate if this could also be supported by a cli arg --bucket?

unless @ScottTodd has a different opinion?

Copy link
Copy Markdown
Member

@marbre marbre Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has its origin in upstream script. It depends on how the Lambda wrapper script will handle it an pass argument. In general I prefer to pass explicit arguments instead of using environment variables. However, can be a follow up also but command line args are preferred for manual use.

This would however mean to update our workflows as well. I'd rather do it in a follow up therefore once we switched over. Less refactoring (as calling the script in workflows will be dropped anyway).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good points. I added --bucket, but kept S3_BUCKET_PY as a fallback so we don’t break any existing workflows or Lambda usage.

Right now it works like this:

  1. If --bucket is passed, we use that (better for manual runs)
  2. Otherwise we fall back to S3_BUCKET_PY
  3. If neither is set, it fails fast

I also didn’t hardcode a default bucket on purpose — didn’t want to risk accidentally writing to prod.

Could you have a quick look at the initialize_bucket() logic and confirm this makes sense? Mainly want to make sure it’s fine from your side and doesn’t cause any surprises.

Comment thread build_tools/third_party/s3_management/manage.py
Comment thread build_tools/third_party/s3_management/manage.py
Comment thread build_tools/third_party/s3_management/manage.py Outdated
Comment thread build_tools/third_party/s3_management/manage.py Outdated
Comment thread build_tools/third_party/s3_management/manage.py Outdated
Comment on lines +42 to +43
# TODO: bucket mirror just to hold index used with CDN
# BUCKET_CDN = S3.Bucket('therock-nightly-python-testing')
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# TODO: bucket mirror just to hold index used with CDN
# BUCKET_CDN = S3.Bucket('therock-nightly-python-testing')

We don't have this.

Comment thread build_tools/third_party/s3_management/manage.py
Comment thread build_tools/third_party/s3_management/manage.py
@araravik-psd araravik-psd requested a review from marbre March 16, 2026 21:15
Copy link
Copy Markdown
Member

@marbre marbre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some additional comments from my agent:


Critical Issues

1. Missing lambda_function.py — the PR's primary goal is absent

The PR description says "Introduced a thin lambda_function.py wrapper to
handle S3 events" but the diff only modifies manage.py. The Lambda
integration layer never ships. Without it, this PR is a pure CLI refactor
with no Lambda functionality delivered.

2. with_metadata behavior regression — SHA256 checksums dropped from normal uploads

Before:

idx = S3Index.from_S3(prefix=prefix, with_metadata=True or args.compute_sha256)
# Note: `True or X` always evaluates True — metadata always fetched

After (update_pep503_index):

idx = S3Index.from_S3(prefix=prefix, with_metadata=compute_sha256)

Metadata (SHA256 checksums) is now only fetched when --compute-sha256
is passed. For normal upload runs, compute_sha256=False, so
with_metadata=False, meaning uploaded index.html files will have no
#sha256=... fragments. This silently breaks PEP 503 checksum links in the
package index.

The original True or args.compute_sha256 was itself a bug (always True),
but the fix should be with_metadata=True for normal upload paths, not
with_metadata=compute_sha256.

3. Uninitialized globals — Lambda invocations will crash with NameError

The original module-level assignments:

BUCKET_NAME = getenv("S3_BUCKET_PY", "therock-dev-python")
BUCKET = S3.Bucket(BUCKET_NAME)
INDEX_BUCKETS = {BUCKET}

...are removed and replaced with initialize_bucket(), which is only called
from main(). But BUCKET_NAME, BUCKET, and INDEX_BUCKETS are no longer
declared at module scope at all. Any code path (Lambda handler, import-time
side effects) that uses these before initialize_bucket() runs will get
NameError: name 'BUCKET_NAME' is not defined. The globals should at least
be declared as None at module level.


Moderate Issues

4. Loss of prefix validation / all behavior

Before: choices=PREFIXES + ["all"] enforced known prefixes; "all" was
treated as "process all prefixes."

After: nargs="?" with no choices — any string is accepted silently,
and "all" is now treated as a literal prefix name (will process a
non-existent all/ prefix). Old callers passing "all" will silently do the
wrong thing. At minimum, add a deprecation note or handle "all" explicitly.

5. detect_prefixes_from_bucket uses BUCKET_NAME global

paginator.paginate(Bucket=BUCKET_NAME, ...)

Fine for CLI (because initialize_bucket() runs first in main()), but for
Lambda use it depends on call order. Given that lambda_function.py is
missing, this call order is unverified.


Minor Issues

6. update_pep503_index docstring is a comment, not a docstring

def update_pep503_index(prefix: str, compute_sha256: bool = False, upload: bool = True):
    # Regenerates the PEP 503 simple index...

Should use """...""" for consistency with other functions in the file.

7. No tests for new auto-detection path

resolve_prefixes() and detect_prefixes_from_bucket() have no unit or
integration test coverage.

8. Blank line removed between PREFIXES and CUSTOM_PREFIX

Cosmetic, inconsistent with surrounding style.


Summary

Area Status
Deliverable lambda_function.py missing — core goal unmet
Regression SHA256 checksums dropped from uploaded indexes
Stability Uninitialized globals will crash non-CLI code paths
Compatibility all prefix and choices validation silently removed

The update_pep503_index() extraction and initialize_bucket() refactor are
in the right direction for Lambda support, but the PR needs: the Lambda file
itself, the with_metadata fix, module-level global declarations, and
handling of the all prefix case before it's mergeable.

@araravik-psd araravik-psd changed the title Move S3 Python index generation to event-driven Lambda with thin wrapper architecture Enable event-driven Lambda execution for S3 Python index generation Mar 25, 2026
@araravik-psd
Copy link
Copy Markdown
Contributor Author

araravik-psd commented Mar 25, 2026

Some additional comments from my agent:

Critical Issues

1. Missing lambda_function.py — the PR's primary goal is absent

The PR description says "Introduced a thin lambda_function.py wrapper to handle S3 events" but the diff only modifies manage.py. The Lambda integration layer never ships. Without it, this PR is a pure CLI refactor with no Lambda functionality delivered.

2. with_metadata behavior regression — SHA256 checksums dropped from normal uploads

Before:

idx = S3Index.from_S3(prefix=prefix, with_metadata=True or args.compute_sha256)
# Note: `True or X` always evaluates True — metadata always fetched

After (update_pep503_index):

idx = S3Index.from_S3(prefix=prefix, with_metadata=compute_sha256)

Metadata (SHA256 checksums) is now only fetched when --compute-sha256 is passed. For normal upload runs, compute_sha256=False, so with_metadata=False, meaning uploaded index.html files will have no #sha256=... fragments. This silently breaks PEP 503 checksum links in the package index.

The original True or args.compute_sha256 was itself a bug (always True), but the fix should be with_metadata=True for normal upload paths, not with_metadata=compute_sha256.

3. Uninitialized globals — Lambda invocations will crash with NameError

The original module-level assignments:

BUCKET_NAME = getenv("S3_BUCKET_PY", "therock-dev-python")
BUCKET = S3.Bucket(BUCKET_NAME)
INDEX_BUCKETS = {BUCKET}

...are removed and replaced with initialize_bucket(), which is only called from main(). But BUCKET_NAME, BUCKET, and INDEX_BUCKETS are no longer declared at module scope at all. Any code path (Lambda handler, import-time side effects) that uses these before initialize_bucket() runs will get NameError: name 'BUCKET_NAME' is not defined. The globals should at least be declared as None at module level.

Moderate Issues

4. Loss of prefix validation / all behavior

Before: choices=PREFIXES + ["all"] enforced known prefixes; "all" was treated as "process all prefixes."

After: nargs="?" with no choices — any string is accepted silently, and "all" is now treated as a literal prefix name (will process a non-existent all/ prefix). Old callers passing "all" will silently do the wrong thing. At minimum, add a deprecation note or handle "all" explicitly.

5. detect_prefixes_from_bucket uses BUCKET_NAME global

paginator.paginate(Bucket=BUCKET_NAME, ...)

Fine for CLI (because initialize_bucket() runs first in main()), but for Lambda use it depends on call order. Given that lambda_function.py is missing, this call order is unverified.

Minor Issues

6. update_pep503_index docstring is a comment, not a docstring

def update_pep503_index(prefix: str, compute_sha256: bool = False, upload: bool = True):
    # Regenerates the PEP 503 simple index...

Should use """...""" for consistency with other functions in the file.

7. No tests for new auto-detection path

resolve_prefixes() and detect_prefixes_from_bucket() have no unit or integration test coverage.

8. Blank line removed between PREFIXES and CUSTOM_PREFIX

Cosmetic, inconsistent with surrounding style.

Summary

Area Status
Deliverable lambda_function.py missing — core goal unmet
Regression SHA256 checksums dropped from uploaded indexes
Stability Uninitialized globals will crash non-CLI code paths
Compatibility all prefix and choices validation silently removed
The update_pep503_index() extraction and initialize_bucket() refactor are in the right direction for Lambda support, but the PR needs: the Lambda file itself, the with_metadata fix, module-level global declarations, and handling of the all prefix case before it's mergeable.

@marbre Thanks a lot for pointing these out.

I’ve already pushed updates addressing them:

  1. Switched back to with_metadata=True so SHA256 fragments are preserved during normal uploads.
  2. Reintroduced safe module-level initialization for BUCKET_NAME, BUCKET, and INDEX_BUCKETS to avoid any NameError before initialize_bucket() runs.
  3. Added back explicit handling for "all" to keep previous behavior intact.
  4. Converted the inline comment into a proper docstring for consistency.
  5. Fixed a small bug the S3 class and added details for the same in description

For the auto-detection logic, I am running them locally, will add the outputs as a git gist of CLI test output for both v2/ and v2-staging/ in this PR to show it working as expected. I can add tests for these as a subsequent PR if that works

All of this is already committed in the latest revision.

@araravik-psd araravik-psd requested a review from marbre March 25, 2026 23:53
Copy link
Copy Markdown
Member

@marbre marbre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description is still misleading:

Introduced a thin lambda_function.py wrapper to handle S3 events.

This is not part of the PR, thus it shouldn't be mentioned to be part. Rather looks like a missing file. It is okay to use an agent for your PRs but you still own the code as well as any description. That said, please follow https://llvm.org/docs/AIToolPolicy.html#llvm-ai-tool-use-policy:

it is strongly recommended that contributors write PR descriptions themselves

Comment on lines +489 to +494
"""
NOTE:
This function is used both by the CLI (via main()) and by the AWS Lambda
wrapper (lambda_function.py). It must remain callable without relying on
CLI argument parsing or main() initialization.
"""
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""
NOTE:
This function is used both by the CLI (via main()) and by the AWS Lambda
wrapper (lambda_function.py). It must remain callable without relying on
CLI argument parsing or main() initialization.
"""
# NOTE:
# This function is used both by the CLI (via main()) and by the AWS Lambda
# wrapper (lambda_function.py). It must remain callable without relying on
# CLI argument parsing or main() initialization.

Should not be a string.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could also used in the function string, wdyt?

Comment on lines +47 to +50
# NOTE:
# initialize_bucket() is required for both CLI and Lambda usage.
# Do not move bucket initialization exclusively into main(), as the
# Lambda wrapper imports and calls update_pep503_index() directly.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not move this into the function description (currently there is none).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you forgot to delete this after added into the function. I see the same sentence in docstring.

Comment on lines +539 to +540
"Base prefix for auto-detection (e.g. v2/, v2-staging/, v3/whl/). "
"Required when using --auto-detect-prefixes."
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing indention:

Suggested change
"Base prefix for auto-detection (e.g. v2/, v2-staging/, v3/whl/). "
"Required when using --auto-detect-prefixes."
"Base prefix for auto-detection (e.g. v2/, v2-staging/, v3/whl/). "
"Required when using --auto-detect-prefixes."

@araravik-psd araravik-psd requested a review from marbre March 30, 2026 14:36
@marbre
Copy link
Copy Markdown
Member

marbre commented Apr 1, 2026

@erman-gurses @HereThereBeDragons @amd-shiraz please review before I do a final round.

Copy link
Copy Markdown
Contributor

@erman-gurses erman-gurses left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some comments - they are minor issues in general.

Comment thread build_tools/third_party/s3_management/manage.py
initialize_bucket() is required for both CLI and Lambda usage.
Do not move bucket initialization exclusively into main(), as the
Lambda wrapper imports and calls update_pep503_index() directly.
'''
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: indentation slightly off for '''

# Must be provided via --bucket (CLI) or S3_BUCKET_PY (env).

# Bucket globals configured via initialize_bucket()
BUCKET_NAME: Optional[str] = None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the new annotations to use T | None per the style guide. Optional is still used elsewhere in the file, so I’ve kept the import to avoid unrelated refactors.

Comment on lines +47 to +50
# NOTE:
# initialize_bucket() is required for both CLI and Lambda usage.
# Do not move bucket initialization exclusively into main(), as the
# Lambda wrapper imports and calls update_pep503_index() directly.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you forgot to delete this after added into the function. I see the same sentence in docstring.

Copy link
Copy Markdown
Contributor

@HereThereBeDragons HereThereBeDragons left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only some minor improvments. and please also have a look at erman s comments.

Comment thread build_tools/third_party/s3_management/manage.py
Comment thread build_tools/third_party/s3_management/manage.py
Comment thread build_tools/third_party/s3_management/manage.py
Comment thread build_tools/third_party/s3_management/manage.py
Comment thread build_tools/third_party/s3_management/manage.py
@amd-shiraz
Copy link
Copy Markdown
Contributor

@araravik-psd is this PR ready to land after approval ? If yes can do another pass.

Copy link
Copy Markdown
Contributor

@erman-gurses erman-gurses left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, only some minor issues and questions.
Also, did we test it fully after the new changes applied?

for obj in all_sorted_packages:
full_package_name = path.basename(obj)
package_name = full_package_name.split('-')[0]
print(f"[DEBUG] Evaluating package: {package_name}")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this temporary? If not, we might remove it.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this should be removed

Suggested change
print(f"[DEBUG] Evaluating package: {package_name}")

Comment thread build_tools/third_party/s3_management/manage.py
Comment on lines +183 to +189
# Allow new ROCm multi-arch + device packages
ROCM_PACKAGE_PREFIXES = (
"rocm_",
"rocm-sdk", # safety for normalized names
"rocm_sdk_device",
"amd_torch_device",
)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the value of having this in an extra list? Some of those are already in PACKAGE_ALLOW_LIST and amd_torch_device is actually not a ROCm package.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We further need to add amd_torchvision_device but the place here seems wrong.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @marbre Removed the extra list and extended the existing allow-list logic instead. Added explicit handling for amd_torchvision_device* and similar dynamic packages.

@araravik-psd araravik-psd requested a review from marbre April 28, 2026 16:11
Comment thread build_tools/third_party/s3_management/manage.py
Comment thread build_tools/third_party/s3_management/manage.py
# Allow legacy packages + explicitly handle dynamic multi-arch packages
if (
pkg in PACKAGE_ALLOW_LIST
or pkg.startswith("rocm_sdk_device_")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not add those to PACKAGE_ALLOW_LIST?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are dynamic arch-suffixed packages (_gfx), so adding them to PACKAGE_ALLOW_LIST would require enumerating all variants. Using startswith avoids needing updates for every new architecture. This was my thinking here

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is the case how have the rocm-sdk-libraries-gfx* packages ever worked?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, looking at the actual wheel names again, the single-arch rocm_sdk_libraries_gfx* packages are also dynamically suffixed similar to the rocm_sdk_device_* wheels.

I am testing this now without the startswith() handling to verify whether the existing normalization/filtering logic already covers these correctly before keeping the additional prefix checks.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marbre @HereThereBeDragons I re-checked the original logic and rocm_sdk_libraries_gfx* packages were already being special-cased via:

match(r"rocm_sdk_libraries_gfx", package_name.lower())
https://github.com/ROCm/TheRock/blob/main/build_tools/third_party/s3_management/manage.py#L216

So they were never handled through PACKAGE_ALLOW_LIST.

The new startswith() handling is effectively extending the same pattern to the newer dynamically suffixed multi-arch package families like rocm_sdk_device_* and amd_torch_device_*.

@marbre marbre force-pushed the users/arravikum/manage_lambda_changes branch from 445bb8c to c4f34fe Compare April 29, 2026 19:43
Copy link
Copy Markdown
Member

@marbre marbre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The currently deployed Lambda creates duplicates:

Image

This was fixed with commit 319acc7 but the branch was never updated. I rebased and force pushed as we need to re-deploy the updated Lambda asap.

Copy link
Copy Markdown
Member

@marbre marbre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are still a few minor issues to address. @araravik-psd please follow up and let's land this latest tomorrow.

for obj in all_sorted_packages:
full_package_name = path.basename(obj)
package_name = full_package_name.split('-')[0]
print(f"[DEBUG] Evaluating package: {package_name}")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this should be removed

Suggested change
print(f"[DEBUG] Evaluating package: {package_name}")

# Allow legacy packages + explicitly handle dynamic multi-arch packages
if (
pkg in PACKAGE_ALLOW_LIST
or pkg.startswith("rocm_sdk_device_")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is the case how have the rocm-sdk-libraries-gfx* packages ever worked?

Comment thread build_tools/third_party/s3_management/manage.py
Comment thread build_tools/third_party/s3_management/manage.py
@araravik-psd araravik-psd force-pushed the users/arravikum/manage_lambda_changes branch from 5471a43 to 2360e78 Compare May 7, 2026 15:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: TODO

Development

Successfully merging this pull request may close these issues.

6 participants