Skip to content

Add download files to example Python Large Output Files#503

Open
saimanikant wants to merge 5 commits into
mainfrom
mguntupa/example
Open

Add download files to example Python Large Output Files#503
saimanikant wants to merge 5 commits into
mainfrom
mguntupa/example

Conversation

@saimanikant

Copy link
Copy Markdown
Collaborator

Description

Please provide a brief description of the changes in this pull request.

Checklist

  • I have tested these changes locally.
  • I have added unit tests (if appropriate).
  • I have added necessary documentation or updated existing documentation.
  • I have linked the issue(s) addressed by this PR if any.

@codecov

codecov Bot commented Nov 28, 2024

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 92.45%. Comparing base (b7f1e48) to head (cd35015).
⚠️ Report is 194 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #503   +/-   ##
=======================================
  Coverage   92.45%   92.45%           
=======================================
  Files          64       64           
  Lines        2599     2599           
=======================================
  Hits         2403     2403           
  Misses        196      196           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment on lines +23 to +25
"""
Example to query resources from a project.

- Query values from evaluated jobs, computing some simple statistics on parameter values.
- Download files from the project

"""

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please update the docstring, there's no querying of statistics here

log.info(
f"=== Example 1: Downloading output files of {num} jobs using ProjectApi.download_file()"
)
for job in jobs[0:num]:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for job in jobs[0:num]:
for job in jobs:

num = len(jobs)

log.info(
f"=== Example 1: Downloading output files of {num} jobs using ProjectApi.download_file()"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no example 2, please clean up the msg

for f in files:
fpath = os.path.join(out_path, f"task_{task.id}")
log.info(f"Download output file {f.evaluation_path} to {fpath}")
start = time.process_time()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Best to use the same timing function you use 2 lines later, otherwise you get meaningless numbers

Suggested change
start = time.process_time()
start = time.time()

args = parser.parse_args()

logger = logging.getLogger()
logging.basicConfig(format="%(message)s", level=logging.DEBUG)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe default the log level to INFO, to avoid the many DT client debug messages


def download_files(client, project_name):
"""Download files."""
out_path = os.path.join(os.path.dirname(__file__), "downloads")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd make the download dir a configurable CLI argument and even default it to a temp directory if unset, those downloaded files are of no use.

for f in files:
fpath = os.path.join(out_path, f"task_{task.id}")
log.info(f"Download output file {f.evaluation_path} to {fpath}")
start = time.process_time()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first timing won't be accurate because it will include initialization of the DT client. Is there a way to force that to happen outside of the loop?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants