diff --git a/docs/examples/getting_started/Dockerfile b/docs/examples/getting_started/Dockerfile index 1ca9458..de1fc9b 100644 --- a/docs/examples/getting_started/Dockerfile +++ b/docs/examples/getting_started/Dockerfile @@ -1,2 +1,11 @@ -FROM python:3.11-slim -RUN pip install pandas scikit-learn duckdb +# Use the official Python 3.11 image +FROM python:3.11 + +# Set the working directory inside the container +WORKDIR /app + +# Install Python dependencies +RUN pip install --no-cache-dir pandas duckdb + +# Define the command to run your application +CMD ["python3", "-c", "print('Hello!')"] diff --git a/docs/examples/getting_started/cloudclient_walkthrough.ipynb b/docs/examples/getting_started/cloudclient_walkthrough.ipynb index dea8f9f..1ae91a8 100644 --- a/docs/examples/getting_started/cloudclient_walkthrough.ipynb +++ b/docs/examples/getting_started/cloudclient_walkthrough.ipynb @@ -38,7 +38,7 @@ "id": "b60f671d", "metadata": {}, "source": [ - "The initialization below is the simplest way to create and instance of the `CloudClient` class. If a variable called AZURE_KEYVAULT_NAME is saved to your environment, the `CloudClient` will initialize based on some Azure values stored in the Key Vault. Otherwise it will use environment variables or values stored in a .env file to authenticate, like the .env file stored [here](../../files/sample.env), and a managed identity credential based on your local working environment. The .env file should be stored at the same level in the directory in which you're working." + "The initialization below is the simplest way to create and instance of the `CloudClient` class. If a variable called AZURE_KEYVAULT_NAME is saved to your environment, the `CloudClient` will initialize based on some Azure values stored in the Key Vault. Otherwise it will use environment variables or values stored in a .env file to authenticate, like the .env file stored [here](../../files/sample.env), and a managed identity credential based on your local working environment. The .env file should be stored at the same level in the directory in which you're working. **Make sure to update your .env file based on the sample with values relevant to your Azure environment.**" ] }, { @@ -61,7 +61,7 @@ "id": "a2643149", "metadata": {}, "source": [ - "We could also specify the Key Vault directly." + "We could also specify the Key Vault directly. If a Key Vault is specified, a .env file is no longer needed. This is the easiest way to authenticate using CFA's Key Vault." ] }, { @@ -139,7 +139,24 @@ "\n", "There are plenty of times when local files would need to be uploaded to Blob Storage. Files can be referenced from within a running job via a mount in the pool. Scripts in Blob Storage can also be referenced in the command line for the task execution.\n", "\n", - "For example, we have the `main.py` file that we want to upload to the Blob container 'input-test' in order to use it for a future task. The following code will upload to the root of the specified container." + "For example, we have the `main.py` file that we want to upload to the Blob container 'input-test' in order to use it for a future task. The following code will upload to the root of the specified container. *Note that the container must already exist in Blob Storage.*\n", + "\n", + "For experimentation, you should create a new testing container (like \"input-test-\" for example) and be sure not to overwrite anything important. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "fe7a4c5b", + "metadata": { + "vscode": { + "languageId": "plaintext" + } + }, + "outputs": [], + "source": [ + "# uncomment below and input your username to create a new blob container\n", + "#cc.create_blob_container(\"input-test-\")" ] }, { @@ -153,6 +170,7 @@ }, "outputs": [], "source": [ + "# upload main.py to container \"input-test\"\n", "cc.upload_files(\n", " \"main.py\",\n", " container_name = \"input-test\"\n", @@ -164,9 +182,11 @@ "id": "1f65c7b6", "metadata": {}, "source": [ - "## Upload Image to Container Registry\n", + "## Upload Image to Azure Container Registry\n", + "\n", + "Batch pools can use images from Azure Container Registry, GitHub Container Registry, or Docker Hub. Suppose we want to package a local Dockerfile (python image with a few requirements) and upload to the Azure Container Registry for use by the pool. This Dockerfile should exist at the root of your working directory, or you can specify the path to the Dockerfile. The following code would do the trick if your Dockerfile exists at the root of your working directory. Make sure to reference the correct registry name.\n", "\n", - "Batch pools can use images from Azure Container Registry, GitHub Container Registry, or Docker Hub. Suppose we want to package the local Dockerfile (python image with a few requirements) and upload to the Azure Container Registry for use by the pool. The following code would do the trick. Make sure to reference the correct registry name." + "Your Dockerfile can be the same Dockerfile for running your code in a container locally. See the [Docker Docs](https://docs.docker.com/) for help getting started with Docker. You can also find an example python Dockerfile [here](./Dockerfile)." ] }, { @@ -183,7 +203,8 @@ "container_name = cc.package_and_upload_dockerfile(\n", " registry_name = \"my_azure_registry\",\n", " repo_name = \"simple_test\",\n", - " tag = \"latest\"\n", + " tag = \"latest\",\n", + " path_to_dockerfile = \"./Dockerfile\" #this line only needed if Dockerfile not at root of working directory\n", ")" ] }, @@ -194,7 +215,13 @@ "source": [ "## Create a Pool\n", "\n", - "Pools are usually created for each team or per project. It spins up nodes when necessary based on the container you specify. The following would create a pool based on the Docker image we just uploaded, autoscaling to 5 nodes, mounting to the 'input-test' container we uploaded to, an 8 core CPU, and call it 'getting-started-pool'. " + "Pools are usually created for each team or per project. It spins up nodes when necessary based on the container you specify. \n", + "\n", + "It's at this point we specify which Blob Containers we mount to the pool. This will make blobs in Blob Storage accessible to read or write for the containers that we mount. The mounts are then accessible in your code at the root of the node, i.e. a mounted container called 'input-test' will be accessible in your code via `/input-test`. \n", + "\n", + "The following would create a pool based on the Docker image we just uploaded, autoscaling to 5 nodes, mounting to the 'input-test' container to which we uploaded, use an 8 core CPU, and call it 'getting-started-pool'. \n", + "\n", + "You could also specify vm_size from a list of xsmall, small, medium, large, and xlarge. These will use 2, 4, 8, 16, or 32 cores, respectively." ] }, { @@ -224,7 +251,7 @@ "source": [ "## Create a Job\n", "\n", - "Now we can create a job to run our set of tasks. Let's call it 'getting-started-job'." + "Now we can create a job to run our set of tasks. Let's call it 'getting-started-job'. Jobs are meant to capture all the tasks for one goal. For example, if we wanted to run a model for each state then compile the outputs, this whole process would make up one job. Each model and compilation would be an individual task in the job." ] }, { @@ -252,7 +279,9 @@ "source": [ "## Add Tasks to Job\n", "\n", - "At this point we are ready to add tasks to the job we created. We can run the `main.py` python script that we uploaded to the 'input-test' container. It takes an argument called '--user' and prints a welcome message to the console. We will add two tasks to our job for two different users. In general, any number of tasks can be added to a job." + "At this point we are ready to add tasks to the job we created. We can run the `main.py` python script that we uploaded to the 'input-test' container. It takes an argument called '--user' and prints a welcome message to the console. We will add two tasks to our job for two different users. In general, any number of tasks can be added to a job.\n", + "\n", + "Notice that we reference the mounted Blob Container with /input-test (the leading / is important)." ] }, { diff --git a/docs/files/sp_sample.env b/docs/files/sp_sample.env index 4a5f24b..aa25fb1 100644 --- a/docs/files/sp_sample.env +++ b/docs/files/sp_sample.env @@ -2,7 +2,6 @@ AZURE_TENANT_ID="your azure tenant id" AZURE_SUBSCRIPTION_ID="your subscription id" AZURE_CLIENT_ID="your azure service principal client id" -AZURE_SP_CLIENT_ID="your azure service principal client id" AZURE_CLIENT_SECRET="your client secret" #pragma: allowlist secret # Azure account info diff --git a/docs/overview.md b/docs/overview.md index 386f37c..17e97aa 100644 --- a/docs/overview.md +++ b/docs/overview.md @@ -31,3 +31,5 @@ There are several components of this repo that provide benefits to developers in - CloudClient object for easy interaction with the cloud - more info found [here](./CloudClient/index.md) - automation component to run jobs/tasks from a configuration file +- ContainerAppClient object for easy interaction with Container App Jobs + - more info found [here](./ContainerAppClient/index.md) diff --git a/docs/troubleshooting.md b/docs/troubleshooting.md index e61b2ef..d41073f 100644 --- a/docs/troubleshooting.md +++ b/docs/troubleshooting.md @@ -7,3 +7,14 @@ The default authentication method for the `CloudClient` is a Managed Identity. If your Managed Identity on your VM is not setup at all or not setup correctly, you will experience issues authenticating. Solution: confirm your VM has the right Managed Identity setup for the Azure environment. If working at CFA, please reach out to the CFA Tools Teams. An easy way to check your Managed Identity is to run `az login --identity` in your terminal. + + +### Error Instantiating CloudClient + +If you experience errors when creating an instance of `CloudClient()` using a .env file, it's possible the issue is coming from the .env file itself. Make sure the keys in your .env file match exactly with the keys in the sample .env. If all keys are present, it's likely an issue with a value in the .env. Confirm all values are correct. + +### File Not Found During Job + +If you are interacting with files during a job and getting errors like a file is not found, it can be originating from two places: +1. incorrect mount reference. The blob container should be mounted during pool creation and referenced at the root of the Docker container. For example, a container called `my-container` would be referenced as `/my-container` in code, unless you provided a relative mount path when creating the pool. +2. file not present in container. If you are referencing a file that should exist in your Docker container, confirm the path where it exists. Note that Docker sets a working directory so any relative paths will start from the working directory specified in your Docker container.