Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
50 changes: 17 additions & 33 deletions default_python/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,39 +2,18 @@

The 'default_python' project was generated by using the default-python template.

For documentation on the Databricks Asset Bundles format use for this project,
and for CI/CD configuration, see https://docs.databricks.com/aws/en/dev-tools/bundles.

## Getting started

Choose how you want to work on this project:

(a) Directly in your Databricks workspace, see
https://docs.databricks.com/dev-tools/bundles/workspace.

(b) Locally with an IDE like Cursor or VS Code, see
https://docs.databricks.com/vscode-ext.

(c) With command line tools, see https://docs.databricks.com/dev-tools/cli/databricks-cli.html


Dependencies for this project should be installed using UV:
0. Install UV: https://docs.astral.sh/uv/getting-started/installation/

* Make sure you have the UV package manager installed.
It's an alternative to tools like pip: https://docs.astral.sh/uv/getting-started/installation/.
* Run `uv sync --dev` to install the project's dependencies.
1. Install the Databricks CLI from https://docs.databricks.com/dev-tools/cli/databricks-cli.html

# Using this project using the CLI

The Databricks workspace and IDE extensions provide a graphical interface for working
with this project. It's also possible to interact with it directly using the CLI:

1. Authenticate to your Databricks workspace, if you have not done so already:
2. Authenticate to your Databricks workspace, if you have not done so already:
```
$ databricks configure
```

2. To deploy a development copy of this project, type:
3. To deploy a development copy of this project, type:
```
$ databricks bundle deploy --target dev
```
Expand All @@ -44,9 +23,9 @@ with this project. It's also possible to interact with it directly using the CLI
This deploys everything that's defined for this project.
For example, the default template would deploy a job called
`[dev yourname] default_python_job` to your workspace.
You can find that job by opening your workpace and clicking on **Jobs & Pipelines**.
You can find that job by opening your workpace and clicking on **Workflows**.

3. Similarly, to deploy a production copy, type:
4. Similarly, to deploy a production copy, type:
```
$ databricks bundle deploy --target prod
```
Expand All @@ -56,12 +35,17 @@ with this project. It's also possible to interact with it directly using the CLI
is paused when deploying in development mode (see
https://docs.databricks.com/dev-tools/bundles/deployment-modes.html).

4. To run a job or pipeline, use the "run" command:
5. To run a job or pipeline, use the "run" command:
```
$ databricks bundle run
```

5. Finally, to run tests locally, use `pytest`:
```
$ uv run pytest
```
6. Optionally, install the Databricks extension for Visual Studio code for local development from
https://docs.databricks.com/dev-tools/vscode-ext.html. It can configure your
virtual environment and setup Databricks Connect for running unit tests locally.
When not using these tools, consult your development environment's documentation
and/or the documentation for Databricks Connect for manually setting up your environment
(https://docs.databricks.com/en/dev-tools/databricks-connect/python/index.html).

7. For documentation on the Databricks asset bundles format used
for this project, and for CI/CD configuration, see
https://docs.databricks.com/dev-tools/bundles/index.html.
76 changes: 0 additions & 76 deletions default_python/conftest.py

This file was deleted.

16 changes: 11 additions & 5 deletions default_python/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,18 +4,24 @@ version = "0.0.1"
authors = [{ name = "user@company.com" }]
requires-python = ">= 3.11"

[dependency-groups]
[project.optional-dependencies]
dev = [
"pytest",

# Code completion support for DLT, also install databricks-connect
"databricks-dlt",

# databricks-connect can be used to run parts of this project locally.
# Note that for local development, you should use a version that is not newer
# than the remote cluster or serverless compute you connect to.
# See also https://docs.databricks.com/dev-tools/databricks-connect.html.
"databricks-connect>=15.4,<15.5",
# See https://docs.databricks.com/dev-tools/databricks-connect.html.
#
# Note, databricks-connect is automatically installed if you're using Databricks
# extension for Visual Studio Code
# (https://docs.databricks.com/dev-tools/vscode-ext/dev-tasks/databricks-connect.html).
#
# To manually install databricks-connect, uncomment the line below to install a version
# of db-connect that corresponds to the Databricks Runtime version used for this project.
# See https://docs.databricks.com/dev-tools/databricks-connect.html
# "databricks-connect>=15.4,<15.5",
]

[tool.pytest.ini_options]
Expand Down
2 changes: 1 addition & 1 deletion default_python/scratch/exploration.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@
"sys.path.append(\"../src\")\n",
"from default_python import main\n",
"\n",
"main.get_taxis().show(10)"
"main.get_taxis(spark).show(10)"
]
}
],
Expand Down
19 changes: 15 additions & 4 deletions default_python/src/default_python/main.py
Original file line number Diff line number Diff line change
@@ -1,13 +1,24 @@
from databricks.sdk.runtime import spark
from pyspark.sql import DataFrame
from pyspark.sql import SparkSession, DataFrame


def find_all_taxis() -> DataFrame:
def get_taxis(spark: SparkSession) -> DataFrame:
return spark.read.table("samples.nyctaxi.trips")


# Create a new Databricks Connect session. If this fails,
# check that you have configured Databricks Connect correctly.
# See https://docs.databricks.com/dev-tools/databricks-connect.html.
def get_spark() -> SparkSession:
try:
from databricks.connect import DatabricksSession

return DatabricksSession.builder.getOrCreate()
except ImportError:
return SparkSession.builder.getOrCreate()


def main():
find_all_taxis().show(5)
get_taxis(get_spark()).show(5)


if __name__ == "__main__":
Expand Down
2 changes: 1 addition & 1 deletion default_python/src/dlt_pipeline.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@
"source": [
"@dlt.view\n",
"def taxi_raw():\n",
" return main.find_all_taxis()\n",
" return main.get_taxis(spark)\n",
"\n",
"\n",
"@dlt.table\n",
Expand Down
2 changes: 1 addition & 1 deletion default_python/src/notebook.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@
"source": [
"from default_python import main\n",
"\n",
"main.find_all_taxis().show(10)"
"main.get_taxis(spark).show(10)"
]
}
],
Expand Down
6 changes: 3 additions & 3 deletions default_python/tests/main_test.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
from default_python import main
from default_python.main import get_taxis, get_spark


def test_find_all_taxis():
taxis = main.find_all_taxis()
def test_main():
taxis = get_taxis(get_spark())
assert taxis.count() > 5
8 changes: 4 additions & 4 deletions scripts/update_from_templates.sh
Original file line number Diff line number Diff line change
Expand Up @@ -50,12 +50,12 @@ if [ ! "$DATABRICKS_HOST" ]; then
exit 1
fi

if [ -n "$1" ]; then
# Prompt for CURRENT_USER_NAME if not passed as first arg
if [ -n "${1-}" ]; then
CURRENT_USER_NAME="$1"
else
read -p "Enter the current user name (e.g., 'lennart_kats'): " CURRENT_USER_NAME
read -p "Enter the current user name (e.g., 'lennart_kats'): " CURRENT_USER_NAME
if [ ! "$CURRENT_USER_NAME" ]; then
read -r -p "Enter the current user name of your 'DEFAULT' profile (e.g., 'lennart_kats'): " CURRENT_USER_NAME
if [ -z "${CURRENT_USER_NAME:-}" ]; then
echo "Error: current user name is required." >&2
exit 1
fi
Expand Down