Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
50 changes: 33 additions & 17 deletions default_python/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,18 +2,39 @@

The 'default_python' project was generated by using the default-python template.

For documentation on the Databricks Asset Bundles format use for this project,
and for CI/CD configuration, see https://docs.databricks.com/aws/en/dev-tools/bundles.

## Getting started

0. Install UV: https://docs.astral.sh/uv/getting-started/installation/
Choose how you want to work on this project:

(a) Directly in your Databricks workspace, see
https://docs.databricks.com/dev-tools/bundles/workspace.

(b) Locally with an IDE like Cursor or VS Code, see
https://docs.databricks.com/vscode-ext.

(c) With command line tools, see https://docs.databricks.com/dev-tools/cli/databricks-cli.html


Dependencies for this project should be installed using uv:

1. Install the Databricks CLI from https://docs.databricks.com/dev-tools/cli/databricks-cli.html
* Make sure you have the UV package manager installed.
It's an alternative to tools like pip: https://docs.astral.sh/uv/getting-started/installation/.
* Run `uv sync --dev` to install the project's dependencies.

2. Authenticate to your Databricks workspace, if you have not done so already:
# Using this project using the CLI

The Databricks workspace and IDE extensions provide a graphical interface for working
with this project. It's also possible to interact with it directly using the CLI:

1. Authenticate to your Databricks workspace, if you have not done so already:
```
$ databricks configure
```

3. To deploy a development copy of this project, type:
2. To deploy a development copy of this project, type:
```
$ databricks bundle deploy --target dev
```
Expand All @@ -23,9 +44,9 @@ The 'default_python' project was generated by using the default-python template.
This deploys everything that's defined for this project.
For example, the default template would deploy a job called
`[dev yourname] default_python_job` to your workspace.
You can find that job by opening your workpace and clicking on **Workflows**.
You can find that job by opening your workpace and clicking on **Jobs & Pipelines**.

4. Similarly, to deploy a production copy, type:
3. Similarly, to deploy a production copy, type:
```
$ databricks bundle deploy --target prod
```
Expand All @@ -35,17 +56,12 @@ The 'default_python' project was generated by using the default-python template.
is paused when deploying in development mode (see
https://docs.databricks.com/dev-tools/bundles/deployment-modes.html).

5. To run a job or pipeline, use the "run" command:
4. To run a job or pipeline, use the "run" command:
```
$ databricks bundle run
```
6. Optionally, install the Databricks extension for Visual Studio code for local development from
https://docs.databricks.com/dev-tools/vscode-ext.html. It can configure your
virtual environment and setup Databricks Connect for running unit tests locally.
When not using these tools, consult your development environment's documentation
and/or the documentation for Databricks Connect for manually setting up your environment
(https://docs.databricks.com/en/dev-tools/databricks-connect/python/index.html).

7. For documentation on the Databricks asset bundles format used
for this project, and for CI/CD configuration, see
https://docs.databricks.com/dev-tools/bundles/index.html.

5. Finally, to run tests locally, use `pytest`:
```
$ uv run pytest
```
20 changes: 7 additions & 13 deletions default_python/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -2,26 +2,20 @@
name = "default_python"
version = "0.0.1"
authors = [{ name = "user@company.com" }]
requires-python = ">= 3.11"
requires-python = ">=3.10,<=3.13"

[project.optional-dependencies]
[dependency-groups]
dev = [
"pytest",

# Code completion support for DLT, also install databricks-connect
# Code completion support for Lakeflow Declarative Pipelines, also install databricks-connect
"databricks-dlt",

# databricks-connect can be used to run parts of this project locally.
# See https://docs.databricks.com/dev-tools/databricks-connect.html.
#
# Note, databricks-connect is automatically installed if you're using Databricks
# extension for Visual Studio Code
# (https://docs.databricks.com/dev-tools/vscode-ext/dev-tasks/databricks-connect.html).
#
# To manually install databricks-connect, uncomment the line below to install a version
# of db-connect that corresponds to the Databricks Runtime version used for this project.
# See https://docs.databricks.com/dev-tools/databricks-connect.html
# "databricks-connect>=15.4,<15.5",
# Note that for local development, you should use a version that is not newer
# than the remote cluster or serverless compute you connect to.
# See also https://docs.databricks.com/dev-tools/databricks-connect.html.
"databricks-connect>=15.4,<15.5",
]

[tool.pytest.ini_options]
Expand Down
2 changes: 1 addition & 1 deletion default_python/resources/default_python.job.yml
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,6 @@ resources:
# Full documentation of this spec can be found at:
# https://docs.databricks.com/api/workspace/jobs/create#environments-spec
spec:
client: "2"
environment_version: "2"
dependencies:
- ../dist/*.whl
2 changes: 1 addition & 1 deletion default_python/resources/default_python.pipeline.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ resources:
serverless: true
libraries:
- notebook:
path: ../src/dlt_pipeline.ipynb
path: ../src/pipeline.ipynb

configuration:
bundle.sourcePath: ${workspace.file_path}/src
2 changes: 1 addition & 1 deletion default_python/scratch/exploration.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@
"sys.path.append(\"../src\")\n",
"from default_python import main\n",
"\n",
"main.get_taxis(spark).show(10)"
"main.get_taxis().show(10)"
]
}
],
Expand Down
19 changes: 4 additions & 15 deletions default_python/src/default_python/main.py
Original file line number Diff line number Diff line change
@@ -1,24 +1,13 @@
from pyspark.sql import SparkSession, DataFrame
from databricks.sdk.runtime import spark
from pyspark.sql import DataFrame


def get_taxis(spark: SparkSession) -> DataFrame:
def find_all_taxis() -> DataFrame:
return spark.read.table("samples.nyctaxi.trips")


# Create a new Databricks Connect session. If this fails,
# check that you have configured Databricks Connect correctly.
# See https://docs.databricks.com/dev-tools/databricks-connect.html.
def get_spark() -> SparkSession:
try:
from databricks.connect import DatabricksSession

return DatabricksSession.builder.getOrCreate()
except ImportError:
return SparkSession.builder.getOrCreate()


def main():
get_taxis(get_spark()).show(5)
find_all_taxis().show(5)


if __name__ == "__main__":
Expand Down
90 changes: 0 additions & 90 deletions default_python/src/dlt_pipeline.ipynb

This file was deleted.

2 changes: 1 addition & 1 deletion default_python/src/notebook.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@
"source": [
"from default_python import main\n",
"\n",
"main.get_taxis(spark).show(10)"
"main.find_all_taxis().show(10)"
]
}
],
Expand Down
6 changes: 3 additions & 3 deletions default_python/tests/main_test.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
from default_python.main import get_taxis, get_spark
from default_python import main


def test_main():
taxis = get_taxis(get_spark())
def test_find_all_taxis():
taxis = main.find_all_taxis()
assert taxis.count() > 5
2 changes: 2 additions & 0 deletions lakeflow_pipelines_python/.gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,5 +4,7 @@ dist/
__pycache__/
*.egg-info
.venv/
scratch/**
!scratch/README.md
**/explorations/**
**/!explorations/README.md
4 changes: 2 additions & 2 deletions lakeflow_pipelines_python/.vscode/extensions.json
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
{
"recommendations": [
"databricks.databricks",
"ms-python.vscode-pylance",
"redhat.vscode-yaml"
"redhat.vscode-yaml",
"ms-python.black-formatter"
]
}
28 changes: 23 additions & 5 deletions lakeflow_pipelines_python/.vscode/settings.json
Original file line number Diff line number Diff line change
@@ -1,19 +1,37 @@
{
"python.analysis.stubPath": ".vscode",
"databricks.python.envFile": "${workspaceFolder}/.env",
"jupyter.interactiveWindow.cellMarker.codeRegex": "^# COMMAND ----------|^# Databricks notebook source|^(#\\s*%%|#\\s*\\<codecell\\>|#\\s*In\\[\\d*?\\]|#\\s*In\\[ \\])",
"jupyter.interactiveWindow.cellMarker.default": "# COMMAND ----------",
"python.testing.pytestArgs": [
"."
],
"python.testing.unittestEnabled": false,
"python.testing.pytestEnabled": true,
"python.analysis.extraPaths": ["resources/lakeflow_pipelines_python_pipeline"],
"files.exclude": {
"**/*.egg-info": true,
"**/__pycache__": true,
".pytest_cache": true,
"dist": true,
},
"files.associations": {
"**/.gitkeep": "markdown"
},

// Pylance settings (VS Code)
// Set typeCheckingMode to "basic" to enable type checking!
"python.analysis.typeCheckingMode": "off",
"python.analysis.extraPaths": ["src", "lib", "resources"],
"python.analysis.diagnosticMode": "workspace",
"python.analysis.stubPath": ".vscode",

// Pyright settings (Cursor)
// Set typeCheckingMode to "basic" to enable type checking!
"cursorpyright.analysis.typeCheckingMode": "off",
"cursorpyright.analysis.extraPaths": ["src", "lib", "resources"],
"cursorpyright.analysis.diagnosticMode": "workspace",
"cursorpyright.analysis.stubPath": ".vscode",

// General Python settings
"python.defaultInterpreterPath": "./.venv/bin/python",
"python.testing.unittestEnabled": false,
"python.testing.pytestEnabled": true,
"[python]": {
"editor.defaultFormatter": "ms-python.black-formatter",
"editor.formatOnSave": true,
Expand Down
Loading