-
Notifications
You must be signed in to change notification settings - Fork 580
Open
Description
I successfully cloned the repository and could run docker compose up -d --build and the container is shown all green in Docker Desktop. I could also connect to Jupyter notebook locally on port 8888 and run the first script.
However, the second script directly fails with a java error:
[/opt/spark/bin/spark-class](http://localhost:8888/opt/spark/bin/spark-class): line 71: [/usr/lib/jvm/java-17-openjdk-amd64/bin/java](http://localhost:8888/usr/lib/jvm/java-17-openjdk-amd64/bin/java): No such file or directory
[/opt/spark/bin/spark-class](http://localhost:8888/opt/spark/bin/spark-class): line 97: CMD: bad array subscript
---------------------------------------------------------------------------
PySparkRuntimeError Traceback (most recent call last)
File [/home/airflow/notebooks/run_ddl.py:14](http://localhost:8888/lab/tree/notebooks/notebooks/run_ddl.py#line=13)
11 logger = logging.getLogger(__name__)
13 # Create Spark session
---> 14 spark = SparkSession.builder.appName("Run DDLs for TPCH data").getOrCreate()
16 spark.sql("CREATE SCHEMA IF NOT EXISTS prod_db")
17 logger.info("Dropping any existing TPCH tables")
File /home/airflow/.venv/lib/python3.13/site-packages/pyspark/sql/session.py:556, in SparkSession.Builder.getOrCreate(self)
554 sparkConf.set(key, value)
555 # This SparkContext may be an existing one.
--> 556 sc = SparkContext.getOrCreate(sparkConf)
557 # Do not update `SparkConf` for existing `SparkContext`, as it's shared
558 # by all sessions.
559 session = SparkSession(sc, options=self._options)
File /home/airflow/.venv/lib/python3.13/site-packages/pyspark/core/context.py:523, in SparkContext.getOrCreate(cls, conf)
521 with SparkContext._lock:
522 if SparkContext._active_spark_context is None:
--> 523 SparkContext(conf=conf or SparkConf())
524 assert SparkContext._active_spark_context is not None
525 return SparkContext._active_spark_context
File /home/airflow/.venv/lib/python3.13/site-packages/pyspark/core/context.py:205, in SparkContext.__init__(self, master, appName, sparkHome, pyFiles, environment, batchSize, serializer, conf, gateway, jsc, profiler_cls, udf_profiler_cls, memory_profiler_cls)
199 if gateway is not None and gateway.gateway_parameters.auth_token is None:
200 raise ValueError(
201 "You are trying to pass an insecure Py4j gateway to Spark. This"
202 " is not allowed as it is a security risk."
203 )
--> 205 SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)
206 try:
207 self._do_init(
208 master,
209 appName,
(...) 219 memory_profiler_cls,
220 )
File /home/airflow/.venv/lib/python3.13/site-packages/pyspark/core/context.py:444, in SparkContext._ensure_initialized(cls, instance, gateway, conf)
442 with SparkContext._lock:
443 if not SparkContext._gateway:
--> 444 SparkContext._gateway = gateway or launch_gateway(conf)
445 SparkContext._jvm = SparkContext._gateway.jvm
447 if instance:
File /home/airflow/.venv/lib/python3.13/site-packages/pyspark/java_gateway.py:111, in launch_gateway(conf, popen_kwargs)
108 time.sleep(0.1)
110 if not os.path.isfile(conn_info_file):
--> 111 raise PySparkRuntimeError(
112 errorClass="JAVA_GATEWAY_EXITED",
113 messageParameters={},
114 )
116 with open(conn_info_file, "rb") as info:
117 gateway_port = read_int(info)
PySparkRuntimeError: [JAVA_GATEWAY_EXITED] Java gateway process exited before sending its port number.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels