Before starting a PySpark session with access to data stored in UltiHash, the user should make sure that the correct packages were downloaded and configure the S3A driver.
Copy pyspark \\
--packages org.apache.hadoop:hadoop-aws:3.3.4, #Hadoop package check
com.amazonaws:aws-java-sdk-bundle:1.12.262 \\ #AWS Java SDK package check
--conf spark.sql.catalog.iceberg=org.apache.iceberg.spark.SparkCatalog \\
--conf spark.sql.catalog.iceberg.type=hadoop \\
--conf spark.sql.catalog.iceberg.warehouse=s3a://iceberg \\
--conf spark.hadoop.fs.s3a.endpoint=http://127.0.0.1:8080 \\ #S3A driver config
--conf spark.hadoop.fs.s3a.access.key=TEST-USER \\ #S3A driver config
--conf spark.hadoop.fs.s3a.secret.key=SECRET \\ #S3A driver config
--conf spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem \\ #S3A driver config
--conf spark.hadoop.fs.s3a.path.style.access=true \\ #S3A driver config
--conf spark.hadoop.fs.s3a.connection.ssl.enabled=false \\ #S3A driver config
--conf spark.driver.bindAddress=127.0.0.1 \\
--conf spark.driver.host=127.0.0.1
See all information about the integration on GitHub here: https://github.com/UltiHash/scripts/tree/main/pyspark