> For the complete documentation index, see [llms.txt](https://docs.ultihash.io/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.ultihash.io/operations/prebuilt-connections/pyspark.md).

# PySpark

Before starting a PySpark session with access to data stored in UltiHash, the user should make sure that the correct packages were downloaded and configure the S3A driver.

```python
pyspark \\
  --packages org.apache.hadoop:hadoop-aws:3.3.4, #Hadoop package check
  com.amazonaws:aws-java-sdk-bundle:1.12.262 \\ #AWS Java SDK package check
  --conf spark.sql.catalog.iceberg=org.apache.iceberg.spark.SparkCatalog \\
  --conf spark.sql.catalog.iceberg.type=hadoop \\
  --conf spark.sql.catalog.iceberg.warehouse=s3a://iceberg \\
  --conf spark.hadoop.fs.s3a.endpoint=http://127.0.0.1:8080 \\ #S3A driver config
  --conf spark.hadoop.fs.s3a.access.key=TEST-USER \\ #S3A driver config
  --conf spark.hadoop.fs.s3a.secret.key=SECRET \\ #S3A driver config
  --conf spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem \\ #S3A driver config
  --conf spark.hadoop.fs.s3a.path.style.access=true \\ #S3A driver config
  --conf spark.hadoop.fs.s3a.connection.ssl.enabled=false \\ #S3A driver config
  --conf spark.driver.bindAddress=127.0.0.1 \\
  --conf spark.driver.host=127.0.0.1
```

> See all information about the integration on GitHub here: <https://github.com/UltiHash/scripts/tree/main/pyspark>