Cheatsheet

Here you can find a collection of the most frequently-used commands and scripts when working with UltiHash.

API calls

UltiHash supports a wide range of S3 API operations, enabling you to manage your data effectively. Here are the core operations you can perform:

CreateBucket: Create new buckets in your UltiHash storage.

aws s3api create-bucket --bucket your-bucket-name --endpoint-url <https://your-ultihash-endpoint>

PutObject: Upload files to your UltiHash buckets.

aws s3 cp local-file.txt s3://your-bucket-name/ --endpoint-url <https://your-ultihash-endpoint>

GetObject: Retrieve files from your UltiHash storage.

aws s3 cp s3://your-bucket-name/file.txt local-file.txt --endpoint-url <https://your-ultihash-endpoint>

ListObjectsV2: List the contents of your buckets.

aws s3api list-objects-v2 --bucket your-bucket-name --endpoint-url <https://your-ultihash-endpoint>

DeleteObject:

Manage your storage by removing one unnecessary object.

aws s3api delete-object --bucket your-bucket-name --key your-object-key --endpoint-url <https://your-ultihash-endpoint>

Facilitate object deletion operations by removing all objects in one bucket

aws s3 rm s3://your-bucket-name/ --recursive --endpoint-url <https://your-ultihash-endpoint>

DeleteBucket: Manage your storage by removing unnecessary buckets (buckets should be empty before removal).

aws s3api delete-bucket --bucket your-bucket-name --endpoint-url <https://your-ultihash-endpoint>

You can access the full list of supported API calls via our documentation.

Integrations

Here are a few integration scripts to help you connect your stack to UltiHash. For more integration scripts, go to our Documentation.

PySpark

pyspark \\\\
--packages org.apache.hadoop:hadoop-aws:3.3.4, #Hadoop package check
com.amazonaws:aws-java-sdk-bundle:1.12.262 \\\\ #AWS Java SDK package check
--conf spark.sql.catalog.iceberg=org.apache.iceberg.spark.SparkCatalog \\\\
--conf spark.sql.catalog.iceberg.type=hadoop \\\\
--conf spark.sql.catalog.iceberg.warehouse=s3a://iceberg \\\\
--conf spark.hadoop.fs.s3a.endpoint=http://127.0.0.1:8080 \\\\ #S3A driver config
--conf spark.hadoop.fs.s3a.access.key=TEST-USER \\\\ #S3A driver config
--conf spark.hadoop.fs.s3a.secret.key=SECRET \\\\ #S3A driver config
--conf spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem \\\\ #S3A driver config
--conf spark.hadoop.fs.s3a.path.style.access=true \\\\ #S3A driver config
--conf spark.hadoop.fs.s3a.connection.ssl.enabled=false \\\\ #S3A driver config
--conf spark.driver.bindAddress=127.0.0.1 \\\\
--conf spark.driver.host=127.0.0.1

Iceberg

pyspark \\
--packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.8.1,org.apache.hadoop:hadoop-aws:3.3.4,com.amazonaws:aws-java-sdk-bundle:1.12.262 \\
--conf spark.sql.catalog.iceberg=org.apache.iceberg.spark.SparkCatalog \\
--conf spark.sql.catalog.iceberg.type=hadoop \\
--conf spark.sql.catalog.iceberg.warehouse=s3a://iceberg \\
--conf spark.hadoop.fs.s3a.endpoint=http://127.0.0.1:8080 \\
--conf spark.hadoop.fs.s3a.access.key=TEST-USER \\
--conf spark.hadoop.fs.s3a.secret.key=SECRET \\
--conf spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem \\
--conf spark.hadoop.fs.s3a.path.style.access=true \\
--conf spark.hadoop.fs.s3a.connection.ssl.enabled=false \\
--conf spark.driver.bindAddress=127.0.0.1 \\
--conf spark.driver.host=127.0.0.1

Presto

connector.name=hive-hadoop2
hive.metastore.uri=thrift://hive-metastore:9083
hive.non-managed-table-writes-enabled=true
hive.s3.endpoint=https://ultihash
hive.s3.path-style-access=true
hive.s3.aws-access-key=mocked
hive.s3.aws-secret-key=mocked

AWS Glue

import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
 
## @params: [JOB_NAME]
args = getResolvedOptions(sys.argv, ['JOB_NAME'])

# Define the UltiHash endpoint URL 
s3_endpoint = "<https://ultihash.cluster.io>"

sc = SparkContext()
# AWS access and secret keys could be any, since authentication is not yet supported by UltiHash
sc._jsc.hadoopConfiguration().set("fs.s3a.access.key", "mocked")  # Replace with the corresponding UltiHash credentials 
sc._jsc.hadoopConfiguration().set("fs.s3a.secret.key", "mocked")  # Replace with the corresponding UltiHash credentials
# The S3 endpoint is a URL pointing to the deployed UltiHash cluster
sc._jsc.hadoopConfiguration().set("fs.s3a.endpoint", s3_endpoint)
# S3 path style access has to be enabled
sc._jsc.hadoopConfiguration().set("fs.s3a.path.style.access", "true")
sc._jsc.hadoopConfigration().set("fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")
sc._jsc.hadoopConfiguration().set("fs.s3a.path.style.access", "true")
sc._jsc.hadoopConfiguration().set("fs.s3a.connection.ssl.enabled", "false")

glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)

Last updated 1 month ago

Was this helpful?