Test UltiHash

This guide will show you how to set up a test environment for UltiHash.

You'll set up a local Kubernetes environment using Minikube, containerized by Docker.

The main steps are as follows:

1

Install prerequisite tools

2

Set up a local Kubernetes cluster

3

Deploy UltiHash with Helm

4

Integrate sample data

5

See space savings

This setup is intended for local testing - not production use.

For now, UltiHash is only supported on Linux. This guide provides commands to be run in your terminal, and assumes you're running Ubuntu LTS on an AMD64 (x86_64) architecture. Other distributions and ARM architectures should work fine, although some commands may need slight adjustment.

1. Install prerequisite tools

Before you start setting up the UltiHash cluster, you need some tools installed. If you already have any of these installed, you can simply skip that step.

1

Install Docker Engine

Docker provides a containerized virtual environment for Minikube to run on.

You can find general instructions for installing Docker Engine at docs.docker.com/engine/install.

To quickly install, run:

# Linux installation: Update package index, install prerequisites, and set up Docker’s GPG key
sudo apt-get update
sudo apt-get install ca-certificates curl
sudo install -m 0755 -d /etc/apt/keyrings
sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
sudo chmod a+r /etc/apt/keyrings/docker.asc

# Add the Docker repository to Apt sources and update package index
echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \
  $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
  sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update

# Install Docker Engine, CLI, and related plugins
sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin

After installing Docker, you may need to add your user to the Docker group.

Run:

sudo usermod -aG docker $USER

Make sure to restart your computer at this stage to apply the group changes.

2

Install Minikube

Minikube lets you run a single-node Kubernetes cluster locally for development and testing. In this case we will use Docker as its container engine.

You can find general instructions for installing Minikube at minikube.sigs.k8s.io/docs/start.

To quickly install, run:

# Download latest Minikube for Linux AMD64
curl -LO https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64

# Install Minikube and clean up
sudo install minikube-linux-amd64 /usr/local/bin/minikube && rm minikube-linux-amd64

3

Install kubectl

Kubectl is a tool for interacting with your Kubernetes clusters, allowing you to manage and deploy applications, inspect resources, and troubleshoot issues.

You can find general instructions for installing Kubectl at kubernetes.io/docs/tasks/tools/install-kubectl-linux.

To quickly install, run:

# Download kubectl for Linux AMD64
curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"

# Install kubectl to /usr/local/bin with root permissions
sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl

4

Install Helm

Helm is a package manager for Kubernetes. It makes it easy to manage, install, and update applications (like UltiHash) on your clusters.

You can find general instructions for installing Helm at https://helm.sh/docs/intro/install.

To quickly install, run:

# Add Helm's GPG key, set up repo, and update packages
curl https://baltocdn.com/helm/signing.asc | gpg --dearmor | sudo tee /usr/share/keyrings/helm.gpg > /dev/null
sudo apt-get install apt-transport-https --yes
echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/helm.gpg] https://baltocdn.com/helm/stable/debian/ all main" | sudo tee /etc/apt/sources.list.d/helm-stable-debian.list
sudo apt-get update

# Install Helm
sudo apt-get install helm

5

Install AWS CLI

The AWS CLI is a unified tool to manage AWS services from the command line.

You can find general instructions for installing the AWS CLI at docs.aws.amazon.com/cli/latest/userguide/getting-started-install.

To quickly install, run:

# Download and unzip AWS CLI installer
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip awscliv2.zip

# Install AWS CLI
sudo ./aws/install

6

Install boto3 (and tdqm)

The Amazon Web Services (AWS) SDK for Python (often referred to as boto3, allows you to interact with AWS services programatically.

To install, run:

sudo apt install python3-boto3

tqdm is a Python package that provides a progress bar, which will be used in the upload scripts.

To install, run:

sudo apt install python3-tqdm

Done! You've successfully installed all the prerequisites for testing UltiHash.

Next, you'll set up your local Kubernetes cluster using Minikube.

2. Set up a local Kubernetes cluster

Now you’ll set up the local Kubernetes cluster for UltiHash. This involves setting up the Minikube environment, creating a dedicated namespace for UltiHash, and provisioning Kubernetes with necessary credentials using secrets.

1

Set up Minikube environment

Create a local Kubernetes cluster:

minikube start --cpus='no-limit' --memory='no-limit'

This command removes the limits on CPU and memory usage to ensure performance with larger uploads; remove these arguments if you prefer.


Next, ensure kubectl has access and the cluster node has been provisioned:

kubectl get nodes

You should see something like:

NAME       STATUS   ROLES           AGE   VERSION
minikube   Ready    control-plane   32s   v1.28.3

The Nginx Ingress Controller manages internal routing within your Minikube cluster, allowing UltiHash services to communicate efficiently.

Install it with:

minikube addons enable ingress

2

Create namespace

Choose and create a namespace for the UltiHash installation:

kubectl create ns <namespace>

Make sure to replace <namespace> with your chosen namespace, e.g. uh-test-namespace.

3

Provision credentials using secrets

For this step, you'll need these credentials from your UltiHash Dashboard:

  • Registry login

  • Registry password

  • License key

  • Monitoring token

Provision a secret to store the UltiHash registry credentials:

kubectl create secret docker-registry registry-credentials -n <namespace> --docker-server='registry.ultihash.io' --docker-username='<registry-login>' --docker-password='<registry-password>'

Make sure to replace <namespace> with your chosen name. Also replace <registry-login> and <registry-password> with the credentials from your Dashboard.


Provision a secret to store the UltiHash license key and monitoring token:

kubectl create secret generic ultihash -n <namespace> --from-literal=license='<license-key>' --from-literal=token='<monitoring-token>'

Make sure to replace <namespace> with your chosen name. Also replace <license-key> and <monitoring-token> with the credentials from your Dashboard.

Done!

You’ve successfully set up your local Kubernetes cluster. Next, let's configure and deploy UltiHash with Helm.

3. Configure and deploy Helm chart

Now that your Kubernetes environment is ready, it’s time to configure and deploy the UltiHash Helm chart. This will set up the necessary resources and configurations to run UltiHash in your cluster.

1

Create the values.yaml configuration file

Create a file named values.yaml with any text editor.

This file will define the settings for your UltiHash deployment.

Copy and paste the following content:

etcd:
  replicaCount: 1
  persistence:
    storageClass: standard

database:
  primary:
    persistence:
      storageClass: standard
      size: 10Gi

entrypoint:
  replicas: 1
  ingress:
    host: 

storage:
  replicas: 1
  storageClass: standard
  storageSize: 10Gi

deduplicator:
  replicas: 1
  storageClass: standard
  storageSize: 10Gi
  

If you want, you can adjust the number of replicas and storage size for your test. For storage size, you must give the size in gigabytes as Gi - not gb or similar.


Save values.yaml in an easy-to-access place.

2

Log in to the Helm chart registry

For this step, you'll need these credentials from your UltiHash Dashboard:

  • Registry login

  • Registry password

To log in to the registry, run:

helm registry login registry.ultihash.io -u <registry-login>

Make sure to replace <registry-login> with the credentials from your Dashboard.


Enter your registry password when prompted.

You won't be able to see it when you type it in.

3

Deploy Helm chart

Deploy the Helm chart to your cluster with a release name of your choosing:

helm install <release> oci://registry.ultihash.io/stable/ultihash-cluster -n <namespace> --values <values-yaml-path> --wait --timeout 10m

Make sure to replace <release> with your chosen release name, e.g. uh-test-release. Also make sure to replace <values-yaml-path> with the actual path to your values.yaml file, such as /home/user/values.yaml. On Ubuntu, copying the file in the file browser automatically copies the path to your clipboard as well.

This deployment may take up to 10 minutes.

4

Get access to the UltiHash cluster

Run the following command to retrieve the IP address of your Minikube cluster and set it as the ClusterURL:

export ClusterURL=http://`minikube ip`

This command saves the URL of your UltiHash cluster in an environment variable called ClusterURL, which you’ll use to connect to the cluster locally.

5

Retrieve root user credentials

UltiHash requires AWS-style credentials for access. To obtain the root user’s access and secret keys, run:

export AWS_ACCESS_KEY_ID=`kubectl get secret <release>-super-user-credentials -n <namespace> -o jsonpath="{.data.access-key-id}" | base64 --decode`
export AWS_SECRET_ACCESS_KEY=`kubectl get secret <release>-super-user-credentials -n <namespace> -o jsonpath="{.data.secret-key}" | base64 --decode`

Make sure to replace <release> and <namespace> with your chosen names - both times.

Done! You’ve successfully deployed UltiHash to a local Kubernetes test environment. Next, continue to integrate some sample data.

4. Integrate sample data

Now that UltiHash is running on your local Kubernetes cluster, let's integrate some sample data.

1

Prepare dataset

If you have a dataset you want to test already, you can skip this step.

Alternatively, you can download one of these datasets from Kaggle:

Remember to unzip your test dataset if you download it from Kaggle.

UltiHash's deduplication can have significantly different results depending on the dataset integrated. For testing, try datasets likely to contain repeated content - like document libraries with shared templates, multimedia collections with common graphics, or code repositories.

2

Create a bucket

Object storage systems like UltiHash use a top-level container called a bucket. To facilitate scalability, buckets don’t have a traditional hierarchical folder structure: instead, each object in a bucket has a unique key (which can resemble a file path, simulating directories).

To create a bucket, run:

aws s3api create-bucket --bucket <bucket-name> --endpoint-url $ClusterURL

Make sure to replace <bucket-name> with your chosen bucket name, e.g. test-bucket.


You can see your newly created bucket by running:

aws s3api list-buckets --endpoint-url $ClusterURL

3

Download scripts

We've prepared some scripts to make the testing process easier.

Download the following scripts for uploading and downloading:

Trouble downloading these scripts? Try right-clicking and selecting 'Save link as...' or similar.

4

Integrate sample data

Now that you have a bucket in which to put objects, let's use the upload script to integrate your sample data.

To integrate your dataset, run:

python3 <upload-script-path> --url $ClusterURL --bucket <bucket-name> <dataset-path>

Make sure to replace <upload-script-path> with the path to the upload script you downloaded, e.g. /home/user/Downloads/uh-upload.py.

Also replace <bucket-name> with your bucket name.

Finally, replace <dataset-path> with the path to the directory for the dataset you prepared or downloaded, e.g. /home/user/Downloads/test-dataset.

A bar should display the ongoing progress of your integration.


Once the integration is complete, you can run the following command to see your objects:

aws s3api list-objects --endpoint-url $ClusterURL --bucket <bucket-name> --output text | cat

Make sure to replace <bucket-name> with your bucket name.


You can also download an entire bucket by running:

python3 <download-script-path> --url $ClusterURL --path <destination-path> <bucket-name>

Make sure to replace <download-script-path> with the path to the upload script you downloaded, e.g. /home/user/Downloads/uh-download.py.

Also replace <destination-path> with the path to the directory you want to download the bucket to, e.g. /home/user/Downloads.

Finally, replace <bucket-name> with the name of the bucket to download.

5

See space savings in your cluster

You can see the storage space UltiHash is saving across the entire cluster by running the uh-see-space-savings script:

python3 <see-space-savings-script-path> --url $ClusterURL

Make sure to replace <see-space-savings-script-path> with the path to the upload script you downloaded, e.g. /home/user/Downloads/uh-see-space-savings.py.

Done! You’ve successfully integrated a dataset to a local test cluster, and can see the space saved by UltiHash's built-in deduplication.

Last updated