# Test with Docker

{% embed url="<https://youtu.be/Nb0zM5EKYf8>" %}

This guide will show you how to set up a test environment for UltiHash.

You’ll set up a local environment using Docker Compose for container orchestration.

{% hint style="info" %}
Please note that to test UltiHash, you need to [sign up for a free account](https://www.ultihash.io/signup-login).
{% endhint %}

{% hint style="info" %}
If you want to test UltiHash in a Kubernetes environment, you can do so with [Minikube](/installation/test-with-docker/test-ultihash.md).
{% endhint %}

**The main steps are as follows:**

{% stepper %}
{% step %}
Install prerequisite tools
{% endstep %}

{% step %}
Set up UltiHash with Docker Compose
{% endstep %}

{% step %}
Integrate sample data + see space savings
{% endstep %}
{% endstepper %}

{% hint style="danger" %}
This setup is intended for local testing - not production use.
{% endhint %}

{% hint style="info" %}
For now, UltiHash is only supported on Linux. This guide provides commands to be run in your terminal, and assumes you're running Ubuntu LTS on an AMD64 (x86\_64) architecture. Other distributions and ARM architectures should work fine, although some commands may need slight adjustment.
{% endhint %}

## 1. Install prerequisite tools

Before you start setting up the UltiHash cluster, you need some tools installed. If you already have any of these installed, you can simply skip that step.

{% stepper %}
{% step %}

#### Install Docker Engine

Docker provides a containerized virtual environment for UltiHash to run on.

You can find general instructions for installing Docker Engine at [docs.docker.com/engine/install](https://docs.docker.com/engine/install/).

**To quickly install, run:**

```bash
# Linux installation: Update package index, install prerequisites, and set up Docker’s GPG key
sudo apt-get update
sudo apt-get install ca-certificates curl
sudo install -m 0755 -d /etc/apt/keyrings
sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
sudo chmod a+r /etc/apt/keyrings/docker.asc

# Add the Docker repository to Apt sources and update package index
echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \
  $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
  sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update

# Install Docker Engine, CLI, and related plugins
sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
```

***

After installing Docker, you may need to add your user to the Docker group.

Run:

```bash
sudo usermod -aG docker $USER
```

{% hint style="warning" %}
**Make sure to restart your computer** at this stage to apply the group changes.
{% endhint %}

{% endstep %}

{% step %}

#### Install AWS CLI

The AWS CLI is a unified tool to manage AWS services from the command line.

You can find general instructions for installing the AWS CLI at [docs.aws.amazon.com/cli/latest/userguide/getting-started-install](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html).

**To quickly install, run:**

```bash
# Download and unzip AWS CLI installer
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip awscliv2.zip

# Install AWS CLI
sudo ./aws/install
```

{% endstep %}

{% step %}

### Install `boto3` (and `tdqm`)

The Amazon Web Services (AWS) SDK for Python (often referred to as `boto3`, allows you to interact with AWS services programatically.

**To install, run:**

```bash
sudo apt install python3-boto3
```

`tqdm` is a Python package that provides a progress bar, which will be used in the upload scripts.

**To install, run:**

```bash
sudo apt install python3-tqdm
```

{% endstep %}
{% endstepper %}

{% hint style="success" %}
**Done!**\
\
You've successfully installed all the prerequisites for testing UltiHash.

\
**Next, you'll set up your local cluster using Docker Compose.**
{% endhint %}

## 2. Set up UltiHash with Docker Compose

Now you’ll set up the local UltiHash environment using Docker Compose. This involves authenticating with the UltiHash registry, downloading the necessary configuration file, and running the UltiHash services locally.

{% stepper %}
{% step %}
**Set up authentication with the registry**\
\
Before you can download and run UltiHash, you need to authenticate with the UltiHash registry. The registry is where the container images (required for running UltiHash) are stored.

For this step, you'll need these credentials from your UltiHash [Dashboard](https://ultihash.io/dashboard):

* **Registry login**
* **Registry password**

***

Log in to the UltiHash registry with your credentials:

```bash
docker login registry.ultihash.io -u <registry-login>
```

{% hint style="warning" %}
Make sure to replace `<registry-login>` with the 'Registry login' from your [Dashboard](https://ultihash.io/dashboard).\
\
When prompted for a password, enter the 'Registry password' from your [Dashboard](https://ultihash.io/dashboard).
{% endhint %}

{% endstep %}

{% step %}
**Download `compose.yaml`**

The `compose.yaml` file is a Docker Compose configuration file that defines all the services, volumes, and settings needed to run UltiHash.<br>

**Download:**

{% file src="/files/raVrRDlSB64ifn5dZSvU" %}

{% hint style="info" %}
Trouble downloading? Try right-clicking and selecting 'Save link as...' or similar.
{% endhint %}

{% endstep %}

{% step %}
**Set up credentials and license**

To enable access to UltiHash services, you need to export your credentials and license key. These environment variables will be used for authentication.

**Run the following commands:**

```bash
export AWS_ACCESS_KEY_ID="TEST-USER"
export AWS_SECRET_ACCESS_KEY="SECRET"
export UH_CUSTOMER_ID="<customer-id>"
export UH_ACCESS_TOKEN="<access-token>"
export UH_MONITORING_TOKEN="<monitoring-token>"
```

{% hint style="warning" %}
Make sure to replace `<customer-id>` , `<access-token>` , and `<monitoring-token>` with the 'Customer ID', 'Access token', and 'Monitoring token' from your [Dashboard](https://ultihash.io/dashboard).
{% endhint %}

{% endstep %}

{% step %}
**Start UltiHash services**\
\
**Change the working directory** to the folder where you saved `compose.yaml`. For example:

```bash
cd ~/Downloads
```

\
**Start the UltiHash cluster**:

```bash
docker compose up -d
```

\
If successful, Docker Compose will download the necessary images (if they’re not already cached) and start the UltiHash services.
{% endstep %}
{% endstepper %}

{% hint style="success" %}
**Done!**

You’ve successfully set up your local UltiHash cluster.\
\
**Next, let's integrate sample data + see space savings.**
{% endhint %}

## 3. Integrate sample data + see space savings

Now that UltiHash is running on your local cluster, let's integrate some sample data.

<br>

{% stepper %}
{% step %}

### Prepare dataset

If you have a dataset you want to test already, you can skip this step.

**Alternatively, you can download one of these datasets from Kaggle:**

<table data-view="cards"><thead><tr><th></th><th></th><th data-hidden></th><th data-hidden data-card-target data-type="content-ref"></th></tr></thead><tbody><tr><td><strong>DICOM files of brain MRI scans</strong></td><td>1.51 GB</td><td></td><td><a href="https://www.kaggle.com/datasets/amritpal333/adni4dicomnano10514/">https://www.kaggle.com/datasets/amritpal333/adni4dicomnano10514/</a></td></tr><tr><td><strong>JPGs of driving scenarios</strong></td><td>2.41 GB</td><td></td><td><a href="https://www.kaggle.com/datasets/zaynena/selfdriving-car-simulator">https://www.kaggle.com/datasets/zaynena/selfdriving-car-simulator</a></td></tr><tr><td><strong>PNGs of synthetic textures with defects</strong></td><td>5.89 GB</td><td></td><td><a href="https://www.kaggle.com/datasets/mhskjelvareid/dagm-2007-competition-dataset-optical-inspection">https://www.kaggle.com/datasets/mhskjelvareid/dagm-2007-competition-dataset-optical-inspection</a></td></tr><tr><td><strong>WAVs of human speech for emotion recognition</strong></td><td>0.33 GB</td><td></td><td><a href="https://www.kaggle.com/datasets/barelydedicated/savee-database">https://www.kaggle.com/datasets/barelydedicated/savee-database</a></td></tr><tr><td><strong>TIFF images of climate data</strong></td><td>16.28 GB</td><td></td><td><a href="https://www.kaggle.com/datasets/abireltaief/highresolution-geotiff-images-of-climatic-data">https://www.kaggle.com/datasets/abireltaief/highresolution-geotiff-images-of-climatic-data</a></td></tr><tr><td><strong>CSV tables of symptoms</strong></td><td>140 KB</td><td></td><td><a href="https://www.kaggle.com/datasets/kaushil268/disease-prediction-using-machine-learning">https://www.kaggle.com/datasets/kaushil268/disease-prediction-using-machine-learning</a></td></tr></tbody></table>

{% hint style="warning" %}
Remember to unzip your test dataset if you download it from Kaggle.
{% endhint %}

{% hint style="info" %}
UltiHash's deduplication can have significantly different results depending on the dataset integrated. For testing, try datasets likely to contain repeated content - like document libraries with shared templates, multimedia collections with common graphics, or code repositories.
{% endhint %}

{% endstep %}

{% step %}

### Create a bucket

Object storage systems like UltiHash use a top-level container called a **bucket**. To facilitate scalability, buckets don’t have a traditional hierarchical folder structure: instead, each object in a bucket has a unique key (which can resemble a file path, simulating directories).

**To create a bucket, run:**

```bash
aws s3api create-bucket --bucket <bucket-name> --endpoint-url http://127.0.0.1:8080
```

{% hint style="warning" %}
Make sure to replace `<bucket-name>` with your chosen bucket name, e.g. `test-bucket`.
{% endhint %}

***

**You can see your newly created bucket by running:**

```bash
aws s3api list-buckets --endpoint-url http://127.0.0.1:8080
```

<br>
{% endstep %}

{% step %}

### Download scripts

We've prepared some scripts to make the testing process easier.

Download the following scripts for uploading and downloading:

{% file src="/files/ZLfMUvRRAfxr9OYWZXMQ" %}

{% file src="/files/zcflMM05G31JKqdLbSpR" %}

{% file src="/files/sZqR8ejBcOD6QRb9FngD" %}

{% file src="/files/FXIUapwSGEPRbLvXIKG3" %}

{% hint style="info" %}
Trouble downloading these scripts? Try right-clicking and selecting 'Save link as...' or similar.
{% endhint %}

{% endstep %}

{% step %}

### Integrate sample data

Now that you have a bucket in which to put objects, let's use the upload script to integrate your sample data.

**To integrate your dataset, run:**

```bash
python3 <upload-script-path> --url http://127.0.0.1:8080 --bucket <bucket-name> <dataset-path>
```

{% hint style="warning" %}
Make sure to replace `<upload-script-path>` with the path to the upload script you downloaded, e.g. `/home/user/Downloads/uh-upload.py`.

Also replace `<bucket-name>` with your bucket name.

Finally, replace `<dataset-path>` with the path to the directory for the dataset you prepared or downloaded, e.g. `/home/user/Downloads/test-dataset`.
{% endhint %}

A bar should display the ongoing progress of your integration.

***

Once the integration is complete, you can run the following command to see your objects:

```bash
aws s3api list-objects --endpoint-url http://127.0.0.1:8080 --bucket <bucket-name> --output text | cat
```

{% hint style="warning" %}
Make sure to replace `<bucket-name>` with your bucket name.
{% endhint %}

***

You can also download an entire bucket by running:

```bash
python3 <download-script-path> --url http://127.0.0.1:8080 --path <destination-path> <bucket-name>
```

{% hint style="warning" %}
Make sure to replace `<download-script-path>` with the path to the upload script you downloaded, e.g. `/home/user/Downloads/uh-download.py`.

Also replace `<destination-path>` with the path to the directory you want to download the bucket to, e.g. `/home/user/Downloads`.

Finally, replace `<bucket-name>` with the name of the bucket to download.
{% endhint %}

{% endstep %}

{% step %}

### See space savings in your cluster

You can see the storage space UltiHash is saving across the entire cluster by running the `uh-see-space-savings` script:

```bash
python3 <see-space-savings-script-path> --url http://127.0.0.1:8080
```

{% hint style="warning" %}
Make sure to replace `<see-space-savings-script-path>` with the path to the upload script you downloaded, e.g. `/home/user/Downloads/uh-see-space-savings.py`.
{% endhint %}
{% endstep %}
{% endstepper %}

<br>

{% hint style="success" %}
**Done!**\
\
You’ve successfully integrated a dataset to a local test cluster, and can see the space saved by UltiHash's built-in deduplication.
{% endhint %}


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.ultihash.io/installation/test-with-docker.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
