UltiHash documentation
← back to ultihash.io
  • Get started with UltiHash
  • Cheatsheet
  • Help + support
  • About UltiHash
    • Introduction
    • Features
      • Built-in deduplication
      • S3-compatible API
      • Cloud + on-prem with Kubernetes
      • Fast + lightweight deletion
      • Erasure coding for data resiliency
      • Access management
    • Benchmarks
  • Installation
    • Test installation
    • Kubernetes installation
    • AWS installation
    • System requirements
  • Connection
    • API use
    • Integrations
      • Featured: SuperAnnotate
      • Airflow
      • AWS Glue
      • Iceberg
      • Icechunk
      • Kafka
      • Neo4j
      • Presto
      • PySpark
      • PyTorch
      • Trino
      • Vector databases
    • Upload + download scripts
    • Pre-signed URLs
    • Data migration
  • Administration
    • Scaling, updates + secrets
    • Performance optimization
    • User and policy management
    • Advanced configuration
    • Encryption
  • Troubleshooting
  • Changelog
    • Core image
    • Helm chart
Powered by GitBook
On this page
  • Prerequisites
  • Skills
  • Remote Environment
  • Local Environment
  • Setup S3 Bucket for Terraform States
  • Clone the scripts repository
  • EKS Cluster Setup
  • Install Controllers on EKS
  • UltiHash Installation
  • Uninstall the Environment
  • Enforce Least Privilege Access
  • IAM permissions required to deploy and manage an UH cluster
  • Manage AWS service limits

Was this helpful?

  1. Installation

AWS installation

Last updated 1 hour ago

Was this helpful?

This guide describes the full installation process of UltiHash in AWS environment including:

  • provision of EKS cluster in a dedicated VPC

  • deployment of the essential Kubernetes controllers

  • installation of UltiHash on the EKS cluster

This guide outlines the recommended UltiHash setup for managing 10 TB of data. The setup diagram is shown below. UltiHash cluster is deployed on a single EC2 instance of type r8g.4xlarge with a network load balancer that routes traffic to it. The cluster uses gp3 volumes optimized for performance, ensuring efficient storage management. In case you have other storage requirements, you may freely change the volume sizes in the configuration. You are free to select any EC2 instance type and EBS volume configurations for production purposes based on your specific needs. The diagram below depicts the resources to deploy in an AWS account by the Terraform scripts.

Expected performance:

  • Write throughput: up to 200 MB/s

  • Read throughput: up to 1000 MB/s

Expected costs:

  • Hourly:

    • EC2 cost: 1.14 USD

    • EBS cost: 1.58 USD

  • Monthly:

    • EC2 cost: 829.98 USD

    • EBS cost: 1152.38 USD

List of billable AWS services:

  • mandatory:

    • EKS, EC2, S3, KMS

  • optional:

    • SQS, Eventbridge

Estimated amount of time to complete a deployment: ~45 minutes.

Prerequisites

Skills

  • good knowledge of the following AWS services: IAM, VPC, EKS, EC2

  • high-level knowledge of Terraform

Remote Environment

  • access to an AWS account

    • Warning: do not use AWS account root to provision and manage the deployed resources! Instead create an IAM user that has sufficient privileges to manage these AWS services: IAM, VPC, EKS, EC2.

    • IAM permissions required to deploy and manage UltiHash cluster are listed in IAM permissions required to deploy and manage an UH cluster

    • make sure your AWS account has sufficient limits before deploying UltiHash cluster: Manage AWS service limits

Local Environment

Setup S3 Bucket for Terraform States

Since the Terraform state for this setup has to be stored on S3, need to provision a dedicated S3 bucket. Execute the following command, replacing the <bucket-name> and <aws-region> placeholders:

aws s3api create-bucket --bucket <bucket-name> --create-bucket-configuration LocationConstraint=<aws-region> --region <aws-region> 

The S3 bucket will be created with the default encryption of type SSE-S3 (AWS managed KMS key) enabled.

Clone the scripts repository

Clone the repository by executing the command below:

git clone https://github.com/UltiHash/scripts.git

Later its code will be required to setup UltiHash in AWS environment.

EKS Cluster Setup

  1. Initialize and apply the Terraform project

    cd scripts/terraform/aws/eks-cluster
    terraform init
    terraform apply --var-file config.tfvars

    Wait until the installation is completed.

aws eks update-kubeconfig --name <cluster-name> --region <aws-region>

Execute the following kubectl command to check the available EKS cluster nodes:

kubectl get nodes

The command has to output a name of a single provisioned EC2 instance.

Install Controllers on EKS

  • Nginx Ingress - exposes UltiHash outside of the EKS cluster with a Network Load Balancer.

  • Load Balancer Controller - provisions a Network Load Balancer for the Nginx Ingress controller.

  • Karpenter - provisions EC2 instances on-demand to host UltiHash workloads.

Perform the following actions to deploy the Terraform project:

  1. Initialize and apply the Terraform project

    cd scripts/terraform/aws/eks-cluster-controllers
    terraform init
    terraform apply --var-file config.tfvars

    Wait until the installation is completed. A Network Load balancer should be provisioned in the same region as the EKS cluster.

UltiHash Installation

  1. Initialize and apply the Terraform project

    cd scripts/terraform/aws/ultihash
    terraform init
    terraform apply --var-file config.tfvars

    Wait until the installation is completed.

The UltiHash cluster is installed in the default Kuberentes namespace, you kubectl to see the deployed workloads:

kubectl get all

To get access to the deployed UltiHash cluster, configure your AWS CLI/SDK with the Ultihash root credentials:

# Obtain credentials for the UltiHash root user
aws_access_key_id=`kubectl get secret ultihash-super-user-credentials -o jsonpath="{.data.access-key-id}" | base64 --decode`
aws_secret_access_key=`kubectl get secret ultihash-super-user-credentials -o jsonpath="{.data.secret-key}" | base64 --decode`
      
# Set the credentials for the UltiHash root user
export AWS_ACCESS_KEY_ID=$aws_access_key_id
export AWS_SECRET_ACCESS_KEY=$aws_secret_access_key
aws s3api list-buckets --endpoint-url http://ultihash-test-6a925a272ca1f954.elb.eu-central-1.amazonaws.com/


Uninstall the Environment

To uninstall all previously deployed AWS resources follow the steps below:

Make sure you are in a new Terminal window when uninstalling.

1

Uninstall UltiHash

Run the following commands:

cd scripts/terraform/aws/ultihash
terraform destroy --var-file config.tfvars
kubectl delete pvc --all

2

Uninstall Kubernetes controllers

Run the following commands:

cd scripts/terraform/aws/eks-cluster-controllers
terraform destroy --var-file config.tfvars

3

Uninstall EKS cluster

Run the following commands:

cd scripts/terraform/aws/eks-cluster
terraform destroy --var-file config.tfvars

Enforce Least Privilege Access

Whenever interacting with AWS cloud, we strongly encourage you to follow the principle of least privilege. This means permissions should be limited to the minimum actions and resources required for each role or service to function.

Why this matters:

  • Reduces the attack surface and limits the impact of compromised credentials or components.

  • Prevents unintentional changes or access to unauthorized resources.

  • Aligns with AWS security best practices and the Well-Architected Framework.

  • Enables better auditing, control, and compliance with security standards.

IAM permissions required to deploy and manage an UH cluster

The IAM user or the role that is used to provision and manage UH cluster in an AWS account should have the following IAM permissions. The IAM permissions below are applied for all resources, after successful deployment they could be adjusted to match certain resource ARNs for improved security.

S3 permissions (required to manage Terraform states in S3):

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:GetObject",
                "s3:CreateBucket",
                "s3:ListBucket"
            ],
            "Resource": "*"
        }
    ]
}

EventBridge permissions (required by Karpenter to manage EC2 interruption events):

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "events:TagResource",
                "events:DeleteRule",
                "events:PutTargets",
                "events:DescribeRule",
                "events:PutRule",
                "events:ListTagsForResource",
                "events:RemoveTargets",
                "events:ListTargetsByRule"
            ],
            "Resource": "*"
        }
    ]
}

SQS permissions (required by Karpenter to manage EC2 interruption events):

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "sqs:DeleteQueue",
                "sqs:GetQueueAttributes",
                "sqs:ListQueueTags",
                "sqs:CreateQueue",
                "sqs:SetQueueAttributes"
            ],
            "Resource": "*"
        }
    ]
}

KMS permissions (required by EKS cluster to manage Kubernetes secrets):

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "kms:TagResource",
                "kms:ListAliases",
                "kms:CreateAlias",
                "kms:CreateKey",
                "kms:DeleteAlias"
            ],
            "Resource": "*"
        }
    ]
}

EKS permissions (required to manage EKS cluster):

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "eks:DeleteAccessEntry",
                "eks:ListNodegroups",
                "eks:DescribeAddonConfiguration",
                "eks:UpdateAddon",
                "eks:ListAddons",
                "eks:AssociateAccessPolicy",
                "eks:ListAccessEntries",
                "eks:CreateNodegroup",
                "eks:DescribeAccessEntry",
                "eks:DescribeAddon",
                "eks:DeleteCluster",
                "eks:ListAssociatedAccessPolicies",
                "eks:DescribeNodegroup",
                "eks:DeleteAddon",
                "eks:DeleteNodegroup",
                "eks:DisassociateAccessPolicy",
                "eks:TagResource",
                "eks:CreateAddon",
                "eks:CreateAccessEntry",
                "eks:UpdateNodegroupConfig",
                "eks:DescribeCluster",
                "eks:ListAccessPolicies",
                "eks:DescribeAddonVersions",
                "eks:CreateCluster"
            ],
            "Resource": "*"
        }
    ]
}

IAM permissions (required by EKS cluster and EC2 instances):

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "iam:GetRole",
                "iam:GetPolicyVersion",
                "iam:GetPolicy",
                "iam:DeletePolicy",
                "iam:CreateRole",
                "iam:DeleteRole",
                "iam:AttachRolePolicy",
                "iam:CreateOpenIDConnectProvider",
                "iam:CreatePolicy",
                "iam:ListInstanceProfilesForRole",
                "iam:PassRole",
                "iam:DetachRolePolicy",
                "iam:ListPolicyVersions",
                "iam:ListAttachedRolePolicies",
                "iam:ListRolePolicies",
                "iam:GetOpenIDConnectProvider",
                "iam:DeleteOpenIDConnectProvider",
                "iam:TagOpenIDConnectProvider"
            ],
            "Resource": "*"
        }
    ]
}

EC2 permissions (required to manage EC2 instances):

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "ec2:AuthorizeSecurityGroupIngress",
                "ec2:DeleteSubnet",
                "ec2:AttachInternetGateway",
                "ec2:DeleteRouteTable",
                "ec2:AssociateRouteTable",
                "ec2:DescribeInternetGateways",
                "ec2:CreateRoute",
                "ec2:CreateInternetGateway",
                "ec2:RevokeSecurityGroupEgress",
                "ec2:DeleteInternetGateway",
                "ec2:DescribeNetworkAcls",
                "ec2:DescribeRouteTables",
                "ec2:DescribeLaunchTemplates",
                "ec2:CreateTags",
                "ec2:CreateRouteTable",
                "ec2:RunInstances",
                "ec2:DetachInternetGateway",
                "ssm:GetParameters",
                "ec2:DisassociateRouteTable",
                "ec2:RevokeSecurityGroupIngress",
                "ec2:DescribeSecurityGroupRules",
                "ec2:DeleteNatGateway",
                "ec2:DeleteVpc",
                "ec2:CreateSubnet",
                "ec2:DescribeSubnets",
                "ec2:DeleteNetworkAclEntry",
                "ec2:DisassociateAddress",
                "ec2:DescribeAddresses",
                "ec2:CreateNatGateway",
                "ec2:CreateVpc",
                "ec2:DescribeAddressesAttribute",
                "ec2:DescribeVpcAttribute",
                "ec2:DescribeNetworkInterfaces",
                "ec2:CreateSecurityGroup",
                "ec2:ModifyVpcAttribute",
                "ec2:DeleteLaunchTemplateVersions",
                "ec2:ReleaseAddress",
                "ec2:AuthorizeSecurityGroupEgress",
                "ec2:DeleteLaunchTemplate",
                "ec2:DeleteRoute",
                "ec2:DescribeLaunchTemplateVersions",
                "ec2:DescribeNatGateways",
                "ec2:AllocateAddress",
                "ec2:DescribeSecurityGroups",
                "ec2:CreateLaunchTemplateVersion",
                "ec2:CreateLaunchTemplate",
                "ec2:DescribeVpcs",
                "ec2:DeleteSecurityGroup",
                "ec2:CreateNetworkAclEntry"
            ],
            "Resource": "*"
        }
    ]
}

Manage AWS service limits

When deploying UltiHash cluster on Amazon EKS, it is important to ensure that your AWS account has sufficient EC2 vCPU-based instance limits in the selected region. Amazon EKS worker nodes are backed by EC2 instances, and if vCPU quotas are too low, the cluster may fail to scale or provision nodes, causing deployment failures.

The relevant quota is: Running On-Demand Standard (A, C, D, H, I, M, R, T, Z) instances Default limit: 5 vCPUs per region

If the EKS cluster attempts to launch EC2 instances exceeding your vCPU quota, node provisioning will fail, and workloads may not start or scale properly. In case you need more vCPUs in your region than the quota provides, we recommend increasing quota proactively before scaling out your UltiHash cluster.

and AWS CLI

terraform

kubectl of version 1.30

personal credentials found on the

Since UltiHash has to be deployed on Kubernetes cluster, need to provision EKS cluster on AWS. For this purpose use . The project deploys a dedicated VPC and provisions there an EKS cluster with a single c5.large machine to host the essential Kubernetes controllers.

Note: by default the EKS cluster is provisioned with a public endpoint that is reachable over the Internet. In case the EKS cluster endpoint should be private, change the parameter from true to false.

Once the repository is cloned, perform the following actions to deploy the Terraform project:

Update the bucket name and its region in the with the onces done at .

Update the configuration in . The only required change is the parameter cluster_admins - specify the list of ARNs of IAM users and/or IAM roles that need to have access to the provisioned EKS cluster. Other parameters could be left intact.

Make sure the access to the EKS cluster has been granted to the required IAM users and roles To check that, download the kubeconfig for the EKS cluster, executing the command below. Replace the <cluster-name> (by default ultihash-test) and the <aws-region> (by default eu-central-1) with the corresponding values defined in

The next step is installation of the essential Kubernetes controllers on the provisioned EKS cluster. For this purpose use . The project deploys the following Kuberentes controllers on the EKS cluster:

EBS CSI Driver - CSI controller that automatically provisions persistent volumes the UltiHash workfloads. The volumes are based on gp3 storage class and optimised in terms of performance. The default storage class provisions unencrypted EBS volumes. To provision encrypted EBS volumes, create a new storage class like .

Update the bucket name and its region in the with the onces done at .

Update the configuration in if required. The helm values for the deployed controlers are found . It is not recommended to change any of these configurations, the only parameter that should be selected in advance is the Network Load Balancer type (internal or internet-facing) in this .

In case it is required to change the instance type for the UltiHash services, update it in the following .

The last step is installation of UltiHash. For this purpose use . Perform the following actions to deploy the Terraform project:

Update the bucket name and its region in the with the ones done at .

Update the configuration in with the credentials obtained from your account on . The credentials in the config.tfvars are mocked. The helm values for UltiHash are found . Adjust the helm values to set your custom storage class if required.

Finally access the UltiHash cluster by using AWS CLI/SDK, use the domain name of the Network Load Balancer provisioned at :

More information on this topic could be found under

Check your current Running On-Demand Standard (A, C, D, H, I, M, R, T, Z) instances) quota on and if it is not enough, create a quota increase request by clicking on the button Request increase at account level in the top right corner.

installed
configured
installed
installed
UltiHash dashboard
this Terraform project
cluster_endpoint_public_access
config.tfvars
config.tfvars
this Terraform project
this
config.tfvars
here
file
Karpenter manifest
this Terraform project
config.tfvars
ultihash.io
here
this link.
Service Quotas Console for EC2
scripts
main.tf
the previous step
main.tf
the previous step
main.tf
the previous step
the previous step
Diagram of the deployed resources