Install Self-Hosted on AWS

How to set up UltiHash Self-Hosted in the cloud with AWS + Kubernetes

For cloud deployments, UltiHash integrates seamlessly with AWS and Elastic Block Storage (EBS). Unlike traditional object storage solutions that charge based on the number of requests, which leads to intransparent and unpredictable costs, UltiHash eliminates these uncertainties. Instead, users can choose from different storage classes based on performance needs, which is especially useful for I/O-intensive and mission-critical workloads.


This guide describes the full installation process of UltiHash in AWS environment, including:

  • provision of EKS cluster in a dedicated VPC

  • deployment of the essential Kubernetes controllers

  • installation of UltiHash on the EKS cluster

This guide outlines the recommended UltiHash setup for managing 10 TB of data. The setup diagram is shown below. UltiHash cluster is deployed on a single EC2 instance of type r8g.4xlarge with a network load balancer that routes traffic to it. The cluster uses gp3 volumes optimized for performance, ensuring efficient storage management. In case you have other storage requirements, you may freely change the volume sizes in the configuration. You are free to select any EC2 instance type and EBS volume configurations for production purposes based on your specific needs. The diagram below depicts the resources to deploy in an AWS account by the Terraform scripts.

By default the deployment is done in a single AZ (see the diagram below). However it could be adjusted; see EKS Cluster Setup

Diagram of the deployed resources

Expected performance:

  • Write throughput: up to 200 MB/s

  • Read throughput: up to 1000 MB/s

Expected costs:

  • Hourly:

    • EC2 cost: 1.14 USD

    • EBS cost: 1.58 USD

    • UltiHash Pay-as-you-go license cost: 0.14 USD

  • Monthly:

    • EC2 cost: 829.98 USD

    • EBS cost: 1152.38 USD

    • UltiHash 1 month subscription license cost: 92.16 USD

UltiHash license is available as pay-as-you-go license with pricing for the number of used GiBs per hour or subscription license with pricing for the number of used GiBs for the subscription duration available in 1 month, 12 month, 24 month and 36 month contract variations.

List of billable AWS services:

  • mandatory: EKS, EC2, S3, KMS

  • optional: SQS, Eventbridge

Estimated amount of time to complete a deployment: ~45 minutes.

System hardware requirements
  • Storage: NVMe SSDs are required for optimal disk performance.

  • Network: 10 Gbps interface minimum between nodes.

  • Kubernetes: Version 1.20+ with Nginx Ingress and a CSI Controller installed.

  • Containerization: Docker 19.03+ or Containerd 1.3+.

  • Helm: Version 3.x.

  • Cloud: for AWS, EC2 instances with Elastic Block Storage (EBS). GCP/Azure support is in development.

Resource needs will vary depending on the amount of data being stored and managed. For best performance, especially with larger datasets, it’s essential to provision additional resources accordingly.


1

Prerequisites

Skills

  • good knowledge of the following AWS services: IAM, VPC, EKS, EC2

  • high-level knowledge of Terraform

Remote Environment

  • access to an AWS account

Local Environment

2

Setup S3 Bucket for Terraform States

Since the Terraform state for this setup has to be stored on S3, need to provision a dedicated S3 bucket. Execute the following command, replacing the <bucket-name> and <aws-region> placeholders:

aws s3api create-bucket --bucket <bucket-name> --create-bucket-configuration LocationConstraint=<aws-region> --region <aws-region> 

The S3 bucket will be created with the default encryption of type SSE-S3 (AWS managed KMS key) enabled.

3

Clone the scripts repository

Clone the repository by executing the command below:

git clone https://github.com/UltiHash/scripts.git

Later its code will be required to setup UltiHash in AWS environment.

4

EKS Cluster Setup

Since UltiHash has to be deployed on Kubernetes cluster, need to provision EKS cluster on AWS. For this purpose use this Terraform project. The project deploys a dedicated VPC and provisions there an EKS cluster with a single c5.large machine to host the essential Kubernetes controllers.

Note: by default the EKS cluster is provisioned with a public endpoint that is reachable over the Internet. In case the EKS cluster endpoint should be private, change the parameter cluster_endpoint_public_access from true to false.

Once the scripts repository is cloned, perform the following actions to deploy the Terraform project:

  1. Update the bucket name and its region in the main.tf with the onces done at the previous step.

  2. Update the configuration in config.tfvars. The only required change is the parameter cluster_admins - specify the list of ARNs of IAM users and/or IAM roles that need to have access to the provisioned EKS cluster. Other parameters could be left intact.

  3. Initialize and apply the Terraform project

    cd scripts/terraform/aws/eks-cluster
    terraform init
    terraform apply --var-file config.tfvars

    Wait until the installation is completed.

Make sure the access to the EKS cluster has been granted to the required IAM users and roles To check that, download the kubeconfig for the EKS cluster, executing the command below. Replace the <cluster-name> (by default ultihash-test) and the <aws-region> (by default eu-central-1) with the corresponding values defined in config.tfvars.

aws eks update-kubeconfig --name <cluster-name> --region <aws-region>

Execute the following kubectl command to check the available EKS cluster nodes:

kubectl get nodes

The command has to output a name of a single provisioned EC2 instance.

5

Install Controllers on EKS

The next step is installation of the essential Kubernetes controllers on the provisioned EKS cluster. For this purpose use this Terraform project. The project deploys the following Kuberentes controllers on the EKS cluster:

  • Nginx Ingress - exposes UltiHash outside of the EKS cluster with a Network Load Balancer.

  • Load Balancer Controller - provisions a Network Load Balancer for the Nginx Ingress controller.

  • Karpenter - provisions EC2 instances on-demand to host UltiHash workloads.

  • EBS CSI Driver - CSI controller that automatically provisions persistent volumes the UltiHash workfloads. The volumes are based on gp3 storage class and optimised in terms of performance. The default storage class provisions unencrypted EBS volumes. To provision encrypted EBS volumes, create a new storage class like this.

Perform the following actions to deploy the Terraform project:

  1. Update the bucket name and its region in the main.tf with the onces done at the previous step.

  2. Update the configuration in config.tfvars if required. The helm values for the deployed controlers are found here. It is not recommended to change any of these configurations, the only parameter that should be selected in advance is the Network Load Balancer type (internal or internet-facing) in this file.

  3. In case it is required to change the instance type for the UltiHash services, update it in the following Karpenter manifest.

  4. Initialize and apply the Terraform project

    cd scripts/terraform/aws/eks-cluster-controllers
    terraform init
    terraform apply --var-file config.tfvars

    Wait until the installation is completed. A Network Load balancer should be provisioned in the same region as the EKS cluster.

6

UltiHash installation

The last step is installation of UltiHash. For this purpose use this Terraform project. Perform the following actions to deploy the Terraform project:

  1. Update the bucket name and its region in the main.tf with the ones done at the previous step.

  2. Update the configuration in config.tfvars with the credentials obtained from your account on ultihash.io. The credentials in the config.tfvars are mocked. The helm values for UltiHash are found here. Adjust the helm values to set your custom storage class if required.

  3. Initialize and apply the Terraform project

    cd scripts/terraform/aws/ultihash
    terraform init
    terraform apply --var-file config.tfvars

    Wait until the installation is completed.

The UltiHash cluster is installed in the default Kuberentes namespace, you kubectl to see the deployed workloads:

kubectl get all

To get access to the deployed UltiHash cluster, configure your AWS CLI/SDK with the Ultihash root credentials:

# Obtain credentials for the UltiHash root user
aws_access_key_id=`kubectl get secret ultihash-super-user-credentials -o jsonpath="{.data.access-key-id}" | base64 --decode`
aws_secret_access_key=`kubectl get secret ultihash-super-user-credentials -o jsonpath="{.data.secret-key}" | base64 --decode`
      
# Set the credentials for the UltiHash root user
export AWS_ACCESS_KEY_ID=$aws_access_key_id
export AWS_SECRET_ACCESS_KEY=$aws_secret_access_key

Finally access the UltiHash cluster by using AWS CLI/SDK, use the domain name of the Network Load Balancer provisioned at the previous step:

aws s3api list-buckets --endpoint-url http://ultihash-test-6a925a272ca1f954.elb.eu-central-1.amazonaws.com/
How to uninstall UltiHash on AWS

To uninstall all previously deployed AWS resources follow the steps below:

Make sure you are in a new Terminal window when uninstalling.

First, uninstall UltiHash by running the following commands:

cd scripts/terraform/aws/ultihash
terraform destroy --var-file config.tfvars
kubectl delete pvc --all

Next, uninstall the Kubernetes controllers:

cd scripts/terraform/aws/eks-cluster-controllers
terraform destroy --var-file config.tfvars

Finally, uninstall the EKS cluster:

cd scripts/terraform/aws/eks-cluster
terraform destroy --var-file config.tfvars

More information

Manage AWS service limits

When deploying UltiHash cluster on Amazon EKS, it is important to ensure that your AWS account has sufficient EC2 vCPU-based instance limits in the selected region. Amazon EKS worker nodes are backed by EC2 instances, and if vCPU quotas are too low, the cluster may fail to scale or provision nodes, causing deployment failures.

The relevant quota is: Running On-Demand Standard (A, C, D, H, I, M, R, T, Z) instances Default limit: 5 vCPUs per region

If the EKS cluster attempts to launch EC2 instances exceeding your vCPU quota, node provisioning will fail, and workloads may not start or scale properly. In case you need more vCPUs in your region than the quota provides, we recommend increasing quota proactively before scaling out your UltiHash cluster.

Check your current Running On-Demand Standard (A, C, D, H, I, M, R, T, Z) instances) quota on Service Quotas Console for EC2 and if it is not enough, create a quota increase request by clicking on the button Request increase at account level in the top right corner.

Enforce Least Privilege Access

Whenever interacting with AWS cloud, we strongly encourage you to follow the principle of least privilege. This means permissions should be limited to the minimum actions and resources required for each role or service to function.

Why this matters:

  • Reduces the attack surface and limits the impact of compromised credentials or components.

  • Prevents unintentional changes or access to unauthorized resources.

  • Aligns with AWS security best practices and the Well-Architected Framework.

  • Enables better auditing, control, and compliance with security standards.

More information on this topic can be found at this AWS link.

IAM permissions required to deploy and manage an UH cluster

The IAM user or the role that is used to provision and manage UH cluster in an AWS account should have the following IAM permissions. The IAM permissions below are applied for all resources, after successful deployment they could be adjusted to match certain resource ARNs for improved security.

S3 permissions (required to manage Terraform states in S3):
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:GetObject",
                "s3:CreateBucket",
                "s3:ListBucket"
            ],
            "Resource": "*"
        }
    ]
}
EventBridge permissions (required by Karpenter to manage EC2 interruption events):
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "events:TagResource",
                "events:DeleteRule",
                "events:PutTargets",
                "events:DescribeRule",
                "events:PutRule",
                "events:ListTagsForResource",
                "events:RemoveTargets",
                "events:ListTargetsByRule"
            ],
            "Resource": "*"
        }
    ]
}
SQS permissions (required by Karpenter to manage EC2 interruption events):
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "sqs:DeleteQueue",
                "sqs:GetQueueAttributes",
                "sqs:ListQueueTags",
                "sqs:CreateQueue",
                "sqs:SetQueueAttributes"
            ],
            "Resource": "*"
        }
    ]
}

KMS permissions (required by EKS cluster to manage Kubernetes secrets):
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "kms:TagResource",
                "kms:ListAliases",
                "kms:CreateAlias",
                "kms:CreateKey",
                "kms:DeleteAlias"
            ],
            "Resource": "*"
        }
    ]
}
EKS permissions (required to manage EKS cluster):
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "eks:DeleteAccessEntry",
                "eks:ListNodegroups",
                "eks:DescribeAddonConfiguration",
                "eks:UpdateAddon",
                "eks:ListAddons",
                "eks:AssociateAccessPolicy",
                "eks:ListAccessEntries",
                "eks:CreateNodegroup",
                "eks:DescribeAccessEntry",
                "eks:DescribeAddon",
                "eks:DeleteCluster",
                "eks:ListAssociatedAccessPolicies",
                "eks:DescribeNodegroup",
                "eks:DeleteAddon",
                "eks:DeleteNodegroup",
                "eks:DisassociateAccessPolicy",
                "eks:TagResource",
                "eks:CreateAddon",
                "eks:CreateAccessEntry",
                "eks:UpdateNodegroupConfig",
                "eks:DescribeCluster",
                "eks:ListAccessPolicies",
                "eks:DescribeAddonVersions",
                "eks:CreateCluster"
            ],
            "Resource": "*"
        }
    ]
}

IAM permissions (required by EKS cluster and EC2 instances):
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "iam:GetRole",
                "iam:GetPolicyVersion",
                "iam:GetPolicy",
                "iam:DeletePolicy",
                "iam:CreateRole",
                "iam:DeleteRole",
                "iam:AttachRolePolicy",
                "iam:CreateOpenIDConnectProvider",
                "iam:CreatePolicy",
                "iam:ListInstanceProfilesForRole",
                "iam:PassRole",
                "iam:DetachRolePolicy",
                "iam:ListPolicyVersions",
                "iam:ListAttachedRolePolicies",
                "iam:ListRolePolicies",
                "iam:GetOpenIDConnectProvider",
                "iam:DeleteOpenIDConnectProvider",
                "iam:TagOpenIDConnectProvider"
            ],
            "Resource": "*"
        }
    ]
}

EC2 permissions (required to manage EC2 instances):
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "ec2:AuthorizeSecurityGroupIngress",
                "ec2:DeleteSubnet",
                "ec2:AttachInternetGateway",
                "ec2:DeleteRouteTable",
                "ec2:AssociateRouteTable",
                "ec2:DescribeInternetGateways",
                "ec2:CreateRoute",
                "ec2:CreateInternetGateway",
                "ec2:RevokeSecurityGroupEgress",
                "ec2:DeleteInternetGateway",
                "ec2:DescribeNetworkAcls",
                "ec2:DescribeRouteTables",
                "ec2:DescribeLaunchTemplates",
                "ec2:CreateTags",
                "ec2:CreateRouteTable",
                "ec2:RunInstances",
                "ec2:DetachInternetGateway",
                "ssm:GetParameters",
                "ec2:DisassociateRouteTable",
                "ec2:RevokeSecurityGroupIngress",
                "ec2:DescribeSecurityGroupRules",
                "ec2:DeleteNatGateway",
                "ec2:DeleteVpc",
                "ec2:CreateSubnet",
                "ec2:DescribeSubnets",
                "ec2:DeleteNetworkAclEntry",
                "ec2:DisassociateAddress",
                "ec2:DescribeAddresses",
                "ec2:CreateNatGateway",
                "ec2:CreateVpc",
                "ec2:DescribeAddressesAttribute",
                "ec2:DescribeVpcAttribute",
                "ec2:DescribeNetworkInterfaces",
                "ec2:CreateSecurityGroup",
                "ec2:ModifyVpcAttribute",
                "ec2:DeleteLaunchTemplateVersions",
                "ec2:ReleaseAddress",
                "ec2:AuthorizeSecurityGroupEgress",
                "ec2:DeleteLaunchTemplate",
                "ec2:DeleteRoute",
                "ec2:DescribeLaunchTemplateVersions",
                "ec2:DescribeNatGateways",
                "ec2:AllocateAddress",
                "ec2:DescribeSecurityGroups",
                "ec2:CreateLaunchTemplateVersion",
                "ec2:CreateLaunchTemplate",
                "ec2:DescribeVpcs",
                "ec2:DeleteSecurityGroup",
                "ec2:CreateNetworkAclEntry"
            ],
            "Resource": "*"
        }
    ]
}


Frequent issues troubleshooting

Helm chart install or upgrade failure

Symptoms:

  • helm install or helm upgrade hangs or returns an error

  • Application pods do not start

  • Helm status is stuck at pending-install or failed

Steps to resolve:

  • Inspect the Helm release status:

    helm status <release_name> -n <namespace>
  • Check for resource creation errors or pending pods:

    kubectl get pods -n <namespace>
  • Describe a failing pod to view events and errors:

    kubectl describe pod <pod_name> -n <namespace>
  • Debug with Helm’s dry run mode:

    helm upgrade <release_name> oci://registry.ultihash.io/stable/ultihash-cluster \
      -n <namespace> --dry-run --values values.yaml --debug
  • After the issue has been found and eliminated, process with install or upgrade further.

Recommendation: Always use --dry-run and --debug to validate changes before applying them in production.

Missing or incorrect values in values.yaml

Symptoms:

  • Helm fails with a rendering error

  • Application fails at runtime due to missing config (e.g., secrets, ports, env vars)

Steps to resolve:

  • Compare your values file with the chart defaults:

    helm show values oci://registry.ultihash.io/stable/ultihash-cluster
  • Test the rendered templates locally:

    helm template <your_release_name> oci://registry.ultihash.io/stable/ultihash-cluster --values <your_values.yaml>
  • Reapply the corrected configuration:

    helm upgrade <release_name> oci://registry.ultihash.io/stable/ultihash-cluster \
      -n <namespace> --values <your_values.yaml>

Recommendation: Use a version-controlled values file and validate changes in a staging environment before rolling out to production.

3. Application pods stuck in CrashLoopBackOff or ImagePullBackOff

Purpose: Diagnose runtime pod failures due to misconfiguration or image issues.

Symptoms:

  • Pods keep restarting or cannot pull the container image

Steps to resolve:

  • Inspect the pod state:

    kubectl get pods -n <namespace>
  • Check the logs of the failing pod:

    kubectl logs <pod_name> -n <namespace>
  • Correct the config causing failure, then upgrade:

    helm upgrade <release_name> oci://registry.ultihash.io/stable/ultihash-cluster \
      -n <namespace> --values <your_values.yaml>

Recommendation: Ensure that image repositories are accessible and secrets for private registries are correctly configured in the cluster.

Last updated

Was this helpful?