Troubleshooting

This section focuses on how to investigate the common issues encountered during or after Helm chart deployments of the UltiHash cluster. It includes practical examples to ensure smooth operation and faster issue recovery from failures.

1. Helm chart install or upgrade failure

Purpose: Resolve failures when installing or upgrading the UltiHash cluster using Helm.

Symptoms:

helm install or helm upgrade hangs or returns an error
Application pods do not start
Helm status is stuck at pending-install or failed

Steps to resolve:

Inspect the Helm release status:

helm status <release_name> -n <namespace>

Check for resource creation errors or pending pods:
```
kubectl get pods -n <namespace>
```
Describe a failing pod to view events and errors:
```
kubectl describe pod <pod_name> -n <namespace>
```

Debug with Helm’s dry run mode:

helm upgrade <release_name> oci://registry.ultihash.io/stable/ultihash-cluster \
  -n <namespace> --dry-run --values values.yaml --debug

After the issue has been found and eliminated, process with install or upgrade further.

Recommendation: Always use --dry-run and --debug to validate changes before applying them in production.

2. Missing or incorrect values in `values.yaml`

Purpose: Identify and correct configuration errors that prevent proper deployment.

Symptoms:

Helm fails with a rendering error
Application fails at runtime due to missing config (e.g., secrets, ports, env vars)

Steps to resolve:

Compare your values file with the chart defaults:

helm show values oci://registry.ultihash.io/stable/ultihash-cluster

Test the rendered templates locally:

helm template <your_release_name> oci://registry.ultihash.io/stable/ultihash-cluster --values <your_values.yaml>

Reapply the corrected configuration:

helm upgrade <release_name> oci://registry.ultihash.io/stable/ultihash-cluster \
  -n <namespace> --values <your_values.yaml>

Recommendation: Use a version-controlled values file and validate changes in a staging environment before rolling out to production.

3. Application pods stuck in `CrashLoopBackOff` or `ImagePullBackOff`

Purpose: Diagnose runtime pod failures due to misconfiguration or image issues.

Symptoms:

Pods keep restarting or cannot pull the container image

Steps to resolve:

Inspect the pod state:
```
kubectl get pods -n <namespace>
```
Check the logs of the failing pod:
```
kubectl logs <pod_name> -n <namespace>
```

Correct the config causing failure, then upgrade:

helm upgrade <release_name> oci://registry.ultihash.io/stable/ultihash-cluster \
  -n <namespace> --values <your_values.yaml>

Recommendation: Ensure that image repositories are accessible and secrets for private registries are correctly configured in the cluster.

Last updated 1 month ago

Was this helpful?

1. Helm chart install or upgrade failure

2. Missing or incorrect values in values.yaml

3. Application pods stuck in CrashLoopBackOff or ImagePullBackOff

2. Missing or incorrect values in `values.yaml`

3. Application pods stuck in `CrashLoopBackOff` or `ImagePullBackOff`