UltiHash documentation
← back to ultihash.io
  • Get started with UltiHash
  • Cheatsheet
  • Help + support
  • About UltiHash
    • Introduction
    • Features
      • Built-in deduplication
      • S3-compatible API
      • Cloud + on-prem with Kubernetes
      • Fast + lightweight deletion
      • Erasure coding for data resiliency
      • Access management
    • Benchmarks
  • Installation
    • Test installation
    • Kubernetes installation
    • AWS installation
    • System requirements
  • Connection
    • API use
    • Integrations
      • Featured: SuperAnnotate
      • Airflow
      • AWS Glue
      • Iceberg
      • Icechunk
      • Kafka
      • Neo4j
      • Presto
      • PySpark
      • PyTorch
      • Trino
      • Vector databases
    • Upload + download scripts
    • Pre-signed URLs
    • Data migration
  • Administration
    • Scaling, updates + secrets
    • Performance optimization
    • User and policy management
    • Advanced configuration
    • Encryption
  • Troubleshooting
  • Changelog
    • Core image
    • Helm chart
Powered by GitBook
On this page
  • Helm chart customization
  • 1. Ingress configuration
  • 2. Resource allocation
  • Kubernetes configuration
  • 1. Node affinity and tolerations
  • Monitoring configuration
  • 1. Telemetry and monitoring

Was this helpful?

  1. Administration

Advanced configuration

This section outlines how to configure the UltiHash cluster after the initial installation. It covers advanced customization options, including Helm chart adjustments, Kubernetes configurations, and integration settings.

Helm chart customization

The Helm chart used during installation is flexible and allows for various configurations to fine-tune the UltiHash setup according to your specific needs. Below are key areas where you might want to make changes:

1. Ingress configuration

  • Purpose: Configure how the UltiHash cluster is accessed externally.

  • Example: Set up Ingress with specific annotations and TLS configuration.

    entrypoint:
      ingress:
        host: <your_domain_name>
        annotations:
          kubernetes.io/ingress.class: nginx
          nginx.ingress.kubernetes.io/proxy-body-size: "0"
        tls:
         - hosts:
            - <your_domain_name>
           secretName: <tls_secret>
    
  • Recommendation: Ensure the ingress controller is configured for your environment (e.g., Nginx) and that TLS is used for secure communication.

2. Resource allocation

  • Purpose: Adjust the resource allocations for service replicas.

  • Example: Customize resource requests and limits for critical services.

    etcd:
      resources:
        limits:
          memory: "2Gi"
          cpu: "500m"
    
    entrypoint:
      resources:
        limits:
          memory: "16Gi"
          cpu: "8"
    
    database:
      primary:
        resources:
          limits:
            memory: "16Gi"
            cpu: "8"
    
    deduplicator:
      resources:
        limits:
          memory: "64Gi"
          cpu: "16"
    
    storage:
      resources:
        limits:
          memory: "32Gi"
          cpu: "16"
    
  • Recommendation: Adjust resources to balance performance with cost.

Kubernetes configuration

Beyond the Helm chart, you might need to adjust Kubernetes-specific settings to optimize the UltiHash deployment:

1. Node affinity and tolerations

  • Purpose: Control where pods are scheduled within your Kubernetes cluster.

  • Example: Use node affinity to keep storage pods on different nodes.

    storage:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: role
                operator: In
                values:
                - storage
            topologyKey: kubernetes.io/hostname
    
  • Recommendation: Use affinity rules to optimize performance and ensure critical services run on appropriate nodes.

Monitoring configuration

UltiHash can be integrated with various monitoring, logging, and analytics tools. Below are key integrations you may want to configure:

1. Telemetry and monitoring

  • Purpose: Export metrics and logs to external systems like Prometheus and Loki.

  • collector:
      config:
        exporters:
          prometheus/mycompany:
            endpoint: "1.2.3.4:1234"
        service:
          pipelines:
            metrics:
              receivers:
               - otlp
               - prometheus/mycompany
    
  • Recommendation: Set up monitoring early to ensure you can track system performance and diagnose issues as they arise.

The metrics exported by the UltiHash cluster are listed below, categorized into multiple groups.

Storage service requests

  • storage_read_fragment_req: number of requests received for reading a fragment

  • storage_read_address_req: number of requests received for reading an address

  • storage_write_req: number of requests received for writing data

  • storage_sync_req: number of requests received to sync data to persistent storage

  • storage_remove_fragment_req: number of requests received to remove a fragment from storage

  • storage_used_req: number of requests received to get the used space

Deduplicator service requests

  • deduplicator_req: number of requests received to deduplicate uploaded data

Entrypoint service requests

Utilization Metrics

  • gdv_l1_cache_hit_counter: Hit count of the L1 cache in the global_data_view

  • gdv_l1_cache_miss_counter: Miss count of the L1 cache in the global_data_view

  • gdv_l2_cache_hit_counter: Hit count of the L2 cache in the global_data_view

  • gdv_l2_cache_miss_counter: Miss count of the L2 cache in the global_data_view

  • deduplicator_set_fragment_counter: The number of fragments pointed in the deduplicator set maintained by the deduplicator service

  • deduplicator_set_fragment_size_counter: The aggregated size of fragments pointed in the deduplicator set maintained by the deduplicator service

  • entrypoint_ingested_data_counter: The total data volume ingested by a entrypoint service

  • entrypoint_egressed_data_counter: The total data volume egressed by a entrypoint service

  • entrypoint_original_data_volume_gauge: The original/raw data volume in the storage cluster, maintained by the entrypoint service

  • active_connections: Number of currently handled connections

  • storage_available_space_gauge: Storage space available to a storage service instance

  • storage_used_space_gauge: Storage space used by a storage service instance

Last updated 6 months ago

Was this helpful?

Example: Configure the OpenTelemetry Collector to export data to Prometheus. Please refer to the .

entrypoint_abort_multipart_req: number of requests received

entrypoint_complete_multipart_req: number of requests received

entrypoint_create_bucket_req: number of requests received

entrypoint_delete_bucket_req: number of requests received

entrypoint_delete_object_req: number of requests received

entrypoint_delete_objects_req: number of requests received

entrypoint_get_bucket_req: number of requests received

entrypoint_get_object_req: number of requests received

entrypoint_head_object_req: number of requests received

entrypoint_init_multipart_req: number of requests received

entrypoint_list_buckets_req: number of requests received

entrypoint_list_multipart_req: number of requests received

entrypoint_list_objects_req: number of requests received

entrypoint_list_objects_v2_req: number of requests received

entrypoint_multipart_req: number of requests received

entrypoint_put_object_req: number of requests received

Open Telemetry documentation
AbortMultipartUpload
CompleteMultipartUpload
CreateBucket
DeleteBucket
DeleteObject
DeleteObjects
GetBucket
GetObject
HeadObject
CreateMultipartUpload
ListBuckets
ListMultipartUploads
ListObjects
ListObjectsV2
UploadPart
PutObject