UltiHash documentation
← back to ultihash.io
  • Get started with UltiHash
  • Cheatsheet
  • Help + support
  • About UltiHash
    • Introduction
    • Features
      • Built-in deduplication
      • S3-compatible API
      • Cloud + on-prem with Kubernetes
      • Fast + lightweight deletion
      • Erasure coding for data resiliency
      • Access management
    • Benchmarks
  • Installation
    • Test installation
    • Kubernetes installation
    • AWS installation
    • System requirements
  • Connection
    • API use
    • Integrations
      • Featured: SuperAnnotate
      • Airflow
      • AWS Glue
      • Iceberg
      • Icechunk
      • Kafka
      • Neo4j
      • Presto
      • PySpark
      • PyTorch
      • Trino
      • Vector databases
    • Upload + download scripts
    • Pre-signed URLs
    • Data migration
  • Administration
    • Scaling, updates + secrets
    • Performance optimization
    • User and policy management
    • Advanced configuration
    • Encryption
  • Troubleshooting
  • Changelog
    • Core image
    • Helm chart
Powered by GitBook
On this page
  • Traditional object storage vs modern object storage
  • What makes UltiHash special?

Was this helpful?

  1. About UltiHash

Benchmarks

UltiHash is scalable storage built for AI and advanced analytics applications. UltiHash is a modern object storage solution, offering high throughput and up to 60% space savings across your entire data lake. Designed for AI and advanced analytics, UltiHash efficiently handles large data volumes with a deduplication algorithm that cuts redundancies down to the byte level, regardless of data type or format. This powerful combination of speed and efficiency makes UltiHash ideal for businesses seeking to optimize their data infrastructure without sacrificing performance.

Traditional object storage vs modern object storage

Traditional object storage, like Amazon S3, is designed for cost-effectiveness, perfect for data archiving and backups. It typically lacks the performance needed for AI training and is available only in the cloud. Modern object storage solutions prioritize high performance and efficiency, essential for AI and analytics workloads. They entail players like UltiHash, Kubernetes-native and S3-compatible solutions, ensuring seamless integration both in the cloud and on-premises. Modern object storage are tailored for fast AI training and come equipped with advanced features like efficient data deletion and enhancing storage management.

Dataset + link
Dataset size
Space savings

1.51 GB

67.13 %

2.41 GB

52.32 %

5.89 GB

50 %

0.33 GB

50 %

16.28 GB

45.84 %

8.26 GB

44.64 %

0.0014 GB

42 %

1.87 GB

33.09 %

1.81 GB

21.59 %

0.68 GB

18.65 %

33.26 GB

18.31 %

5.57 GB

11.95 %

This table shows a throughput benchmark between AWS S3 Standard, MinIO and UltiHash. Benchmarks were run on a UltiHash cluster with 4 storage nodes of the type m5dn.24xlarge, in an AWS Virtual-Private Cloud. The network bandwidth in the cluster is configured at 100 Gb/s. The storage disks are physically attached to the instances (instance store) to provide optimal performance. The test setup is writing and reading to a single m5dn instance. Our performance test was performed on a large-scale dataset of 64MB average object size: UltiHash version 0.5.4 achieves an average throughput of 500.27 MB/s for PUT and 1578.59 MB/s GET operations. To set UltiHash in comparison with other solutions, the benchmark includes performance measurements of PUT and GET operations with the same dataset against S3 Standard and MinIO.

The throughput differentiation between traditional object storage and modern object storage is depicted by this table. The modern object storage provides about 250% faster read operations in comparison to the traditional object storage. The main difference is their use: traditional storage is best for passive (cold) data storage, while modern object storage are tailored for active data management and processing.

What makes UltiHash special?

UltiHash optimizes data volume out of the box through a built-in deduplication algorithm that eliminates redundancies at a byte level, regardless of data type or format. This results in significant space savings of up to 60% on the entire data volume, depending on various factors including:

  • compressed vs uncompressed data format: UltiHash generates up to 75% space savings on uncompressed formats (e.g. RAW, TIFF) and up to 51% on compressed formats (e.g. JPG, PNG)

  • similarity between the objects: the higher the similarity, the more space saved

This section documents the space savings generated by UltiHash on different datasets, giving a fair demonstration of UltiHash’s capabilities. The results can be reproduced on any UltiHash cluster.

Benchmark
PUT
GET

S3 Standard

496.02 MB/s

587.64 MB/s

UltiHash

500.27 MB/s

1578.59 MB/s

MinIO

587.64 MB/s

1587.05 MB/s

Our benchmark is frequently updated with new data. You can submit a request with a desired data source to hello@ultihash.io.

Last updated 4 months ago

Was this helpful?

DICOM files of brain MRI scans
JPGs of driving scenarios
PNGs of synthetic textures with defects
WAVs of human speech for emotion recognition
TIFF images of climate data
TIFFs of fossil segmentations
CSV tables of symptoms
Models of dinosaur teeth
Parquet files with temperature, humidity, wind and land uses
NetCDF climatic and atmospheric data
LIDAR data of driving scenarios
PDFs of Indian supreme court judgements