Benchmarks
UltiHash is high-performance storage built for AI and advanced analytics applications. It offers optimized read throughput and up to 60% space savings across your entire data lake. Designed for AI and advanced analytics, UltiHash efficiently handles large data volumes with a deduplication algorithm that cuts redundancies down to the byte level, regardless of data type or format. This powerful combination of speed and efficiency makes UltiHash ideal for businesses seeking to optimize their data infrastructure without sacrificing performance.
Read Throughput Benchmarks
Read throughput plays a central role in high-performance data workloads such as model training, fine-tuning, and batch inference. In these scenarios, delays in accessing data can lead to underutilized compute resources (e.g GPUs) and increased costs. UltiHash Serverless is optimized to provide high and consistent read performance, particularly for teams working with large volumes of data stored in the EU. The following benchmarks compare UltiHash to AWS S3 (Standard and Express One Zone) across different configurations.
Please note that if you’re using UltiHash Self-Hosted, the read throughput measured may be different depending on the hardware you’re using. For reference, the following benchmarks have been tested with the AWS EC2 instance type c7g.4xlarge
.
UltiHash Serverless (hosted on AWS) vs. S3 Standard (Region eu-central-1
, Frankfurt)
UltiHash Serverless (AWS)
1 GB/s
S3 Standard
200–250 MB/s
UltiHash Serverless delivers 4–5× higher read throughput than S3 Standard, enabling faster data access and reducing idle time for compute resources, such as GPUs waiting for data loading.
To address the need for higher read throughput, AWS introduced a new tier, S3 Express One Zone, designed for high-throughput data access. This tier is particularly targeted at workloads requiring rapid access to large volumes of small files or high request rates. It is currently only available in the us-east-1
(Virginia) region, which limits its applicability for EU-based workloads or users with data residency requirements in Europe.
The table below compares S3 Express One Zone performance under two different conditions:
S3 Express One Zone (Virginia)
Virginia
1.1 GB/s
S3 Express One Zone (Virginia)
Frankfurt
290 MB/s
UltiHash Serverless (Frankfurt)
Frankfurt
1 GB/s
While S3 Express One Zone offers strong read performance when used locally within Virginia, its throughput drops significantly when accessed from Europe. UltiHash Serverless is designed to deliver high read throughput directly from within the EU. It provides fast, local access to object data, making it well-suited for compute-heavy workloads like model training, inference pipelines, and parallel data processing, while ensuring that all data remains compliant with regional residency requirements
What makes UltiHash deduplication special?
UltiHash optimizes data volume out of the box through a built-in deduplication algorithm that eliminates redundancies at a byte level, regardless of data type or format. This results in significant space savings of up to 60% on the entire data volume, depending on various factors including:
compressed vs uncompressed data format: UltiHash generates up to 75% space savings on uncompressed formats (e.g. RAW, TIFF) and up to 51% on compressed formats (e.g. JPG, PNG)
similarity between the objects: the higher the similarity, the more space saved
This section documents the space savings generated by UltiHash on different datasets, giving a fair demonstration of UltiHash’s capabilities. The results can be reproduced on any UltiHash cluster.
Due to it's integrated design, UltiHash deduplication does not expose latency on read operations and is recommended for all read intensive workloads. For write intensive workloads UltiHash deduplication can be disabled for the entire storage cluster.
Our benchmark is frequently updated with new data. You can submit a request with a desired data source to [email protected].
Traditional object storage vs modern object storage
Traditional object storage, like Amazon S3, is designed for low cost storage at scale, perfect for infrequent data access on data archives or backups. It typically lacks the performance needed for AI training and is available only in the cloud. Modern object storage solutions prioritize high performance and efficiency, essential for AI and analytics workloads. UltiHash is Kubernetes-native and S3-compatible, ensuring seamless integration both in the cloud and on-premises. Modern object storage is tailored for efficient scaling of performance intensive workloads such as AI model training and inference, and enables the setup of modern data lakehouse architectures to power advanced analytics and business intelligence.
Last updated
Was this helpful?