Please disable your adblock and script blockers to view this page

HopsFS: 100x Times Faster than AWS S3 - Logical Clocks


JavaScript
AWS
S3
POSIX
S3 (EMRFS
the DFSIO Benchmark
ACM Middleware 2020).S3
CDC
Azure
Azure Data Lake Storage
ADLS
HDFS API
S3.Â
NoSQL
HA
VM
HopsFS
HopsFS’
Logical Clocks
The Hopsworks Feature Store
NDB Cluster
S3’s
ACM/IFIP Middleware
ICDCS
Spotify
TensorFlow
Kubernetes
The Feature Stores
ML Newsletter


EMR
Spanner
Hops Hive
Hopsworks.ai
Flink
Databricks


SingleSQL
S3.Â


Azure Blob Storage
Hopsworks
HopsFS’

No matching tags


HopsFS
HA
industry’s
1.7X.**As

No matching tags

Positivity     37.00%   
   Negativity   63.00%
The New York Times
SOURCE: https://www.logicalclocks.com/blog/hopsfs-100x-times-faster-than-aws-s3
Write a review: Hacker News
Summary

That’s what we have done with a cloud-native release of HopsFS that is highly available across availability zones, has the same cost as S3, but has 100X the performance of S3 for file move/rename operations, and 3.4X the read throughput of S3 (EMRFS) for the DFSIO Benchmark (peer reviewed at ACM Middleware 2020).S3 has become the de-facto platform for storage in AWS due to its scalability, high availability, and low cost. However, NoSQL is just too hard for developers, and databases are returning to strongly consistent (but now scalable) NewSQL systems, with databases such as Spanner, CockroachDB, SingleSQL, and MySQL Cluster. In this blog, we show that distributed hierarchical file systems are completing a similar journey, going from strongly consistent POSIX-compliant file systems to object stores (with their weaker consistency models, but high availability across data centers), and back to distributed hierarchical file systems that are HA across data centers, without any loss in performance and, crucially, without any increase in cost, as we will use S3 as block storage for our file system.HopsFS is a distributed hierarchical file system that provides a HDFS API (POSIX-like API), but stores its data in a bucket in S3. The Hopsworks Feature Store is built on Hops Hive and customized metadata extensions to HopsFS, ensuring strong consistency between the offline Feature Store, the online Feature Store (NDB Cluster), and data files in HopsFS.We compared the performance of EMRFS instead of S3 with HopsFS, as EMRFS provides stronger guarantees than S3 for consisting listing of files and consistent read-after-updates for objects.

As said here by