High-performance read-through cache for object storage

onethumb an hour ago

This looks super interesting for single-AZ systems (which are useful, and have their place).

But I can't find anything to support the use case for highly available (multi-AZ), scalable, production infrastructure. Specifically, a unified and consistent cache across geos (AZs in the AWS case, since this seems to be targeted at S3).

Without it, you're increasing costs somewhere in your organization - cross-AZ networking costs, increased cache sizes in each AZ to be available, increased compute and cache coherency costs across AZs to ensure the caches are always in sync, etc etc.

Any insight from the authors on how they handle these issue on their production systems at scale?

trueismywork 8 minutes ago

Not the author but. Its a user side read through cache, so no need for pre-emptive cache coherence as such. But there will be a performance penalty for fetching data under write contention irrespective of whether you have single az/multiple AZ. The only way to mitigate the performance penalty here is to have accurate predictive fetching which works for usage patterns.
jimbohn 13 minutes ago

Assuming the "Designed for caching immutable blobs", I guess the approach is to indeed increase the cache size in each AZ or eat the cross-AZ networking costs.

_1tan an hour ago

Can someone explain when this would be good solution? We currently store loads of files in S3 and directly ingest them on demand in our Java app API pods. Seems interesting if we could speed up retrievals for sure.