Content Deduplication

S4 automatically deduplicates stored data using content-addressable storage (CAS). When two or more objects have identical content, only one copy is stored on disk.

How It Works

When an object is uploaded, S4 computes its SHA-256 hash
The hash is checked against the deduplication index
If the hash already exists, the object data is not written again — only a new metadata reference is created
A reference counter tracks how many objects point to each unique blob
When all references to a blob are deleted, the space becomes reclaimable by the compactor

Storage Savings

Deduplication provides 30-50% space savings on typical workloads. Savings depend on how much duplicate content exists in your data.

Examples of high-dedup workloads: - Backup storage (incremental backups share most data) - Container image registries (layers are often shared) - Log storage (repeated patterns) - File sync services (multiple users uploading the same files)

Dedup Statistics

Check deduplication effectiveness via the stats API:

curl http://localhost:9000/api/stats

Response includes:

{
  "dedup_unique_blobs": 1500,
  "dedup_total_references": 3200,
  "dedup_ratio": 2.13
}

Field	Description
`dedup_unique_blobs`	Number of unique data blobs on disk
`dedup_total_references`	Total number of object references
`dedup_ratio`	Ratio of references to unique blobs (higher = more savings)

A ratio of 2.13 means that on average, each unique blob is referenced by 2.13 objects — roughly 53% storage savings.

Interaction with Other Features

Versioning: Each version is deduplicated independently. If version 1 and version 3 have identical content, only one copy is stored.
Object Lock: Deduplication is transparent to Object Lock. Locked objects are protected regardless of dedup status.
Lifecycle Policies: When lifecycle rules delete objects, dedup reference counts are decremented. The actual data is removed only when no references remain.

Design Details

Hashing algorithm: SHA-256 (cryptographically strong, no collisions in practice)
Dedup granularity: whole-object (not block-level)
All objects are deduplicated regardless of size
Dedup index is stored in a dedicated fjall keyspace alongside other metadata