Data Flow
This page describes how data moves through S4 during common operations.
Write (PUT Object)
Client --PUT--> S4 API
|
v
Compute SHA-256 hash
|
v
Check Deduplicator
/ \
(duplicate) (new data)
| |
| v
| Write to volume file
| |
v v
Increment Store IndexRecord
ref count in fjall (atomic batch)
| |
v v
fsync to disk
|
v
Return HTTP 200
Key points:
- SHA-256 is computed on the full object body
- If the hash already exists in the deduplicator, the data is not written again (saving disk space)
- The reference count is incremented for the existing blob
- fsync is called before returning success, guaranteeing durability
Read (GET Object)
Client --GET--> S4 API
|
v
Lookup IndexRecord in fjall
|
v
Read blob from volume file
|
v
Verify CRC32 checksum
|
v
Return data to client
Key points: - Metadata lookup in fjall is fast (LSM-tree with MVCC lock-free reads) - CRC32 is verified on every read to detect bit-rot
Delete (DELETE Object)
Without Versioning
Client --DELETE--> S4 API
|
v
Mark as tombstone
|
v
Remove IndexRecord from fjall
|
v
Decrement dedup ref count
|
v
Return HTTP 204
With Versioning Enabled
Client --DELETE--> S4 API
|
v
Create Delete Marker
(new version entry)
|
v
Return HTTP 204
+ x-amz-delete-marker: true
The actual object data remains in the volume file. Accessing the key returns a 404, but previous versions are still accessible by version ID.
Multipart Upload
Client --CreateMultipartUpload--> S4 API --> Return Upload ID
Client --UploadPart (1..N)------> S4 API --> Store each part in volume
Client --CompleteMultipartUpload-> S4 API --> Combine parts virtually
Store final IndexRecord
Return HTTP 200
Parts are stored as individual entries. On completion, they are logically combined without copying data.