DFS

DFS is a distributed file system designed for storing large artifacts across multiple storage nodes. It handles data chunking, replication, erasure coding, and integrity verification. All other Mistrive services build on DFS for persistent storage.

System components

DFD

Storage nodes that persist data chunks to local disk. Each node exposes a gRPC API for chunk operations and reports health status to FoundationDB.

MDS

Metadata Service managing the filesystem namespace. Handles inode allocation, directory structure, and stripe placement through atomic FoundationDB transactions.

Custodian

Maintenance daemon responsible for data durability and storage efficiency. Handles failure detection, data repair, garbage collection, and rebalancing.

Client library

Internal orchestration layer used by Artifact Store and Platform. Handles node selection, data encoding, and integrity verification.

The client library is an internal component. External services interact with DFS through Artifact Store or Platform APIs.

Data model

DFS organizes data using a POSIX-like filesystem model with inodes, directories, and files. Each file consists of one or more stripes, where each stripe contains up to 1 MiB of data distributed across storage nodes.

Write operations

The client determines the appropriate durability policy based on file size
Data splits into 1 MiB stripes
Each stripe writes to multiple DFD nodes according to the durability policy
MDS records stripe metadata and updates the inode atomically

Read operations

The client resolves the file path through MDS to obtain stripe locations
Chunks are fetched from DFD nodes in parallel
For erasure-coded files, missing shards reconstruct from parity data
The client verifies checksums and streams data to the caller

Durability policies

DFS provides two durability strategies that balance storage efficiency against fault tolerance.

Replication

Files of 3 MiB or smaller use replication by default. Each chunk writes to three separate storage nodes. A read succeeds when any single replica returns valid data.

Property	Value
Minimum factor	3
Storage overhead	3×
Fault tolerance	Survives loss of N-1 nodes

Reed-Solomon erasure coding

Files larger than 3 MiB use Reed-Solomon encoding. Data splits into shards with additional parity shards for reconstruction. This approach reduces storage overhead while maintaining fault tolerance.

Property	Value
Default configuration	6 data + 3 parity shards
Storage overhead	~1.5×
Fault tolerance	Survives loss of any 3 nodes

The client uses SIMD-accelerated encoding and decoding for Reed-Solomon operations.

DFD

DFD (D for Disk) nodes form the storage layer. Each node manages a local data directory and serves chunk operations over gRPC.

Storage layout

Chunks organize in a two-level directory structure using the first four hex characters of the chunk UUID:

<data_path>/<hex[0:2]>/<hex[2:4]>/<uuid>.chunk
<data_path>/<hex[0:2]>/<hex[2:4]>/<uuid>.meta

The .chunk file contains the raw data. The .meta file stores the CRC32C checksum computed by the client at write time.

API operations

Operation	Description
`PutChunk`	Writes a chunk with UUID, data (max 1 MiB), and checksum. Verifies checksum before persisting. Idempotent.
`GetChunk`	Retrieves a chunk by UUID. Verifies stored checksum before returning. Returns error on corruption.
`DeleteChunk`	Removes a chunk by UUID. Idempotent—deleting a non-existent chunk succeeds.

Health reporting

Each DFD node writes a health record to FoundationDB every 15 seconds containing:

Current node state (healthy or dead)
Last-seen timestamp in nanoseconds
Network address for client connections

MDS and the client library use these records to route traffic to available nodes.

Configuration

Variable	Description	Required
`ADDRESS`	Bind address for gRPC server	No (default: `0.0.0.0`)
`PORT`	gRPC port	No (default: `9182`)
`FDB_DIRECTORY`	FoundationDB directory path	Yes
`DATA_PATH`	Local filesystem path for chunk storage	Yes
`NODE_ID`	Unique identifier for this node	Yes
`ACCESSIBLE_ADDRESS`	Address clients use to reach this node (e.g., `host:port`)	Yes
`DEBUG`	Enable debug logging	No (default: `false`)

MDS

MDS (Metadata Service) provides the transactional metadata layer for the filesystem. It manages namespace operations through atomic commits against FoundationDB.

Data organization

MDS stores metadata in dedicated FoundationDB subspaces:

Subspace	Contents
`inodes`	Inode records: type, permissions, durability policy, size, timestamps, version
`dirents`	Directory entries mapping filenames to child inode IDs
`stripes`	Stripe metadata: logical offsets, sizes, shard locations
`system`	System counters including `last_inode_id`

API operations

Operation	Description
`Lookup`	Resolves a path to its inode. Traverses directory entries from root.
`ListDirectory`	Returns paginated directory contents for a given inode.
`GetChunks`	Returns stripe metadata for a file. Supports offset and limit filtering.
`Commit`	Applies a batch of mutations atomically. Supports preconditions for optimistic concurrency.
`ListWriteServers`	Returns healthy DFD nodes available for writes.

Commit operations

The Commit API accepts multiple mutations that execute atomically:

Mutation	Description
`CreateInode`	Allocates a new inode with specified type and durability policy
`Link`	Creates a directory entry pointing to an existing inode
`Unlink`	Removes a directory entry
`SetAttributes`	Updates inode metadata (size, permissions, timestamps)
`AddChunk`	Records stripe metadata for a file

MDS validates durability policies during inode creation:

Replication factor must be at least 3
Reed-Solomon requires minimum 5 data shards and 3 parity shards
Directories cannot specify durability policies

Configuration

Variable	Description	Required
`ADDRESS`	Bind address for gRPC server	No (default: `0.0.0.0`)
`PORT`	gRPC port	No (default: `9182`)
`FDB_DIRECTORY`	FoundationDB directory path	Yes
`DEBUG`	Enable debug logging	No (default: `false`)

Custodian

Custodian is the maintenance daemon responsible for cluster health, data durability, and storage efficiency. It runs continuous background processes that detect failures, repair under-replicated data, reclaim unused storage, and balance data distribution across nodes.

Responsibilities

Function	Description
Node failure detection	Monitors heartbeats and marks unresponsive nodes as dead
Data repair	Restores durability by re-replicating or reconstructing under-replicated stripes
Garbage collection	Reclaims storage from deleted files and orphaned chunks
Rebalancing	Redistributes data from full nodes to nodes with available capacity

Node failure detection

Custodian scans DFD health records at a configurable interval (default: 5 seconds). A node is marked dead when its last heartbeat exceeds the timeout threshold (default: 45 seconds). When a node failure is detected:

Custodian updates the node state to Dead in FoundationDB
The node is removed from available read and write indexes
Repair tasks are queued for all stripes that resided on the failed node

Data repair

The repair scanner periodically examines stripe metadata to identify durability violations:

Replication: Stripes with fewer than the required number of healthy replicas
Reed-Solomon: Stripes missing data or parity shards beyond the reconstruction threshold

For each under-replicated stripe, Custodian reads the surviving chunks, reconstructs any missing data using Reed-Solomon decoding if applicable, and writes new replicas to healthy nodes. Failed node repairs receive elevated priority to restore durability quickly.

Garbage collection

DFS uses a two-phase garbage collection process: Metadata GC scans the inode subspace for entries with zero reference count. After a configurable grace period, these inodes and their associated stripe metadata are deleted. Chunk GC identifies orphaned chunks on DFD nodes—chunks that exist on disk but are not referenced by any stripe metadata. The process builds a set of valid chunk IDs from MDS, then each DFD node compares its local inventory against this set and deletes orphans. A grace period protects chunks that are newly written but not yet committed.

Rebalancing

When storage utilization becomes uneven across nodes, Custodian migrates chunks from nodes approaching capacity limits to nodes with available space. This prevents hotspots and ensures write operations can proceed without capacity-related failures.

Task queue

Custodian coordinates all maintenance work through a distributed task queue backed by FoundationDB. The queue provides:

Configurable concurrency (default: 16 parallel tasks)
Automatic retries with exponential backoff (default: 5 attempts)
Visibility timeouts to prevent duplicate processing
Priority levels for failure-triggered repairs

Configuration

Variable	Description	Required
`FDB_DIRECTORY`	FoundationDB directory path	Yes
`ARES_CUSTODIAN_SCAN_INTERVAL_MS`	Interval between health scans	No (default: `5000`)
`ARES_CUSTODIAN_HEARTBEAT_TIMEOUT_MS`	Time before declaring a node dead	No (default: `45000`)
`ARES_CUSTODIAN_REPAIR_QUEUE`	Queue name for repair tasks	No (default: `ares-node-repair`)
`DEBUG`	Enable debug logging	No (default: `false`)

Client library

The client library orchestrates distributed reads and writes across DFS components. It is used internally by Artifact Store and Platform.

Responsibilities

Maintains a cache of healthy DFD nodes from ListWriteServers
Selects target nodes for chunk replication
Computes Reed-Solomon parity shards using SIMD acceleration
Reconstructs missing shards during reads
Verifies CRC32C checksums end-to-end
Manages directory creation and file metadata updates

Durability path options

The client supports inline durability specification in file paths:

/data/file/rs=6,3        # Reed-Solomon: 6 data, 3 parity shards
/data/file/replication=5 # Replication factor 5

Options are parsed and stripped from the final path during file creation.

Deployment

Minimum requirements

A functional DFS deployment requires:

One MDS instance
Three DFD nodes (minimum for replication factor 3)
Access to a FoundationDB cluster
One Custodian instance for health monitoring

Scaling considerations

Storage capacity: Add DFD nodes to increase total storage. Nodes register automatically through health reporting. Metadata throughput: MDS is stateless. Deploy multiple instances behind a load balancer for higher throughput. Fault tolerance: Increase the number of DFD nodes to improve data durability. Reed-Solomon configurations with more parity shards tolerate additional failures. Monitoring: Run Custodian with appropriate scan intervals based on your availability requirements. Shorter intervals detect failures faster but increase FoundationDB load.

Getting started

Components

System components

DFD

MDS

Custodian

Client library

Data model

Write operations

Read operations

Durability policies

Replication

Reed-Solomon erasure coding

DFD

Storage layout

API operations

Health reporting

Configuration

MDS

Data organization

API operations

Commit operations

Configuration

Custodian

Responsibilities

Node failure detection

Data repair

Garbage collection

Rebalancing

Task queue

Configuration

Client library

Responsibilities

Durability path options

Deployment

Minimum requirements

Scaling considerations

Getting started

Components

​System components

DFD

MDS

Custodian

Client library

​Data model

​Write operations

​Read operations

​Durability policies

​Replication

​Reed-Solomon erasure coding

​DFD

​Storage layout

​API operations

​Health reporting

​Configuration

​MDS

​Data organization

​API operations

​Commit operations

​Configuration

​Custodian

​Responsibilities

​Node failure detection

​Data repair

​Garbage collection

​Rebalancing

​Task queue

​Configuration

​Client library

​Responsibilities

​Durability path options

​Deployment

​Minimum requirements

​Scaling considerations

System components

Data model

Write operations

Read operations

Durability policies

Replication

Reed-Solomon erasure coding

DFD

Storage layout

API operations

Health reporting

Configuration

MDS

Data organization

API operations

Commit operations

Configuration

Custodian

Responsibilities

Node failure detection

Data repair

Garbage collection

Rebalancing

Task queue

Configuration

Client library

Responsibilities

Durability path options

Deployment

Minimum requirements

Scaling considerations