Skip to main content
DFS is a distributed file system designed for storing large artifacts across multiple storage nodes. It handles data chunking, replication, erasure coding, and integrity verification. All other Mistrive services build on DFS for persistent storage.

System components

DFD

Storage nodes that persist data chunks to local disk. Each node exposes a gRPC API for chunk operations and reports health status to FoundationDB.

MDS

Metadata Service managing the filesystem namespace. Handles inode allocation, directory structure, and stripe placement through atomic FoundationDB transactions.

Custodian

Maintenance daemon responsible for data durability and storage efficiency. Handles failure detection, data repair, garbage collection, and rebalancing.

Client library

Internal orchestration layer used by Artifact Store and Platform. Handles node selection, data encoding, and integrity verification.
The client library is an internal component. External services interact with DFS through Artifact Store or Platform APIs.

Data model

DFS organizes data using a POSIX-like filesystem model with inodes, directories, and files. Each file consists of one or more stripes, where each stripe contains up to 1 MiB of data distributed across storage nodes.

Write operations

  1. The client determines the appropriate durability policy based on file size
  2. Data splits into 1 MiB stripes
  3. Each stripe writes to multiple DFD nodes according to the durability policy
  4. MDS records stripe metadata and updates the inode atomically

Read operations

  1. The client resolves the file path through MDS to obtain stripe locations
  2. Chunks are fetched from DFD nodes in parallel
  3. For erasure-coded files, missing shards reconstruct from parity data
  4. The client verifies checksums and streams data to the caller

Durability policies

DFS provides two durability strategies that balance storage efficiency against fault tolerance.

Replication

Files of 3 MiB or smaller use replication by default. Each chunk writes to three separate storage nodes. A read succeeds when any single replica returns valid data.
PropertyValue
Minimum factor3
Storage overhead
Fault toleranceSurvives loss of N-1 nodes

Reed-Solomon erasure coding

Files larger than 3 MiB use Reed-Solomon encoding. Data splits into shards with additional parity shards for reconstruction. This approach reduces storage overhead while maintaining fault tolerance.
PropertyValue
Default configuration6 data + 3 parity shards
Storage overhead~1.5×
Fault toleranceSurvives loss of any 3 nodes
The client uses SIMD-accelerated encoding and decoding for Reed-Solomon operations.

DFD

DFD (D for Disk) nodes form the storage layer. Each node manages a local data directory and serves chunk operations over gRPC.

Storage layout

Chunks organize in a two-level directory structure using the first four hex characters of the chunk UUID:
<data_path>/<hex[0:2]>/<hex[2:4]>/<uuid>.chunk
<data_path>/<hex[0:2]>/<hex[2:4]>/<uuid>.meta
The .chunk file contains the raw data. The .meta file stores the CRC32C checksum computed by the client at write time.

API operations

OperationDescription
PutChunkWrites a chunk with UUID, data (max 1 MiB), and checksum. Verifies checksum before persisting. Idempotent.
GetChunkRetrieves a chunk by UUID. Verifies stored checksum before returning. Returns error on corruption.
DeleteChunkRemoves a chunk by UUID. Idempotent—deleting a non-existent chunk succeeds.

Health reporting

Each DFD node writes a health record to FoundationDB every 15 seconds containing:
  • Current node state (healthy or dead)
  • Last-seen timestamp in nanoseconds
  • Network address for client connections
MDS and the client library use these records to route traffic to available nodes.

Configuration

VariableDescriptionRequired
ADDRESSBind address for gRPC serverNo (default: 0.0.0.0)
PORTgRPC portNo (default: 9182)
FDB_DIRECTORYFoundationDB directory pathYes
DATA_PATHLocal filesystem path for chunk storageYes
NODE_IDUnique identifier for this nodeYes
ACCESSIBLE_ADDRESSAddress clients use to reach this node (e.g., host:port)Yes
DEBUGEnable debug loggingNo (default: false)

MDS

MDS (Metadata Service) provides the transactional metadata layer for the filesystem. It manages namespace operations through atomic commits against FoundationDB.

Data organization

MDS stores metadata in dedicated FoundationDB subspaces:
SubspaceContents
inodesInode records: type, permissions, durability policy, size, timestamps, version
direntsDirectory entries mapping filenames to child inode IDs
stripesStripe metadata: logical offsets, sizes, shard locations
systemSystem counters including last_inode_id

API operations

OperationDescription
LookupResolves a path to its inode. Traverses directory entries from root.
ListDirectoryReturns paginated directory contents for a given inode.
GetChunksReturns stripe metadata for a file. Supports offset and limit filtering.
CommitApplies a batch of mutations atomically. Supports preconditions for optimistic concurrency.
ListWriteServersReturns healthy DFD nodes available for writes.

Commit operations

The Commit API accepts multiple mutations that execute atomically:
MutationDescription
CreateInodeAllocates a new inode with specified type and durability policy
LinkCreates a directory entry pointing to an existing inode
UnlinkRemoves a directory entry
SetAttributesUpdates inode metadata (size, permissions, timestamps)
AddChunkRecords stripe metadata for a file
MDS validates durability policies during inode creation:
  • Replication factor must be at least 3
  • Reed-Solomon requires minimum 5 data shards and 3 parity shards
  • Directories cannot specify durability policies

Configuration

VariableDescriptionRequired
ADDRESSBind address for gRPC serverNo (default: 0.0.0.0)
PORTgRPC portNo (default: 9182)
FDB_DIRECTORYFoundationDB directory pathYes
DEBUGEnable debug loggingNo (default: false)

Custodian

Custodian is the maintenance daemon responsible for cluster health, data durability, and storage efficiency. It runs continuous background processes that detect failures, repair under-replicated data, reclaim unused storage, and balance data distribution across nodes.

Responsibilities

FunctionDescription
Node failure detectionMonitors heartbeats and marks unresponsive nodes as dead
Data repairRestores durability by re-replicating or reconstructing under-replicated stripes
Garbage collectionReclaims storage from deleted files and orphaned chunks
RebalancingRedistributes data from full nodes to nodes with available capacity

Node failure detection

Custodian scans DFD health records at a configurable interval (default: 5 seconds). A node is marked dead when its last heartbeat exceeds the timeout threshold (default: 45 seconds). When a node failure is detected:
  1. Custodian updates the node state to Dead in FoundationDB
  2. The node is removed from available read and write indexes
  3. Repair tasks are queued for all stripes that resided on the failed node

Data repair

The repair scanner periodically examines stripe metadata to identify durability violations:
  • Replication: Stripes with fewer than the required number of healthy replicas
  • Reed-Solomon: Stripes missing data or parity shards beyond the reconstruction threshold
For each under-replicated stripe, Custodian reads the surviving chunks, reconstructs any missing data using Reed-Solomon decoding if applicable, and writes new replicas to healthy nodes. Failed node repairs receive elevated priority to restore durability quickly.

Garbage collection

DFS uses a two-phase garbage collection process: Metadata GC scans the inode subspace for entries with zero reference count. After a configurable grace period, these inodes and their associated stripe metadata are deleted. Chunk GC identifies orphaned chunks on DFD nodes—chunks that exist on disk but are not referenced by any stripe metadata. The process builds a set of valid chunk IDs from MDS, then each DFD node compares its local inventory against this set and deletes orphans. A grace period protects chunks that are newly written but not yet committed.

Rebalancing

When storage utilization becomes uneven across nodes, Custodian migrates chunks from nodes approaching capacity limits to nodes with available space. This prevents hotspots and ensures write operations can proceed without capacity-related failures.

Task queue

Custodian coordinates all maintenance work through a distributed task queue backed by FoundationDB. The queue provides:
  • Configurable concurrency (default: 16 parallel tasks)
  • Automatic retries with exponential backoff (default: 5 attempts)
  • Visibility timeouts to prevent duplicate processing
  • Priority levels for failure-triggered repairs

Configuration

VariableDescriptionRequired
FDB_DIRECTORYFoundationDB directory pathYes
ARES_CUSTODIAN_SCAN_INTERVAL_MSInterval between health scansNo (default: 5000)
ARES_CUSTODIAN_HEARTBEAT_TIMEOUT_MSTime before declaring a node deadNo (default: 45000)
ARES_CUSTODIAN_REPAIR_QUEUEQueue name for repair tasksNo (default: ares-node-repair)
DEBUGEnable debug loggingNo (default: false)

Client library

The client library orchestrates distributed reads and writes across DFS components. It is used internally by Artifact Store and Platform.

Responsibilities

  • Maintains a cache of healthy DFD nodes from ListWriteServers
  • Selects target nodes for chunk replication
  • Computes Reed-Solomon parity shards using SIMD acceleration
  • Reconstructs missing shards during reads
  • Verifies CRC32C checksums end-to-end
  • Manages directory creation and file metadata updates

Durability path options

The client supports inline durability specification in file paths:
/data/file/rs=6,3        # Reed-Solomon: 6 data, 3 parity shards
/data/file/replication=5 # Replication factor 5
Options are parsed and stripped from the final path during file creation.

Deployment

Minimum requirements

A functional DFS deployment requires:
  • One MDS instance
  • Three DFD nodes (minimum for replication factor 3)
  • Access to a FoundationDB cluster
  • One Custodian instance for health monitoring

Scaling considerations

Storage capacity: Add DFD nodes to increase total storage. Nodes register automatically through health reporting. Metadata throughput: MDS is stateless. Deploy multiple instances behind a load balancer for higher throughput. Fault tolerance: Increase the number of DFD nodes to improve data durability. Reed-Solomon configurations with more parity shards tolerate additional failures. Monitoring: Run Custodian with appropriate scan intervals based on your availability requirements. Shorter intervals detect failures faster but increase FoundationDB load.