Skip to content

STORAGE FACILITY PROJECT GOAL

(In-Development) Provide a general purpose file storage facility with point-in-time recovery and daily metadata differentials.

STORAGE FACILITY PROJECT DETAILS

Designed as a dedicated file store, the storage facility is built largely in part as a systems backup device with (SMB / RSYNC), but additionally allowing for remote mounts/access using SMB (CIFS), SSHFS, SFTP (and possibly NFS - still exploring). A guiding purpose is to deliver a competent and reliable backup store at a lower performance (as well as cost) for data that doesn't need the ultra low latency and much higher cost of NVME SSD's.

Including point-in-time mounts, storage costs measured only for data consumption + delta changes, and point-in-time metadata differentials separate this product apart from common file store options available by cloud providers.

All storage facilities are provisioned as both Active and PiT storage allocations in 4TB increments. A storage allocation at 4TB is comparatively cheaper than the equivalent from cloud providers, beating performance at equivalent tiers. Additionally, each 4TB active allocation provides an additional 4TB PiT storage allocation for historic point-in-time snapshots.

STORAGE NODES

Storage nodes contain large storage pools currently ranging from 90TB-400TB. While all nodes in KYNGIN have some levels of redundancy, (typically raid 10 for performance needs) SG nodes operate in such that lowest latency performance isn't as important as large data allocation needed for backups, network mounts, and dedicated SFTP servers.

By having multiple drive redundancies per pool, we nearly eliminate the possibility of multi-disk failure causing data loss. Large core count CPU's are necessary for incredibly high compression algorithms applied for all data during writes, as well as fast decompression during reads, maximizing storage utilization while reducing latency.

These systems operate with encrypted enterprise HDD drives. SMART testing is more aggressive (weekly) performed on HDD's due to higher failure rates than SSDs, and drives showing any errors are proactively replaced. All Storage nodes operate with multiple levels of caching and data allocation for maximum performance detailed herein:

1) All file metadata is stored on dedicated SSD's for fast file lookups and directory browsing.

2) All hot file data blocks are cached to RAM (filesystem cache).

3) When no longer hot, warm data blocks are expunged from RAM and cached onto high-speed read-optimized SSD's.

4) All stable data blocks are stored on HDD's organized in raid groups fairly distributing reliability, storage, and performance. By keeping many levels of cache, we can leverage large parts of the non-archive filesystem to be stored in memory for fast lookups and requests, leaving plenty of IO available for HDD writes which are much slower and easily congested.