DIY NAS Part 3: The Filesystem
This is part 3 of the project, see here for the other parts.
Which filesystem to use was fundamental. It affects power, noise and resilience. I planned to use RAID to get resilience, but typical RAID setups prevent disks from spinning down, which means more noise and power (more than you might think).
I did a bunch of research and found that a combination of SnapRAID and MergerFS was going to be best for my goals. They each solved a single problem and didn't lock my data away in some obscure way.
The ability to be able to pull out a drive and stick it in another machine is invaluable when (not if) stuff goes wrong.
SnapRAID
SnapRAID protects data from disk failure. It does this by storing parity data on a dedicated drive. It uses drives formatted to good old boring EXT4.
I will use a single parity drive, this will protect all the data from a single drive failure. If two drives were to fail simultaneously, I would lose only the files which are stored on the failed drives. This is unlike regular RAID where if too many drives fail, you lose everything.
MergerFS
MergerFS allows creating a single virtual drive which is actually many physical drives. This is a convenient abstraction allowing me to spread data between disks without applications noticing. It does all this while still allowing normal access to each disk, no proprietary nonsense.
Allowing Disks To Sleep AKA Caching
SnapRAID and MergerFS cover resilience and convenience, but don't allow the disks to spin down.
I hoped to be able to use Bcache, it transparently routes requests for frequently accessed data to a fast, silent, disk, but it required using a special format on all the disks, that's a no-go for me. Instead, I can achieve the same thing with just MergerFS and 'tiered caching'.
There are lots of details in the docs about how exactly tiered caching works. The short version is that you use an NVME drive to store 'hot' data, and only wake the HDDs when retrieving 'cold' data. This requires a script to be run periodically to move files on the NVME which have since gone cold or just to free up space.
This is more work than using Bcache, but it also makes a lot more sense in my setup. The NVME drive I will be using for the cache is 4TB (it's what I've got on hand). With Bcache all that space would be dedicated to ephemeral cache. With the MergerFS approach, that space is added to the total storage space. Win, win.
The only downside is that I protect the cache disk with SnapRAID because the parity disk has to be at least as large as the largest disk in the array.
Putting It All Together
Format all the drives to ext4. -m sets no reserved space for root, -L sets the label.
mkfs.ext4 -m 0 -L solaris /dev/nvme0n1 mkfs.ext4 -m 0 -L aquila /dev/usb-aquila mkfs.ext4 -m 0 -L carina /dev/usb-carina mkfs.ext4 -m 0 -L dorado /dev/usb-dorado mkfs.ext4 -m 0 -L fornax /dev/usb-fornax mkfs.ext4 -m 0 -L hydrus /dev/usb-hydrus mkfs.ext4 -m 0 -L tucana /dev/usb-tucana
Next fstab. I considered using the /dev/usb-x or /dev/disk/by-label/ paths, but it felt like the UUID was safest:
/dev/disk/by-uuid/082063a5-faaf-45b4-9545-a727d8ebceaf /media/solaris ext4 defaults,nofail,x-systemd.device-timeout=5 0 2 /dev/disk/by-uuid/492fcaac-a610-4824-b1e4-cbfdbdd7b17c /media/aquila ext4 defaults,nofail,x-systemd.device-timeout=5 0 2 /dev/disk/by-uuid/fb5dad8d-eb6a-43e6-9fe8-45edc8b0efad /media/carina ext4 defaults,nofail,x-systemd.device-timeout=5 0 2 /dev/disk/by-uuid/659b3ef9-ec03-4ecd-a1cc-7cd7ad795bca /media/dorado ext4 defaults,nofail,x-systemd.device-timeout=5 0 2 /dev/disk/by-uuid/2f700da3-b3ca-4170-9663-5682778a4cda /media/fornax ext4 defaults,nofail,x-systemd.device-timeout=5 0 2 /dev/disk/by-uuid/bc17cff4-1b5b-45d0-8b24-1c9e7f00a90e /media/hydrus ext4 defaults,nofail,x-systemd.device-timeout=5 0 2 /dev/disk/by-uuid/761a3128-a3d6-4b34-aa6e-16c600944689 /media/tucana ext4 defaults,nofail,x-systemd.device-timeout=5 0 2
Create mount directories:
sudo mkdir /media/{aquila,carina,dorado,fornax,hydrus,tucana,solaris}
Mount drives
sudo mount -a
Next was setting up SnapRAID. The Ubuntu global repos, of course, only had v12.1, so I compiled the v13 from source:
wget https://github.com/amadvance/snapraid/releases/download/v13.0/snapraid-13.0.tar.gz tar -zxvf snapraid-13.0.tar.gz . ./configure make sudo make install
Setup the SnapRAID config:
parity /media/aquila/snapraid.parity content /var/snapraid/snapraid.content content /media/carina/snapraid.content content /media/dorado/snapraid.content content /media/fornax/snapraid.content content /media/hydrus/snapraid.content content /media/tucana/snapraid.content data d1 /media/carina/ data d2 /media/dorado/ data d3 /media/fornax/ data d4 /media/hydrus/ data d5 /media/tucana/ smartctl d1 -d sat %s smartctl d2 -d sat %s smartctl d3 -d sat %s smartctl d4 -d sat %s smartctl d5 -d sat %s smartctl parity -d sat %s
Install latest MergerFS
wget https://github.com/trapexit/mergerfs/releases/download/2.41.1/mergerfs_2.41.1.ubuntu-jammy_amd64.deb sudo dpkg -i mergerfs_2.41.1.ubuntu-jammy_amd64.deb
Configure fstab to set up two pools. corona is the pool with the NVME 'cache', glacius is the cold storage:
/media/solaris:/media/carina:/media/dorado:/media/fornax:/media/hydrus:/media/tucana /media/corona fuse.mergerfs defaults,config=/etc/mergerfs/corona.conf 0 2 /media/carina:/media/dorado:/media/fornax:/media/hydrus:/media/tucana /media/glacius fuse.mergerfs defaults,config=/etc/mergerfs/glacius.conf 0 2
The corona pool is configured to direct all reads and writes to the first (NVME) drive:
fsname=corona cache.files=auto-full category.create=ff category.search=ff minfreespace=50G moveonenospc=true func.getattr=newest dropcacheonclose=false
The glacius pool spreads data around based on free space:
fsname=glacius cache.files=auto-full category.create=pfrd func.getattr=newest dropcacheonclose=false
df -h now shows
/dev/sdb 916G 28K 916G 1% /media/aquila /dev/sdc 916G 32K 916G 1% /media/dorado /dev/sda 916G 32K 916G 1% /media/carina /dev/sde 916G 32K 916G 1% /media/hydrus /dev/sdd 916G 32K 916G 1% /media/fornax /dev/sdf 916G 32K 916G 1% /media/tucana /dev/nvme0n1p1 3.6T 795G 2.7T 23% /media/solaris corona 8.1T 795G 7.1T 10% /media/corona glacius 4.5T 160K 4.5T 1% /media/glacius
8,100 GB of glorious storage. Now to find stuff to fill it with 😅