This post is also available in Japanese.
I’ve been playing a little bit with ZFS, Oracle’s (previously Sun’s) next-generation file system. Originally developed for Solaris, but since it’s open source also ported to Linux (as of 0.6.1 considered stable for production use) and Mac. While called a file system, ZFS is also a volume manager, so also takes over the job of partitioning your disk as well. Why is ZFS cool? It includes protection against data corruption, built-in support for RAID, snapshots and copy-on-write clones, and flexible and efficient ways of transferring data, e.g. for backups. To show what’s possible and push the limits somewhat, I’ll show how we get implement various features of Git, the version control system (or any version control system, for that matter) using ZFS. Of course, I’m not seriously suggesting you’d ditch a “proper” version control system, but it gives a good sense of what’s possible at the file system level.
Installing ZFS is not hard: on Mac go to the OpenZFS On OS X site and install the package. On Ubuntu Linux:
$ sudo apt-add-repository ppa:zfs-native/stable
$ sudo apt-get update
$ sudo apt-get install ubuntu-zfs
Pools and file systems
Now you’re able to create new ZFS storage pools and file systems. If you have a drive available you can use that, or, if you don’t and just want to play around a little bit, you can create one or more files to represent the disks. For instance, to create a 10G file you can use dd:
$ dd if=/dev/zero of=/tmp/disk1.img bs=1024 count=10485760
If you want to test out a RAID setup, create a second one with a different name than disk1.img. The next step is to create a storage pool, for this we’ll use zpool create If you have one or more disks available you can use their drive label (e.g. /dev/sda or /dev/sdb) or better yet: by id (/dev/disk/by-id/…), in our case we’ll use absolute paths to our regular files.
We can create various types of pools, for instance to create a mirror raid:
$ sudo zpool create mypool mirror /tmp/disk1.img /tmp/disk2.img
This will create a pool named “mypool” that mirrors across the two “devices” and mount it under /mypool (on Linux, or /Volumes/mypool on Mac). To see how much space we have available use zfs list:
$ sudo zfs list
NAME USED AVAIL REFER MOUNTPOINT
mypool 433Ki 9,78Gi 370Ki /Volumes/mypool
Alternatively, we can pool up the space from all devices and treat it as one big drive. If you created mypool already, destroy it first:
$ sudo zpool destroy mypool
Then, to create the non-mirrored pool:
$ sudo zpool create mypool /tmp/disk1.img /tmp/disk2.img
$ sudo zfs list
NAME USED AVAIL REFER MOUNTPOINT
mypool 439Ki 19,6Gi 370Ki /Volumes/mypool
Now we have a total of about 20G available.
There’s much more you can do with storage pools, like adding disks on the fly, replacing them on the fly etc. But let’s stick to this simple setup for now.
While we can now start writing files to the /Volumes/mypool or /mypool mount, this is not the recommended way of using ZFS. Instead, we will create separate file systems in the pool. For each of these file systems we can then set various properties, such as whether to enable encryption, compression or quotas. We can also take snapshots of each file system individually, or share the file systems via Samba or NFS, or transfer file system snapshots to other pools, possibly on other servers.
So… file systems are kind of the shit.
ZFS filesystems are managed using the zfs command line tool (as opposed to zpool used for pools).
$ sudo zfs create mypool/test
This will create and mount a new filesystem under /mypool/test (or /Volumes/mypool/test on Mac). Incidentally, we can mount file systems (and pools) anywhere we like by passing in the -m switch, or, even more fun: by changing the mountpoint on the fly:
$ sudo zfs set mountpoint=/test mypool/test
which remounts the filesystem under /test. To see all properties of the filesystem, use zfs get all:
$ sudo zfs get all mypool/test
NAME PROPERTY VALUE SOURCE
mypool/test type filesystem -
mypool/test creation di aug 20 14:47 2013 -
mypool/test used 442Ki -
mypool/test available 9,78Gi -
mypool/test referenced 442Ki -
mypool/test compressratio 1.00x -
mypool/test mounted yes -
mypool/test quota none default
mypool/test reservation none default
mypool/test recordsize 128Ki default
mypool/test mountpoint /test local
mypool/test checksum on default
mypool/test compression off default
mypool/test atime on default
mypool/test devices on default
mypool/test exec on default
mypool/test setuid on default
mypool/test readonly off default
mypool/test snapdir hidden default
mypool/test canmount on default
mypool/test copies 1 default
mypool/test version 5 -
mypool/test utf8only on -
mypool/test normalization formD -
mypool/test casesensitivity sensitive -
mypool/test refquota none default
mypool/test refreservation none default
mypool/test primarycache all default
mypool/test secondarycache all default
mypool/test usedbysnapshots 0 -
mypool/test usedbydataset 442Ki -
mypool/test usedbychildren 0 -
mypool/test usedbyrefreservation 0 -
mypool/test logbias latency default
mypool/test sync standard default
There’s a bunch of useful stuff here, for instance, let’s enable compression:
$ sudo zfs set compression=on mypool/test
Anything we write to this filesystem from this point onwards will be compressed.
Who needs Git?
Using ZFS as a replacement of Git for is probably not a good idea, but just to give you a sense of what ZFS supports at the file system level, let me go through a few typical git-like operations:
- Creating a repository
- Committing or tagging a version
- Pushing and pulling changes from other storage pools, possibly on other machines
Notably missing is support for merging, which ZFS does not have direct support for as far as I’m aware.
Creating a repository
First, let’s create a filesystem for our projects, with a specific nested filesystem for our project, which we’ll call “zfsgit”. Ues, you can nest filesystems as deep as you like. And then we’ll chown the root of the filesystem to our current user so that we don’t have to sudo for creating, editing and removing files.
$ sudo zfs create mypool/projects
$ sudo zfs create mypool/projects/zfsgit
$ sudo chown $(whoami) /Volumes/mypool/projects/zfsgit
$ cd /Volumes/mypool/projects/zfsgit
Alright, we now have the equivalent of a repository, or checkout thereof.
Let’s create a file and put some content in it:
$ echo "Hello" > file.txt
“Committing” and “Tagging”
In order to create a “commit” or “tag”, i.e. something that is kept in our project’s history and you can revert to, you can use a ZFS snapshot. ZFS snapshots have to be explicitly named. Let’s create our first one “firstcommit”. We do this by adding @ and the snapshot name to our filesystem name.
$ sudo zfs snapshot mypool/projects/zfsgit@firstcommit
Now, let’s change our file slightly:
$ echo "world" >> file.txt
Let’s see what changed:
$ sudo zfs diff mypool/projects/zfsgit@firstcommit
Sadly it won’t really get to see a textual diff, but at least it indicates which file changed. We can now create a new commit:
$ sudo zfs snapshot mypool/projects/zfsgit@secondcommit
To list our current snapshots:
$ sudo zfs list -t snapshot
NAME USED AVAIL REFER MOUNTPOINT
mypool/projects/zfsgit@firstcommit 146Ki - 370Ki -
mypool/projects/zfsgit@secondcommit 0 - 386Ki -
Now, let’s make another change:
$ echo "ladies..." >> file.txt
That was a bad idea, let’s roll back to our previous snapshot:
$ sudo zfs rollback mypool/projects/zfsgit@secondcommit
$ cat file.txt
And now we got our previous version back.
Functionality similar to branching can be achieved using zfs clone, which allows you to clone a filesystem based on a particular snapshot:
$ sudo zfs clone mypool/projects/zfsgit@firstcommit mypool/projects/zfsgit_branch
This creates a new copy-on-write filesystem, mounted under mypool/projects/zfsgit_branch which is a very light-weight operation because no copying is involved, and initially barely any extra diskspace is consumed.
Pushing and pulling repositories
You can send filesystems, even incrementally to other storage pools, both local and remote. To demonstrate, let’s say we created another storage pool called “mypool2″ locally. We can now “push” any snapshot to our the other storage pool as follows (as root):
$ zfs send mypool/projects/zfsgit@firstcommit | zfs receive mypool2/zfsgit
You can imagine, this works just as well via SSH, for instance:
$ zfs send mypool/projects/zfsgit@firstcommit | ssh root@myserver zfs receive mypool/zfsgit
This pushes the entire filesystem as it looked at the time of the snapshot. Alternatively, if we already pushed a previous snapshot before, we can also just push the difference between the previous snapshot and the current one using the -i option:
$ zfs send -i mypool/projects/zfsgit@firstcommit mypool/projects/zfsgit@secondcommit | zfs receive mypool2/zfsgit
This is useful for incrementally backing up large file systems. Of course, this is just using Unix pipes, so we can also write the result of zfs send to a file and upload it to S3, for instance:
$ zfs send mypool/projects/zfsgit@firstcommit > backup.dump
To pull a filesystem, instead of pushing it, you’d do the reverse, over SSH that could look something like this:
ssh root@myserver zfs send mypool/zfsgit@secondcommit | zfs receive mypool/zfsgit
Should you use ZFS?
ZFS is pretty cool and pretty stable, at least on Solaris and Linux. I’m not sure of the stability on Mac at this time. Using ZFS as a root file system on Linux is still slightly problematic at this moment, but those issues will likely be resolved soon. I don’t have extensive experience with its reliability and performance myself, but the Internets has good things to say.
However, ZFS is not the only game in town. There’s also Linux’ Btrfs, which offers many similar features. However, Btrfs is newer and less mature, it may not be as stable yet. Either way, these file systems are a lot of fun to play with. To learn more about ZFS, I’d recommend reading through Oracle’s ZFS Administration Guide, which is pretty readable and much of it applies to Linux and Mac as well.