I’ve been playing around with Docker a load recently, and I’ve developed a nice little pattern for sharing sockets that I’m gonna pass on. (By sockets, I specifically mean Unix domain sockets) for the purposes of this post.
A lot of applications can either use a TCP socket, or a Unix domain socket to communicate – for example FastCGI and http://www.mysql.com/ both allow you to use either mode (and in the case of mysql, both).
I have a small home server, and a want to run a bunch of applications that inside containers (for security and management reasons), and these applications need to speak to a mysql server (via a unix domain socket – which just appears to be a file on the filesystem.
I also want to run the mysql server inside a container – so the mechanics of getting a socket shared between them are a little non-trivial.
Lets go through a worked example of how I’ve solved this for the dovecot imap and pop3 server talking to mysql. This is an especially fun example, as dovecot runs as root (within it’s container) so if someone hacked into the server through an exploit in dovecot – they can rm -rf anything I share with that container…
First off, I’ve created some LVM volumes:
mysql vg0 -wi-ao 4.00g
This is the mysql data directory, which will be mounted in the mysql container as /var/lib/mysql
mysql_socket vg0 -wi-ao 4.00m
This is going to get mounted at /socket/mysql inside containers, and will hold the mysql Unix domain socket
vmail vg0 -wi-ao 20.00g
And this is my mail spool for dovecot with all the emails in it.
The immediate problem with this is that if the mysql socket volume is shared between multiple other containers (as dovecot won’t be the only app using this mysql instance), as dovecot runs as ‘root’, then if it gets hacked, the hacker can delete the mysql socket, and any other programs trying to connect to mysql through that socket will be affected.
We can’t allow that – that would defeat the entire point of using containers for application isolation!
The trick is that you only need read access to use a unix domain socket that some other program has created.
Therefore, we can use bind mounts to fix this – by re-binding a readonly (-o ro) copy of the file system, and giving that to dovecot would stop these issues, easy…
So, our partitions get mounted like this:
/dev/mapper/vg0-vmail on /mnt/volumes/vmail type ext3 (rw,noatime) /dev/mapper/vg0-mysql on /mnt/volumes/mysql type ext3 (rw,noatime) /dev/mapper/vg0-mysql_socket on /mnt/volumes/mysql_socket type ext3 (rw,noatime) /mnt/volumes/mysql_socket on /mnt/volumes_ro/mysql_socket type none (ro,bind)
The last line involves a little trick. When I tried this, to my dismay, it didn’t work.
The /etc/fstab entry associated with it is:
/mnt/volumes/mysql_socket /mnt/volumes_ro/mysql_socket none bind,ro 00 00
But as you can see from the Googles, read only bind mounts are a bit tricky – you have to re-mount them a second time to make them read-only.
The trick I use here is twofold – one, I only create them with puppet (mostly using the excelent mounts module (which I only had to patch a little bit), the associated code looks like this:
1 2 3 4 5 6 7 8 9 10 11 |
|
This could be notify rather than just ordering – but I like to be paranoiad and check these are ok every puppet run..
However, if the machine is freshly rebooted, then the read-only file systems won’t be remounted yet, ergo the second piece of cunning is an upstart script to make sure things as kosher after a reboot:
1 2 3 4 5 6 7 8 9 10 11 |
|
And there we go – all done(-ish). I just rsync the vmail spool and mysql data from another machine, and start it all up!
The containers look like this:
1 2 3 4 |
|
And they get run with the upstart script from Garth’s excelent docker puppet module:
1 2 3 |
|
And last but not least, here’s it working:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
Next up, doing the same trickery for postfix, and then I’ll have a working mail infrastructure that’s entirely containerised.