After spending about 4 months testing, benchmarking, setting up and breaking down various Ceph clusters, I though I would spend time documenting some of the things I have learned while setting up cephfs, rbd and radosgw along the way.
First let me talk a little bit about the details of the cluster that we will be putting into production over the next several weeks.
Cluster specs:
- 6 x Dell R-720xd;64 GB of RAM; for OSD nodes
- 72 x 4TB SAS drives as OSD’s
- 3 x Dell R-420;32 GB of RAM; for MON/RADOSGW/MDS nodes
- 2 x Force10 S4810 switches
- 4 x 10 GigE LCAP bonded Intel cards
- Ubuntu 12.04 (AMD64)
- Ceph 0.72.1 (emperor)
- 2400 placement groups
- 261TB of usable space
The process I used to set- up and tear down our cluster during testing was quite simple, after installing ‘ceph-deploy’ on the admin node:
- ceph-deploy new mon1 mon2 mon3
- ceph-deploy install mon1 mon2 mon3 osd1 osd2 osd3 osd4 osd5 osd6
- ceph-deploy mon create mon1 mon2 mon3
- ceph-deploy gatherkeys mon1
- ceph-deploy osd create osd1:sdb
- ceph-deploy osd create osd1:sdc
……….
The uninstall process went something like this:
- ceph-deploy disk zap osd1:sdb
………. - ceph-deploy purge mon1 mon2 mon3 osd1 osd2 osd3 osd4 osd5 osd6
- ceph-deploy purgedata mon1 mon2 mon3 osd1 osd2 osd3 osd4 osd5 osd6
Additions to ceph.conf:
Since we wanted to configure an appropriate journal size for our 10GigE network, mount xfs with appropriate options and configure radosgw, we added the following to our ceph.conf (after ‘ceph-deploy new but before ‘ceph-deploy install’:
[global]
osd_journal_size = 10240
osd_mount_options_xfs = “rw,noatime,nodiratime,logbsize=256k,logbufs=8,inode64”
osd_mkfs_options_xfs = “-f -i size=2048”
[client.radosgw.gateway]
host = mon1
keyring = /etc/ceph/keyring.radosgw.gateway
rgw_socket_path = /tmp/radosgw.sock
log_file = /var/log/ceph/radosgw.log
admin_socket = /var/run/ceph/radosgw.asok
rgw_dns_name = yourdomain.com
debug rgw = 20
rgw print continue = true
rgw should log = true
rgw enable usage log = true
Benchmarking:
I used the following commands to benchmark rados, rbd, cephfs, etc
- rados -p rbd bench 20 write –no-cleanup
- rados -p rbd bench 20 seq
- dd bs=1M count=512 if=/dev/zero of=test conv=fdatasync
- dd bs=4M count=512 if=/dev/zero of=test conv=fdatasync
 Ceph blogs worth reading:
http://ceph.com/community/blog/
http://www.sebastien-han.fr/blog/
http://dachary.org/
As your setup consist “4 x 10 GigE LCAP bonded Intel cards “”.Is it in mode 4? if so,are you able to get throughput in order of 20GB or more?