Ceph braindump part1

After spending about 4 months testing, benchmarking, setting up and breaking down various Ceph clusters, I though I would spend time documenting some of the things I have learned while setting up cephfs, rbd and radosgw along the way.

First let me talk a little bit about the details of the cluster that we will be putting into production over the next several weeks.

Cluster specs:

  • 6 x Dell R-720xd;64 GB of RAM; for OSD nodes
  • 72 x 4TB SAS drives as OSD’s
  • 3 x Dell R-420;32 GB of RAM; for MON/RADOSGW/MDS nodes
  • 2 x Force10 S4810 switches
  • 4 x 10 GigE LCAP bonded Intel cards
  • Ubuntu 12.04 (AMD64)
  • Ceph 0.72.1 (emperor)
  • 2400 placement groups
  • 261TB of usable space

The process I used to set- up and tear down our cluster during testing was quite simple, after installing ‘ceph-deploy’ on the admin node:

  1. ceph-deploy new mon1 mon2 mon3
  2. ceph-deploy install  mon1 mon2 mon3 osd1 osd2 osd3 osd4 osd5 osd6
  3. ceph-deploy mon create mon1 mon2 mon3
  4. ceph-deploy gatherkeys mon1
  5. ceph-deploy osd create osd1:sdb
  6. ceph-deploy osd create osd1:sdc
    ……….

The uninstall process went something like this:

  1. ceph-deploy disk zap osd1:sdb
    ……….
  2. ceph-deploy purge mon1 mon2 mon3 osd1 osd2 osd3 osd4 osd5 osd6
  3. ceph-deploy purgedata mon1 mon2 mon3 osd1 osd2 osd3 osd4 osd5 osd6

Additions to ceph.conf:

Since we wanted to configure an appropriate journal size for our 10GigE network, mount xfs with appropriate options and configure radosgw, we added the following to our ceph.conf (after ‘ceph-deploy new but before ‘ceph-deploy install’:

[global]
osd_journal_size = 10240
osd_mount_options_xfs = “rw,noatime,nodiratime,logbsize=256k,logbufs=8,inode64”
osd_mkfs_options_xfs = “-f -i size=2048”

[client.radosgw.gateway]
host = mon1
keyring = /etc/ceph/keyring.radosgw.gateway
rgw_socket_path = /tmp/radosgw.sock
log_file = /var/log/ceph/radosgw.log
admin_socket = /var/run/ceph/radosgw.asok
rgw_dns_name = yourdomain.com
debug rgw = 20
rgw print continue = true
rgw should log = true
rgw enable usage log = true

Benchmarking:

I used the following commands to benchmark rados, rbd, cephfs, etc

  1. rados -p rbd  bench 20 write –no-cleanup
  2. rados -p rbd  bench 20 seq
  3. dd bs=1M count=512 if=/dev/zero of=test conv=fdatasync
  4. dd bs=4M count=512 if=/dev/zero of=test conv=fdatasync

 Ceph blogs worth reading:

http://ceph.com/community/blog/
http://www.sebastien-han.fr/blog/
http://dachary.org/

1 thought on “Ceph braindump part1

  1. shashank

    As your setup consist “4 x 10 GigE LCAP bonded Intel cards “”.Is it in mode 4? if so,are you able to get throughput in order of 20GB or more?

Leave a Reply

Your email address will not be published. Required fields are marked *