Monthly Archives: August 2014

Ceph cheatsheet

Here is a list of Ceph commands that we tend to use on a regular basis:

a)Display cluster status:
‘ceph -s’

b)Display running cluster status:
‘ceph -w’

c)Display pool usage stats:
‘ceph df’

d)List pools:
‘ceph osd lspools’

e)Display per pool placement group and replication levels:
‘ceph osd dump | grep ‘replicated size’

f)Set pool placement group sizes:
‘ceph osd pool set pool_name pg_num 512’
‘ceph osd pool set pool_name pgp_num 512’

g)Display rbd images in a pool:
‘rbd -p pool_name list’

h)Create rbd snapshot:
‘rbd –pool rbd snap create pool_name/image_name@snap_name’

i)Display rbd snapshots:
‘rbd snap ls pool_name/image_name’

j)Display which images are mapped via kernel:
‘rbd showmapped’

k)Get rados statistics:
‘rados df’

l)List pieces of pool using rados:
‘rados -p pool_name  ls’

Benchmarking Ceph

This is a post that I have had in draft mode for quite some time. At this point some of this information is out of date, so I am planning on writing a ‘part II’ post shortly, which will include some updated information.

Benchmarking Ceph:

Ever since we got our ceph cluster up and running, I’ve been running various benchmarking applications against different cluster configurations. Just to review, the cluster that we recently built has the following specs:

Cluster specs:

  • 3 x Dell R-420;32 GB of RAM; for MON/RADOSGW/MDS nodes
  • 6 x Dell R-720xd;64 GB of RAM; for OSD nodes
  • 72 x 4TB SAS drives as OSD’s
  • 2 x Force10 S4810 switches
  • 4 x 10 GigE LCAP bonded Intel cards
  • Ubuntu 12.04 (AMD64)
  • Ceph 0.72.1 (emperor)
  • 2400 placement groups
  • 261TB of usable space

The main role for this cluster will be one primarily tied to archiving audio and video assets. This being the case, we decided to try and maximize total cluster capacity (4TB drives, no ssd’s, etc), while at the same time being able to achieve and maintain reasonable cluster throughput (10 GigE, 12 drives per osd nodes, etc).

Most of my benchmarking focused on rbd and radosgw, because either of these is most likely to be what we introduce into production when we are ready.  We are very much awaiting a stable and supported cephfs release (which will hopefully be available sometime in mid-late 2014), which will allow us to switch out our rbd + samba setup, for on based on cephfs.

Rados Benchmarks: 

I setup a pool called ‘test’ with 1600 pg’s in order to run some benchmarks using the ‘rados bench’ tool that came with Ceph.  I started with a replication level of ‘1’ and worked my way up to a replication level of ‘3’.

root@hqceph1:/# rados -p test bench 20 write (rep size=1)
Total time run: 20.241290
Total writes made: 5646
Write size: 4194304
Bandwidth (MB/sec): 1115.739
Stddev Bandwidth: 246.027
Max bandwidth (MB/sec): 1136
Min bandwidth (MB/sec): 0
Average Latency: 0.0571572
Stddev Latency: 0.0262513
Max latency: 0.336378
Min latency: 0.02248
root@hqceph1:/# rados -p test bench 20 write (rep size=2)
Total time run: 20.547026
Total writes made: 2910
Write size: 4194304
Bandwidth (MB/sec): 566.505
Stddev Bandwidth: 154.643
Max bandwidth (MB/sec): 764
Min bandwidth (MB/sec): 0
Average Latency: 0.112384
Stddev Latency: 0.198579
Max latency: 2.5105
Min latency: 0.025391
root@hqceph1:/# rados -p test bench 20 write (rep size=3)
Total time run: 20.755272
Total writes made: 2481
Write size: 4194304
Bandwidth (MB/sec): 478.144
Stddev Bandwidth: 147.064
Max bandwidth (MB/sec): 728
Min bandwidth (MB/sec): 0
Average Latency: 0.133827
Stddev Latency: 0.229962
Max latency: 3.32957
Min latency: 0.029481

RBD Benchmarks:

Next I setup a 10GB block device using rbd:

root@ceph1:/blockdev# dd bs=1M count=256 if=/dev/zero of=test1 conv=fdatasync (rep size=1)
256+0 records in
256+0 records out
268435456 bytes (268 MB) copied, 0.440333 s, 610 MB/s
root@ceph1:/blockdev# dd bs=4M count=256 if=/dev/zero of=test1 conv=fdatasync (rep size=1)
256+0 records in
256+0 records out
1073741824 bytes (1.1 GB) copied, 1.07413 s, 1000 MB/s
root@ceph1:/mnt/blockdev# hdparm -Tt /dev/rbd1 (rep size=1)
/dev/rbd1:
Timing cached reads: 16296 MB in 2.00 seconds = 8155.69 MB/sec
Timing buffered disk reads: 246 MB in 3.10 seconds = 79.48 MB/sec
root@ceph1:/mnt/blockdev# dd bs=1M count=256 if=/dev/zero of=test conv=fdatasync (rep size=2)
256+0 records in
256+0 records out
268435456 bytes (268 MB) copied, 1.29985 s, 207 MB/s
root@ceph1:/mnt/blockdev# dd bs=4M count=256 if=/dev/zero of=test2 conv=fdatasync(rep size=2)
256+0 records in
256+0 records out
1073741824 bytes (1.1 GB) copied, 4.02375 s, 267 MB/s
root@cephmount1:/mnt/ceph-block-device/test# hdparm -Tt /dev/rbd1 (rep size=2)
/dev/rbd1:
Timing cached reads: 16434 MB in 2.00 seconds = 8225.55 MB/sec
Timing buffered disk reads: 152 MB in 3.01 seconds = 50.55 MB/sec

Radosgw Benchmarks:

Using s3cmd (s3tools) I was able to achieve about 70MB/s when pushing files to ceph via the s3 restful API.