3 Mar, 2015  |  Written by  |  under KVM, Linux, Proxmox, Zfs

Proxmox 3.4 was recently released with additional integrated support for ZFS.  More details are provided here in the Proxmox ZFS wiki section. I also decided to start gathering up another more current round of links related to performance, best practices, benchmarking, etc.

If you are looking for up to date links to help you understand some of the more advanced aspects and use cases surrounding zfs, or if you are just getting started and are looking for some relevant reading material on the subject, you should find these links extremely useful:

1.The state of ZFS on linux
2.Arch linux ZFS wiki page
3.Gentoo linux ZFS wiki page
4.ZFS Raidz Performance, Capacity and Integrity
5.ZFS administration
6.KVM benchmarking using various filesystems
7.How to improve ZFS performance

10 Feb, 2015  |  Written by  |  under Debian

We have run into an issue on several of of our Debian servers after upgradeing to Wheezy. The issue is one that tends to go unnoticed for a while, until you are looking through the files in ‘/var/log’ and notice that none of the files have updated entries since the date of the last upgrade.

The solution to the issue is to install the following package:

‘apt-get install inetutils-syslogd’

After installing this missing package, syslogd should once again be running, and you should start to see new entries show up in your messages, syslog, etc files.

6 Feb, 2015  |  Written by  |  under Dell, Force10

Here are a few of the more helpful commands that I have been using recently to troubleshoot some performance issues we have been having with our our stacked S4820’s.

1)Show interfaces statistics:
‘show interface tengigabitethernet 0/40′

2)Monitor interface statistics in real-time:
‘monitor interface tengigabitethernet 0/40′

3)Show port channel statistics:
‘show interface port-channel7′

4)Show overview of port channel groupings:
‘show interface port-channel brief’

5)Monitor port channel statistics in real-time:
‘monitor interface port-channel 12′

6)Display vlan interfaces statistics:
‘show interface vlan 20′

7)Gather data for tech support:
‘show tech-support’

8)Display serial numbers, service tags, etc:
‘show inventory’

6 Feb, 2015  |  Written by  |  under Ceph

I recently wanted to cleanup a few of the pools I have been using for rados benchmarking. I did not want to do delete the pools, just the objects inside the pool.

If you are trying to clear up ‘rados bench’ objects you can use something like this:

‘rados -p temp_pool cleanup --prefix benchmark’

If you are trying to remove all the objects from a pool that does not prefix the object names with:

‘for i in `rados -p temp_pool ls`; do echo $i; rados -p temp_pool rm $i; done’

23 Aug, 2014  |  Written by  |  under Ceph

Here is a list of Ceph commands that we tend to use on a regular basis:

a)Display cluster status:
‘ceph -s’

b)Display running cluster status:
‘ceph -w’

c)Display pool usage stats:
‘ceph df’

d)List pools:
‘ceph osd lspools’

e)Display per pool placement group and replication levels:
‘ceph osd dump | grep ‘replicated size’

f)Set pool placement group sizes:
‘ceph osd pool set pool_name pg_num 512′
‘ceph osd pool set pool_name pgp_num 512′

g)Display rbd images in a pool:
‘rbd -p pool_name list’

h)Create rbd snapshot:
‘rbd --pool rbd snap create pool_name/image_name@snap_name’

i)Display rbd snapshots:
‘rbd snap ls pool_name/image_name’

j)Display which images are mapped via kernel:
‘rbd showmapped’

k)Get rados statistics:
‘rados df’

l)List pieces of pool using rados:
‘rados -p pool_name  ls’

23 Aug, 2014  |  Written by  |  under Ceph, Linux, Ubuntu, Xfs

This is a post that I have had in draft mode for quite some time. At this point some of this information is out of date, so I am planning on writing a ‘part II’ post shortly, which will include some updated information.

Benchmarking Ceph:

Ever since we got our ceph cluster up and running, I’ve been running various benchmarking applications against different cluster configurations. Just to review, the cluster that we recently built has the following specs:

Cluster specs:

  • 3 x Dell R-420;32 GB of RAM; for MON/RADOSGW/MDS nodes
  • 6 x Dell R-720xd;64 GB of RAM; for OSD nodes
  • 72 x 4TB SAS drives as OSD’s
  • 2 x Force10 S4810 switches
  • 4 x 10 GigE LCAP bonded Intel cards
  • Ubuntu 12.04 (AMD64)
  • Ceph 0.72.1 (emperor)
  • 2400 placement groups
  • 261TB of usable space

The main role for this cluster will be one primarily tied to archiving audio and video assets. This being the case, we decided to try and maximize total cluster capacity (4TB drives, no ssd’s, etc), while at the same time being able to achieve and maintain reasonable cluster throughput (10 GigE, 12 drives per osd nodes, etc).

Most of my benchmarking focused on rbd and radosgw, because either of these is most likely to be what we introduce into production when we are ready.  We are very much awaiting a stable and supported cephfs release (which will hopefully be available sometime in mid-late 2014), which will allow us to switch out our rbd + samba setup, for on based on cephfs.

Rados Benchmarks: 

I setup a pool called ‘test’ with 1600 pg’s in order to run some benchmarks using the ‘rados bench’ tool that came with Ceph.  I started with a replication level of ‘1’ and worked my way up to a replication level of ‘3’.

root@hqceph1:/# rados -p test bench 20 write (rep size=1)
Total time run: 20.241290
Total writes made: 5646
Write size: 4194304
Bandwidth (MB/sec): 1115.739
Stddev Bandwidth: 246.027
Max bandwidth (MB/sec): 1136
Min bandwidth (MB/sec): 0
Average Latency: 0.0571572
Stddev Latency: 0.0262513
Max latency: 0.336378
Min latency: 0.02248
root@hqceph1:/# rados -p test bench 20 write (rep size=2)
Total time run: 20.547026
Total writes made: 2910
Write size: 4194304
Bandwidth (MB/sec): 566.505
Stddev Bandwidth: 154.643
Max bandwidth (MB/sec): 764
Min bandwidth (MB/sec): 0
Average Latency: 0.112384
Stddev Latency: 0.198579
Max latency: 2.5105
Min latency: 0.025391
root@hqceph1:/# rados -p test bench 20 write (rep size=3)
Total time run: 20.755272
Total writes made: 2481
Write size: 4194304
Bandwidth (MB/sec): 478.144
Stddev Bandwidth: 147.064
Max bandwidth (MB/sec): 728
Min bandwidth (MB/sec): 0
Average Latency: 0.133827
Stddev Latency: 0.229962
Max latency: 3.32957
Min latency: 0.029481

RBD Benchmarks:

Next I setup a 10GB block device using rbd:

root@ceph1:/blockdev# dd bs=1M count=256 if=/dev/zero of=test1 conv=fdatasync (rep size=1)
256+0 records in
256+0 records out
268435456 bytes (268 MB) copied, 0.440333 s, 610 MB/s
root@ceph1:/blockdev# dd bs=4M count=256 if=/dev/zero of=test1 conv=fdatasync (rep size=1)
256+0 records in
256+0 records out
1073741824 bytes (1.1 GB) copied, 1.07413 s, 1000 MB/s
root@ceph1:/mnt/blockdev# hdparm -Tt /dev/rbd1 (rep size=1)
Timing cached reads: 16296 MB in 2.00 seconds = 8155.69 MB/sec
Timing buffered disk reads: 246 MB in 3.10 seconds = 79.48 MB/sec
root@ceph1:/mnt/blockdev# dd bs=1M count=256 if=/dev/zero of=test conv=fdatasync (rep size=2)
256+0 records in
256+0 records out
268435456 bytes (268 MB) copied, 1.29985 s, 207 MB/s
root@ceph1:/mnt/blockdev# dd bs=4M count=256 if=/dev/zero of=test2 conv=fdatasync(rep size=2)
256+0 records in
256+0 records out
1073741824 bytes (1.1 GB) copied, 4.02375 s, 267 MB/s
root@cephmount1:/mnt/ceph-block-device/test# hdparm -Tt /dev/rbd1 (rep size=2)
Timing cached reads: 16434 MB in 2.00 seconds = 8225.55 MB/sec
Timing buffered disk reads: 152 MB in 3.01 seconds = 50.55 MB/sec

Radosgw Benchmarks:

Using s3cmd (s3tools) I was able to achieve about 70MB/s when pushing files to ceph via the s3 restful API.


13 Nov, 2013  |  Written by  |  under Ceph, Linux, Ubuntu, Xfs

After spending about 4 months testing, benchmarking, setting up and breaking down various Ceph clusters, I though I would spend time documenting some of the things I have learned while setting up cephfs, rbd and radosgw along the way.

First let me talk a little bit about the details of the cluster that we will be putting into production over the next several weeks.

Cluster specs:

  • 6 x Dell R-720xd;64 GB of RAM; for OSD nodes
  • 72 x 4TB SAS drives as OSD’s
  • 3 x Dell R-420;32 GB of RAM; for MON/RADOSGW/MDS nodes
  • 2 x Force10 S4810 switches
  • 4 x 10 GigE LCAP bonded Intel cards
  • Ubuntu 12.04 (AMD64)
  • Ceph 0.72.1 (emperor)
  • 2400 placement groups
  • 261TB of usable space

The process I used to set- up and tear down our cluster during testing was quite simple, after installing ‘ceph-deploy’ on the admin node:

  1. ceph-deploy new mon1 mon2 mon3
  2. ceph-deploy install  mon1 mon2 mon3 osd1 osd2 osd3 osd4 osd5 osd6
  3. ceph-deploy mon create mon1 mon2 mon3
  4. ceph-deploy gatherkeys mon1
  5. ceph-deploy osd create osd1:sdb
  6. ceph-deploy osd create osd1:sdc

The uninstall process went something like this:

  1. ceph-deploy disk zap osd1:sdb
  2. ceph-deploy purge mon1 mon2 mon3 osd1 osd2 osd3 osd4 osd5 osd6
  3. ceph-deploy purgedata mon1 mon2 mon3 osd1 osd2 osd3 osd4 osd5 osd6

Additions to ceph.conf:

Since we wanted to configure an appropriate journal size for our 10GigE network, mount xfs with appropriate options and configure radosgw, we added the following to our ceph.conf (after ‘ceph-deploy new but before ‘ceph-deploy install':

osd_journal_size = 10240
osd_mount_options_xfs = “rw,noatime,nodiratime,logbsize=256k,logbufs=8,inode64″
osd_mkfs_options_xfs = “-f -i size=2048″

host = mon1
keyring = /etc/ceph/keyring.radosgw.gateway
rgw_socket_path = /tmp/radosgw.sock
log_file = /var/log/ceph/radosgw.log
admin_socket = /var/run/ceph/radosgw.asok
rgw_dns_name = yourdomain.com
debug rgw = 20
rgw print continue = true
rgw should log = true
rgw enable usage log = true


I used the following commands to benchmark rados, rbd, cephfs, etc

  1. rados -p rbd  bench 20 write --no-cleanup
  2. rados -p rbd  bench 20 seq
  3. dd bs=1M count=512 if=/dev/zero of=test conv=fdatasync
  4. dd bs=4M count=512 if=/dev/zero of=test conv=fdatasync

 Ceph blogs worth reading:


4 Nov, 2013  |  Written by  |  under Linux

The information below is based heavily off of a post that can be found here:


I am providing the information on my blog in the event that the original blog post becomes unavailable at some point in the future, as we use this information quite regularly.

1-Gather Info: 

Controller information

megacli -AdpAllInfo -aALL
megacli -CfgDsply -aALL
megacli -adpeventlog -getevents -f controller-events.log -a0 -nolog

Enclosure information

megacli -EncInfo -aALL

Virtual drive information

megacli -LDInfo -Lall -aALL

Physical drive information

megacli -PDList -aALL
megacli -PDInfo -PhysDrv [E:S] -aALL

Battery backup information

megacli -AdpBbuCmd -aALL

Check Battery backup warning on boot

megacli -AdpGetProp BatWarnDsbl -a0

Controller management:

Silence active alarm

megacli -AdpSetProp AlarmSilence -aALL

Disable alarm

megacli -AdpSetProp AlarmDsbl -aALL

Enable alarm

megacli -AdpSetProp AlarmEnbl -aALL

Disable battery backup warning on system boot

megacli -AdpSetProp BatWarnDsbl -a0

Change the adapter rebuild rate to 60%:

megacli -AdpSetProp {RebuildRate -60} -aALL

2-Virtual drive management:

Create RAID 0, 1, 5 drive

megacli -CfgLdAdd -r(0|1|5) [E:S, E:S, …] -aN

Create RAID 10 drive

megacli -CfgSpanAdd -r10 -Array0[E:S,E:S] -Array1[E:S,E:S] -aN

Remove drive

megacli -CfgLdDel -Lx -aN

Physical drive management

Set state to offline

megacli -PDOffline -PhysDrv [E:S] -aN

Set state to online

megacli -PDOnline -PhysDrv [E:S] -aN

Mark as missing

megacli -PDMarkMissing -PhysDrv [E:S] -aN

Prepare for removal

megacli -PdPrpRmv -PhysDrv [E:S] -aN

Replace missing drive

megacli -PdReplaceMissing -PhysDrv [E:S] -ArrayN -rowN -aN

The number N of the array parameter is the Span Reference you get using megacli -CfgDsply -aALL and the number N of the row parameter is the Physical Disk in that span or array starting with zero (it’s not the physical disk’s slot!).

Rebuild drive -- Drive status should be “Firmware state: Rebuild”

megacli -PDRbld -Start -PhysDrv [E:S] -aN
megacli -PDRbld -Stop -PhysDrv [E:S] -aN
megacli -PDRbld -ShowProg -PhysDrv [E:S] -aN
megacli -PDRbld -ProgDsply -physdrv [E:S] -aN

Clear drive

megacli -PDClear -Start -PhysDrv [E:S] -aN
megacli -PDClear -Stop -PhysDrv [E:S] -aN
megacli -PDClear -ShowProg -PhysDrv [E:S] -aN

Bad to good

megacli -PDMakeGood -PhysDrv[E:S] -aN

Changes drive in state Unconfigured-Bad to Unconfigured-Good.

Hot spare management

Set global hot spare

megacli -PDHSP -Set -PhysDrv [E:S] -aN

Remove hot spare

megacli -PDHSP -Rmv -PhysDrv [E:S] -aN

Set dedicated hot spare

megacli -PDHSP -Set -Dedicated -ArrayN,M,… -PhysDrv [E:S] -aN

Walkthrough: Rebuild a Drive that is marked ‘Foreign’ when Inserted:

Bad to good

megacli -PDMakeGood -PhysDrv [E:S] -aALL

Clear the foreign setting

megacli -CfgForeign -Clear -aALL

Set global hot spare

megacli -PDHSP -Set -PhysDrv [E:S] -aN

Walkthrough: Change/replace a drive

a. Set the drive offline, if it is not already offline due to an error

megacli -PDOffline -PhysDrv [E:S] -aN

b. Mark the drive as missing

megacli -PDMarkMissing -PhysDrv [E:S] -aN

c. Prepare drive for removal

megacli -PDPrpRmv -PhysDrv [E:S] -aN

d. Change/replace the drive

e. If you’re using hot spares then the replaced drive should become your new hot spare drive

megacli -PDHSP -Set -PhysDrv [E:S] -aN

f. In case you’re not working with hot spares, you must re-add the new drive to your RAID virtual drive and start the rebuilding

megacli -PdReplaceMissing -PhysDrv [E:S] -ArrayN -rowN -aN
megacli -PDRbld -Start -PhysDrv [E:S] -aN

3-Gathering Standard logs

#rm –f MegaSAS.log
#megacli -adpallinfo -a0
#megacli -encinfo -a0
#megacli -ldinfo -lall -a0
#megacli -pdlist -a0
#megacli -adpeventlog -getevents -f controller-events.log -a0 -nolog
#megacli -fwtermlog -dsply -a0 -nolog > controller-fwterm.log

9 Aug, 2013  |  Written by  |  under Debian, Linux, Proxmox

I ran in to a few issues while trying to install Dell Openmanage on the latest version of Proxmox (3.0).

In order to get things working correctly on Proxmox 3.x, here are the steps that are required:

#echo “deb http://linux.dell.com/repo/community/ubuntu wheezy openmanage” > /etc/apt/sources.list.d/linux.dell.com.sources.list
#gpg --keyserver pool.sks-keyservers.net --recv-key 1285491434D8786F
#gpg -a --export 1285491434D8786F | sudo apt-key add --
#sudo apt-get update
#apt-get install libcurl3
#sudo apt-get install srvadmin-all
#sudo service dataeng start
#sudo service dsm_om_connsvc start

Once you get everything installed correctly you will be able to log in to the Openmanage web interface here:

https://<hostname or ip address>:1311

The first time you log in you should use the ‘root’ username and associated password.

5 Jun, 2013  |  Written by  |  under Ceph, Gluster, Linux

I recently started looking into Ceph as a possible replacement for our 2 node Glustr cluster. After reading numerous blog posts and several videos on the topic, I believe that the following three videos provide the best overview and necessary insight into how Ceph works, what it offers, how to mange a cluster, and how it differs from Gluster.

This video was given at the 2013 linux.conf.au conference. The first one is a Ceph only talk given by Florian Haas and Tim Serong.  It provides a complete overview of the Ceph software, it’s feature set and goes into detail about the current status of each of Ceph’s components (block storage, object storage, filesystem storage, etc).

The second video is a bit more lighthearted, it is a talk which involves some back and forth between Ceph’s Saige Weil and Gluster’s John Mark Walker, the talk is moderated by Florian Haas.  It covers some of the basic use cases for each file systems, offers some technical insight into the overall design, some of the ways in which the filesystems are similar, some items that are on each of their roadmaps moving forward,  as well as some of the ways each of  the projects differ from one another.

The final video also takes place at the 2013 linux.conf.au conference.  It covers some of the same topics that were discussed in the prior two videos, however this one is also geared toward the operations aspect of managing a Ceph cluster.  This talk is given by Saige Weil as well.