Category Archives: Ubuntu

Ceph upgrade to Luminous + Ubuntu 16.04

Our Ubuntu 14.04 cluster has been running Ceph Jewel without issue for about the last 9 months.  During that time period the Ceph team released the latest LTS version called Luminous.  Given that Jewel is slated for EOL at some point int he next 2 or 3 months, I thought it was a good time to once again upgrade the cluster.

Before I started working on upgrading the version of Ceph on the cluster I decided to go ahead and upgrade the version of Ubuntu from 14.04 to 16.04.  The process for me was a relatively simple one, I used the following single set of commands on each node:

apt-get update; apt-get upgrade; apt-get dist-upgrade ; apt-get install update-manager-core; do-release-upgrade

After the upgrade the process for restarting Ceph daemons will have changed slightly due to the fact that Ubuntu 16.04 uses systemd instead of upstart.  For example you will have to use the following commands to restart each of the services going forward:

systemctl restart ceph-mon.target
systemctl restart ceph-osd.target
systemctl restart ceph-mgr.target

After the Ubuntu upgrade I decided to start upgrading the version of Ceph on the cluster.
Using ‘ceph-deploy’ I issued the following commands on each node:

Monitor nodes first:
ceph-deploy install --release luminous hqceph1 hqceph2 hqceph3 (from the admin node)
systemctl restart ceph-mon.target (locally on each server)

OSD nodes second (for example):

Set ‘noout’ so your data does not try to rebalance during the OSD restarts:
ceph osd set noout

ceph-deploy install --release luminous hqosd1 hqosd2 hqosd3  (from admin node)
systemctl restart ceph-osd.target (locally on each server)

Next you should do the same for any RGW nodes that you might have.
Finally repeat the process for any other client nodes (librbd, KRBD, etc).

Don’t forget to unset noout before you finish:
ceph osd unset noout (from the admin node)

This release also introduced a new daemon called the manager daemon or ‘mgr’.  You can learn more about this new process here:  http://docs.ceph.com/docs/mimic/mgr/

I was able to install the mgr daemon on two of my existing nodes (one as a active and the second as the standyby) using the following commands:

ceph-deploy mgr create hqceph2 (from the admin node)
ceph-deploy mgr create hqceph3 (from the admin node)

Upgrading Ceph from Hammer to Jewel

We recently upgraded our Ceph cluster from the latest version of Hammer to 10.2.7 (Jewel). Here are the steps that we used in order to complete the upgrade. Due to a change in Ceph daemon permissions, this specific upgrade required an additional step of using chmod to change file permissions for each daemon directory.

Set the cluster to the ‘noout’ state so that we can perform the upgrade without any data movement:
ceph osd set noout

From the Ceph-deploy control node upgrade monitor nodes first:
ceph-deploy install --release jewel ceph-mon1 ceph-mon2 ceph-mon3

On each monitor node:
stop ceph-mon-all
cd /var/lib/ceph
chown -R ceph:ceph /var/lib/ceph/
start ceph-mon-all

Next move on to the OSD nodes:
ceph-deploy install --release jewel ceph-osd1 ceph-osd2 ceph-osd3 ceph-osd4

Add the following line to /etc/ceph/ceph.conf on each OSD (this will allow the ceph daemons to startup using the old permission scheme):

setuser match path = /var/lib/ceph/$type/$cluster-$id

Stop OSD’s and restart them on each node:
stop ceph-osd-all
start ceph-osd-all

Don’t forget to unset noout from the admin node:
ceph osd unset noout

Once the cluster is all healthy again and you have some time make the necessary permission changes for the OSD daemons you can do the following:

Set noout:
ceph osd set noout

Log onto to each OSD node 1 at a time and run the following commands:
find /var/lib/ceph/osd -maxdepth 1 -mindepth 1 -print | xargs -P12 -n1 chown -R root:root

stop ceph-osd-all

find /var/lib/ceph/osd -maxdepth 1 -mindepth 1 -print | xargs -P12 -n1 chown -R ceph:ceph

chown -R ceph:ceph /var/lib/ceph/

Comment out the setuser line in ceph.conf and restart OSD’s:
#setuser match path = /var/lib/ceph/$type/$cluster-$id
start ceph-osd-all

Don’t forget to unset noout from the admin node:
ceph osd unset noout

Upgrading Ceph from Firefly to Hammer

Last week we decided to upgrade our 13 node Ceph cluster from version 0.80.11 (firefly) to version 0.94.6 (hammer).

Although we have not been having any known issues with the cluster running firefly, the official support for firefly ended in January 2016, and the jewel release will be out soon and it will be easier to upgrade to jewel from either hammer or infernalis.

The overall upgrade process was relatively painless. I used the ceph-deploy script to create the cluster initially, and I choose to use it again to upgrade the cluster to hammer.

1) First I pull in the current config file and keys:
root@admin:/ceph-deploy# ceph-deploy config pull mon1
root@admin:/ceph-deploy# ceph-deploy gatherkeys mon1

2) Next we upgrade each of the mon daemons:
root@admin:/ceph-deploy# ceph-deploy install --release hammer mon1 mon2 mon3

3) Now we can restart the daemons on each mon server
root@mon1:~# stop ceph-mon-all
root@mon1:~# start ceph-mon-all

4) Next it’s time to upgrade the osd server daemons:
root@admin:/ceph-deploy# ceph-deploy install --release hammer osd1 osd2 osd3 osd4

5) Now we can restart the daemons on each of the osd servers:
root@osd4:~# stop ceph-osd-all
root@osd4:~# start ceph-osd-all

6) Finally you can upgrade any client server daemons that you have:
root@admin:/ceph-deploy# ceph-deploy install --release hammer clientserver1

Benchmarking Ceph

This is a post that I have had in draft mode for quite some time. At this point some of this information is out of date, so I am planning on writing a ‘part II’ post shortly, which will include some updated information.

Benchmarking Ceph:

Ever since we got our ceph cluster up and running, I’ve been running various benchmarking applications against different cluster configurations. Just to review, the cluster that we recently built has the following specs:

Cluster specs:

  • 3 x Dell R-420;32 GB of RAM; for MON/RADOSGW/MDS nodes
  • 6 x Dell R-720xd;64 GB of RAM; for OSD nodes
  • 72 x 4TB SAS drives as OSD’s
  • 2 x Force10 S4810 switches
  • 4 x 10 GigE LCAP bonded Intel cards
  • Ubuntu 12.04 (AMD64)
  • Ceph 0.72.1 (emperor)
  • 2400 placement groups
  • 261TB of usable space

The main role for this cluster will be one primarily tied to archiving audio and video assets. This being the case, we decided to try and maximize total cluster capacity (4TB drives, no ssd’s, etc), while at the same time being able to achieve and maintain reasonable cluster throughput (10 GigE, 12 drives per osd nodes, etc).

Most of my benchmarking focused on rbd and radosgw, because either of these is most likely to be what we introduce into production when we are ready.  We are very much awaiting a stable and supported cephfs release (which will hopefully be available sometime in mid-late 2014), which will allow us to switch out our rbd + samba setup, for on based on cephfs.

Rados Benchmarks: 

I setup a pool called ‘test’ with 1600 pg’s in order to run some benchmarks using the ‘rados bench’ tool that came with Ceph.  I started with a replication level of ‘1’ and worked my way up to a replication level of ‘3’.

root@hqceph1:/# rados -p test bench 20 write (rep size=1)
Total time run: 20.241290
Total writes made: 5646
Write size: 4194304
Bandwidth (MB/sec): 1115.739
Stddev Bandwidth: 246.027
Max bandwidth (MB/sec): 1136
Min bandwidth (MB/sec): 0
Average Latency: 0.0571572
Stddev Latency: 0.0262513
Max latency: 0.336378
Min latency: 0.02248
root@hqceph1:/# rados -p test bench 20 write (rep size=2)
Total time run: 20.547026
Total writes made: 2910
Write size: 4194304
Bandwidth (MB/sec): 566.505
Stddev Bandwidth: 154.643
Max bandwidth (MB/sec): 764
Min bandwidth (MB/sec): 0
Average Latency: 0.112384
Stddev Latency: 0.198579
Max latency: 2.5105
Min latency: 0.025391
root@hqceph1:/# rados -p test bench 20 write (rep size=3)
Total time run: 20.755272
Total writes made: 2481
Write size: 4194304
Bandwidth (MB/sec): 478.144
Stddev Bandwidth: 147.064
Max bandwidth (MB/sec): 728
Min bandwidth (MB/sec): 0
Average Latency: 0.133827
Stddev Latency: 0.229962
Max latency: 3.32957
Min latency: 0.029481

RBD Benchmarks:

Next I setup a 10GB block device using rbd:

root@ceph1:/blockdev# dd bs=1M count=256 if=/dev/zero of=test1 conv=fdatasync (rep size=1)
256+0 records in
256+0 records out
268435456 bytes (268 MB) copied, 0.440333 s, 610 MB/s
root@ceph1:/blockdev# dd bs=4M count=256 if=/dev/zero of=test1 conv=fdatasync (rep size=1)
256+0 records in
256+0 records out
1073741824 bytes (1.1 GB) copied, 1.07413 s, 1000 MB/s
root@ceph1:/mnt/blockdev# hdparm -Tt /dev/rbd1 (rep size=1)
/dev/rbd1:
Timing cached reads: 16296 MB in 2.00 seconds = 8155.69 MB/sec
Timing buffered disk reads: 246 MB in 3.10 seconds = 79.48 MB/sec
root@ceph1:/mnt/blockdev# dd bs=1M count=256 if=/dev/zero of=test conv=fdatasync (rep size=2)
256+0 records in
256+0 records out
268435456 bytes (268 MB) copied, 1.29985 s, 207 MB/s
root@ceph1:/mnt/blockdev# dd bs=4M count=256 if=/dev/zero of=test2 conv=fdatasync(rep size=2)
256+0 records in
256+0 records out
1073741824 bytes (1.1 GB) copied, 4.02375 s, 267 MB/s
root@cephmount1:/mnt/ceph-block-device/test# hdparm -Tt /dev/rbd1 (rep size=2)
/dev/rbd1:
Timing cached reads: 16434 MB in 2.00 seconds = 8225.55 MB/sec
Timing buffered disk reads: 152 MB in 3.01 seconds = 50.55 MB/sec

Radosgw Benchmarks:

Using s3cmd (s3tools) I was able to achieve about 70MB/s when pushing files to ceph via the s3 restful API.

 

Ceph braindump part1

After spending about 4 months testing, benchmarking, setting up and breaking down various Ceph clusters, I though I would spend time documenting some of the things I have learned while setting up cephfs, rbd and radosgw along the way.

First let me talk a little bit about the details of the cluster that we will be putting into production over the next several weeks.

Cluster specs:

  • 6 x Dell R-720xd;64 GB of RAM; for OSD nodes
  • 72 x 4TB SAS drives as OSD’s
  • 3 x Dell R-420;32 GB of RAM; for MON/RADOSGW/MDS nodes
  • 2 x Force10 S4810 switches
  • 4 x 10 GigE LCAP bonded Intel cards
  • Ubuntu 12.04 (AMD64)
  • Ceph 0.72.1 (emperor)
  • 2400 placement groups
  • 261TB of usable space

The process I used to set- up and tear down our cluster during testing was quite simple, after installing ‘ceph-deploy’ on the admin node:

  1. ceph-deploy new mon1 mon2 mon3
  2. ceph-deploy install  mon1 mon2 mon3 osd1 osd2 osd3 osd4 osd5 osd6
  3. ceph-deploy mon create mon1 mon2 mon3
  4. ceph-deploy gatherkeys mon1
  5. ceph-deploy osd create osd1:sdb
  6. ceph-deploy osd create osd1:sdc
    ……….

The uninstall process went something like this:

  1. ceph-deploy disk zap osd1:sdb
    ……….
  2. ceph-deploy purge mon1 mon2 mon3 osd1 osd2 osd3 osd4 osd5 osd6
  3. ceph-deploy purgedata mon1 mon2 mon3 osd1 osd2 osd3 osd4 osd5 osd6

Additions to ceph.conf:

Since we wanted to configure an appropriate journal size for our 10GigE network, mount xfs with appropriate options and configure radosgw, we added the following to our ceph.conf (after ‘ceph-deploy new but before ‘ceph-deploy install’:

[global]
osd_journal_size = 10240
osd_mount_options_xfs = “rw,noatime,nodiratime,logbsize=256k,logbufs=8,inode64”
osd_mkfs_options_xfs = “-f -i size=2048”

[client.radosgw.gateway]
host = mon1
keyring = /etc/ceph/keyring.radosgw.gateway
rgw_socket_path = /tmp/radosgw.sock
log_file = /var/log/ceph/radosgw.log
admin_socket = /var/run/ceph/radosgw.asok
rgw_dns_name = yourdomain.com
debug rgw = 20
rgw print continue = true
rgw should log = true
rgw enable usage log = true

Benchmarking:

I used the following commands to benchmark rados, rbd, cephfs, etc

  1. rados -p rbd  bench 20 write –no-cleanup
  2. rados -p rbd  bench 20 seq
  3. dd bs=1M count=512 if=/dev/zero of=test conv=fdatasync
  4. dd bs=4M count=512 if=/dev/zero of=test conv=fdatasync

 Ceph blogs worth reading:

http://ceph.com/community/blog/
http://www.sebastien-han.fr/blog/
http://dachary.org/

Ubuntu 13.04 + Gnome 3.8 + ATI HD5450

I recently switched from using OpenSuse 12, to using Ubuntu 13.04 on my desktop machine at work. In order to get everything working correctly I had to use the latest ATI beta drivers (13.3 Beta).

After you unzip the downloaded file, simply run:

# ./amd-driver-installer-catalyst-13.3-beta3-linux-x86.x86_64.run

Initially the install script errored out, after it failed to locate ‘version.h’, so before I could successfully complete the driver install, I had to run:

# ln -s /lib/modules/3.8.0-19-generic/build/include/generated/uapi/linux/version.h /lib/modules/3.8.0-19-generic/build/include/linux/version.h

Since these are beta drivers, they have a watermark in the lower right hand corner of each monitor.  The solution to removing that watermark can be found here.

If you have dual monitors as I do, once you boot into ubuntu you will want to configure them with the ATI Control Center:

# amdcccle

In order to get the latest version of Gnome you can add the Gnome3 team ppa’s:

# sudo add-apt-repository ppa:gnome3-team/gnome3
# sudo apt-get update
# sudo apt-get upgrade

If  you want access to the latest bleeding edge applications and utilities , you can add the Ricotz testing ppa’s as well:

# sudo add-apt-repository ppa:ricotz/testing
# sudo apt-get update
# sudo apt-get upgrade

The Ricotz staging ppa’s can be used however, there is a chance that if you upgrade with these, you system may end up becoming unstable:

# sudo add-apt-repository ppa:ricotz/staging
# sudo apt-get update
# sudo apt-get upgrade

Finally, I ran into a bug in Gnome where my entire desktop background was gray/white, in inorder to get that issue fixed I followed the advice given in the comments section of this Youtube video.

zfsonlinux and gluster so far….

Recently I started to revisit the idea of using zfs and linux (zfsonlinux) as the basis for a server that will eventually be the foundation of our gluster storage infrastructure.  At this point we are using the Opensolaris version of zfs and an older (but stable) version of gluster (3.0.5).

The problem with staying with Opensolaris (besides the fact that it is no longer being actively supported itself),  is that we would be unable to upgrade gluster….and thus we would be unable to take advantage of some of the new and upcoming features that exist in the later versions (such as geo-replication, snapshots, active-active geo-replication and various other bugfixes, performance enhancements, etc).

Hardware:

Here are the specs for the current hardware I am using to test:

  • 2 x Intel Xeon E5410 @ 2.33GHz:CPU
  • 32 GB DDR2 DIMMS:RAM
  • 48 X 2TB Western Digital SATA II:HARD DRIVES
  • 2 x 3WARE 9650SE-24M8 PCIE:RAID CONTROLLER
  • Ubuntu 11.10
  • Glusterfs version 3.2.5
  • 1 Gbps interconnects (LAN)

ZFS installation:

I decided to use Ubuntu 11.10 for this round of testing, currently the daliy ppa has a lot of bugfixes and performance improvements that do not exist in the latest stable release ( 0.6.0-rc6) so the daily ppa is the version that should be used until either v0.6.0-rc7 or v0.6.0 final are released.

Here is what you will need to get zfs installed and running:

# apt-add-repository ppa:zfs-native/daily
# apt-get update
# apt-get install debootstrap ubuntu-zfs

At this point we can create our first zpool. Here is the syntax used to create a 6 disk raidz2 vdev:

# zpool create -f tank raidz2 sdc sdd sde sdf sdg sdh

Now let’s check the status of the zpool:

# zpool status tank
pool: tank
state: ONLINE
scan: none requested
config:NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
sdc ONLINE 0 0 0
sdd ONLINE 0 0 0
sde ONLINE 0 0 0
sdf ONLINE 0 0 0
sdg ONLINE 0 0 0
sdh ONLINE 0 0 0errors: No known data errors

ZFS Benchmarks:

I ran a few tests to see what kind of performance I could expect out of zfs first, before I added gluster on top, that way I would have better idea about where the bottleneck (if any) existed.

linux 3.3-rc5 kernel untar:

single ext4 disk: 3.277s
zfs 2 disk mirror: 19.338s
zfs 6 disk raidz2: 8.256s

dd using block size of 4096:

single ext4 disk: 204 MB/s
zfs 2 disk mirror: 7.5 MB/s
zfs 6 disk raidz2: 174 MB/s

dd using block size of 1M:

single ext4 disk: 153.0 MB/s
zfs 2 disk mirror: 99.7 MB/s
zfs 6 disk raidz2: 381.2 MB/s

Gluster + ZFS Benchmarks

Next I added gluster (version 3.2.5) to the mix to see how they performed together:

linux 3.3-rc5 kernel untar:

zfs 6 disk raidz2 + gluster (replication): 4m10.093s
zfs 6 disk raidz2 + gluster (geo replication): 1m12.054s

dd using block size of 4096:

zfs 6 disk raidz2 + gluster (replication): 53.6 MB/s
zfs 6 disk raidz2 + gluster (geo replication): 53.7 MB/s

dd using block size of 1M:

zfs 6 disk raidz2 + gluster (replication): 45.7 MB/s
zfs 6 disk raidz2 + gluster (geo replication): 155 MB/s

Conclusion

Well so far so good, I have been running the zfsonlinux port for two weeks now without any real issues. From what I understand there is still a decent amount of work left to do around dedup and compression (neither of which I necessarily require for this particular setup).

The good news is that the zfsonlinux developers have not even really started looking into improving performance at this point, since their main focus thus far has been overall stability.

A good deal of development is also taking place in order to allow linux to boot using a zfs ‘/boot’ partition.  This is currently an option on several disto’s including Ubuntu and Gentoo, however the setup requires a fair amount of effort to get going, so it will be nice when this style setup is supported out of the box.

In terms of Gluster specifically, it performs quite well using geo-replication with larger file sizes. I am really looking forward to the active-active geo-replication feature currently planned for v3.4 to become fully implemented and available. Our current production setup (currently using two node replication) has a T3 (WAN) interconnect, so having the option to use geo-replication in the future should really speed up our write throughput, which is currently hampered by the throughput of the T3 itself.

Ubuntu 11.10 + Gnome Shell + ATI drivers + multiple monitors

**UPDATE**

Dane (see comments) pointed out that ATI has in fact released the 11.10 version of their drivers, I went ahead and gave them a try and using them broke most things for me.

Once I booted back in to Gnome…I had some of the Gnome3 look and feel…but everything else (menus, icons, etc) were clearly from Gnome2.  I reinstalled version 11.9 and everything was back to normal.  This update might work for some other setups…but for now I’ll just stick with the version that is working 95% of the time.

——————————————————————————————————————————————————————

I was finally able to get a working desktop using Ubuntu 11.10, Gnome Shell, Gnome 3.2 along with my Radeon HD 2400 XT video card.  The adventure started a few weeks ago when I tried to setup my existing Ubuntu 11.04 desktop using some PPA repositories I found online.

I was able to successfully upgrade from Ubuntu 11.04 to 11.10 beta, and  since the 11.10 final release was right around the corner I figured it was safe to go ahead and give it a try.  The upgrade went well, but I spent the next day fighting to try and get gnome-shell to play nicely with my Radeon card using the existing ATI drivers.

I ended up starting from scratch a few days later, by backing up some important files in my home directory and doing a clean install of 11.10 once the final version was released.

After doing an update and installing some other packages such as  ubuntu-restricted-extras, vlc, pidgin, etc installing gnome-shell was painless:

# apt-get install gnome-shell

After rebooting, I logged in to find some of the same problems as before with this desktop install (screen tearing, blurry icons, multicolored menus, etc). I found some posts around the net that alluded to the fact that I might be able to solve some of my problems if I used the latest drivers (version 11.9) off the ATI website.

On the other hand, I found other posts by people claiming that even using the latest drivers had not completely solved all their problems and that ATI would be releasing version 11.10 sometime within the next 2 to 3 weeks, and that this new version would be specifically tested against Gnome 3.x (and fix the remaining bugs).

Anyway, I decided that I had nothing to lose at this point and decided to grab the latest version from the web:

# mkdir ati-11.9; cd ati-11.9
# wget http://www2.ati.com/drivers/linux/ati-driver-installer-11-9-x86.x86_64.run
# sh ati-driver-installer-11-9-x86.x86_64.run –buildpkg Ubuntu/oneiric
# dpkg -i fglrx*.deb
# aticonfig –initial -f

After rebooting my machine again, I was pleasantly surprised to see that everything was looking good, no more problems with screen tearing and all my icons and menus were seemingly in order.

The only thing I needed to do now was to setup my multiple monitors correctly, since at that point I was staring at two cloned spaces instead of one large desktop spread across both my two 24″ monitors.

First I launched the Catalyst control panel:

# gksu amdcccle

Under the ‘Display Manager’ page I had to select ‘Multi-display desktop with display’

***FOR EACH OF MY TWO MONITORS****

After a reboot I went into the Gnome ‘System Settings’ and choose ‘Displays’….I was finally able to uncheck ‘Mirror displays’ and hit ‘Apply’ without error.

The final two steps required for me to getting everything working %100 correctly was to install the gnome-tweak-tool:

# apt-get install gnome-tweak-tool

and disable the ‘Have file manager handle the desktop’ option in the ‘Desktop’ section (that did away with the extra menu I was seeing).

The final step in the process involved installing a new theme…I really liked the Elementary them found here. So that is the one I choose….now everything is working as it should be!

Finding and installing pre-created images on OpenStack

Once you have your OpenStack cluster up and running you will need to either find some pre-created image templates or you may decide that you want to roll your own.  I’ll leave the details of creating images from scratch for a different post, this post will focus on providing links to both image files and instructions for installing pre-created Linux templates on OpenStack infrastructure.

First, if you are looking to install any version of Ubuntu, you should visit

http://uec-images.ubuntu.com/releases/

and download the file that corresponds to your desired version and architecture.

Once you have that file, you can follow the instructions here.

If you are looking to install a version of Debian, CentOS or Fedora, you should visit

http://open.eucalyptus.com/wiki/EucalyptusUserImageCreatorGuide_v1.6,

and download one of pre-created images that the folks over at Eucalyptus have provided.

Once you have are ready to install one of those files, you can follow the instructions here.

MP3 and H.264 playback with chromium

Recently I noticed that I was unable to play certain types of audio and video files directly from within Chromium,  I am using Chromium version 10.0.648.133 (77742) on Ubuntu 10.10.  It seems that due to the various licensing issues surrounding the codecs required to playback some of these media types, they are not supported without installing some extra packages.

In order to get MP3 playback support up and running you will need to install the necessary software package using the following command:

 apt-get install chromium-codecs-ffmpeg-extra

After a quick browser restart, you should be able to enjoy MP3 playback on Chromium.

There is a similar process required in order to get MP4 and H.264 playback enabled, however this time you will need to install the following instead:

 apt-get install chromium-codecs-ffmpeg-nonfree

Once again after a quick browser restart, you should be able to enjoy MP4 and H.264 playback on Chromium.