Sunday, 10 December 2017

Want to Install Ceph, but afraid of Ansible?

There is no doubt that Ansible is a pretty cool automation engine for provisioning and configuration management. ceph-ansible builds on this versatility to deliver what is probably the most flexible Ceph deployment tool out there. However, some of you may not want to get to grips with Ansible before you install Ceph...weird right?

No, not really.

If you're short on time, or just want a cluster to try ceph for the first time, a more guided installation approach may help. So I started a project called ceph-ansible-copilot

The idea is simple enough; wrap the ceph-ansible playbook with a text GUI. Very 1990's, I know, but now instead of copying and editing various files you simply start the copilot tool, enter the details and click 'deploy'. The playbook runs in the background within the GUI and any errors are shown there and more drowning in an ocean of scary ansible output :)

The features and workflows of the UI are described in the project page's README file.

Enough rambling, lets look at how you test this stuff out. The process is fairly straight forward;
  1. configure some hosts for Ceph
  2. create the Ansible environment
  3. run copilot
The process below describes each of these steps using CentOS7 as the deployment target for Ansible and the Ceph cluster nodes.
1. Configure Some Hosts for Ceph
Call me lazy, but I'm not going to tell you how to build vm's or physical servers. To follow along, the bare minimum you need are a few virtual machines - as long as they have some disks on them for Ceph, you're all set!

2. Create the Ansible environment
Typically for a Ceph cluster you'll want to designate a host as the deployment or admin host. The admin host is just a deployment manager, so it can be a virtual machine, a container or even a real (gasp!) server. All that really matters is that your admin host has network connectivity to the hosts you'll be deploying ceph to.

On the admin host, perform these tasks (copilot needs ansible 2.4 or above)
> yum install git ansible python-urwid -y
Install ceph-ansible (full installation steps can be found here)
> cd /usr/share
> git clone
> cd ceph-ansible
> git checkout master
Setup passwordless ssh between the admin host and for candidate ceph hosts
> ssh-keygen
> ssh-copy-id root@<ceph_node>
On the admin host install copilot
> cd ~
> git clone
> cd ceph-ansible-copilot
> python install 
3. Run copilot
The main playbook for ceph-ansible is in /usr/share/ceph-ansible - this is where you need to run copilot from (it will complain if you try to run it in some other place!)
> cd /usr/share/ceph-ansible
> copilot
Then follow the UI..

Example Run
Here's a screen capture showing the whole process, so you can see what you get before you hit the command line.

The video shows the deployment of a small 3 node ceph cluster, 6 OSDs, a radosgw (for S3), and an MDS for cephfs testing. It covers the configuration of the admin host, the copilot UI and finally a quick look at the resulting ceph cluster. The video is 9mins in length, but for those of us with short attention spans, here's the timeline so you can jump to the areas that interest you.

00:00 Pre-requisite rpm installs on the admin host
01:12 Installing ceph-ansible from github
01:52 Installing copilot
02:58 Setting up passwordless ssh from the admin host to the candidate ceph hosts
04:04 Ceph hosts before deployment
05:04 Starting copilot
08:10 Copilot complete, review the Ceph hosts

What's next?
More testing...on more and varied hardware...

So far I've only tested 'simple' deployments using the packages from (community deployments) against a CentOS target. So like I said, more testing is needed, a lot more...but for now there's enough of the core code there for me to claim a victory and write a blog post!

Aside from the testing, these are the kinds of things that I'd like to see copilot handle
  • collocation rules (which daemons can safely run together)
  • resource warnings (if you have 10 HDD's but not enough RAM, or CPU...issue a warning)
  • handle the passwordless ssh setup. copilot already checks for passwordless ssh, so instead of leaving it to the admin to resolve any issues, just add another page to the UI.
That's my wishlist - what would you like copilot to do? Leave a comment, or drop by the project on github.

Demo'd Versions
  • copilot 0.9.1
  • ceph-ansible MASTER as at December 11th 2017
  • ansible 2.4.1 on CentOS

Tuesday, 5 July 2016

De-mystifying gluster shards

Recently I've been working on converging glusterfs with oVirt - hyperconverged, open source style. oVirt has supported glusterfs storage domains for a while, but in the past a virtual disk was stored as a single file on a gluster volume. This helps some workloads, but file distribution and functions like self heal and rebalance have more work to do. The larger the virtual disk, the more work gluster has to do in one go.

Enter sharding

The shard translator was introduced with version 3.7, and enables large files to be split into smaller chunks(shards) of a user defined size. This addresses a number of legacy issues when using glusterfs for virtual machine storage - but does introduce an additional level complexity. For example, how do you now relate a file to it's shard, or vice-versa?

The great thing is that even though a file is split into shards, the implementation still allows you to relate files to shards with a few simple commands.
Firstly, let's look at how to relate a file to it's shards;

And now, let's go the other way. We start with a shard, and end with the parent file.

Hopefully this helps others getting to grips with glusterfs sharding (and maybe even oVirt!)

Sunday, 29 May 2016

Making gluster play nicely with others

These days hyperconverged strategies are everywhere. But when you think about it, sharing the finite resources within a physical host requires an effective means of prioritisation and enforcement. Luckily, the Linux kernel already provides an infrastructure for this in the shape of cgroups, and the interface to these controls is now simplified with systemd integration.

So lets look at how you could use these capabilities to make Gluster a better neighbour in a collocated or hyperconverged  model. 

First some common systemd terms, we should to be familiar with;
slice : a slice is a concept that systemd uses to group together resources into a hierarchy. Resource constraints can then be applied to the slice, which defines 
  • how different slices may compete with each other for resources (e.g. weighting)
  • how resources within a slice are controlled (e.g. cpu capping)
unit : a systemd unit is a resource definition for controlling a specific system service
NB. More information about control groups with systemd can be found here

In this article, I'm keeping things simple by implementing a cpu cap on glusterfs processes. Hopefully, the two terms above are big clues, but conceptually it breaks down into two main steps;
  1. define a slice which implements a CPU limit
  2. ensure gluster's systemd unit(s) start within the correct slice.
So let's look at how this is done.

Defining a slice

Slice definitions can be found under /lib/systemd/system, but systemd provides a neat feature where /etc/systemd/system can be used provide local "tweaks". This override directory is where we'll place a slice definition. Create a file called glusterfs.slice, containing;


CPUQuota is our means of applying a cpu limit on all resources running within the slice. A value of 200% defines a 2 cores/execution threads limit.

Updating glusterd

Next step is to give gluster a nudge so that it shows up in the right slice. If you're using RHEL7 or Centos7, cpu accounting may be off by default (you can check in /etc/systemd/system.conf). This is OK, it just means we have an extra parameter to define. Follow these steps to change the way glusterd is managed by systemd

# cd /etc/systemd/system
# mkdir glusterd.service.d
# echo -e "[Service]\nCPUAccounting=true\nSlice=glusterfs.slice" > glusterd.service.d/override.conf

glusterd is responsible for starting the brick and self heal processes, so by ensuring glusterd starts in our cpu limited slice, we capture all of glusterd's child processes too. Now the potentially bad news...this 'nudge' requires a stop/start of gluster services. If your doing this on a live system you'll need to consider quorum, self heal etc etc. However, with the settings above in place, you can get gluster into the right slice by;

# systemctl daemon-reload
# systemctl stop glusterd
# killall glusterfsd && killall glusterfs
# systemctl daemon-reload
# systemctl start glusterd

You can see where gluster is within the control group hierarchy by looking at it's runtime settings

# systemctl show glusterd | grep slice
After=rpcbind.service glusterfs.slice systemd-journald.socket

or use the systemd-cgls command to see the whole control group hierarchy

├─1 /usr/lib/systemd/systemd --switched-root --system --deserialize 19
│ └─glusterd.service
│   ├─ 867 /usr/sbin/glusterd -p /var/run/ --log-level INFO
│   ├─1231 /usr/sbin/glusterfsd -s server-1 --volfile-id repl.server-1.bricks-brick-repl -p /var/lib/glusterd/vols/repl/run/ 

 │   └─1305 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/lib/glusterd/glustershd/run/ -l /var/log/glusterfs/glustershd.log
│ └─user-0.slice
│   └─session-1.scope
│     ├─2075 sshd: root@pts/0  
│     ├─2078 -bash
│     ├─2146 systemd-cgls
│     └─2147 less

At this point gluster is exactly where we want it! 

Time for some more systemd coolness ;) The resource constraints that are applied by the slice are dynamic, so if you need more cpu, you're one command away from getting it;

# systemctl set-property glusterfs.slice CPUQuota=350%

Try the 'systemd-cgtop' command to show the cpu usage across the complete control group hierarchy.

Now if jumping straight into applying resource constraints to gluster is a little daunting, why not test this approach with a tool like 'stress'. Stress is designed to simply consume components of the system - cpu, memory, disk. Here's an example .service file which uses stress to consume 4 cores

Description=CPU soak task

ExecStart=/usr/bin/stress -c 4


Now you can tweak the service, and the slice with different thresholds before you move on to bigger things! Use stress to avoid stress :)

And now the obligatory warning. Introducing any form of resource constraint may resort in unexpected outcomes especially in hyperconverged/collocated systems - so adequate testing is key.

With that said...happy hacking :)

Tuesday, 26 April 2016

Using LIO with Gluster

In the past, gluster users of have been able to open up their gluster volumes to iSCSI using the tgt daemon. This has been covered in the past on other blogs and also documented on

But, tgt has been superseded in more recent distro's by LIO. LIO provides a number of different local storage options to be utilised as SCSI targets, including; FILEIO, BLOCK, PSCSI and RAMDISK. These SCSI targets are implemented as modules in kernel space, but what isn't immediately obvious is that LIO also provides a userspace framework called TCMU. TCMU enables userspace files to become iSCSI targets. 

With LIO, the easiest way to exploit gluster as an iSCSI target was through the FILEIO 'storage engine' over FUSE. However, the high number of context switches incurred within FUSE is likely to reduce the performance potential to your 'client' -  especially for random I/O access patterns.

Until now, FUSE was your only option. But Andy Grover at Red Hat has just changed things. Andy has developed tcmu-runner which utilises the TCMU framework, allowing a glusterfs target to be used over gluster's libgfapi interface. Typically, with libgfapi you can expect less context switching, and improved performance.

For those like me, with short attention spans, here's what the improvement looked like when I compared LIO/FUSE with LIO/gfapi using a couple of fio  based workloads.

Read Improvement
Mixed Workload Improvement

In both charts, IOPS and latency significantly improves using LIO/GFAPI, and further still by adopting the arbiter volume.

As you can see, for a young project, these results are really encouraging. The bad news is that to try tcmu-runner you'll need to either build systems based on Fedora F24/rawhide or compile it yourself from the github repo. Let's face it, there's always a price to pay for new shiny stuff :)

For the remainder of this article, I'll walk through the configuration of LIO and the iSCSI client that I used during my comparisons.

Preparing Your Environment

In the interests of brevity, I'm assuming that you know how to build servers,  create a gluster trusted pool and define volumes. Here's a checklist of the tasks you should do in order to prepare a test environment;
  1. build 3 Fedora24 nodes and install gluster (3.7.11) on each peer/node
  2. on each node, ensure /etc/gluster/glusterd.vol contains the following setting - option rpc-auth-allow-insecure on. This is needed for gfapi access. Once added, you'll need to restart glusterd.
  3. install targetcli (targetcli-2.1.fb43-1) and tcmu-runner (tcmu-runner-1.0.4-1) on each of your gluster nodes
  4. form a gluster trusted pool, and create a replica 3 volume or replica with arbiter volume (or both!) 
  5. issue "gluster vol set <vol_name> server.allow-insecure on" to enable libgfapi access to the volume
There are several ways to configure the iSCSI environment, but for my tests I adopted the following approach;
  • two of my three gluster nodes will be iSCSI gateways (LIO targets)
  • each gateway will have it's own iqn (iSCSI Qualified Name)
  • each gateway will only access the gluster volume from itself, so if gluster is down on this node so is the path for any attached client (makes things simple)
  • high availability for the LUN is provided by client side multipathing
Before moving on, you can confirm that targetcli/tcmu-runner are providing the gluster integration by simply running 'ls' from the targetcli.

# targetcli ls
o- / ...............
  o- backstores ....
  | o- block .......
  | o- fileio ......
  | o- pscsi .......
  | o- ramdisk .....
  | o- user:glfs ...    <--- gluster gfapi available through tcmu
  | o- user:qcow ...
  o- iscsi .........
  o- loopback ......
  o- vhost ......

With the preparation complete, let's configure the LIO gateways.

Configuring LIO - Node 1

The following steps provide an example configuration You'll need to make changes to naming etc specific to your test environment.

  1. Mount the volume (called iscsi-pool), and allocate the file that will become the LUN image
  2. # fallocate -l 100G mytest.img
  1. Enter the targetcli shell. The remaining steps all take place within this shell.
  1. Create the backing store connection to the glusterfs file
  2. /backstores/user:glfs create myLUN 100G iscsi-pool@iscsi-3/mytest.img
  1. Create the node's target portal (this is the name the client will connect to). In this example 'iscsi-3' is the node name
  2. /iscsi/ create
    NB. this will create the target IQN and the iscsi portal will be enabled and listening on port 3260
  1. On the client, 'grab' it's iqn from /etc/iscsi/initiatorname.iscsi, then add it to the gateway
  2. /iscsi/ create
  1. Add the LUN, "myLUN", to the target and automatically map it to the client(s) 
  2. /iscsi/ create /backstores/user:glfs/myLUN 0
  1. Issue saveconfig to commit the configuration (config is stored in /etc/target/saveconfig.json)

Configuring LIO - Node 2 

When a LUN is defined by targetcli, a wwn is automatically generated for it. This is neat, but to ensure multipathing works we need the LUN exported by the gateways to share the same wwn - if they don't match, the client will see two devices, not two paths to the same device.

So for subsequent nodes, the steps are slightly different.
  1. On the first node, look at /etc/target/saveconfig.json. You'll see a storage object item for the gluster file you've just created, together with the wwn that was assigned (highlighted).
  2.   "storage_objects": [
          "config": "glfs/iscsi-pool@iscsi-3/mytest.img",
          "name": "myLUN",
          "plugin": "user",
          "size": 107374182400,
          "wwn": "653e4072-8aad-4e9d-900e-4059f0e19e7e"
  1. Open the targetcli shell on node 2, and define a LUN pointing to the same backing file as node 1, but this time explicitly specifying the wwn (from step 1)
  2. /backstores/user:glfs create myLUN 100G iscsi-pool@iscsi-1/mytest.img 653e4072-8aad-4e9d-900e-4059f0e19e7e
    (if you cd to /backstores/user:glfs and use help create you'll see a summary of the options available when creating the LUN)
  1. With the LUN in place, you can follow steps 4-7 above to create the iqn, portal and LUN masking for this node.

At this point you have;
  • 3 gluster nodes
  • a gluster volume with a file defined, serving as an iscsi target
  • 2 gluster nodes defined as iscsi gateways
  • each gateway exports the same LUN to a client (supporting multipathing)

Next up...configuring the client.

Client Configuration

To get the client to connect to your 'exported' LUN(s), you first need to ensure that the following rpms are installed on the client; device-mapper-multipath, iscsi-initiator-utils and preferably sg3_utils. With these packages in place you can move on to configure multipathing and connect to you LUN(s).
  • Multipathing : the example below shows a devices section from /etc/multipath.conf that I used to ensure my exported LUNs are seen as multipath devices. With this in place, you can take a node down for maintenance and your LUN remains accessible (as long as your volume has quorum!)
devices {
    device {
        vendor "LIO-ORG"
        path_grouping_policy "multibus"
# I tested with a path_selector of "round-robin" and "queue-length"
        path_selector "queue-length 0"
        path_checker "directio"
        prio "const"
        rr_weight "uniform"

  • iscsi discovery/login : to login to the gluster iscsi gateway's just use the iscsiadm command (from iscsi-initiator-utils rpm)

# iscsiadm -m discovery -t st -p <your_gluster_node_1> -l
# iscsiadm -m discovery -t st -p <your_gluster_node_2> -l

# #check your paths are working as expected with multipath command
# multipath -ll
mpathd (36001405891b9858f4b0440285cacbcca) dm-2 LIO-ORG ,TCMU device   
size=8.0G features='0' hwhandler='0' wp=rw
`-+- policy='queue-length 0' prio=1 status=active
  |- 33:0:0:1 sdc 8:32 active ready running
  `- 34:0:0:1 sde 8:64 active ready running
mpathb (3600140596a3a65692104740a88516aba) dm-3 LIO-ORG ,TCMU device   
size=8.0G features='0' hwhandler='0' wp=rw
`-+- policy='queue-length 0' prio=1 status=active
  |- 33:0:0:0 sdb 8:16 active ready running
  `- 34:0:0:0 sdd 8:48 active ready running
mpathf (36001405653e40728aad4e9d900e4059f) dm-6 LIO-ORG ,TCMU device   
size=1.0G features='0' hwhandler='0' wp=rw
`-+- policy='queue-length 0' prio=1 status=active
  |- 35:0:0:0 sdf 8:80 active ready running
  `- 33:0:0:2 sdg 8:96 active ready running

You can see in this example, I have three LUN's exported, and each one has two active paths (one to each gluster node). By default, the iscsi node definition in (/var/lib/iscsi/nodes) uses a setting of node.startup=automatic, which means LUN(s) will automagically reappear on the client following a reboot.

But from the client's perspective, how do you know which LUN is from which glusterfs volume/file? For this, sg_inq is your friend...

# sg_inq -i /dev/dm-6
VPD INQUIRY: Device Identification page
  Designation descriptor number 1, descriptor length: 49
    designator_type: T10 vendor identification,  code_set: ASCII
    associated with the addressed logical unit
      vendor id: LIO-ORG
      vendor specific: 653e4072-8aad-4e9d-900e-4059f0e19e7e
  Designation descriptor number 2, descriptor length: 20
    designator_type: NAA,  code_set: Binary
    associated with the addressed logical unit
      NAA 6, IEEE Company_id: 0x1405
      Vendor Specific Identifier: 0x653e40728
      Vendor Specific Identifier Extension: 0xaad4e9d900e4059f
  Designation descriptor number 3, descriptor length: 39
    designator_type: vendor specific [0x0],  code_set: ASCII
    associated with the addressed logical unit
      vendor specific: glfs/iscsi-pool@iscsi-3/mytest.img

The highlighted text shows the configuration string you specified when you created the LUN in targetcli. If you run the same command against the devices themselves (/dev/sdf or /dev/sdg) you'd see the connection string from each of respective gateways. Nice and easy!

And Finally...

Remember, this is all shiny and new - so if you try it, expect some rough edges! However, I have to say that it looks promising, and during my tests I didn't lose any data...but YMMV :)

Happy testing!

Tuesday, 31 March 2015

Using SSL with Glusterfs

Wow - it's been a while since my last post!

Recently, I needed to configure glusterfs with SSL and found that the documention that describes how to do it is actually pretty thin.  What's annoying is that this feature has been around since 2013!

First the caveat - I'm not an expert with SSL, but I arrived at this working process after digging through mail lists and a great article from Zbyszek Żółkiewski

There are 8 steps to follow, so nothing too taxing :)
  1. Create the keys and certificates
  • On each node, perform the following;
  • # cd /etc/ssl
    # openssl genrsa -out glusterfs.key 1024
    # openssl req -new -x509 -days 3650 -key glusterfs.key -subj /CN=<hostname> -out glusterfs.pem
  • This step creates a private key(.key) and associated certificate(.pem) on each node. The common name (CN), I've used is the hostname, so each certificate is unique to each gluster node and/or client. You may opt for a different scheme - but the important thing is the CN chosen here is reflected in step 6.
  1. Combine the pem files to a single file
  • Use scp to copy the .pem file from each node to a single node in the cluster (I'm calling it the primary host for the purpose of this article)
  • # scp glusterfs.pem root@<primary-host>:/etc/ssl/<this-hostname>.pem
    On the primary host concatenate the files
    # cat glusterfs.pem host2.pem host3.pem >
  1. Distribute the common 'ca' file to all nodes
  • On the primary host distribute the common CA containing the certs from all nodes/clients
  • # scp /etc/ssl/ root@<hostX>:/etc/ssl/.
  1. Stop the volume you want to enable SSL on

  2. # gluster vol stop <volume-name>
  1. Restart glusterd

  2. # systemctl restart glusterd
  1. Update the volume to enable SSL

  2. # gluster vol set <volume-name> client.ssl on
    # gluster vol set <volume-name> server.ssl on
    # gluster vol set <volume-name> auth.ssl-allow host-1,host-2,host-3
  • The comma separated list should consist of the CN's used when generating the .pem files on each host, from step '1'.
  1. Start the volume

  2. # gluster vol start <volume-name>
  1. Check SSL is enabled on the I/O Path
  • Although you can use vol info to check the SSL setting is in place, the best way to confirm that SSL is actually being used is to look at one of the log files;
  • # grep SSL /var/log/glusterfs/glustershd.log
    [2015-03-31 06:58:34.674091] I [socket.c:3799:socket_init] 0-vol-client-2: SSL support on the I/O path is ENABLED
    [2015-03-31 06:58:34.679316] I [socket.c:3799:socket_init] 0-vol-client-1: SSL support on the I/O path is ENABLED
    [2015-03-31 06:58:34.680784] I [socket.c:3799:socket_init] 0-vol-client-0: SSL support on the I/O path is ENABLED
That's it - enjoy a more secure glusterfs!

Monday, 30 June 2014

"gluster-deploy" Updated to Support Multiple Volume Configurations

When I started the gluster-deploy project, the main requirement I had for a gluster setup wizard was to support a single volume configuration. However, over time that's changed - in no small part, due to OpenStack!

Once glusterfs provided support for OpenStack, multi-volume setups were needed to support different volumes for glance and cinder, each optimised for their respective workloads.

So this requirement slowly rose to the top of my to-do list, and now version 0.8 is available on the gluster forge. The 0.7 and 0.8 releases have extended the tool to provide:

  • support for thin provisioned gluster bricks based on LVM
  • support for raid alignment of the bricks (through globals set in the deploy.cfg file)
  • more flexibility on the node discovery page (you can now change you mind when defining which peers to use in your config)
  • introduced a volume queue concept to allow multiple volume to be defined in a single run
  • provide the ability to change characteristics of the volumes in the queue
  • added the ability to mount volumes defined for hadoop across all nodes (converged storage + compute use case)
  • provide the ability to set a tuned profile across all nodes during the brick creation phase

Here's a video showing the tool running against a beta version of the new Red Hat Storage 3.0 product, which also provides a quick taste of volume level snapshots that are coming in version 3.6 of glusterfs.

Onwards, towards 1.0!

Sunday, 2 March 2014

What's your Storage Up to?

Proprietary and Software Define Storage platforms are getting increasing complex as more and more features are added. Let's face it, complexity is a fact of life! The thing that really matters is whether this increasing complexity makes the day-to-day life of an admin easier or harder.

Understanding what the state of a platform is at any given time is a critical 'feature' for any platform - be it compute, network or storage. Most of the proprietary storage vendors spend a lot of time, money and resources in trying to get this right, but what about the open source storage. Is that as mature as proprietary offerings?

I'd have to so 'no'. But open source, never stands still - someone always has an itch to scratch which moves things forward.
I took a look at the ceph project and compared it to the current state of glusterfs.

The architecture of Ceph is a lot more complex than gluster. To my mind this has put more emphasis on the ceph team to provide the tools that give insight in to the state of the platform. Looking at the ceph cli, you can see that the ceph guys have thought this through and provide great "at-a-glance" of what's going on within a ceph cluster.

Here's an example from a test "0.67.4" environment I have;

[root@ceph2-admin ceph]# ceph -s
  cluster 5a155678-4158-4c1d-96f4-9b9e9ad2e363
   health HEALTH_OK
   monmap e1: 3 mons at {1=,2=,3=}, election epoch 34, quorum 0,1,2 1,2,3
   osdmap e95: 4 osds: 4 up, 4 in
    pgmap v5356: 1160 pgs: 1160 active+clean; 16 bytes data, 4155 MB used, 44940 MB / 49096 MB avail
   mdsmap e1: 0/0/1 up

This is what I'm talking about -  a single command, and you stand a fighting chance of understanding what's going on - albeit in geek speak! 

By contrast, gluster's architecture is much less complex, so the necessity for easy to consume status information has not been as great. In fact as it stands today with gluster 3.4 and the new 3.5 release, you have to piece together information from several commands to get a consolidated view of what's going on.

Not exactly 'optimal' !

However, the gluster project has the community forge where projects can be submitted to add functionality to a gluster deployment, even if the changes aren't inside the core of gluster.

So I came up with a few ideas about what I'd like to see and started a project on the forge called gstatus. I've based the gstatus tool on the xml functionality that was introduced in gluster 3.4, so earlier versions of gluster are not compatible at this point (sorry, if that's you!)

It's also still early days for the tool, but I find useful now - maybe you will too. As a taster here's an example of the output the tool generates against a test gluster 3.5 environment.

[root@glfs35-1 gstatus]# ./ -s

      Status: HEALTHY           Capacity: 80.00 GiB(raw bricks)
   Glusterfs: 3.5.0beta3                  60.00 GiB(usable)

   Nodes    :  4/ 4        Volumes:  2 Up
   Self Heal:  4/ 4                  0 Up(Degraded)
   Bricks   :  8/ 8                  0 Up(Partial)
                                     0 Down
Status Messages
  - Cluster is healthy, all checks successful

As I mentioned, gluster does provide all the status information - just not in a single command. 'gstatus', grabs the output from several commands and brings the information together into a logical model that represents a cluster. The object model it creates centres on a cluster object as the parent, with child objects of volume(s), bricks and nodes. The objects 'link' together allowing simple checks to be performed and state information to be propagated.

You may have noticed that there are some new volume states I've introduced, which aren't currently part of gluster-core.
Up Degraded : a volume in this state is a replicated volume that has at least one brick down/unavailable within a replica set. Data is still available from other members of the replica set, so the degraded state just makes you aware that performance may be affected, and further brick loss to the same replica set will result in data being inaccessible.
Up Partial : this state is an escalation of the "up degraded" state. In this case all members of the same replica set are unavailable - so data stored within this replica set is no longer accessible. Access to data residing on other replica sets is continues, so I still regard the volume as a whole as up, but flag it as partial.

The script has a number of 'switches' that allow you to get a high level view of the cluster, or drill down into the brick layouts themselves (much the same as the lsgvt project on the forge)

To further illustrate the kind of information the tool can provide, here's a run where things are not quite as happy (I killed a couple of brick daemon's!)

[root@glfs35-1 gstatus]# ./ -s

      Status: UNHEALTHY         Capacity: 80.00 GiB(raw bricks)
   Glusterfs: 3.5.0beta3                  60.00 GiB(usable)

   Nodes    :  4/ 4        Volumes:  0 Up
   Self Heal:  4/ 4                  1 Up(Degraded)
   Bricks   :  6/ 8                  1 Up(Partial)
                                          0 Down
Status Messages
  - Cluster is UNHEALTHY
  - Volume 'dist' is in a PARTIAL state, some data is inaccessible data, due to missing bricks
  - WARNING -> Write requests may fail against volume 'dist'
  - Brick glfs35-3:/gluster/brick1 in volume 'myvol' is down/unavailable
  - Brick glfs35-2:/gluster/brick2 in volume 'dist' is down/unavailable

I've highlighted some of the areas in the output, to make it easier to talk about
  • The summary section shows 6 of 8 bricks are in an up state - so this is the first indication that things are not going well.
  • We also have a volume in partial state, and as the warning indicates this will mean the potential of failed writes, and inaccessible data for that volume
Now to drill down, you can use '-v' and get a more detailed volume level view of the problems (this doesn't have to be two steps, you could use gstatus -a, to get state info and volume info combined)

[root@glfs35-1 gstatus]# ./ -v

      Status: UNHEALTHY         Capacity: 80.00 GiB(raw bricks)
   Glusterfs: 3.5.0beta3                  60.00 GiB(usable)

Volume Information
    myvol            UP(DEGRADED) - 3/4 bricks up - Distributed-Replicate
                     Capacity: (0% used) 79.00 MiB/20.00 GiB (used/total)
                     Self Heal:  4/ 4   Heal backlog:54 files
                     Protocols: glusterfs:on  NFS:off  SMB:off

    dist             UP(PARTIAL)  - 3/4 bricks up - Distribute
                     Capacity: (0% used) 129.00 MiB/40.00 GiB (used/total)
                     Self Heal: N/A   Heal backlog:0 files
                     Protocols: glusterfs:on  NFS:on  SMB:on
  • For both volumes a brick is down, so each volume shows only 3/4 bricks up. The state of the volume however depends upon the volume type. 'myvol' is a volume that uses replication, so it's volume state is just degraded - whereas the 'dist' volume is in a partial state
  • The heal backlog count against the 'myvol' volume, shows 54 files. This indicates that this volume is tracking updates being made to a replica set, where a brick is currently unavailable. Once the brick comes back online, the backlog will be cleared by the self heal daemon (all self heal daemons are present and 'up' as shown in the summary section.
You can drill down further into the volumes too, and look at the brick sizes and relationships using the -l flag.

 [root@glfs35-1 gstatus]# ./ -v myvol -l

      Status: UNHEALTHY         Capacity: 80.00 GiB(raw bricks)
   Glusterfs: 3.5.0beta3                  60.00 GiB(usable)

Volume Information
    myvol            UP(DEGRADED) - 3/4 bricks up - Distributed-Replicate
                     Capacity: (0% used) 77.00 MiB/20.00 GiB (used/total)
                     Self Heal:  4/ 4   Heal backlog:54 files
                     Protocols: glusterfs:on  NFS:off  SMB:off

    myvol----------- +
                Distribute (dht)
                         +-- Repl Set 0 (afr)
                         |     |
                         |     +--glfs35-1:/gluster/brick1(UP) 38.00 MiB/10.00 GiB
                         |     |
                         |     +--glfs35-2:/gluster/brick1(UP) 38.00 MiB/10.00 GiB
                         +-- Repl Set 1 (afr)
                               +--glfs35-3:/gluster/brick1(DOWN) 36.00 MiB/10.00 GiB
                               +--glfs35-4:/gluster/brick1(UP) 39.00 MiB/10.00 GiB S/H Backlog 54

You can see in the example above, that the layout display also shows which bricks are recording the self-heal backlog information.

How you run the tool depends on the question you want to ask, but the program options provide a way to show state, state + volumes, brick layouts and capacities etc. Here's the 'help' information;

[root@glfs35-1 gstatus]# ./ -h
Usage: [options]

  --version             show program's version number and exit
  -h, --help            show this help message and exit
  -s, --state           show highlevel health of the cluster
  -v, --volume          volume info (default is ALL, or supply a volume name)
  -a, --all             show all cluster information
  -u UNITS, --units=UNITS
                        display capacity units in DECimal or BINary format (GB
                        vs GiB)
  -l, --layout          show brick layout when used with -v, or -a

Lots of ways to get into trouble :)

Now the bad news...

As I mentioned earlier, the tool relies upon the xml output from gluster, which is a relatively new thing. The downside is that I've uncovered a couple of bugs that result in malformed xml coming from the gluster commands. I'll raise a bug for this, but until this gets fixed in gluster I've added some workarounds in the code to mitigate. It's not perfect, so if you do try the tool and see errors like this -

 Traceback (most recent call last):
  File "./", line 153, in <module>
  File "./", line 42, in main
  File "/root/bin/gstatus/classes/", line 398, in updateState
  File "/root/bin/gstatus/classes/", line 601, in update
    brick_path = node_info['hostname'] + ":" + node_info['path']
KeyError: 'hostname''re hitting the issue.

If you're running gluster 3.4 (or above!) and you'd like to give gstatus a go -  head on over to the forge and give it a go. Installing is as simple as grabbing the tar archive from the forge, extracting it and running it..that's it.

Update 4th March. Thanks to Niels at Red Hat, the code has been restructured to make it easier for packing and installation.  Now to install, you need to download the tar file, extract it and run 'python install' (you'll need to ensure your system has python-setuptools installed first!)

Anyway, it helps me - maybe it'll help you too!