Nov 122013
 

Hi all,

Not often I have a whinge about something, but this problem has been bugging me of late more than somewhat.  I’m in the process of setting up an OpenStack cluster at work.  Now, as the underlying OS we’ve chosen Ubuntu Linux which is fine.  Ubuntu is a quite stable, reliable and well supported platform.

One of my pet peeves though, is when some package manager decides to get lazy.  Now, those of us who have been around the Linux scene have probably discovered RPM dependency hell… and the smug Debian users who tell us that Debian doesn’t do this.

Ho ho, errm… no, when APT wants to go into dummy mode, it does so with style:

Nov 12 05:32:27 in-target: Setting up python3-update-manager (1:0.186.2) ...
Nov 12 05:32:27 in-target: Setting up python3-distupgrade (1:0.192.13) ...
Nov 12 05:32:27 in-target: Setting up ubuntu-release-upgrader-core 
(1:0.192.13) ...
Nov 12 05:32:27 in-target: Setting up update-manager-core (1:0.186.2) ...
Nov 12 05:32:27 in-target: Processing triggers for libc-bin ...
Nov 12 05:32:27 in-target: ldconfig deferred processing now taking place
Nov 12 05:32:27 in-target: Processing triggers for initramfs-tools ...
Nov 12 05:32:27 in-target: Processing triggers for ca-certificates ...
Nov 12 05:32:27 in-target: Updating certificates in /etc/ssl/certs... 
Nov 12 05:32:29 in-target: 158 added, 0 removed; done.
Nov 12 05:32:29 in-target: Running hooks in /etc/ca-certificates/update.d....
Nov 12 05:32:29 in-target: done.
Nov 12 05:32:29 in-target: Processing triggers for sgml-base ...
Nov 12 05:32:29 pkgsel: installing additional packages
Nov 12 05:32:29 in-target: Reading package lists...
Nov 12 05:32:29 in-target: 
Nov 12 05:32:29 in-target: Building dependency tree...
Nov 12 05:32:30 in-target: 
Nov 12 05:32:30 in-target: Reading state information...
Nov 12 05:32:30 in-target: 
Nov 12 05:32:30 in-target: openssh-server is already the newest version.
Nov 12 05:32:30 in-target: Some packages could not be installed. This may 
mean that you have
Nov 12 05:32:30 in-target: requested an impossible situation or if you are 
using the unstable
Nov 12 05:32:30 in-target: distribution that some required packages have not 
yet been created
Nov 12 05:32:30 in-target: or been moved out of Incoming.
Nov 12 05:32:30 in-target: The following information may help to resolve the 
situation:
Nov 12 05:32:30 in-target: 
Nov 12 05:32:30 in-target: The following packages have unmet dependencies:
Nov 12 05:32:30 in-target:  mariadb-galera-server : Depends: 
mariadb-galera-server-5.5 (= 5.5.33a+maria-1~raring) but it is not going to 
be installed

Mmmm, great, not going to be installed. May I ask why not? No, I’ll just drop to a shell and do it myself then.

Nov 12 05:32:30 in-target: E: Unable to correct problems, you have held 
broken packages.

Now this is probably one of my most hated things about computing, is when a software package accuses YOU of doing something that you haven’t. Excuse me… I have held broken packages? I simply performed a fresh install then told you to do an install!

So let’s have a closer look.

Nov 12 05:32:30 main-menu[20801]: WARNING **: Configuring 'pkgsel' failed 
with error code 100
Nov 12 05:32:30 main-menu[20801]: WARNING **: Menu item 'pkgsel' failed.
Nov 12 05:37:38 main-menu[20801]: INFO: Modifying debconf priority limit from 
'high' to 'medium'
Nov 12 05:37:38 debconf: Setting debconf/priority to medium
Nov 12 05:37:38 main-menu[20801]: DEBUG: resolver (ext2-modules): package 
doesn't exist (ignored)
Nov 12 05:37:40 main-menu[20801]: INFO: Menu item 'di-utils-shell' selected
~ # chroot /target
chroot: can't execute '/bin/network-console': No such file or directory
~ # chroot /target bin/bash

We give it a shot ourselves to see the error more clearly.

root@test-mgmt0:/# apt-get install mariadb-galera-server
Reading package lists... Done
Building dependency tree       
Reading state information... Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
 mariadb-galera-server : Depends: mariadb-galera-server-5.5 (= 
5.5.33a+maria-1~raring) but it is not going to be installed
E: Unable to correct problems, you have held broken packages.

Fine, so we’ll try installing that instead then.

root@test-mgmt0:/# apt-get install mariadb-galera-server-5.5
Reading package lists... Done
Building dependency tree       
Reading state information... Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
 mariadb-galera-server-5.5 : Depends: mariadb-client-5.5 (>= 
5.5.33a+maria-1~raring) but it is not going to be installed
                             Depends: libmariadbclient18 (>= 
5.5.33a+maria-1~raring) but it is not going to be installed
                             PreDepends: mariadb-common but it is not going 
to be installed
E: Unable to correct problems, you have held broken packages.

Okay, closer, so we need to install those too. But hang on, isn’t that apt‘s responsibility to know this stuff? (which it clearly does).

Also note we don’t get told why it isn’t going to be installed. It refuses to install the packages, “just because”. No reason given.

We try adding in the deps to our list.

root@test-mgmt0:/# apt-get install mariadb-galera-server-5.5 
mariadb-client-5.5
Reading package lists... Done
Building dependency tree       
Reading state information... Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
 mariadb-client-5.5 : Depends: libdbd-mysql-perl (>= 1.2202) but it is not 
going to be installed
                      Depends: mariadb-common but it is not going to be 
installed
                      Depends: libmariadbclient18 (>= 5.5.33a+maria-1~raring) 
but it is not going to be installed
                      Depends: mariadb-client-core-5.5 (>= 
5.5.33a+maria-1~raring) but it is not going to be installed
 mariadb-galera-server-5.5 : Depends: libmariadbclient18 (>= 
5.5.33a+maria-1~raring) but it is not going to be installed
                             PreDepends: mariadb-common but it is not going 
to be installed
E: Unable to correct problems, you have held broken packages.

Okay, some more deps, we’ll add those…

root@test-mgmt0:/# apt-get install mariadb-galera-server-5.5 
mariadb-client-5.5 libmariadbclient18
Reading package lists... Done
Building dependency tree       
Reading state information... Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
 libmariadbclient18 : Depends: mariadb-common but it is not going to be 
installed
                      Depends: libmysqlclient18 (= 5.5.33a+maria-1~raring) 
but it is not going to be installed
 mariadb-client-5.5 : Depends: libdbd-mysql-perl (>= 1.2202) but it is not 
going to be installed
                      Depends: mariadb-common but it is not going to be 
installed
                      Depends: mariadb-client-core-5.5 (>= 
5.5.33a+maria-1~raring) but it is not going to be installed
 mariadb-galera-server-5.5 : PreDepends: mariadb-common but it is not going 
to be installed
E: Unable to correct problems, you have held broken packages.

Wash-rinse-repeat!

root@test-mgmt0:/# apt-get install mariadb-galera-server-5.5 
mariadb-client-5.5 libmariadbclient18 mariadb-common
Reading package lists... Done
Building dependency tree       
Reading state information... Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
 libmariadbclient18 : Depends: libmysqlclient18 (= 5.5.33a+maria-1~raring) 
but it is not going to be installed
 mariadb-client-5.5 : Depends: libdbd-mysql-perl (>= 1.2202) but it is not 
going to be installed
 mariadb-common : Depends: mysql-common but it is not going to be installed
E: Unable to correct problems, you have held broken packages.
root@test-mgmt0:/# apt-get install mariadb-galera-server-5.5 
mariadb-client-5.5 libmariadbclient18 mariadb-common libdbd-mysql-perl 
mysql-common
Reading package lists... Done
Building dependency tree       
Reading state information... Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
 libmariadbclient18 : Depends: libmysqlclient18 (= 5.5.33a+maria-1~raring) 
but 5.5.34-0ubuntu0.13.04.1 is to be installed
 mariadb-client-5.5 : Depends: mariadb-client-core-5.5 (>= 
5.5.33a+maria-1~raring) but it is not going to be installed
 mysql-common : Breaks: mysql-client-5.1
                Breaks: mysql-server-core-5.1
E: Unable to correct problems, you have held broken packages.

Aha, so there’s a newer version in the Ubuntu repository that’s overriding ours. Brilliant. Ohh, and there’s a mysql-client binary too, but it won’t tell me what version it’s trying for.

Looking in the repository myself I spot a package named mysql-common_5.5.33a+maria-1~raring_all.deb. That is likely our culprit, so I try version 5.5.33a+maria-1~raring.

root@test-mgmt0:/# apt-get install mariadb-galera-server-5.5 
mariadb-client-5.5 libmariadbclient18 mariadb-common libdbd-mysql-perl 
mysql-common=5.5.33a+maria-1~raring libmysqlclient18=5.5.33a+maria-1~raring 
mariadb-client-core-5.5
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following extra packages will be installed:
  galera libaio1 libdbi-perl libhtml-template-perl libnet-daemon-perl 
libplrpc-perl

Bingo!

So, for those wanting to pre-seed MariaDB Cluster 5.5, I used the following in my preseed file:

# MariaDB 5.5 repository list - created 2013-11-12 05:20 UTC
# http://mariadb.org/mariadb/repositories/
d-i apt-setup/local3/repository string \
        deb http://mirror.aarnet.edu.au/pub/MariaDB/repo/5.5/ubuntu raring main
d-i apt-setup/local3/comment string \
        "MariaDB repository"
d-i pkgsel/include string mariadb-galera-server-5.5 \
        mariadb-client-5.5 libmariadbclient18 mariadb-common \
        libdbd-mysql-perl mysql-common=5.5.33a+maria-1~raring \
        libmysqlclient18=5.5.33a+maria-1~raring mariadb-client-core-5.5 \
        galera

# For unattended installation, we set the password here
mysql-server mysql-server/root_password select DatabaseRootPassword
mysql-server mysql-server/root_password_again select DatabaseRootPassword

So yeah, next time someone mentions this:

Gentoo: Increasing blood pressure since 1999.

it doesn’t just apply to Gentoo!

May 122013
 

I’ve been working with VRT Systems for a few years now. Originally brought in as a software engineer, my role shifted to include network administration duties.

This of course does not phase me, I’ve done network administration work before for charities. There are some small differences, for example, back then it was a single do-everything box running Gentoo hosting a Samba-based NT domain for about 5 Windows XP workstations, now it’s about 20 Windows 7 workstations, a Samba-based NT domain backed by LDAP, and a number of servers.

Part of this has been to move our aging infrastructure to a more modern “private cloud” infrastructure. In the following series, I plan to detail my notes on what I’ve learned through this process, so that others may benefit from my insight.  At this stage, I don’t have all the answers, and there are some things  I may have wrong below.

Planning

The first stage with any such network development (this goes for “cloud”-like and traditional structures) is to consider how we want the network to operate, how it is going to be managed, and what skills we need.

Both my manager and I are Unix-oriented people, in my case I’ll be honest — I have a definite bias towards open source, and I’ll try to assess a solution on technical merit rather than via glossy brochures.

After looking at some commercial solutions, my manager more or less came to the conclusion that a lot of these highly expensive servers are not so magical, they are fundamentally just standard desktops in a small form factor. While we could buy a whole heap of 1U high rack servers, we might be better served by using more standard hardware.

The plan is to build a cluster of standard boxes, in as small form factor as practical, which would be managed at a higher level for load balancing and redundancy.

Hardware: first attempt

One key factor we wanted to reduce was the power consumption. Our existing rack of hardware chews about 1.5kW of power. Since we want to run a lot of virtual machines, we want to make them as efficient as possible. We wanted a small building block that would handle a small handful of VMs, and storing data across multiple nodes for redundancy.

After some research, we wound up with our first attempt at a compute node:

Motherboard: Intel DQ77KB Mini ITX
CPU: Intel Core i3-3220T 2.8GHz Dual-Core
RAM: 8GB SODIMM
Storage: Intel 520S 240GB SSD
Networking: Onboard dual gigabit for cluster, PCIe Realtek RTL8168 adaptor for client-facing network

The plan, is that we’d have many of these, they would pool their storage in a redundant fashion.  The two on-board NICs would be bonded together using LACP and would form a back-end storage network for the nodes to share data.  The one PCIe card would be the “public” face of the cluster and would connect it to the outside world using VLANs.

For the OS, we threw on Ubuntu 12.04 LTS AMD64, and we ran the KVM hypervisor. We then tried throwing this on one of our power meters to see how much power the thing drew. At first my manager asked if the thing was even turned on … it was idling at 10W.

Loaded it on with a few virtual machines, eventually I had 6 VMs going on the thing, ranging from Linux, Windows 2000, Windows XP and a Windows 2008R2 P2V image for one of our customer projects.

The CPU load sat at about 6.0, and the power consumption did not budge above 30W. Our existing boxes drew 300W, so theoretically we could run 10 of these for just one of our old servers.

Management software

Running QEMU VMs from bash scripts is all very well, but in this case we need to be able to give non-technical users use of a subset of the cluster for projects.  I hardly expect them to write bash scripts to fire up KVM over SSH.

We considered a few options: Ganeti, OpenNebula and OpenStack.

Ganeti looked good but the lack of a template system and media library let it down for us, and OpenNebula proved a bit fiddly as well.  OpenStack is a big behemoth and will take quite a bit of research however.

Storage

One factor that stood out like a sore thumb: our initial infrastructure was going to just have all compute nodes, with shared storage between them.  There were a couple of options for doing this such as having the nodes in pairs with DR:BD, using Ceph or Sheepdog, etc… but by far, the most common approach was to have a storage backend on a SAN.

SANs get very expensive very quickly.  Nice hardware, but overkill and over budget.  We figured we should plan for that eventuality, should the need arise, but it’d be a later addition.  We don’t need blistering speed, if we can sustain 160Mbps throughput, that’d probably be fine for most things.

Reading the literature, Ceph looked by far and above the best choice, but it had a catch — you can’t run Ceph server daemons, and Ceph in-kernel clients, on the same host.  Doing so you run the risk of a deadlock, in much the same manner as NFS does when you mount from localhost.

OpenStack actually has 3 types of storage:

  • Ephemeral storage
  • Block storage
  • Image storage

Ephemeral storage is specific to a given virtual machine.  It often lives on the compute node with the VM, or on a back-end storage system, and stores data temporarily for the life of a virtual machine instance.  When a VM instance is created, new copies of ephemeral block devices are created from images stored in image storage.  Once the virtual machine is terminated, these ephemeral block devices are deleted.

Block storage is the persistent storage for a given VM.  Say you were running a mail server … your OS and configuration might exist on a ephemeral device, but your mail would sit on a block device.

Image storage are simply raw images of block devices.  Image storage cannot be mounted as a block device directly, but rather, the storage area is used as a repository which is read from when creating the other two types of storage.

Ephemeral storage in OpenStack is managed by the compute node itself, often using LVM on a local block device.  There is no redundancy as it’s considered to be temporary data only.

For block storage, OpenStack provides a service called cinder.  This, at its heart, seems to use LVM as well, and exports the block devices over iSCSI.

For image storage, OpenStack has a redundant storage system called swift.  The basis for this seems to be rsync, with a service called swift-proxy providing a REST-interface over http.  swift-proxy is very network intensive, and benefits from hardware such as high-speed networking (e.g. 10Gbps Ethernet).

Hardware: second attempt

Having researched how storage works in OpenStack somewhat, it became clear that one single building block would not do.  There would in fact be two other types of node: storage nodes, and management nodes.

The storage nodes would contain largish spinning disks, with software maintaining copies and load balancing between all nodes.

The management nodes would contain the high-speed networking, and would provide services such as Ceph monitors (if we use Ceph), swift-proxy and other core functions.  RabbitMQ and the core database would run here for example.

Without the need for big storage, the compute nodes could be downsized in disk, and expanded in RAM.  So we now had a network that looked like this:

Node Type Compute Management Storage
Motherboard: Intel DQ77KB Mini ITX Intel DQ77MH Micro ATX
CPU: Intel Core i3-3220T 2.8GHz Dual-Core
RAM: 2*8GB SODIMM 2*4GB DIMM
Storage: Intel 520S 60GB SSD Intel 520S 60GB SSD for OS, 2*Seagate ST3000VX000-1CU1 3TB HDDs for data
Networking: Onboard dual gigabit for cluster, PCIe Realtek RTL8168 adaptor for client-facing network Onboard dual gigabit for management, PCIe 10GbE for cluster communications Onboard dual gigabit for cluster, PCIe Realtek RTL8168 adaptor for management

The management and storage nodes are slightly tweaked versions of what we use for compute nodes. The motherboard is basically the same chipset, but capable of taking larger PCIe cards and using a standard ATX power supply.

Since we’re not storing much on the compute nodes, we’ve gone for 60GB SSDs rather than 240GB SSDs to cut the cost down a little. We might have to look at 120GB SSDs in newer nodes, or maybe look at other options, as Intel seem to have discontinued the 60GB 520S … bless them! The Intel 520S SSDs were chosen due to the 5-year warranty offered.

The management and storage nodes, rather than going into small Mini-ITX media-centre style cases, are put in larger 2U rackmount cases. These cases have room for 4 HDDs, in theory.

Deployment

For testing purposes, we got two of each node. This allows us to try out things like testing what would happen if a node went belly up by yanking its power, and to test load balancing when things are working properly.

We haven’t bought the 10GbE cards at this stage, as we’re not sure exactly which ones to get (we have a Cisco SG500X switch to plug them into) and they’re expensive.

The final cluster will have at least 3 storage nodes, 3 management nodes and maybe as many as 16 compute nodes. I say at least 3 storage nodes — in buying the test hardware, I accidentally ordered 7 cases, and so we might decide to build an extra storage node.

Each of those gives us 6TB of storage, and the production plan is to load balance with a replica on at least 3 nodes… so we can survive any two going belly up. The disks also push close to 800Mbps throughput, so with 3 nodes serving up data, that should be enough to saturate the dual-gigabit link on the compute node. 4 nodes would give us 8TB of effective storage.

With so many nodes though, one problem remains, deploying the configuration and managing it all. We’re using Ubuntu as our base platform, and so it makes sense to tap into their technologies for deployment.

We’ll be looking to use Ubuntu Cloud and Juju to manage the deployment.

Ubuntu Cloud itself is a packaged version of OpenStack.  The components of OpenStack are deployed with Juju.  Juju itself can deploy services either to “public clouds” like Amazon AWS, or to one’s own private cluster using Ubuntu MAAS (Metal As A Service).

Metal As a Service itself basically is a deployment system which installs and configures Ubuntu on network-booting clients for automatic installation and configuration.

The underlying technology is based on a few components: dnsmasq DHCP/DNS server, tftp-hpa TFTP server, and the configuration gets served up to the installer via a web service API.  There’s a web interface for managing it all.  Once installed, you then deploy services using Juju (the word juju apparently translates to “magic”).

Further research

So having researched what hardware will likely be needed, I need to research a few things.

Firstly, the storage mechanism, we can either go with the pure OpenStack approach with cinder managing LVM based storage and exporting over iSCSI, or we get cinder to manage a Ceph back-end storage cluster.  This decision has not yet been made.  My two biggest concerns with cinder are:

  • Does cinder manage multiple replicas of block storage?
  • Does cinder try to load-balance between replicas?

With image storage, if we use Ceph, we have two choices.  We can either:

  • Install Swift on the storage nodes, partition the drives and use some of the storage for Swift, and the rest for Ceph… with Swift-proxy on the management nodes.
  • Install Rados Gateway on the management nodes in place of Swift

But which is the better approach?  My understanding is that Ceph doesn’t fully integrate into the OpenStack identity service (called keystone).  I need to find out if this matters much, or whether splitting storage between Swift and Ceph might be better.

Metal As a Service seems great in concept.  I’ve been researching OpenStack and Ceph for a few months now (with numerous interruptions), and I’m starting to get a picture as to how it all fits together.  Now the next step is to understand MAAS and Juju.  I don’t mind magic in entertainment, but I do not like it in my systems.  So my first step will be to get to understand MAAS and Juju on a low level.

Crucially, I want to figure out how one customises the image provided by MAAS… in particular, making sure it deploys to the 60GB SSD on each node, and not just the first block device it sees.

The storage nodes have their two 6Gbps SATA ports connected to the 3TB HDDs for performance, making them visible as /dev/sda and /dev/sdb — MAAS needs to understand that the disk it needs to deploy to is called /dev/sdc in this case.  I’d also perfer it to use XFS rather than EXT4, and a user called something other than “ubuntu”.  These are things I’d like to work out how to configure.

As for Juju, I need to work out exactly what it does when it “bootstraps” itself.  When I tried it last, it randomly picked a compute node.  I’d be happier if it deployed itself to the management node I ran it from.  I also need to figure out how it picks out nodes and deploys the application.  My quick testing with it had me asking it to deploy all the OpenStack components, only to have it sit there doing nothing… so clearly I missed something in the docs.  How is it supposed to work?  I’ll need to find out.  It certainly isn’t this simple.