Making Apache Cassandra on IBM Cloud Kubernetes Production Ready— Part I

The following is a continuation of a previous article. The first part can be found here. This article is fairly technical, so if you’re interested in just the end result, skip to the bottom.
Apache Cassandra is a fast and reliable document storage database and really satisfies the need for globally distributed data with horizontal scalability. In the past, one had to provision bare metal or virtual servers and (hopefully) leverage some automation like Chef or SaltStack to make scaling out easier. Technologies such as containers and Kubernetes bring an interesting opportunity to the table when it comes to database applications like Cassandra. In theory, Cassandra could be deployed to Kubernetes in such a way that scaling out could be as easy as clicking a button; not to mention you can deploy it right along side your other containerized applications on the same infrastructure. In practice, this can be quite a challenge.
There are several examples published online that describe how to get Cassandra up and running in Kubernetes. However, very few actually detail how to take it beyond just getting a small cluster up. There are networking challenges as well as operational challenges (i.e. monitoring, backup, etc.) that arise when globally scaling a database application such as Apache Cassandra in Kubernetes in Production. The requirements we’re trying to solve for are:

  • Needs to be accessible inside and outside of the Kubernetes cluster. We have systems and applications that need to access the database, some of which are not running in Kubernetes.
  • Needs to support running in multiple Kubernetes clusters in different data centers and/or regions. We don’t have the option (yet) in IBM Cloud Kubernetes to extend a single Kubernetes cluster across regions. This also comes into the picture if you’re thinking of a hybrid cloud approach.
  • Needs automated maintenance and backup procedures to run.
  • Needs to support scaling out.

We’ll start with a simple 3 node setup and then introduce the above requirements and challenges until we get to the final solution.

A Simple Setup

Getting a simple setup running is pretty easy and straightforward. I’ve described the details here. What this gives you is a 3 node cluster that is only accessible within the Kubernetes cluster it was created in and doesn’t meet any of the goals mentioned above. It does however, give you a repeatable solution to deploying new clusters.

Access From the Outside

Kubernetes already provides a solution for applications that require clients outside the cluster to communicate with containers running inside the cluster. Various networking options exist, but the most widely used is a Kubernetes service of type LoadBalancer. This tells Kubernetes to leverage a 3rd party, cloud provider plugin to associate an external IP to a set of internal pods, and load balance amongst them. You can read more about LoadBalancer type services here. Cassandra isn’t like most applications though, each node in the cluster can, and should be communicated with by the client. It’s quite possible to use a simple load balanced service in Kubernetes, which selects a single node for a client to communicate with. However, this isn’t optimal because some client drivers will leverage the cluster topology and token rings to communicate with the individual nodes directly. So the question becomes, how do we associate an external IP to each and every Cassandra node in the cluster?
I’m sorry to disappoint, but the answer isn’t ground breaking or anything. You just create a separate LoadBalancer service for each node, where that node is the only node in the service.



I recognize that repeating the same configuration for each node isn’t ideal from a maintenance perspective. However, if you are using something like Helm (and you should!), this is not a problem. The ports exposed are the CQL port (for native clients) and the inter-node communication port. There are a few things of interest here though.

  • In order to target a single pod in our StatefulSet, we have to tell the service the pod name to target. This is done with the statefulset.kubernetes.io/pod-name: cassandra-{x} selector.
  • The publishNotReadyAddresses: true and service.alpha.kubernetes.io/tolerate-unready-endpoints: “true” are both the same setting, with the former being the latest way to specify the setting, and the latter being the deprecated way. I’ll talk more about why this is needed later when we scale to multiple clusters, but this essentially says to ignore the readiness of the pod, and just make the pod available in the load balancer no matter what.
  • In IBM Cloud, you can define load balancer IPs as either public or private. In our case, we’re keeping them private so the cluster is only available from within our network boundary, and not to the outside world. Each cloud provider will have its own way to specify this, but in IBM Cloud, it’s done with service.kubernetes.io/ibm-load-balancer-cloud-provider-ip-type: private .

Just making the Cassandra nodes available outside the cluster isn’t enough here. Cassandra nodes have to be aware that they have an external IP address that they need to advertise (vs the internal Pod network IP), this is known as the broadcast address. So the problem is how do we pass the broadcast address information to each node so that it knows what broadcast address to set? For this, we had to modify the original Docker image bu editing thedocker-entrypoint.sh file to read a passed in environment variable CASSANDRA_BROADCAST_ADDRESS_LIST. The script uses the index in the hostname to obtain an IP in the list. For example, if the current node’s hostname is cassandra-2, and CASSANDRA_BROADCAST_ADDRESS_LIST is

10.0.0.10
10.0.0.11
10.0.0.12
10.0.0.13
10.0.0.14
then that node would select 10.0.0.12 . StatefulSets will assign a zero-based index to each pod, and the name of the pod and index become the pod’s hostname. Below is the updated docker-entrypoint.sh script and Dockerfile.



Lines 42–54 in the docker-entrypoint.sh file above, are the additions to the original script. Unfortunately you’ll have to build a new Docker image with the updated script, I’ve provided that above as well.

Deploying Across Kubernetes Clusters

Cassandra inter-node communication requires that all nodes across all data centers, to be able to talk freely with each other. This isn’t a big deal if your nodes are all on the same Kubernetes Pod Network, but can be a real head scratcher if you have different Kubernetes deployments that don’t know about each other. There are some interesting things that could be explored here, such as Federation, or even adding routes on the Kubernetes worker nodes to route traffic to nodes in other clusters. However, the federation approach is still considered in alpha, and the routing approach requires manually configuring the pod network ranges on each cluster such that there isn’t any overlapping IP space. This level of configuration may or may not be available in your cloud provider, and certainly isn’t in IBM Cloud Kubernetes Service.

The solution for this is the configuration described in the previous section; exposing each node to the world outside the Kubernetes cluster. If each node is available outside the cluster, and the external IP space is routable between the Kubernetes cluster, then each node can talk to every other node. In the case of IBM Cloud, you can turn on a setting known as VLAN Spanning that allows networks from different data centers to route between each other. If your cloud provider doesn’t support this, or you’re taking a Hybrid Cloud model, you can expose the nodes on the public address space, and use TLS and IP white listing as security controls.

Considerations for Rolling Updates and Scaling Out

When using a StatefulSet and scaling out, Kubernetes will bring up pods in a predetermined order, and when scaling in, it will shut them down in reverse order. Kubernetes will also update each pod in reverse order when performing a rolling update. By default, Kubernetes will assume that once the container has started, it can proceed to the next pod in the StatefulSet. This doesn’t work well for Cassandra because we don’t want to go on to the next pod until Cassandra says the current pod is ready and has successfully bootstrapped. Kubernetes provides a mechanism for this situation called Readiness Probes. To take advantage of this feature, we will include a script in the container that obtains the current node’s ID, and then uses that ID to get the status of the node and verify that it’s UP and NORMAL (UN). The script returns 0 if things are good, -1 if things are bad. Kubernetes will now wait until the node is UN before proceeding to the next node.

Now that Kubernetes can understand when the Cassandra pod is ready, we have one other thing to consider. Kubernetes Services by default use the “readiness” status of a pod to determine if they should route traffic to that pod. As mentioned earlier, we’re using a Kubernetes service for each Cassandra pod to allow it to communicate with nodes outside the cluster. If this service is the primary mechanism by which nodes communicate, and nodes require this communication in order to become ready, how can they become ready if they can’t communicate? It’s a chicken and egg problem. Nodes can’t communicate with each other until they’re considered ready, and they can’t become ready until they communicate with each other.

We break this cycle by including a few parameters in the service, the publishNotReadyAddresses: true and service.alpha.kubernetes.io/tolerate-unready-endpoints: “true” parameters. This tells Kubernetes to go ahead and route traffic to the pods even though the pods aren’t ready. This allows the Cassandra nodes to bootstrap before being considered “ready”.

Putting it All Together

I’ve put everything together into a Helm Chart that can be used to demonstrate the ideas presented here. The updated Docker image can be found here, where the files used to create the image are in the github repository referenced below.

No comments:

Post a Comment