Sunday, July 5, 2015

Cassandra DataStax Community AMI on Amazon EC2 - Explained

Hi,
After handling our Cassandra Cluster in past time, I'll share some information about the locations of configuration and about the basic management of the cluster.
I'm mainly doing this because it took me a while until i found all of the components, and i think that it will save others some substantial amount of time in the future.



There are several components discussed that come with the DataStax Community AMI on AWS:
  1. OpsCenter - A Monitoring tool to show the metrics of your Cassandra clusters, you can manage multiple clusters with a single OpsCenter installation, and you can define if you want the OpsCenter installed when you launch the Instance of the AMI.
    It gives you a convenient way to show the internals of the performance of your Cassandra cluster.
    Great tool by DataStax by the way, given free of course with the basic actions.
    Let’s say we’ve created a 3 node cluster: cassandra-1, cassandra-2, cassandra-3.
    On cassandra-1 we have the OpsCenter installed.
    Ops Center Link will be:http://cassandra-1:8888/opscenter/index.html
  2. Cassandra - The actual process of the cassandra node.
  3. DataStax Agent - Service running on the node and reporting the metrics of Cassandra to the OpsCenter instance.

You can restart the cluster (It will do it gracefully - node by node) in the right upper corner, via "Restart".
(Note: Do not change the cassandra node configuration from the OpsCenter, it will override the changes and change the cassandra.yaml into a bad format)
This is where not to change (smile)
When pressing a specific node, you will be able to see the "Actions" menu -> "Configure".
You can view the configuration, but don't save any changes, for your own good!
There are 3 relevant linux services that you should know that run in your Instance:
1) “cassandra” - Cassandra node's process (Currently on latest version 2.1.7 - The latest update)
2) “datastax-agent” - Responsible on collection of metrics and reporting to the OpsCenter. Makes the administration actions possible. (Currently on latest version 5.1.3)
3) “opscenterd” - Running only on “cassandra-1” in our case - Relevant to the Web UI running the OpsCenter. (Currently on latest version 5.1.3 - The latest update)
Administration options on each node:
(Service names (star): cassandra, datastax-agent, opscenterd [relevant only to cassandra-1] )
Stop: sudo service (star) stop
Start: sudo service (star) start
Status: sudo service (star) status
Restart: sudo service (star) restart

OpsCenter View: (Popup menu on the right side)

Nodes: Status of the cassandra nodes, and access to each node separately.
Activities: On going actions in the cluster about the nodes.
Data: Keyspaces and their column families.
This configuration refers to each node (location of the relevant files):
Data Directories: /var/lib/cassandra
Log Directories:
Cassandra: /var/log/cassandra (File: system.log)
DataStax Agent: /var/log/datastax-agent (File: agent.log)
Runtime Files: /var/run/cassandra
Cassandra Jars:
/usr/share/cassandra
/usr/share/cassandra/lib
Bin Files:
/usr/bin
/usr/sbin

Configuration Files:

/etc/cassandra - (The important file is: cassandra.yaml)

Important entries in the cassandra.yaml:

"cluster_name: 'your-cluster-name' " - Defines the clusters association - Should be the same on all of the nodes of the same cluster - this is a logical name.
seed_provider:
   - class_name: org.apache.cassandra.locator.SimpleSeedProvider
     parameters:
         - seeds: "cassandra-1,cassandra-2,cassandra-3"
The seeds should be defined as all of the wanted access points that we want to synchronize the cluster's ring.
"listen_address: cassandra-1" - should be defined on each node to it's private ip. (in the case we are talking about the first node)
"broadcast_rpc_address: cassandra-1" - should be defined on each node to it's private ip.
"endpoint_snitch: Ec2Snitch" - Defines that the snitch is an amazon snitch that reveals the network topology.

/etc/datastax-agent/datastax-agent-env.sh - DataStax agent configuration
Service startup script - /etc/init.d/cassandra
Cassandra user limits - /etc/security/limits.d
Cassandra defaults - /etc/default/cassandra
"nodetool" - Cassandra's control and information tool: (Exist on each node)
"nodetool help" - displays all of the possible commands.

Upgrade versions (minor) -
(In case we are running on an ubuntu machine)
After you’ve launched an instance of the AMI, you might need to upgrade the version of either of the services,
If you would like to update, you should run:
SERVICE => will the wanted service name.
1) "sudo apt-get update" - updating repository listing of the versions.
2) "sudo apt-cache policy $SERVICE" - To see all of the possibilities and the currently installed version.
3) "sudo apt-get upgrade $SERVICE" -  To upgrade the specific service, and all of it's dependencies.
(Note: If you upgrade the opscenterd, it's dependency is datastax-agent, and it's dependency is cassandra - so all of them will be upgraded together)

Another important thing - Configuration changes

When you upgrade the services, after all is installed, the installation will try to merge the configuration files to override currently running setting,
be careful, and don't accept all of the changes.
Just go over them with the "D" option that means "Show" ("N" - is the default and it's "decline"), and note the things the installation wants to change, and then apply them if needed,
Check the cassandra change log of the relevant version you are installing to know what's needed.
If not, all of you server configurations will be overridden!

This gave a short overview of the locations and some basic Cassandra handling in the AWS environment.
If you have any further questions, feel free to comment, and i hope i will be able to assist.
Have fun with the info :)