Unix & Storage blog: December 2012

Sunday, December 30, 2012

IPMP on solaris 11

With the introduction of Solaris 11 the network configuration are managed by network configuration policies (NCPs), there are two policies which can be implemented Fixed which are statically implemented and reactive which are dynamically implemented. The network configuration commands that are used are ipadm and dladm which offers lot of new feature like Link aggregation, vlan tagging, ip tunneling, bridging and IPMP, also the setps involved in configuring them are less and which makes it less burden on system administrators.
Just wanted to touch base on configuring few of these feature.

******
IPMP
******

IPMP (IP multipathing) is a grouping of the network interface into single logical interface. The IPMP can be configured in two types of failure detection Link based (layer 2) & probe based (layer 3)
The IPMP feature enable us to achieve the distribution of data address (active-active) & transparent access failover (active-passive). These feature enable us for achieve high availability on our network interface when there is a failure or failover the interface when we are doing any maintenance.

A. Link-Based IPMP

===========
Active/Active
===========

1. Check if the network automatic configuration is disbaled and the default is enabled.

root@suntest:~# netadm list
TYPE        PROFILE        STATE
ncp         Automatic      disabled
ncp         DefaultFixed   online

2. list all the physical interfaces available.

root@suntest:~# dladm show-phys
LINK              MEDIA                STATE      SPEED DUPLEX    DEVICE
net1              Ethernet             up         1000   full      e1000g1
net0              Ethernet             up         1000   full      e1000g0

3. Add the ip in the /etc/hosts so that it remains persistent across the reboot

root@suntest:~# echo "192.168.75.25 testipmp0" >> /etc/hosts

4. Create the ipmp group & add the interface to the group

root@suntest:~# ipadm create-ipmp ipmp0
root@suntest:~# ipadm create-ip net0
root@suntest:~# ipadm create-ip net1
root@suntest:~# ipadm add-ipmp -i net0 -i net1 ipmp0

5. Assign the ip address for the ipmp interface which is configured.

root@suntest:~# ipadm create-addr -T static -a 192.168.75.21/24 ipmp0/v4

6. Review the configurtion.

root@suntest:~# ipmpstat -g
GROUP       GROUPNAME   STATE     FDT       INTERFACES
ipmp0       ipmp0       ok        --        net1 net0

root@suntest:~# ipmpstat -a
ADDRESS                   STATE GROUP       INBOUND     OUTBOUND
suntest                   up     ipmp0       net1        net1 net0

root@suntest:~# ipmpstat -i
INTERFACE   ACTIVE GROUP       FLAGS     LINK      PROBE     STATE
net0        yes     ipmp0       -------   up        disabled ok
net1        yes     ipmp0       --mbM--   up        disabled ok

As you can see both the network interfaces are active here and part of ipmp group

===============
Active/Passive
===============

For configuring the Active/Passive link based ipmp group perform the same step from 1 to 4 and then

5. Make one of the interface as standby from the ipmp group and then assign the ip

root@suntest:~# ipadm set-ifprop -p standby=on -m ip net1
root@suntest:~# ipadm create-addr -T static -a 192.168.75.21/24 ipmp0/v4

6. Review the configurtion.

root@suntest:~# ipmpstat -g
GROUP       GROUPNAME   STATE     FDT       INTERFACES
ipmp0       ipmp0       ok        --        net0 (net1)

root@suntest:~# ipmpstat -a
ADDRESS                   STATE GROUP       INBOUND     OUTBOUND
suntest                   up     ipmp0       net0        net0

root@suntest:~# ipmpstat -in
INTERFACE   ACTIVE GROUP       FLAGS     LINK      PROBE     STATE
net1        no      ipmp0       is-----   up        disabled ok
net0        yes     ipmp0       --mbM--   up        disabled ok

B. Probe-base IPMP

The probe-based IPMP that is supported in Solaris 11 are two kind with Test ip on interfaces & Transitive probing.

*** Configuring Probe-based with Test addresses (Active/Active) ***

1. Confirm if the transitive probling is not started, which by default is set to test address

root@suntest:~# svccfg -s svc:/network/ipmp listprop config/transitive-probing
config/transitive-probing boolean     false

2. Create the ipmp group and add the interface

root@suntest:~# ipadm create-ipmp ipmp0
root@suntest:~# ipadm create-ip net0
root@suntest:~# ipadm create-ip net1
root@suntest:~# ipadm add-ipmp -i net0 -i net1 ipmp0

3. Assign the ip for the ipmp interface and test address for the physical interfaces/

root@suntest:~# ipadm create-addr -T static -a 192.168.75.21/24 ipmp0/v4
root@suntest:~# ipadm create-addr -T static -a 192.168.75.2/24 net0/test1
root@suntest:~# ipadm create-addr -T static -a 192.168.75.3/24 net1/test2

4. Add a target to which the interfaces will probe.

root@suntest:~# route add default 192.168.75.1

5. Review if everything is working as you wanted.

root@suntest:~# ipmpstat -an
ADDRESS                   STATE GROUP       INBOUND     OUTBOUND
192.168.75.21             up     ipmp0       net0        net1 net0

root@suntest:~# ipmpstat -tn
INTERFACE   MODE       TESTADDR            TARGETS
net1        routes     192.168.75.3        192.168.75.1
net0        routes     192.168.75.2        192.168.75.1

root@suntest:~# ipmpstat -in
INTERFACE   ACTIVE GROUP       FLAGS     LINK      PROBE     STATE
net1        yes     ipmp0       -------   up        ok        ok
net0        yes     ipmp0       --mbM--   up        ok        ok

For configuring Probe-based with Test adresses (Active/passive) just do one step after 3

root@suntest:~# ipadm set-ifprop -p standby=on -m ip net1

and the difference you will see as below

root@suntest:~# ipmpstat -in
INTERFACE   ACTIVE GROUP       FLAGS     LINK      PROBE     STATE
net1        no      ipmp0       is-----   up        ok        ok
net0        yes     ipmp0       --mbM--   up        ok        ok

root@suntest:~# ipmpstat -an
ADDRESS                   STATE GROUP       INBOUND     OUTBOUND
192.168.75.21             up     ipmp0       net0        net0

*** Configuring Transtive probing using Active/Active method ***

1. Enable the transitive probing

root@suntest:~# svccfg -s svc:/network/ipmp setprop config/transitive-probing=true
root@suntest:~# svcadm refresh ipmp

2. Create the ipmp group and add the interface

root@suntest:~# ipadm create-ipmp ipmp0
root@suntest:~# ipadm create-ip net0
root@suntest:~# ipadm create-ip net1
root@suntest:~# ipadm add-ipmp -i net0 -i net1 ipmp0

3. Assign the ip for the ipmp interface

root@suntest:~# ipadm create-addr -T static -a 192.168.75.21/24 ipmp0/v4

4. Review the configuration

root@sunclu1:~# ipmpstat -tn
INTERFACE   MODE       TESTADDR            TARGETS
net1        transitive <net1>              <net0>
net0        routes     192.168.75.21       192.168.75.1

root@sunclu1:~# ipmpstat -pn
TIME      INTERFACE   PROBE NETRTT    RTT       RTTAVG    TARGET
0.77s     net1        t247   1.92ms    1.93ms    1.70ms    <net0>
0.77s     net0        i244   0.86ms    1.15ms    1.09ms    192.168.75.1
1.88s     net1        t248   1.68ms    1.69ms    1.70ms    <net0>
1.88s     net0        i245   0.73ms    1.21ms    1.11ms    192.168.75.1

Hope this document was helpful...

Thursday, December 13, 2012

Types of disk used in Storage

The disks in today’s storage environment are seeing a rapid change with the new type of disks available which produce more throughput & which are smaller in size that helps to put more disks into the disk array also the disk capacity are higher than its predecessor.

Let’s just dig in about this disks which are available in Storage platforms and understand what the technology that they are built on are, before that lets just touch upon those basic things that are important to consider when you think about disk.

A disk device has physical components and logical components. The physical components include disk platters and read/write heads. The logical components include disk slices, cylinders, tracks, and sectors.

All the hard disk drives are composed of same physical features, however quality of the parts inside the hard drive affects its performance, there are three important things that work together to give us the performance we want they are:

1. Disk Platter

The disk platter are made of aluminum or glass substrate it is then coated with magnetic surface which actually enables us to store the data in magnetic bits. The platters in a drive are separated by disk spacers and are clamped to a rotating spindle that turns all the platters in unison, there is a motor which is mounted right below the spindle which spins the platter at constant rate which is nothing but the RPM of the disk.

2. Drive Heads

The disk drive heads read and writes the data on these magnetic bits on the platter, there are usually two heads per platter which are sited on either side of the disk.

3. Actuator Arm

All the heads are attached to a single head actuator or actuator arm that moves the heads around the platters

As you can see that the hard disk are built by electro-mechanical components which has its performance limitation, which leads to the discovery of the disk that are built on flash based technology

*******

SSD

*******

Solid state disk or SSD is the fastest of the disk available, it is basically built using the NAND-based memory as flash drive which don’t have any spinning parts of electromechanical disks which lead it to provide fast performance and low latency. Some new solid- state technology systems are designed for solid state as primary storage while using spinning disk as less expensive storage for less active data.

Enterprise flash drives are solid-state drives (SSD’s) that have been modified in order to meet the reliability required in an enterprise storage array and are widely used as the top tier in the automated storage tiering feature on the latest storage boxes

******

Fibre Channel is a hard disk drive interface technology designed primarily for high-speed, high volume data transfers in storage. Fibre Channel standards specify the physical characteristics of the cables, connectors, and signals that link devices.

Fibre Channel provides three topology options for connecting devices: point-to-point, arbitrated loop, and fabric (sometimes called “switched” or “switched fabric”).

With using Fibre Channel FC-AL loop we get a speed of 2, 4 to 6Gbps speed which are scalable (we can start with basic array and extend the loop as we needed), reliable (superior data encoding & error checking) which are built for mission critical environments

******

SAS

******

SAS is the logical evolution of SCSI, including its long-established software advantage and its multi-channel, dual-port connection interface for enterprise storage which provides new levels of breakthrough speed and connectivity while retaining the functionality and reliability that is making SAS disks a good alternative for the FC disk in enterprise storage platform.

SAS disk are available in 10k and 15k RPM speed, the SCSI error-reporting and error-recovery commands on SAS are more functional than those on SATA drives

******

SATA

******

SATA technology was developed directly to replace the legacy desktop parallel ATA (PATA) interface. The SATA interface is designed to meet the requirements of Entry to Enterprise-level storage deployments. SATA I provides a point-to-point data transfer of 1.5gbps but with the new SATA II disk we get a data transfer rate of 3 GBPS, this disk rotates at 5400, 7200 and 10k RPM

SATA disk offer better capacity then other disk types they may not give you high speed performance as SAS/FC/SSD disks but are of great use in environment where you have shared filesystem like NFS/CIFS/SMB or in low cost environments.

SATA drives use native command queuing, while SAS drives use tagged command queuing

Wednesday, December 5, 2012

Installing powerpath and changing the emc pseudo device name

EMC powerpath is one of the best multipathing software that is used on the servers and not only it is stable but it is very easy to use, today i would like to show how to install it and to change the pseudo device name, sometimes when you are working on server where the luns are shared you want the device name to be same so that you can find them easily and sometime make your troubleshooting easier.

First thing you need is a powerlink.emc.com account to download the EMC powerpath which is supported for your Operating system

[root@linux01 ~]# rpm -ivh EMC/EMCPower.LINUX-5.6.0.00.00-143.RHEL5.x86_64.rpm
   Preparing...                         ########################################### [100%]
1:EMCpower.LINUX         ########################################### [100%]
All trademarks used herein are the property of their respective owners.
NOTE:License registration is not required to manage the CLARiiON AX series array.

After installing the powerpath package next thing you need to license the software to use the feature of it.

[root@linux01 ~]# emcpreg -list
There are no license keys now registered.

[root@linux01 ~]# emcpreg -add XXXX-XXXX-XXXX-XXXX-XXXX-XXXX
1 key(s) successfully added.

[root@linux01 ~]# emcpreg -list
Key XXXX-XXXX-XXXX-XXXX-XXXX-XXXX
Product: PowerPath
Capabilities: All

Start the service and then if you have done the zoning then check for any new luns which are visible on the host HBA's
[root@linux01 ~]# /etc/init.d/PowerPath start
Starting PowerPath: done

[root@linux01 ~]# powermt check

[root@linux01 ~]# powermt display dev=all
Pseudo name=emcpowerb
CLARiiON ID=CKM00103100530 [Test-SG]
Logical device ID=600601604D3827004E67486CA534E211 [LUN 1]
state=alive; policy=CLAROpt; priority=0; queued-IOs=0;
Owner: default=SP A, current=SP A       Array failover mode: 1
==============================================================================
--------------- Host ---------------   - Stor -   -- I/O Path -- -- Stats ---
### HW Path               I/O Paths    Interf.   Mode    State   Q-IOs Errors
==============================================================================
   2 qla2xxx                  sdc       SP A1     active alive       0      0
   2 qla2xxx                  sdf       SP B0     active alive       0      0
   3 qla2xxx                  sdh       SP A0     active alive       0      0
   3 qla2xxx                  sdj       SP B1     active alive       0      0

At time you want the luns which are shared across the servers to have the same pseudo name so that you can recognize them in case you want to increase the size or sometime for troubleshooting.

[root@linux01 ~]# emcpadm getusedpseudos
PowerPath pseudo device names in use:
Pseudo Device Name      Major# Minor#
        emcpowera         120      0
        emcpowerb         120     16

After finding the all the pseudo devices that you are using find the next available pseudo device which you can use.

[root@linux01 ~]# emcpadm getfreepseudos
Next free pseudo device name(s) from emcpowera are:
Pseudo Device Name      Major# Minor#
        emcpowerc         120     32

Now lets rename the device and check if its reflecting
[root@linux01 ~]# emcpadm renamepseudo -s emcpowera -t emcpowerc

[root@linux01 ~]# emcpadm getusedpseudos
PowerPath pseudo device names in use:

Pseudo Device Name      Major# Minor#
        emcpowerc         120      0
        emcpowerb         120     16

Sunday, December 2, 2012

Redhat Linux Cluster on RHEL 6.3

Today i am posting a two node Redhat Active-Passive failover Cluster on RHEL 6.3 that i got to setup and it works like a charm. I was really impressed with the way Redhat Cluster suite works and ease of installation.

There are two critical part for setting up a cluster both are relatively important

====================================
Part 1 - Steps to do before starting the installation:
====================================

Make sure that you have all the Redhat Cluster suite package is installed.

Add the host entry on the /etc/hosts on both the nodes and set a passwordless ssh that helps for copying the files
[root@node01 ~]#cat /etc/hosts
192.168.10.2    node01.test.com        node01
192.168.10.3    node02.test.com        node02

You will have to configure the interconnect in such a way that they communicate in multicast frames from eachother and please run the below command simultaneously.

[root@node01 ~]#omping 192.168.10.3 192.168.10.2
192.168.10.3 : joined (S,G) = (*, 232.43.211.234), pinging
192.168.10.3 :   unicast, seq=1, size=69 bytes, dist=0, time=0.131ms
192.168.10.3 : multicast, seq=1, size=69 bytes, dist=0, time=0.174ms

[root@node02 ~]# omping 192.168.10.2 192.168.10.3
192.168.10.2 : waiting for response msg
192.168.10.2 : joined (S,G) = (*, 232.43.211.234), pinging
192.168.10.2 :   unicast, seq=1, size=69 bytes, dist=0, time=0.224ms
192.168.10.2 : multicast, seq=1, size=69 bytes, dist=0, time=0.269ms

Make sure the below services are switched off on both the nodes, so that you can avoid troubleshooting to where was the problem.

[root@node01 ~]#chkconfig ip6tables off
[root@node01 ~]#chkconfig iptables off
[root@node01 ~]#chkconfig acpid off

and the selinux should be disabled on both the nodes
[root@node01 ~]# cat /etc/selinux/config |grep -i ^SELINUX=
SELINUX=disabled

Add the entry in /etc/httpd/conf/httpd.conf (node01, node02)
Listen 192.168.10.10:80

Add the cluster service to start it on the runlevel during the startup
[root@node01 ~]#chkconfig ricci on
[root@node01 ~]#chkconfig cman on
[root@node01 ~]#chkconfig rgmanager on

Set a password for the ricci user
[root@node01 ~]#passwd ricci

Reboot both the nodes to make the changes take effect.

========================
Part 2 - Starting with Installation:
========================
I have connected both the nodes to a EMC clariion and mapped two luns of which one of them i will be using for fencing.

[root@node01 ~]# powermt display dev=all |grep emcpower
Pseudo name=emcpowerb
Pseudo name=emcpowera

Start the ricci service before you start to configure the cluster.
[root@node01 ~]#service ricci start
Starting ricci: [ OK ]

*Create a basic cluster config file and name your cluster some name
[root@node01 ~]#ccs -h node01 --createcluster webcluster

* Add both the nodes on the cluster
[root@node01 ~]#ccs -h node01 --addnode node01 --votes=1 --nodeid=1
[root@node01 ~]#ccs -h node01 --addnode node02 --votes=1 --nodeid=2

* Here we will set the fence daemon properties which will let the cluster waits 0 sec before fencing the other node and the other parameter will set the cluster waits for 30 sec before fencing a node after it joins back
[root@node01 ~]#ccs -h node01 --setfencedaemon post_fail_delay=0 post_join_delay=30

* We will set the cman daemon properties for two node cluster and a vote so that the services will run in one node if the other fails
[root@node01 ~]#ccs -h node01 --setcman two_node=1 expected_votes=1

* Add a fence method which will be used by the nodes
[root@node01 ~]#ccs -h node01 --addmethod scsi node01
[root@node01 ~]#ccs -h node01 --addmethod scsi node02

* Adding the fence device, cluster agent & the log file which helps for troubleshooting
[root@node01 ~]#ccs -h node01 --addfencedev scsi_dev agent=fence_scsi devices=/dev/emcpowera logfile=/var/log/cluster/fence_scsi.log aptpl=1

* Add a fence inst which the cluster uses during the starting up
[root@node01 ~]#ccs -h node01 --addfenceinst scsi_dev node01 scsi key=1
[root@node01 ~]#ccs -h node01 --addfenceinst scsi_dev node02 scsi key=2

* Add a fence inst which the cluster uses during the stopping
[root@node01 ~]#ccs -h node01 --addunfenceinst scsi_dev node01 key=1 action=on
[root@node01 ~]#ccs -h node01 --addunfenceinst scsi_dev node02 key=1 action=on

* Create a failover domain and add the nodes in that, with setting it to failover in ordered way by setting priorities for each node
& nofailback if the node with higher priority comes back online after failover
[root@node01 ~]#ccs -h node01 --addfailoverdomain web-failover ordered=1 nofailback=1
[root@node01 ~]#ccs -h node01 --addfailoverdomainnode web-failover node01 1
[root@node01 ~]#ccs -h node01 --addfailoverdomainnode web-failover node02 2

* Create a services under the cluster
[root@node01 ~]#ccs -h node01 --addservice web domain=web-failover recovery=relocate autostart=1

* Now we will add all the resources in the Global cluster config which the services will use when its starting, I am going to add three services a SAN device which will get mounted when the cluster starts, an virtual ip that will set the ip and the apache service on the Active node.

[root@node01 ~]#ccs -h node01 --addresource fs name=web_fs device=/dev/emcpowerb1 mountpoint=/var/www fstype=ext3
[root@node01 ~]#ccs -h node01 --addresource ip address=192.168.10.10 monitor_link=yes
[root@node01 ~]#ccs -h node01 --addresource apache name=apache_server config_file=conf/httpd.conf server_root=/etc/httpd shutdown_wait=10

Then add the subservice on the cluster in the ordered way so that the filesystem get mounted then the ips is configured and then the apache service is started on the Active cluster node.
[root@node01 ~]#ccs -h node01 --addsubservice web fs ref=web_fs
[root@node01 ~]#ccs -h node01 --addsubservice web ip ref=192.168.10.10
[root@node01 ~]#ccs -h node01 --addsubservice web apache ref=apache_server

* Copy the cluster config on both the nodes
[root@node01 ~]#cman_tool version -r
[root@node01 ~]#ccs -h node01 --sync --activate

* Start the cluster service
[root@node01 ~]#ccs -h node01 --startall

[root@node01 ~]# cman_tool nodes
Node Sts   Inc   Joined               Name
   1   M    112   2012-11-30 23:44:28 node01
   2   M    116   2012-11-30 23:44:28 node02

[root@node01 ~]# ccs -h node01 --lsnodes
node01: votes=1, nodeid=1
node02: votes=1, nodeid=2

[root@node01 ~]# ccs -h node01 --lsfencedev
scsi_dev: logfile=/var/log/cluster/fence_scsi.log, aptpl=1, devices=/dev/emcpowera, agent=fence_scsi

[root@node01 ~]#ccs -h node01 --lsfailoverdomain
web-failover: restricted=0, ordered=1, nofailback=0
node01: priority=1
node02: priority=2

[root@noe01 ~]# ccs -h node01 --lsservices
service: name=web, exclusive=0, domain=web-failover, autostart=1, recovery=relocate
fs: ref=web_fs
ip: ref=192.168.10.10
apache: ref=apache_server
resources:
fs: name=web_fs, device=/dev/emcpowerb1, mountpoint=/var/www, fstype=ext3
ip: monitor_link=yes, address=192.168.10.10
apache: name=apache_server, shutdown_wait=10, config_file=conf/httpd.conf, server_root=/etc/httpd

[root@node01 ~]# clustat
Cluster Status for webcluster @ Sun Dec 2 21:58:31 2012
Member Status: Quorate

Member Name                                                     ID   Status
------ ----                                                     ---- ------
node01                                                              1 Online, Local, rgmanager
node02                                                              2 Online, rgmanager

Service Name                                                     Owner (Last)                                                     State
------- ----                                                     ----- ------                                                     -----
service:web                                                      node01                                                           started

The /etc/cluster/cluster.conf can be edited manually but we just have to make sure that the version no is changed and run ccs_config_validate to check the validity of the config file.

[root@node01 ~]# ccs_config_validate
Configuration validates
[root@node01 ~]#cman_tool version -r
[root@node01 ~]#ccs -h node01 --stopall
[root@node01 ~]#ccs -h node01 --startall

Hope this document was helpful!!