Sunday, March 10, 2019

NFS service group in VCS

NFS service group in VCS

AIM : Make an nfs type service group which will provide redundancy to a nfs share.

Description : We need to have an nfs share point on one of the nodes of a cluster. Whenever this server goes down, this share point will  failover to another node and nsf share will be available to the client continuously. That's what the aim of VCS , right ? I mean , if any thing goes wrong at one node, it should not affect client.

So here we go :

The 2 most important things to reach our objective :

1. NFS configuration at operating system level at all the nodes.
2. Hierarchy of resources i.e. dependency of resources.

1. NFS needs a particular set of services NOT to run at OS level to work perfectly under VCS. We need to disable them NOT through SMF ( svcadm ) but at configuration files so that it may not get enabled when system reboots. So execute following commands to make changes at all the nodes :-


svccfg -s nfs/server setprop "application/auto_enable=false"
svccfg -s nfs/mapid setprop "application/auto_enable=false" 
svccfg -s nfs/nlockmgr setprop "application/auto_enable=false"
svccfg -s nfs/status setprop "application/auto_enable=false"

Now we are all set to play with VCS. I mean it's time to decide what all resources are required for this service group. In technical language - "Hierarchy of resources" 

Keep NFSRestart at top, all other should be child of this resource. 

it needs following 6 resources :-

1. NFSRestart
2. Share

3. DiskGroup
4. Mount

5. IP
6. NIC

Don't get scared how to handle these many resources, we already have divided in 3 sections.

We will take a bottom-up approach , it's not a rocket science, trust me. See, my approach is , if anything feels too tough or lengthy , just break it into peaces , and then assemble it at the end.

First thing that we can do is to make a service group with a familiar name , just in case we need to recall "nfssg".


#hagrp -add nfssg
#hagrp -modify nfssg SystemList sys1 0 sys2 1
#hagrp  -modify nfssg AutoStartList sys1


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Now we have to make a mount resource which will mount one of the volume from this disk group to a mount point /test :
Now, when the mount point is available, our task is to make a share resource which will nfs share this mount point , /test.
You know what , we are done with resources and service group. Nothing mechanical stuff needed now. Only bit of logic and now you will use only 1000th part of your brain to how exactly we should link these resources. 



Now resources : (BOTTOM - UP)


Start with the easiest one, i.e. NIC and IP resources :

Why this IP ? Because it is used to access NFS share from client. ( remember what we do at client, mount ip:/nfsshare  /mnt , so this ip address will be used by client to access our share point)


#hares -add mnicb MultiNICB nfssg
#hares -modify mnicb Critical 0
#hares -modify mnicb Device e1000g0
#hares -modify mnicb Enabled 1

#hares add ipmnicb IPMultiNICB nfssg
#hares -modify ipmnicb Critical 0
#hares -modify ipmnicb Address 192.168.1.100
#hares -modify ipmnicb BaseResName mnicb
#hares -modify ipmnicb NetMask 255.255.255.0
#hares -modify ipmnicb Enabled 1

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Now we need a mount point to be shared. This mount point will come from a disk group , as we are using VxVM. So make a DiskGroup resource and name it nfsdg. (Don't confuse with name nfssg)

#hares -add nfsdg DiskGroup nfssg
#hares -modify  nfsdg  Critical 0
#hares -modify nfsdg DiskGroup dg1
#hares -modify  nfsdg  Enabled 1



#hares -add nfsmount Mount nfssg
#hares -modify   nfsmount  Critical 0
#hares -modify  nfsmount BlockDevice /dev/vx/dg1/dsk/vol1
#hares -modify nfsmount MountPoint /test
#hares -modify nfsmount MountOpt
#hares -modify nfsmount Enabled 1

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


#hares -add nfsshare Share nfssg
#hares -modify nfsshare Critical 0
#hares -modify nfsshare PathName /test
#hares -modify nfsshare Options %-y
#hares -modify nfsshare Enabled 1



The most important resource is NFRestart resource, it will restart nfs services whenever it is called by VCS. Usually whenever service group is brought online or offline, this resource is triggered. As it is most important, we will give highest priority to this resource and it will be our top resource in dependency hierarchy. First add it : 

#hares -add nfsrestart NFSRestart nfssg
#hares -modify nfsrestart Critical 0
#hares -modify nfsrestart Enabled 1



As we know NFSRestart is most important , so make it grandfather, I mean keep it at the top of dependency tree :   NFSRestart  ->Share->Mount->DiskGroup   and another one is  IP->NIC  , thats it. DONE.  We will make 2 dependency tree not 1 because making 1 dependency tree will violate the rule of max 5 dependency in a tree.

#hares -link nfsrestart nfsshare
#hares -link nfsshare nfsmount
#hares -link nfsmount nfsdg

#hares -link ipmnicb mnicb

~~~~~~~~~~~~~DONE ...!!!~~~~~~~~~~~

#haconf -dump -makero

BRING THE SERVICE GROUP ONLINE :-

#hagrp -online nfssg -sys sys1

Clarity of facts :-
1. Here , we are working on NFS server and not on client. We are providing high availability to "nfs share".
2. On client, simply mount it by "mount" command. If you want to provide HA on this mount point as well, simple "Mount" type resource will work , with block device modified as "192.168.1.100:/test" .




NFS_VCS

HA NFS solution using Veritas Cluster server on Solaris

1.Architecture


2.Create disk group and volume

bash-3.00# vxdg init datadg cds=off c1t4d0s2 c1t5d0s2 c1t6d0s2 c1t8d0s2 c1t9d0s2 c1t10d0s2 c1t11d0s2 c1t12d0s2 c1t13d0s2
VxVM vxdg ERROR V-5-1-2349 Device c1t4d0s2 appears to be owned by disk group datadg.
VxVM vxdg ERROR V-5-1-2349 Device c1t5d0s2 appears to be owned by disk group datadg.
VxVM vxdg ERROR V-5-1-2349 Device c1t6d0s2 appears to be owned by disk group datadg.
VxVM vxdg ERROR V-5-1-2349 Device c1t8d0s2 appears to be owned by disk group datadg.
VxVM vxdg ERROR V-5-1-2349 Device c1t9d0s2 appears to be owned by disk group datadg.
VxVM vxdg ERROR V-5-1-2349 Device c1t10d0s2 appears to be owned by disk group datadg.
VxVM vxdg ERROR V-5-1-2349 Device c1t11d0s2 appears to be owned by disk group datadg.
VxVM vxdg ERROR V-5-1-2349 Device c1t12d0s2 appears to be owned by disk group datadg.
VxVM vxdg ERROR V-5-1-2349 Device c1t13d0s2 appears to be owned by disk group datadg.
bash-3.00# vxdg import datadg
bash-3.00# vxdisk list
DEVICE TYPE DISK GROUP STATUS
c1t0d0s2 auto:none – – online invalid
c1t1d0s2 auto:cdsdisk c1t1d0 vxfencoorddg online
c1t4d0s2 auto:sliced datadg01 datadg online
c1t5d0s2 auto:sliced datadg02 datadg online
c1t6d0s2 auto:sliced datadg03 datadg online
c1t8d0s2 auto:sliced datadg04 datadg online
c1t9d0s2 auto:sliced datadg05 datadg online
c1t10d0s2 auto:sliced datadg06 datadg online
c1t11d0s2 auto:sliced datadg07 datadg online
c1t12d0s2 auto:sliced datadg08 datadg online
c1t13d0s2 auto:sliced datadg09 datadg online spare
c2t0d0s2 auto:cdsdisk c2t0d0 vxfencoorddg online
c3t0d0s2 auto:cdsdisk c3t0d0 vxfencoorddg online
bash-3.00# vxassist -g datadg make vol1 100m layout=nolog
on both nodes:
mkdir /nfsshare
bash-3.00# newfs /dev/vx/dsk/datadg/vol1
/dev/vx/rdsk/datadg/vol1: Unable to find Media type. Proceeding with system determined parameters.
newfs: construct a new file system /dev/vx/rdsk/datadg/vol1: (y/n)? y
/dev/vx/rdsk/datadg/vol1: 204800 sectors in 100 cylinders of 32 tracks, 64 sectors
100.0MB in 7 cyl groups (16 c/g, 16.00MB/g, 7680 i/g)
super-block backups (for fsck -F ufs -o b=#) at:
32, 32864, 65696, 98528, 131360, 164192, 197024,

3.On both nodes, disable system NFS

bash-3.00# svccfg delete -f svc:/network/nfs/server:default
svccfg: Pattern ‘svc:/network/nfs/server:default’ doesn’t match any instances or services
bash-3.00# svccfg delete -f svc:/network/nfs/mapid:default

4.check cluster status

bash-3.00# lltstat -l
LLT link information:
link 0 e1000g2 on etherfp hipri
mtu 1500, sap 0xcafe, broadcast FF:FF:FF:FF:FF:FF, addrlen 6
txpkts 29754 txbytes 2079067
rxpkts 77210 rxbytes 5029877
latehb 13 badcksum 0 errors 0
link 1 e1000g3 on etherfp hipri
mtu 1500, sap 0xcafe, broadcast FF:FF:FF:FF:FF:FF, addrlen 6
txpkts 29758 txbytes 2065160
rxpkts 82794 rxbytes 6109424
latehb 16 badcksum 0 errors 0
bash-3.00# modinfo|grep gab
226 fffffffff719d000 48850 229 1 gab (GAB device 5.0)
bash-3.00# gabconfig -a
GAB Port Memberships
===============================================================
Port a gen 813c01 membership 01
Port h gen 813c03 membership 01
bash-3.00# hacf -verify /etc/VRTSvcs/conf/config
bash-3.00# hauser -display
UserName : Privilege
——————–
admin : ClusterAdministrator
bash-3.00# haclus -value EngineVersion
5.0.00.0

5.Create NFS resources

bash-3.00# hares -modify nfsIP Address 192.168.47.239
VCS WARNING V-16-1-11309 Configuration must be ReadWrite
bash-3.00# haconf -makerw
bash-3.00# hares -modify nfsIP Address 192.168.47.239
bash-3.00# haconf -dump –makero
420 hagrp -modify hanfs SystemList solarisA 1 solarisB 2
421 hagrp -autoenable hanfs -sys solarisA
422 hares -add nfsNIC NIC hanfs
423 hares -modify nfsNIC Enabled 1
424 hares -modify nfsNIC Device e1000g0
425 hares -modify nfsIP Enabled 1
426 hares -modify nfsIP Device e1000g0
427 hares -modify nfsIP Address 192.168.47.133
428 hares -modify nfsIP IfconfigTwice 1
429 hares -add nfsDG DiskGroup hanfs
430 hares -modify nfsDG Enabled 1
431 hares -modify nfsDG DiskGroup datadg
432 hares -modify nfsDG StartVolumes 0
433 hares -add nfsVOL Volume hanfs
434 hares -modify nfsVOL Enabled 1
435 hares -modify nfsVOL Volume vol01
436 hares -modify nfsVOL DiskGroup datadg
437 hares -add nfsMOUNT Mount hanfs
438 hares -modify nfsMOUNT Enabled 1
439 hares -modify nfsMOUNT MountPoint /nfsshare
440 hares -modify nfsMOUNT BlockDevice /dev/vx/dsk/datadg/vol01
441 hares -modify nfsMOUNT FSType ufs
442 hares -modify nfsMOUNT FsckOpt %-n
443 hares -add nfsNFS NFS hanfs
444 hares -modify nfsNFS Enabled 1
445 hares -modify nfsNFS Nservers 24
446 hares -add nfsSHARE Share hanfs
447 hares -modify nfsSHARE Enabled 1
448 hares -modify nfsSHARE PathName /nfsshare
449 hares -modify nfsSHARE Options rw

6.Link resources relationship

450 hares -link nfsIP nfsNIC
451 hares -link nfsVOL nfsDG
452 hares -link nfsMOUNT nfsVOL
453 hares -link nfsSHARE nfsIP
454 hares -link nfsSHARE nfsMOUNT
455 hares -link nfsSHARE nfsNFS

7.Online the resource group

bash-3.00# hagrp -online hanfs -sys solarisA
bash-3.00# hastatus
attempting to connect….connected
group resource system message
————— ——————– ——————– ——————–
solarisA RUNNING
solarisB RUNNING
hafile solarisA OFFLINE
hafile solarisB OFFLINE
————————————————————————-
hanfs solarisA ONLINE

hanfs solarisB OFFLINE
fileon solarisA ONLINE
fileon solarisB ONLINE
nfsDG solarisA ONLINE
————————————————————————-
nfsDG solarisB OFFLINE
nfsIP solarisA ONLINE
nfsIP solarisB OFFLINE
nfsMOUNT solarisA ONLINE
nfsMOUNT solarisB OFFLINE
————————————————————————-
nfsNFS solarisA ONLINE
nfsNFS solarisB OFFLINE
nfsNIC solarisA ONLINE
nfsNIC solarisB ONLINE
nfsSHARE solarisA ONLINE
————————————————————————-
nfsSHARE solarisB OFFLINE
nfsVOL solarisA ONLINE
nfsVOL solarisB OFFLINE
^C

a new IP after the hanfs group is online

bash-3.00# ifconfig -a
lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
inet 127.0.0.1 netmask ff000000
e1000g0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
inet 192.168.47.131 netmask ffffff00 broadcast 192.168.47.255
ether 0:c:29:59:47:8f
e1000g0:1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2

inet 192.168.47.239 netmask ffffff00 broadcast 192.168.47.255

e1000g4: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 3
inet 192.168.47.211 netmask ffffff00 broadcast 192.168.47.255
ether 0:c:29:59:47:b7
on the redhat vm:
mount 192.168.47.239:/nfsshare /hanfs
copy some files to /nfsshare
bash-3.00# cp -r /etc/inet/hosts .
now we see the files under /hanfs on the redhat
hagrp -switch hanfs -to solarisB
bash-3.00# hares -display nfsVOL
#Resource Attribute System Value
nfsVOL Group global hanfs
nfsVOL Type global Volume
nfsVOL AutoStart global 1

add autostartlist to the service group

bash-3.00# hagrp -display hanfs
#Group Attribute System Value
hanfs AdministratorGroups global
hanfs Administrators global
hanfs Authority global 0
hanfs AutoFailOver global 1
hanfs AutoRestart global 1
hanfs AutoStart global 1
hanfs AutoStartIfPartial global 1
hanfs AutoStartList global
bash-3.00# hagrp -modify hanfs AutoStartList -add solarisA
VCS WARNING V-16-1-11309 Configuration must be ReadWrite
bash-3.00# haconf -makerw
bash-3.00# hagrp -modify hanfs AutoStartList -add solarisA

update cluster admin password

bash-3.00# hauser -display
UserName : Privilege
——————–
admin : ClusterAdministrator
bash-3.00# hauser -list
admin
bash-3.00# haconf -makerw
bash-3.00# hauser -update admin
Enter New Password:
Enter Again:

Issue fixing

RG is partial/stopping

# hagrp -state hanfs
#Group Attribute System Value
hanfs State solarisA |OFFLINE|
hanfs State solarisB |PARTIAL|STOPPING|
# bash
bash-3.00# hagrp –flush hanfs -sys solarisB

a resource is fault

bash-3.00# hares -clear nfsVOL
bash-3.00# hagrp -online hanfs -sys solarisB
hares –online mysqlMOUNT –sys solarisB
./bin/mysqld_safe&

nfs service group in VCS conflict with SMF


 

Posted on December 8, 2009by jeanwan
If there is a nfs service group in VCS, the nfs service group will not get online becase of the conflict with solaris10 SMF.
Logs:
2009/12/8 12:35:07 VCS ERROR V-16-1-7012 (tys12app1) NFS:/opt/VRTSvcs/bin/NFS/monitor:???:Service Management Facility monitoring is not disabled for NFS nfsmapid daemon!! Returning Unknown
Solution:
Disable SMF for nfsd and mountd:
# svccfg delete -f svc:/network/nfs/server:default
Disable SMF for nfsmapid:
# svccfg delete -f svc:/network/nfs/mapid:default
Then the nfs service group could online.
When you want to import the two service, follow the steps:
(1) cd /var/svc/manifest/network/nfs
(2) svccfg import mapid.xml
(3) svccfg import server.xml
(4) reboot server

bash-3.00# svccfg list | grep nfs
network/nfs/cbd
network/nfs/client
network/nfs/mapid
network/nfs/nlockmgr
network/nfs/status
network/nfs/rquota
network/nfs/server

reference

https://www4.symantec.com/Vrt/offer?a_id=89446

agents: https://sort.symantec.com/agents

Tuesday, April 5, 2016

Recovering root password on Solaris SPARC server:

1.Bring the server to OK prompt. 
If the server is up and running then login to server console,you can initiate the reset or send break signal to bring the server to OK prompt.

2.Boot the OS in failsafe mode from OK prompt.
#ok boot -F failsafe
3.Once the server is booted up in failsafe mode,then mount the root disk in /mnt .
If you don’t know the root disk,then execute format command and check one by one.
#mount /dev/dsk/c1t1d0s0 /mnt

4.Take a backup of /mnt/etc/passwd & /mnt/etc/shadow file before removing the root password from it.
# cp -p /mnt/etc/passwd /mnt/etc/passwd.13092013
# cp -p /mnt/etc/shadow /mnt/etc/shadow.13092013

5.Now remove the encrypted password entry for root from /mnt/etc/shadow file using vi editor.You may need to set term to edit the file.(For bash shell — >#export TERM=vt100)
Before Modifications:
#grep root /a/etc/shadow
root:XD9erIqDGXYM.:12192::::::

After Modifications:
#grep root /a/etc/shadow
root::12192::::::

6. Update the boot archive to ensure boot archive is up to date.
# bootadm update-archive -R /mnt
Creating boot_archive for /mnt
updating /a/platform/sun4u/boot_archive
7.  Reboot your system using init command.
# init 6

Source:http://www.unixarena.com/2013/09/how-to-breakrecover-solaris-root.html

Monday, April 4, 2016

AUTOFS:

On server side

1. Export nfs share for the client
# cat /etc/exports
/opt/share                        *(rw,sync)
/opt/archive kuldeep(rw,sync)
/opt/archive ramsing(rw,sync)
[root@mohan share]# service nfs restart
Shutting down NFS daemon:                                  [  OK  ]
Shutting down NFS mountd:                                  [  OK  ]
Shutting down NFS quotas:                                  [  OK  ]
Shutting down NFS services:                                [  OK  ]
Shutting down RPC idmapd:                                  [  OK  ]
Starting NFS services:  exportfs: Failed to resolve kuldeep
exportfs: Failed to resolve kuldeep
                                                           [  OK  ]
Starting NFS quotas:                                       [  OK  ]
Starting NFS mountd:                                       [  OK  ]
Starting NFS daemon:                                       [  OK  ]
Starting RPC idmapd:        

On client side

1. Install autofs package
# yum search autofs
# yum install autofs

2. Add entry to auto.master
# cat /etc/auto.master
/export/home    /etc/auto.users

3. Add/specify nfs share info
# cat /etc/auto.users
ram -rw 192.168.5.33:/opt/share/ram
# * -rw 192.168.5.33:/opt/share/&

4. Restart the autofs service
# service autofs restart

5. Verify if autofs is working
# cd /export/home/ram

# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda2              18G  7.1G  9.5G  43% /
tmpfs                 495M  228K  495M   1% /dev/shm
/dev/sda1             283M   28M  240M  11% /boot
/dev/sr1              182M  182M     0 100% /media/CentOS
/dev/sr0              4.4G  4.4G     0 100% /media/CentOS_6.6_Final
192.168.5.33:/opt/share/ram
                       18G   13G  3.7G  79% /export/home/ram    ==>> this output confirms that autofs is working
How to freeze/unfreeze the service group in VCS


WHY WE USE FREEZE

In case you want to add application in any service group or want to perform any maintenance activity. Than you can freeze the Service group or cluster node. After freeze the node VCS don’t take any action.
FREEZE Types:
• Persistent:- If cluster node reboot during the activity or after the activity than Cluster node state still remain freeze .
Ref: (frozen=1)
• Temporary:- If cluster node reboot during the activity or after the activity than Cluster node state will be unfreeze automatically.
Ref: (tfrozen=1)
Note: If not freeze than value=0

Freeze Service group:
Permanently
#
hagrp –freeze <Service group name> –persistent

Temporary
#
hagrp –freeze <Service group name>

Freeze Cluster node
#
hasys -freeze [-persistent] node1

Unfreeze Service group:-
#
hagrp –unfreeze <Service group name> –persistent

Temporary
#
hagrp –unfreeze <Service group name>

Unfreeze Cluster Node:-
#
hasys -unfreeze [-persistent] node1

For check freeze status:-
#
hagrp -display|egrep -I ‘frozen|tfrozen’
Share this:
SMON
SMON (System MONitor) is an Oracle background process created when you start a database instance. The SMON process performs instance recovery, cleans up after dirty shutdowns and coalesces adjacent free extents into larger free extents.
SMON wakes up every 5 minutes to perform housekeeping activities. SMON must always be running for an instance. If not, the instance will terminate.
[edit]Check process

The following Unix/Linux command is used to check if the SMON process is running:
$ ps -ef | grep smon
oracle   31144     1  0 11:10 ?        00:00:00 ora_smon_orcl


PMON
PMON (Process MONitor) is an Oracle background process created when you start a database instance. The PMON process will free up resources
if a user process fails (eg. release database locks).
PMON normally wakes up every 3 seconds to perform its housekeeping activities. PMON must always be running for an instance. If not, the
instance will terminate.

How to check RAC database status in UNIX
pmon on both nodes.  check using srvctl status command
[oracle@rac2 ~]$ ps -eaf | grep pmon



Configuring the network Interface 

/etc/udev/rules.d/70-persistent-net.rules
/etc/sysconfig/network-scripts/eth---


Huge Pages and Transparent Huge Pages

Memory is managed in blocks known as pages. A page is 4096 bytes. 1MB of memory is equal to 256 pages; 1GB of memory is equal to 256,000 pages,
 etc. CPUs have a built-in memory management unit that contains a
list of these pages, with each page referenced through a page table entry.There are two ways to enable the system to manage large amounts of memory:
Increase the number of page table entries in the hardware memory management unit
Increase the page size.
Simply put, huge pages are blocks of memory that come in 2MB and 1GB sizes. The page tables used by the 2MB pages are suitable for managing multiplegigabytes of
memory, whereas the page tables of 1GB pages are best for scaling to terabytes of memory.

set huge page to 8G

1. cp /etc/sysclt.conf   /etc/BACKUP.sysclt.cnf  ---- create back up
2. /etc/sysclt.conf (modify the below entry)
kernel.shmmax = 9181523968
kernel.shmall = 4483166
vm.nr_hugepages = 4378
3. cp /etc/security/limits.conf   (create back up)
put 9181523968 value on the line containging "memlock"
4.sysctl -p
notify the requester and reboot host upon confirm request

shakeout:
EDB will perform database shakeout
cat /proc/sys/vm/nr_hugepages   ( should be 4378)