Nutanix is now become more and more popular as an hyperconverged solution and in my blogs I am trying to cover steps to conduct the health check for Nutanix Cluster
Basically for health check “ncc health_checks run_all” is sufficient to provide all the required information however you can gather specific information for below mention components with the help of below commands
+Cluster Info
– Cluster name
– Uptime
– NOS Version
– Cluster ID
– Block serial number
– HW model
Commands to Run
ncli cluster info
ncli host ls
+ Storage pool list
– Name
– Capacity (logical used and total)
– IOPS and latency
+ Container Info
– Name
– Capacity (logical used and total)
– IOPS and latency
– Replication factor
Commands to Run
ncli sp ls
ncli ctr ls
IOPS and Perf metrics can be viewed from prism page
+ CVM
– Status of service for each CVM
– CVM memory and VCPU usage
– Uptime
– Network stats
– IP addresses of CVMs
– NIC errors
Commands to Run
cluster status
allssh uptime
svmips
ifconfig
+ Disk status
– Perf stats and usage
All Perf IO metrics are available on prism
+ Currently set gflags
ncc health_checks system_checks gflags_diff_check
+ Hypervisor
– Hypervisor software and version
– Uptime
– Installed VMs
– Memory Usage
– Attached Datastore
This is independent to the hypervisor ,
+ Datastore Info
– Usage
– Capacity
– Name
information available in prism
+ Disk list
+ Domain fault tolerance states
ncli cluster get-domain-fault-tolerance-status type=node
+ Default gateway
+ SSH key list
+ SMTP config
+ NTP config
information available in prism