一套监控EMC VNX存储的小脚本(可用于Zabbix)
项目地址: https://github.com/zhangrj/EMC-VNX-Storage-Zabbix-Monitor
开发背景
EMC VNX5500存储是公司最核心的存储设备,一旦出问题,整个平台就会陷入瘫痪。在我到来之前,EMC存储的巡检完全依赖人工远程与现场代维,今年5月份的时候,我开始着手解决这个问题。
最先想到的监控方法肯定是SNMP/SNMPTRAP,但很不幸的是,我找了大半天也没有找到配置SNMP或SNMPTRAP的地方,也没有搜索到设备的MIB参考文档。在浏览相关资料的时候,找到了通过命令行配置存储设备的管理工具Navisphere,使用该工具可查看存储状态,简单编写一点程序结合Zabbix即可实现监控。
Navisphere命令行工具安装
正常使用rpm安装即可
[root@localhost ~]# rpm -ivh NaviCLI-Linux-64-x86-en_US-7.33.9.2.36-1.x86_64.rpm
Preparing... ########################################### [100%]
1:NaviCLI-Linux-64-x86-en########################################### [100%]
Run the script /opt/Navisphere/bin/setlevel_cli.sh to set the security level before you proceed.
根据提示设置安全等级,输入2选择medium等级即可。
[root@localhost ~]# /opt/Navisphere/bin/setlevel_cli.sh
Please enter the verifying level(low|medium|l|m) to set?
2
Setting (default) medium verifying level.....
Verification level medium has been set SUCCESSFULLY!!!
创建一个安全文件,这样使用时就不用再输入用户名和密码。安全文件是加密的,且与本机绑定,user参数为EMC管理用户名、password为密码,scope域的值对应<0 – global; 1 – local; 2 – LDAP>:
[root@localhost ~]# cd /opt/Navisphere/bin/
[root@localhost bin]# ls
admsnap naviseccli setlevel_cli.sh setlevel.log
[root@localhost bin]# ./naviseccli -AddUserSecurity -user emc_username -password emc_passwd -scope 0
[root@localhost bin]# cd /root
[root@localhost ~]# ls
SecuredCLISecurityFile.xml
SecuredCLIXMLEncrypted.key
第一次执行查询命令需要保存证书,选择2接受并保存,再次执行命令即可直接显示信息:
[root@localhost ~]# cd /opt/Navisphere/bin/
[root@localhost bin]# ls
admsnap naviseccli setlevel_cli.sh setlevel.log
[root@localhost bin]# ./naviseccli -h 192.168.130.75 getcrus
Unable to validate the identity of the server. There are issues with the certificate presented.
Only import this certificate if you have reason to believe it was sent by a trusted source.
Certificate details:
Subject: CN=192.168.130.75,CN=A-IMAGE,C=US,ST=Massachusetts,L=Southboro,O=EMC Corporation,OU=CLARiiON
Issuer: CN=192.168.130.75,CN=A-IMAGE,C=US,ST=Massachusetts,L=Southboro,O=EMC Corporation,OU=CLARiiON
Serial#: fe91a4ec
Valid From: 20121126045806Z
Valid To: 20271123045806Z
Would you like to [1]Accept the certificate for this session, [2] Accept and store, [3] Reject the certificate?
Please input your selection(The default selection is [1]):
2
DPE7 Bus 0 Enclosure 0
SP A State: Present
SP B State: Present
......
[root@localhost bin]# ./naviseccli -h 192.168.130.75 getcrus
DPE7 Bus 0 Enclosure 0
SP A State: Present
SP B State: Present
......
查看已保存的证书:
[root@localhost ~]# /opt/Navisphere/bin/naviseccli security -certificate -list
--------------------------------------------
Subject: CN=192.168.130.75,CN=A-IMAGE,C=US,ST=Massachusetts,L=Southboro,O=EMC Corporation,OU=CLARiiON
Issuer: CN=192.168.130.75,CN=A-IMAGE,C=US,ST=Massachusetts,L=Southboro,O=EMC Corporation,OU=CLARiiON
Serial#: fe91a4ec
Valid From: 20121126045806Z
Valid To: 20271123045806Z
--------------------------------------------
NaviSecCLI常用命令
显示系统中各组件状态:
naviseccli -h <ip> getcrus
显示哪个SP是某个LUN默认和当前的主SP:
naviseccli -h <ip> getlun -default -owner
显示指定行数的SPlog日志(如:200行):
naviseccli -h <ip> getlog -200
或将输出结果另存为本地文件:
naviseccli -h <ip> getlog -200 > getlog_spa.txt
确认SP Agent状态:
naviseccli -h <ip> getagent
显示主机LUN和阵列LUN信息:
naviseccli -h <ip> storagegroup -list
显示RAID Group基本信息:
naviseccli -h <ip> getrg 0
显示磁盘信息:
naviseccli -h <ip> getdisk
naviseccli -h <ip> getdisk 0_0_5
找出哪些LUN有Dirty Cache:
naviseccli -h <ip> getlun -luncache
显示Rebuild进度:
naviseccli -h <ip> getlun [lun] -prb
收集SPCollects日志:
naviseccli -h <ip> spcollect
naviseccli -h <ip> managefiles -retrieve
列出哪些HBA登录了系统中:
naviseccli -h <ip> port -list
列出组件的部件号:
naviseccli -h <ip> getresume
显示Cache是否启用及配置信息:
naviseccli -h <ip> getcache
列出被启用的系统功能包:
naviseccli -h <ip> ndu -list
Trespass某个LUN:
naviseccli -h <ip> trespass <lun>
发起一个后台sniffer检查命令:
naviseccli -h <ip> setsniffer <lun> -bv -bvtime high -cr
获得Sniffer报告:
naviseccli -h <ip> getsniffer <lun>
监控脚本介绍及使用方法
emc_discovery.py ,用于构建json数据,实现Zabbix中的自动发现,可自动发现 CPU、DIMM、Disk、I/O、LCC、Power、SP、SPS、SPS Cable 。
emc_state.py ,获取监控项的监控数据。
注意以下几点:
- 数据均通过zabbix_sender向zabbix_server传递;
- 需要修改脚本中的EMC存储地址及zabbix_server地址;
- 两个脚本可能并不适用其他配置的EMC存储,但基本思路及数据处理方法相同,读者可根据自己的存储配置进行修改。
- 工作杂事太多,没有对脚本进行优化(包括自动发现通用性、处理过程函数化等),先将就一下。
配置两条crontab定时任务即可,例如:
0 23 * * 6 /usr/bin/python /root/EMC/emc_discovery.py > /tmp/emc_discovery.log
5 * * * * /usr/bin/python /root/EMC/emc_state.py > /tmp/emc_state.log
每周六23点执行一次自动发现,每小时取一次监控项数据。
Zabbix web端的配置
新建主机,hostname字段与脚本中zabbix_sender的-z参数保持一致即可。
手动执行一次脚本,查看监控数据是否刷新。