XCC의 리셋이 계속 발생한다면
원격 관리가 힘들어집니다.
원인은 펌웨어 버그로 확인됩니다. 아래 링크 확인해보시면됩니다.
xcc reset history 를 보게되면 아래와 같은 메시지가 남습니다.
1. Reset due to exhausted memory
2. Userspace hang - wdt2 pre-trigger fired
3. Empty reboot reasons
4. ssp reset the psp
5. bootblock reset after flashing primary bootblock
6. kernel hang - wdt1 pretrigger fired
datacentersupport.lenovo.com/bo/en/products/servers/thinksystem/sr860/7x70/solutions/ht509282
Unexpected XCC reset is triggered on ThinkSystem platforms - Lenovo ThinkSystem
Symptom
The system may have unexpected XCC reboots after upgrading XCC to v2.50 (CDI334W)or later.
When unexpected XCC reboots occur, BMC Reset Summary Log will have the following messages:
1. Reset due to exhausted memory
2. Userspace hang - wdt2 pre-trigger fired
3. Empty reboot reasons
4. ssp reset the psp
5. bootblock reset after flashing primary bootblock
6. kernel hang - wdt1 pretrigger fired
Affected Configurations
The system may be any of the following Lenovo servers:
- Lenovo ThinkAgile VX Series, Type 7Y91, any model
- ThinkAgile HX Series, Type 7Y81, any model
- ThinkAgile HX Series, Type 7Y82, any model
- ThinkAgile HX1320 Nutanix Appliance, Type 7X83, any model
- ThinkAgile HX1321, Type 7Z04, any model
- ThinkAgile HX1520-R Nutanix Appliance, Type 7X84, any model
- ThinkAgile HX1521-R, Type 7Y90, any model
- ThinkAgile HX1521-R, Type 7Z05, any model
- ThinkAgile HX2320-E Nutanix Appliance, Type 7X83, any model
- ThinkAgile HX2720-E Nutanix Appliance, Type 7X82, any model
- ThinkAgile HX3320 Nutanix Appliance, Type 7X83, any model
- ThinkAgile HX3321, Type 7Y89, any model
- ThinkAgile HX3520-G Nutanix Appliance, Type 7X84, any model
- ThinkAgile HX3521-G, Type 7Z05, any model
- ThinkAgile HX3720 Nutanix Appliance, Type 7X82, any model
- ThinkAgile HX3721 Enclosure, Type 7Y87, any model
- ThinkAgile HX3721 Enclosure, Type 7Z02, any model
- ThinkAgile HX3721 Nutanix Appliance, Type 7Y88, any model
- ThinkAgile HX3721 Nutanix Appliance, Type 7Z03, any model
- ThinkAgile HX3721, Type 7Y88, any model
- ThinkAgile HX5520 Nutanix Appliance, Type 7X84, any model
- ThinkAgile HX5520-C Nutanix Appliance, Type 7X84, any model
- ThinkAgile HX7320-N Nutanix Appliance, Type 7X83, any model
- ThinkAgile HX7520 Nutanix Appliance, Type 7X84, any model
- ThinkAgile HX7520-N Nutanix Appliance, Type 7X84, any model
- ThinkAgile HX7720 Nutanix Appliance, Type 7X82, any model
- ThinkAgile HX7820 Nutanix Appliance, Type 7Y95, any model
- ThinkAgile HX7820 Nutanix Appliance, Type 7Z08, any model
- ThinkAgile HX7821 Nutanix Appliance, Type 7Y96, any model
- ThinkAgile HX7821 Nutanix Appliance, Type 7Z09, any model
- ThinkAgile MX Certified Node, Type 7Z20, any model
- ThinkAgile VX2320, Type 7Y13, any model
- ThinkAgile VX3320, Type 7Y93, any model
- ThinkAgile VX3520-G, Type 7Y14, any model
- ThinkAgile VX3720, Type 7Y12, any model
- ThinkAgile VX3720, Type 7Y92, any model
- ThinkSystem SR850, Type 7X18, any model
- ThinkSystem SR850, Type 7X19, any model
- ThinkSystem SR860, Type 7X69, any model 7X69, 7X70
- ThinkSystem SR950, Type 7X11, any model 7X11, 7X12
- ThinkSystem SR950, Type 7X12, any model
- ThinkSystem SD530, Type 7X21, any model
- ThinkSystem SR530, Type 7X07, any model
- ThinkSystem SR530, Type 7X08, any model
- ThinkSystem SR550, Type 7X03, any model
- ThinkSystem SR550, Type 7X04, any model
- ThinkSystem SR630, Type 7X01, any model
- ThinkSystem SR630, Type 7X02, any model
- ThinkSystem SR650, Type 7X05, any model
- ThinkSystem SR650, Type 7X06, any model
- ThinkSystem SR670, Type 7Y36, any model
- ThinkSystem SR670, Type 7Y37, any model
- ThinkSystem ST550, Type 7X09, any model 7X09, 7X10
- ThinkSystem ST550, Type 7X10, any model
This tip is not software specific.
This tip is not option specific.
The system has the symptom described above.
Solution
The following XCC versions (and future releases) will address the known issues for causing XCC to reboot.
v4.8(CDI358P)
v4.0(TEI3A4L)
v1.9(PSI332T)
The file is or will be available by selecting the appropriate Product Group, type of System, Product name, Product machine type, and Operating system on Lenovo Support web page, at the following URL:
http://datacentersupport.lenovo.com/
Workaround
No workaround.
Additional Information
There may be multiple causes to unexpected XCC reset.
1. IPMI over KCS
IPMI packages increase immediately to result in primary service processor spinlock
2. Stingray/Nginx
Stingray Nginx has some issues. When the total event log number exceeds 2048, memory leakage may happen.
3. Tmp file is full
When redis-server is terminated, the Redfish Nginx will print logs indicating connection failure rapidly, and then the logs will fill /tmp up in a couple of minutes.
There are some issues in the new Redfish changes in Second Quarter 2019 release, which also generate a lot of logs that fill /tmp up in a couple of minutes. It is observed that the memory leakage happens every 24.8 days.
4. This is normal behavior of XCC to reset its own file system to prevent XCC file system hang or inability to function normally.
(where SLP = Service Location Protocol, IPMI = Intelligent Platform Management Interface, KCS = keyboard controller style)
'LENOVO' 카테고리의 다른 글
CVPM BATTERY Degraded 교체해도 동일증상일때 (0) | 2021.06.16 |
---|---|
IPMITOOL 윈도우용 다운로드 파일 첨부 SR635/SR655 VPD update (0) | 2021.05.21 |
lenovo ds4200 firmware update방법 (0) | 2020.04.25 |
How to update VPD using OneCLI (0) | 2019.07.17 |
XCC 모델별 기본 탑재 유무 확인 (0) | 2019.01.24 |