Virtual machines could have been restarted by vSphere HA for several reasons.
Have you ever imagined a HA restart of VM happened when –
- No host failure , No HA heart beat failure
- No VM monitoring configured at HA – Virtual machine options.
- Only one VM in a host (present in a HA cluster with 5 hosts) was restarted while rest of the VMs stay alive and healthy.
- No host isolation has happened.
Event Message in vCenter :
“vSphere HA restarted this virtual machine “. No related events found.
Analyzing FDM.log in the host didn’t provided any sufficient information about the restart of a specific VM.
Analyzing vmkernel.log retrieved some information as below,
2016-12-23T13:53:35.921Z cpu25:46625)UserDump: 1907: Dumping cartel 46625(from world 46625) to file /vmfs/volumes/56692f88-ae779548-ad65-0025b501a007/vmname/vmx-zdump.000 ... 2016-12-23T13:53:45.615Z cpu1:46625)UserDump: 2031: Userworld coredump complete. 2016-12-23T13:53:45.626Z cpu6:46625)WARNING: World: vm 46625: 3973: VMMWorld group leader = 46626, members = 4
From above logs , the VM <> in its file location have got a zdump because of higher resource usage resulting in coredump operation.
Digging more onto vmware.log revealed some more errors,
2016-12-23T13:53:35.814Z| vmx| I120: VERIFY bora/lib/misc/strutil.c:1079 2016-12-23T13:53:45.615Z| vmx| W110: A core file is available in "/vmfs/volumes/56692f88-ae779548-ad65-0025b501a007/VMNAME/vmx-zdump.000" 2016-12-23T13:53:45.615Z| vmx| W110: Writing monitor corefile "/vmfs/volumes/56692f88-ae779548-ad65-0025b501a007/VMNAME/vmmcores.gz" 2016-12-23T13:53:45.620Z| vmx| W110: CoreDump error line 2160, error Cannot allocate memory
This happens when there are two specific reasons.
- Reason 1:
The Virtual machine is running SAP host agent which in turn ruins up the VMX memory handling process causing the Guest OS to hang or a VM failure with core dumping operation.
This is because , SAP host agent is trying to retrieve metrics from the ESXi host through the virtual machine causing a memory leak in the ESXi host resulting in total memory exhaustion.
Affected Products : ESXi 5.5 U3 b , ESXi 6.0, ESXi 6.0u1, or ESXi 6.0u1a
This isssue is fixed in ESXi 6.0U1b
Resolution:
Perform vmotion operation in virtual machine to fix it temporarily for the moment or upgrade the ESXi to 6.0U1b
Reference:
- Reason 2 :
This reason applies when additionally you see below errors in ESXi vobd.log file.
"2016-12-23T13:53:45.615Z: [UserWorldCorrelator] 2379432939456us: [vob.uw.core.dumped] /bin/vmx(46625) /vmfs/volumes/56692f88-ae779548-ad65-0025b501a007/vmname/vmx-zdump.000 2016-12-23T13:53:45.615Z: [UserWorldCorrelator] 2379441994799us: [esx.problem.application.core.dumped] An application (/bin/vmx) running on ESXi host has crashed (1 time(s) so far). A core file may have been created at /vmfs/volumes/56692f88-ae779548-ad65-0025b501a007/vmname/vmx-zdump.000."
The above error reports that it is a faulty CPU present in the ESXi host.
You can follow below KB for checking which CPU core is reporting error while thread is being cloned.
Resolution:
Check and replace the faulty CPU if needed.
