ESXi Host

Virtual Machine Restart or VMM Panic "CoreDump error line 2160, error Cannot allocate memory"

Virtual machines could have been restarted by vSphere HA for several reasons.

Have you ever imagined a HA restart of VM happened when –

  • No host failure , No HA heart beat failure
  • No VM monitoring configured at HA – Virtual machine options.
  • Only one VM in a host (present in a HA cluster with 5 hosts) was restarted while rest of the VMs stay alive and healthy.
  • No host isolation has happened.

Event Message in vCenter :

“vSphere HA restarted this virtual machine “. No related events found.

Analyzing FDM.log in the host didn’t provided any sufficient information about the restart of a specific VM.

Analyzing vmkernel.log retrieved some information as below,

2016-12-23T13:53:35.921Z cpu25:46625)UserDump: 1907: Dumping cartel 46625(from world 46625) to file /vmfs/volumes/56692f88-ae779548-ad65-0025b501a007/vmname/vmx-zdump.000 ...
2016-12-23T13:53:45.615Z cpu1:46625)UserDump: 2031: Userworld coredump complete.
2016-12-23T13:53:45.626Z cpu6:46625)WARNING: World: vm 46625: 3973: VMMWorld group leader = 46626, members = 4 

From above logs , the VM <> in its file location have got a zdump because of higher resource usage resulting in coredump operation.

Digging more onto vmware.log revealed some more errors,

2016-12-23T13:53:35.814Z| vmx| I120: VERIFY bora/lib/misc/strutil.c:1079
2016-12-23T13:53:45.615Z| vmx| W110: A core file is available in "/vmfs/volumes/56692f88-ae779548-ad65-0025b501a007/VMNAME/vmx-zdump.000"
2016-12-23T13:53:45.615Z| vmx| W110: Writing monitor corefile "/vmfs/volumes/56692f88-ae779548-ad65-0025b501a007/VMNAME/vmmcores.gz"
2016-12-23T13:53:45.620Z| vmx| W110: CoreDump error line 2160, error Cannot allocate memory

This happens when there are two specific reasons.

  • Reason 1:

The Virtual machine is running SAP host agent which in turn ruins up the VMX memory handling process causing the Guest OS to hang or a VM failure with core dumping operation.

This is because , SAP host agent is trying to retrieve metrics from the ESXi host through the virtual machine causing a memory leak in the ESXi host resulting in total memory exhaustion.

Affected Products : ESXi 5.5 U3 b , ESXi 6.0, ESXi 6.0u1, or ESXi 6.0u1a

This isssue is fixed in ESXi 6.0U1b

Resolution:

Perform vmotion operation in virtual machine to fix it temporarily for the moment or upgrade the ESXi to 6.0U1b

Reference:

https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2137310

  • Reason 2 :

This reason applies when additionally you see below errors in ESXi vobd.log file.

"2016-12-23T13:53:45.615Z: [UserWorldCorrelator] 2379432939456us: [vob.uw.core.dumped] /bin/vmx(46625) /vmfs/volumes/56692f88-ae779548-ad65-0025b501a007/vmname/vmx-zdump.000
2016-12-23T13:53:45.615Z: [UserWorldCorrelator] 2379441994799us: [esx.problem.application.core.dumped] An application (/bin/vmx) running on ESXi host has crashed (1 time(s) so far). A core file may have been created at /vmfs/volumes/56692f88-ae779548-ad65-0025b501a007/vmname/vmx-zdump.000."

The above error reports that it is a faulty CPU present in the ESXi host.

You can follow below KB for checking which CPU core is reporting error while thread is being cloned.

https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1002771

Resolution:

Check and replace the faulty CPU if needed.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.