Quantcast
Channel: VMware Communities : Discussion List - ESXi
Viewing all articles
Browse latest Browse all 8132

Kernel tracebacks after vSphere upgrade to 6.0

$
0
0

Hi all,

 

We have this vAPP in vCloud that VMs have been working flawlessly since day 1. But after we updated our vSphere environment to version 6.0, VMs started rebooting themselves all the sudden and they wouldn't remain up for more than like 5 days at the most. Then we disabled VM monitoring on those VMs at the cluster's configuration and now HA won't reboot those VMs but we are seeing all kind of kernel trace calls, like for example:

 

Jul  5 06:19:35 compu kernel: hrtimer: interrupt took 42859741512 ns

Jul  5 06:19:36 compu kernel: INFO: rcu_sched self-detected stall on CPU

Jul  5 06:19:36 compu kernel: INFO: rcu_sched detected stalls on CPUs/tasks: { 3} (detected by 0, t=69334 jiffies, g=9237069, c=9237068, q=5)

Jul  5 06:19:36 compu kernel: Task dump for CPU 3:

Jul  5 06:19:36 compu kernel: kworker/3:2     R  running task        0 22078      2 0x00000088

Jul  5 06:19:36 compu kernel: sched: RT throttling activated

Jul  5 06:19:36 compu kernel: Workqueue: events_freezable vmballoon_work [vmw_balloon]

Jul  5 06:19:36 compu kernel: { 3}  (t=73140 jiffies g=9237069 c=9237068 q=5)

Jul  5 06:19:36 compu kernel: 0000000000000000 0000000000000200 000000003e25cd65 ffffffff81f0a540

Jul  5 06:19:36 compu kernel: 0000000000000202 0000000000000000 0000000000000000 ffff880716cd0000

Jul  5 06:19:36 compu kernel: ffff88041e9d3dc0 ffffffff811cf94a ffff880716cd0000 0000000000000000

Jul  5 06:19:36 compu kernel: Call Trace:

Jul  5 06:19:36 compu kernel: [<ffffffff811cf94a>] ? alloc_pages_current+0xaa/0x170

Jul  5 06:19:36 compu kernel: [<ffffffffa03b5ff8>] ? vmballoon_work+0x3b8/0x618 [vmw_balloon]

Jul  5 06:19:36 compu kernel: [<ffffffff810a845b>] ? process_one_work+0x17b/0x470

Jul  5 06:19:36 compu kernel: [<ffffffff810a9296>] ? worker_thread+0x126/0x410

Jul  5 06:19:36 compu kernel: [<ffffffff810a9170>] ? rescuer_thread+0x460/0x460

Jul  5 06:19:36 compu kernel: [<ffffffff810b0a4f>] ? kthread+0xcf/0xe0

Jul  5 06:19:36 compu kernel: [<ffffffff810bfde3>] ? finish_task_switch+0x53/0x180

Jul  5 06:19:36 compu kernel: [<ffffffff810b0980>] ? kthread_create_on_node+0x140/0x140

Jul  5 06:19:36 compu kernel: [<ffffffff81697698>] ? ret_from_fork+0x58/0x90

Jul  5 06:19:36 compu kernel: [<ffffffff810b0980>] ? kthread_create_on_node+0x140/0x140

Jul  5 06:19:36 compu kernel: Task dump for CPU 3:

Jul  5 06:19:36 compu kernel: kworker/3:2     R  running task        0 22078      2 0x00000088

Jul  5 06:19:36 compu kernel: Workqueue: events_freezable vmballoon_work [vmw_balloon]

Jul  5 06:19:36 compu kernel: ffff880716cd0000 000000003e25cd65 ffff880c3fd83c48 ffffffff810c47e8

Jul  5 06:19:36 compu kernel: 0000000000000003 ffffffff81a1e6c0 ffff880c3fd83c60 ffffffff810c80c9

Jul  5 06:19:36 compu kernel: 0000000000000004 ffff880c3fd83c90 ffffffff81137960 ffff880c3fd901c0

Jul  5 06:19:36 compu kernel: Call Trace:

Jul  5 06:19:36 compu kernel: <IRQ>  [<ffffffff810c47e8>] sched_show_task+0xa8/0x110

Jul  5 06:19:36 compu kernel: [<ffffffff810c80c9>] dump_cpu_task+0x39/0x70

Jul  5 06:19:36 compu kernel: [<ffffffff81137960>] rcu_dump_cpu_stacks+0x90/0xd0

Jul  5 06:19:36 compu kernel: [<ffffffff8113b0b2>] rcu_check_callbacks+0x442/0x720

Jul  5 06:19:36 compu kernel: [<ffffffff810f3610>] ? tick_sched_handle.isra.13+0x60/0x60

Jul  5 06:19:36 compu kernel: [<ffffffff81099697>] update_process_times+0x47/0x80

Jul  5 06:19:36 compu kernel: [<ffffffff810f35d5>] tick_sched_handle.isra.13+0x25/0x60

Jul  5 06:19:36 compu kernel: [<ffffffff810f3651>] tick_sched_timer+0x41/0x70

Jul  5 06:19:36 compu kernel: [<ffffffff810b4d72>] __hrtimer_run_queues+0xd2/0x260

Jul  5 06:19:36 compu kernel: [<ffffffff810b5310>] hrtimer_interrupt+0xb0/0x1e0

Jul  5 06:19:36 compu kernel: [<ffffffff81050ff7>] local_apic_timer_interrupt+0x37/0x60

Jul  5 06:19:36 compu kernel: [<ffffffff81699e4f>] smp_apic_timer_interrupt+0x3f/0x60

Jul  5 06:19:36 compu kernel: [<ffffffff8169839d>] apic_timer_interrupt+0x6d/0x80

Jul  5 06:19:36 compu kernel: [<ffffffff8108f5e8>] ? __do_softirq+0x98/0x280

Jul  5 06:19:36 compu kernel: [<ffffffff810f12fb>] ? clockevents_program_event+0x6b/0xf0

Jul  5 06:19:36 compu kernel: [<ffffffff816991dc>] call_softirq+0x1c/0x30

Jul  5 06:19:36 compu kernel: [<ffffffff8102d365>] do_softirq+0x65/0xa0

Jul  5 06:19:36 compu kernel: [<ffffffff8108f9d5>] irq_exit+0x115/0x120

Jul  5 06:19:36 compu kernel: [<ffffffff81699e55>] smp_apic_timer_interrupt+0x45/0x60

Jul  5 06:19:36 compu kernel: [<ffffffff8169839d>] apic_timer_interrupt+0x6d/0x80

Jul  5 06:19:36 compu kernel: <EOI>  [<ffffffff8118ac84>] ? get_page_from_freelist+0x434/0x9f0

Jul  5 06:19:36 compu kernel: [<ffffffff8118b3b6>] __alloc_pages_nodemask+0x176/0x420

Jul  5 06:19:36 compu kernel: [<ffffffff811cf94a>] alloc_pages_current+0xaa/0x170

Jul  5 06:19:36 compu kernel: [<ffffffffa03b5ff8>] vmballoon_work+0x3b8/0x618 [vmw_balloon]

Jul  5 06:19:36 compu kernel: [<ffffffff810a845b>] process_one_work+0x17b/0x470

Jul  5 06:19:36 compu kernel: [<ffffffff810a9296>] worker_thread+0x126/0x410

Jul  5 06:19:36 compu kernel: [<ffffffff810a9170>] ? rescuer_thread+0x460/0x460

Jul  5 06:19:36 compu kernel: [<ffffffff810b0a4f>] kthread+0xcf/0xe0

Jul  5 06:19:37 compu kernel: [<ffffffff810bfde3>] ? finish_task_switch+0x53/0x180

Jul  5 06:19:37 compu kernel: [<ffffffff810b0980>] ? kthread_create_on_node+0x140/0x140

Jul  5 06:19:37 compu kernel: [<ffffffff81697698>] ret_from_fork+0x58/0x90

Jul  5 06:19:37 compu kernel: [<ffffffff810b0980>] ? kthread_create_on_node+0x140/0x140

Jul  6 02:19:37 compu kernel: INFO: rcu_sched detected stalls on CPUs/tasks: { 3} (detected by 1, t=75951 jiffies, g=11716833, c=11716832, q=7)

Jul  6 02:19:37 compu kernel: Task dump for CPU 3:

Jul  6 02:19:37 compu kernel: kworker/3:0     R  running task        0  3126      2 0x00000088

Jul  6 02:19:37 compu kernel: Workqueue: events_freezable vmballoon_work [vmw_balloon]

Jul  6 02:19:37 compu kernel: ffff880291e33d98 0000000000000046 ffff880700474e70 ffff880291e33fd8

Jul  6 02:19:37 compu kernel: ffff880291e33fd8 ffff880291e33fd8 ffff880700474e70 ffff880291e30000

Jul  6 02:19:37 compu kernel: 00000000000000ab 00000000000000ab 0000000000000000 ffffffffa03b8558

Jul  6 02:19:37 compu kernel: Call Trace:

Jul  6 02:19:37 compu kernel: [<ffffffff810c2076>] __cond_resched+0x26/0x30

Jul  6 02:19:37 compu kernel: [<ffffffff8168c9ea>] _cond_resched+0x3a/0x50

Jul  6 02:19:37 compu kernel: [<ffffffffa03b6029>] vmballoon_work+0x3e9/0x618 [vmw_balloon]

Jul  6 02:19:37 compu kernel: [<ffffffff810a845b>] process_one_work+0x17b/0x470

Jul  6 02:19:37 compu kernel: [<ffffffff810a9296>] worker_thread+0x126/0x410

Jul  6 02:19:37 compu kernel: [<ffffffff810a9170>] ? rescuer_thread+0x460/0x460

Jul  6 02:19:37 compu kernel: [<ffffffff810b0a4f>] kthread+0xcf/0xe0

Jul  6 02:19:37 compu kernel: [<ffffffff810b0980>] ? kthread_create_on_node+0x140/0x140

Jul  6 02:19:37 compu kernel: [<ffffffff81697698>] ret_from_fork+0x58/0x90

Jul  6 02:19:37 compu kernel: [<ffffffff810b0980>] ? kthread_create_on_node+0x140/0x140

 

 

 

I opened a case with VMware support looking for help regarding this issue, but all I got was a "go check out the OS or the applications" response. To be honest issues started next day after we updated to vSphere 6.0. We updated only half of the nodes in the cluster so the other half was still running on 5.5. furthermore, we split VMs 50-50 on 6.0 and 5.5, and the VMs hosted on 5.5 nodes wouldn't shown that kind of behavior. Since we eventually had to complete the update on the rest of the nodes they are all now running on version 6.0 and the issue is far from gone.

 

The issue in question comes in for VMs within this vAPP. The vCloud account has more than enough resources to provide for the VMs in that pool reservation. The problem happens only on VMs running CentOS 7 with kernel 3.10 or 3.11.

 

Now, I read something on a different post somewhere else that it might be an incompatibility issue between the hypervisor and said kernel version, as the Linux flavor running on those VMs should be the last or our concerns. Before trying installing kernel 4.x from unofficial repositories, I was wondering if someone has encountered with the same situation, and want to share their approach if the actually troubleshooted or are having the same problem so we can share information and hopefully nail this down.

 

Thanks in advance.


Viewing all articles
Browse latest Browse all 8132

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>