[ Retrieve as mbox ]
From: Ian Jackson <Ian.Jackson@eu.citrix.com> To: konrad.wilk@oracle.com Cc: xen-devel@lists.xen.org Subject: [Xen-devel] 3.4.70+ kernel WARNING spew dysfunction on failed migration Date: Tue, 7 Jan 2014 18:55:56 +0000 Message-ID: <21196.19900.136146.867552@mariner.uk.xensource.com>
[ Reply to this message; Retrieve Raw Message; Archives: gmane, marc.info ]
I did the following test: mv /etc/xen/scripts/block /etc/xen/scripts/block.aside xl migrate debian.guest.osstest localhost xl did what appears to be the right thing: it did most of the migration, failed to run the block scripts at the end of the migration, and destroyed the destination domain and instead resumed the source guest. However, the source guest immediately went mad spewing WARNINGs and was after that no longer contactable via the network and not apparently responsive on the console. See below. This is with: [ 0.000000] Linux version 3.4.70+ (osstest@rice-weevil) (gcc version 4.4.5 (Debian 4.4.5-8) ) #1 SMP Wed Dec 4 03:14:51 GMT 2013 For reasons I don't understand it doesn't seem to print the actual kernel git hash in dmesg, but I think it was that from flight 22264, i.e. 234d96ee0f3b8e49501d068a2a3165aa4db60903. It's i386, on a 64-bit Xen. Thanks, Ian. debian login: [ 124.595658] PM: freeze of devices complete after 2.980 msecs [ 124.595991] PM: late freeze of devices complete after 0.013 msecs [ 124.600919] PM: noirq freeze of devices complete after 4.884 msecs [ 124.601105] Grant tables using version 2 layout. [ 124.601105] ------------[ cut here ]------------ [ 124.601105] kernel BUG at drivers/xen/events.c:1582! [ 124.601105] invalid opcode: 0000 [#1] SMP [ 124.601105] Modules linked in: [last unloaded: scsi_wait_scan] [ 124.601105] [ 124.601105] Pid: 6, comm: migration/0 Not tainted 3.4.70+ #1 [ 124.601105] EIP: 0061:[<c12f5d25>] EFLAGS: 00010082 CPU: 0 [ 124.601105] EIP is at xen_irq_resume+0x215/0x370 [ 124.601105] EAX: ffffffef EBX: deadbeef ECX: deadbeef EDX: 00000000 [ 124.601105] ESI: c190b020 EDI: df461f24 EBP: df451eb8 ESP: df451e10 [ 124.601105] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0069 [ 124.601105] CR0: 8005003b CR2: 08b7c8a8 CR3: 038f0000 CR4: 00002660 [ 124.601105] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 [ 124.601105] DR6: ffff0ff0 DR7: 00000400 [ 124.601105] Process migration/0 (pid: 6, ti=df450000 task=df43d860 task.ti=df450000) [ 124.601105] Stack: [ 124.601105] c104ea40 df451e18 c398b80c deadbeef df461f10 df451e58 c12f350a c19b165c [ 124.601105] df451e94 00000003 df451e78 c190b080 c190b020 00000000 00000010 00000000 [ 124.601105] 00000000 00000000 9420f17e 0008a6c2 fc798ba3 0008a6df 00000004 00000413 [ 124.601105] Call Trace: [ 124.601105] [<c104ea40>] ? xen_iret_crit_fixup+0x3c/0x3c [ 124.601105] [<c12f350a>] ? gnttab_map_frames_v2+0xda/0x120 [ 124.601105] [<c1055b90>] ? xen_spin_lock+0xa0/0x100 [ 124.601105] [<c104d155>] ? xen_mm_unpin_all+0x65/0x80 [ 124.601105] [<c12f6cad>] xen_suspend+0x8d/0xc0 [ 124.601105] [<c10e750b>] stop_machine_cpu_stop+0x9b/0x110 [ 124.601105] [<c10e71f7>] cpu_stopper_thread+0xc7/0x1a0 [ 124.601105] [<c10b3f6f>] ? finish_task_switch+0x5f/0xe0 [ 124.601105] [<c10e7470>] ? stop_one_cpu_nowait+0x40/0x40 [ 124.601105] [<c10b682b>] ? default_wake_function+0xb/0x10 [ 124.601105] [<c10af990>] ? __wake_up_common+0x40/0x70 [ 124.601105] [<c16441ad>] ? _raw_spin_unlock_irqrestore+0x2d/0x50 [ 124.601105] [<c10b2479>] ? complete+0x49/0x60 [ 124.601105] [<c10e7130>] ? res_counter_charge+0x180/0x180 [ 124.601105] [<c10a7474>] kthread+0x74/0x80 [ 124.601105] [<c10a7400>] ? kthread_freezable_should_stop+0x60/0x60 [ 124.601105] [<c164b276>] kernel_thread_helper+0x6/0x10 [ 124.601105] Code: 22 e8 ff ff 8b 55 8c 89 d8 e8 88 e6 ff ff 83 45 94 01 83 7d 94 04 0f 84 80 fe ff ff 8b 55 8c 8b 04 95 e0 11 88 c1 e9 64 ff ff ff <0f> 0b eb fe 0f 0b eb fe 8b 1d 00 60 85 c1 81 fb 00 60 85 c1 74 [ 124.601105] EIP: [<c12f5d25>] xen_irq_resume+0x215/0x370 SS:ESP 0069:df451e10 [ 124.601105] ---[ end trace 69a5c8cd56e77bce ]--- [ 124.601105] ------------[ cut here ]------------ [ 124.601105] WARNING: at kernel/time/tick-sched.c:464 tick_nohz_idle_enter+0x7a/0x90() [ 124.601105] Modules linked in: [last unloaded: scsi_wait_scan] [ 124.601105] Pid: 0, comm: swapper/0 Tainted: G D 3.4.70+ #1 [ 124.601105] Call Trace: [ 124.601105] [<c10888ed>] warn_slowpath_common+0x6d/0xa0 [ 124.601105] [<c10d1b1a>] ? tick_nohz_idle_enter+0x7a/0x90 [ 124.601105] [<c10d1b1a>] ? tick_nohz_idle_enter+0x7a/0x90 [ 124.601105] [<c108893d>] warn_slowpath_null+0x1d/0x20 [ 124.601105] [<c10d1b1a>] tick_nohz_idle_enter+0x7a/0x90 [ 124.601105] [<c105e22a>] cpu_idle+0x1a/0xa0 [ 124.601105] [<c16242f8>] rest_init+0x58/0x60 [ 124.601105] [<c1887919>] start_kernel+0x355/0x35b [ 124.601105] [<c1887435>] ? kernel_init+0x1cf/0x1cf [ 124.601105] [<c18870ba>] i386_start_kernel+0xa9/0xb0 [ 124.601105] [<c188b733>] xen_start_kernel+0x5c4/0x5cc [ 124.601105] ---[ end trace 69a5c8cd56e77bcf ]--- [ 124.601105] ------------[ cut here ]------------ [ 124.601105] WARNING: at kernel/time/timekeeping.c:266 ktime_get+0xe9/0x100() [ 124.601105] Modules linked in: [last unloaded: scsi_wait_scan] [ 124.601105] Pid: 0, comm: swapper/0 Tainted: G D W 3.4.70+ #1 [ 124.601105] Call Trace: [ 124.601105] [<c10888ed>] warn_slowpath_common+0x6d/0xa0 [ 124.601105] [<c10c9fc9>] ? ktime_get+0xe9/0x100 [ 124.601105] [<c10c9fc9>] ? ktime_get+0xe9/0x100 [ 124.601105] [<c108893d>] warn_slowpath_null+0x1d/0x20 [ 124.601105] [<c10c9fc9>] ktime_get+0xe9/0x100 [ 124.601105] [<c10d1664>] tick_nohz_stop_sched_tick+0x34/0x3f0 [ 124.601105] [<c10886ea>] ? print_oops_end_marker+0x2a/0x30 [ 124.601105] [<c10888fd>] ? warn_slowpath_common+0x7d/0xa0 [ 124.601105] [<c10d1b1a>] ? tick_nohz_idle_enter+0x7a/0x90 [ 124.601105] [<c10d1ae5>] tick_nohz_idle_enter+0x45/0x90 [ 124.601105] [<c105e22a>] cpu_idle+0x1a/0xa0 [ 124.601105] [<c16242f8>] rest_init+0x58/0x60 [ 124.601105] [<c1887919>] start_kernel+0x355/0x35b [ 124.601105] [<c1887435>] ? kernel_init+0x1cf/0x1cf [ 124.601105] [<c18870ba>] i386_start_kernel+0xa9/0xb0 [ 124.601105] [<c188b733>] xen_start_kernel+0x5c4/0x5cc [ 124.601105] ---[ end trace 69a5c8cd56e77bd0 ]--- [ 124.601105] ------------[ cut here ]------------ [ 124.601105] WARNING: at kernel/time/timekeeping.c:266 ktime_get+0xe9/0x100() [ 124.601105] Modules linked in: [last unloaded: scsi_wait_scan] [ 124.601105] Pid: 0, comm: swapper/0 Tainted: G D W 3.4.70+ #1 [ 124.601105] Call Trace: [ 124.601105] [<c10888ed>] warn_slowpath_common+0x6d/0xa0 [ 124.601105] [<c10c9fc9>] ? ktime_get+0xe9/0x100 [ 124.601105] [<c10c9fc9>] ? ktime_get+0xe9/0x100 [ 124.601105] [<c108893d>] warn_slowpath_null+0x1d/0x20 [ 124.601105] [<c10c9fc9>] ktime_get+0xe9/0x100 [ 124.601105] [<c10d1379>] tick_check_idle+0x39/0xf0 [ 124.601105] [<c108f06c>] irq_enter+0x4c/0x70 [ 124.601105] [<c12f4e36>] xen_evtchn_do_upcall+0x16/0x30 [ 124.601105] [<c164b2c7>] xen_do_upcall+0x7/0xc [ 124.601105] [<c1002227>] ? hypercall_page+0x227/0x1000 [ 124.601105] [<c104e16a>] ? xen_force_evtchn_callback+0x1a/0x30 [ 124.601105] [<c104e994>] check_events+0x8/0xc [ 124.601105] [<c104e93c>] ? xen_clocksource_get_cycles+0xc/0xc [ 124.601105] [<c104e953>] ? xen_irq_enable_direct_reloc+0x4/0x4 [ 124.601105] [<c10d1af4>] ? tick_nohz_idle_enter+0x54/0x90 [ 124.601105] [<c105e22a>] cpu_idle+0x1a/0xa0 [ 124.601105] [<c16242f8>] rest_init+0x58/0x60 [ 124.601105] [<c1887919>] start_kernel+0x355/0x35b [ 124.601105] [<c1887435>] ? kernel_init+0x1cf/0x1cf [ 124.601105] [<c18870ba>] i386_start_kernel+0xa9/0xb0 [ 124.601105] [<c188b733>] xen_start_kernel+0x5c4/0x5cc [ 124.601105] ---[ end trace 69a5c8cd56e77bd1 ]--- [ 124.601105] ------------[ cut here ]------------ [ 124.601105] WARNING: at kernel/time/timekeeping.c:266 ktime_get+0xe9/0x100() [ 124.601105] Modules linked in: [last unloaded: scsi_wait_scan] [ 124.601105] Pid: 0, comm: swapper/0 Tainted: G D W 3.4.70+ #1 [ 124.601105] Call Trace: [ 124.601105] [<c10888ed>] warn_slowpath_common+0x6d/0xa0 [ 124.601105] [<c10c9fc9>] ? ktime_get+0xe9/0x100 [ 124.601105] [<c10c9fc9>] ? ktime_get+0xe9/0x100 [ 124.601105] [<c108893d>] warn_slowpath_null+0x1d/0x20 [ 124.601105] [<c10c9fc9>] ktime_get+0xe9/0x100 [ 124.601105] [<c10d1664>] tick_nohz_stop_sched_tick+0x34/0x3f0 [ 124.601105] [<c12f4288>] ? info_for_irq+0x8/0x20 [ 124.601105] [<c12f47c3>] ? evtchn_from_irq+0x13/0x40 [ 124.601105] [<c104e789>] ? xen_clocksource_read+0x19/0x20 [ 124.601105] [<c12f4b68>] ? __xen_evtchn_do_upcall+0x258/0x2b0 [ 124.601105] [<c10d1a5f>] tick_nohz_irq_exit+0x3f/0x80 [ 124.601105] [<c108ef1f>] irq_exit+0x4f/0xb0 [ 124.601105] [<c12f4e40>] xen_evtchn_do_upcall+0x20/0x30 [ 124.601105] [<c164b2c7>] xen_do_upcall+0x7/0xc [ 124.601105] [<c1002227>] ? hypercall_page+0x227/0x1000 [ 124.601105] [<c104e16a>] ? xen_force_evtchn_callback+0x1a/0x30 [ 124.601105] [<c104e994>] check_events+0x8/0xc [ 124.601105] [<c104e93c>] ? xen_clocksource_get_cycles+0xc/0xc [ 124.601105] [<c104e953>] ? xen_irq_enable_direct_reloc+0x4/0x4 [ 124.601105] [<c10d1af4>] ? tick_nohz_idle_enter+0x54/0x90 [ 124.601105] [<c105e22a>] cpu_idle+0x1a/0xa0 [ 124.601105] [<c16242f8>] rest_init+0x58/0x60 [ 124.601105] [<c1887919>] start_kernel+0x355/0x35b [ 124.601105] [<c1887435>] ? kernel_init+0x1cf/0x1cf [ 124.601105] [<c18870ba>] i386_start_kernel+0xa9/0xb0 [ 124.601105] [<c188b733>] xen_start_kernel+0x5c4/0x5cc [ 124.601105] ---[ end trace 69a5c8cd56e77bd2 ]--- [ 124.601105] ------------[ cut here ]------------ [ 124.601105] WARNING: at kernel/time/timekeeping.c:266 ktime_get+0xe9/0x100() [ 124.601105] Modules linked in: [last unloaded: scsi_wait_scan] [ 124.601105] Pid: 0, comm: swapper/0 Tainted: G D W 3.4.70+ #1 [ 124.601105] Call Trace: [ 124.601105] [<c10888ed>] warn_slowpath_common+0x6d/0xa0 [ 124.601105] [<c10c9fc9>] ? ktime_get+0xe9/0x100 [ 124.601105] [<c10c9fc9>] ? ktime_get+0xe9/0x100 [ 124.601105] [<c108893d>] warn_slowpath_null+0x1d/0x20 [ 124.601105] [<c10c9fc9>] ktime_get+0xe9/0x100 [ 124.601105] [<c10d1379>] tick_check_idle+0x39/0xf0 [ 124.601105] [<c108f06c>] irq_enter+0x4c/0x70 [ 124.601105] [<c12f4e36>] xen_evtchn_do_upcall+0x16/0x30 [ 124.601105] [<c164b2c7>] xen_do_upcall+0x7/0xc [ 124.601105] [<c10023a7>] ? hypercall_page+0x3a7/0x1000 [ 124.601105] [<c104e1c2>] ? xen_safe_halt+0x12/0x20 [ 124.601105] [<c104e1b0>] ? xen_irq_disable+0x10/0x10 [ 124.601105] [<c105ed2b>] default_idle+0x5b/0x190 [ 124.601105] [<c1040054>] ? svm_set_tsc_khz+0x74/0x140 [ 124.601105] [<c105e27f>] cpu_idle+0x6f/0xa0 [ 124.601105] [<c16242f8>] rest_init+0x58/0x60 [ 124.601105] [<c1887919>] start_kernel+0x355/0x35b [ 124.601105] [<c1887435>] ? kernel_init+0x1cf/0x1cf [ 124.601105] [<c18870ba>] i386_start_kernel+0xa9/0xb0 [ 124.601105] [<c188b733>] xen_start_kernel+0x5c4/0x5cc [ 124.601105] ---[ end trace 69a5c8cd56e77bd3 ]--- [ 124.601105] ------------[ cut here ]------------ [ 124.601105] WARNING: at kernel/time/timekeeping.c:266 ktime_get+0xe9/0x100() [ 124.601105] Modules linked in: [last unloaded: scsi_wait_scan] [ 124.601105] Pid: 0, comm: swapper/0 Tainted: G D W 3.4.70+ #1 [ 124.601105] Call Trace: [ 124.601105] [<c10888ed>] warn_slowpath_common+0x6d/0xa0 [ 124.601105] [<c10c9fc9>] ? ktime_get+0xe9/0x100 [ 124.601105] [<c10c9fc9>] ? ktime_get+0xe9/0x100 [ 124.601105] [<c108893d>] warn_slowpath_null+0x1d/0x20 [ 124.601105] [<c10c9fc9>] ktime_get+0xe9/0x100 [ 124.601105] [<c10d1664>] tick_nohz_stop_sched_tick+0x34/0x3f0 [ 124.601105] [<c12f4288>] ? info_for_irq+0x8/0x20 [ 124.601105] [<c12f47c3>] ? evtchn_from_irq+0x13/0x40 [ 124.601105] [<c104e789>] ? xen_clocksource_read+0x19/0x20 [ 124.601105] [<c12f4b68>] ? __xen_evtchn_do_upcall+0x258/0x2b0 [ 124.601105] [<c10d1a5f>] tick_nohz_irq_exit+0x3f/0x80 [ 124.601105] [<c108ef1f>] irq_exit+0x4f/0xb0 [ 124.601105] [<c12f4e40>] xen_evtchn_do_upcall+0x20/0x30 [ 124.601105] [<c164b2c7>] xen_do_upcall+0x7/0xc [ 124.601105] [<c10023a7>] ? hypercall_page+0x3a7/0x1000 [ 124.601105] [<c104e1c2>] ? xen_safe_halt+0x12/0x20 [ 124.601105] [<c104e1b0>] ? xen_irq_disable+0x10/0x10 [ 124.601105] [<c105ed2b>] default_idle+0x5b/0x190 [ 124.601105] [<c1040054>] ? svm_set_tsc_khz+0x74/0x140 [ 124.601105] [<c105e27f>] cpu_idle+0x6f/0xa0 [ 124.601105] [<c16242f8>] rest_init+0x58/0x60 [ 124.601105] [<c1887919>] start_kernel+0x355/0x35b [ 124.601105] [<c1887435>] ? kernel_init+0x1cf/0x1cf [ 124.601105] [<c18870ba>] i386_start_kernel+0xa9/0xb0 [ 124.601105] [<c188b733>] xen_start_kernel+0x5c4/0x5cc [ 124.601105] ---[ end trace 69a5c8cd56e77bd4 ]--- [ 124.601105] ------------[ cut here ]------------ [ 124.601105] WARNING: at kernel/time/timekeeping.c:266 ktime_get+0xe9/0x100() [ 124.601105] Modules linked in: [last unloaded: scsi_wait_scan] [ 124.601105] Pid: 0, comm: swapper/0 Tainted: G D W 3.4.70+ #1 [ 124.601105] Call Trace: [ 124.601105] [<c10888ed>] warn_slowpath_common+0x6d/0xa0 [ 124.601105] [<c10c9fc9>] ? ktime_get+0xe9/0x100 [ 124.601105] [<c10c9fc9>] ? ktime_get+0xe9/0x100 [ 124.601105] [<c108893d>] warn_slowpath_null+0x1d/0x20 [ 124.601105] [<c10c9fc9>] ktime_get+0xe9/0x100 [ 124.601105] [<c10d1379>] tick_check_idle+0x39/0xf0 [ 124.601105] [<c108f06c>] irq_enter+0x4c/0x70 [ 124.601105] [<c12f4e36>] xen_evtchn_do_upcall+0x16/0x30 [ 124.601105] [<c164b2c7>] xen_do_upcall+0x7/0xc [ 124.601105] [<c10023a7>] ? hypercall_page+0x3a7/0x1000 [ 124.601105] [<c104e1c2>] ? xen_safe_halt+0x12/0x20 [ 124.601105] [<c104e1b0>] ? xen_irq_disable+0x10/0x10 [ 124.601105] [<c105ed2b>] default_idle+0x5b/0x190 [ 124.601105] [<c1040054>] ? svm_set_tsc_khz+0x74/0x140 [ 124.601105] [<c105e27f>] cpu_idle+0x6f/0xa0 [ 124.601105] [<c16242f8>] rest_init+0x58/0x60 [ 124.601105] [<c1887919>] start_kernel+0x355/0x35b [ 124.601105] [<c1887435>] ? kernel_init+0x1cf/0x1cf [ 124.601105] [<c18870ba>] i386_start_kernel+0xa9/0xb0 [ 124.601105] [<c188b733>] xen_start_kernel+0x5c4/0x5cc [ 124.601105] ---[ end trace 69a5c8cd56e77bd5 ]--- [ 124.601105] ------------[ cut here ]------------ _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> To: Ian Jackson <Ian.Jackson@eu.citrix.com>, boris.ostrovsky@oracle.com, david.vrabel@citrix.com Cc: xen-devel@lists.xen.org Subject: Re: [Xen-devel] 3.4.70+ kernel WARNING spew dysfunction on failed migration Date: Tue, 7 Jan 2014 14:11:56 -0500 Message-ID: <20140107191156.GA10370@phenom.dumpdata.com>
[ Reply to this message; Retrieve Raw Message; Archives: gmane, marc.info ]
On Tue, Jan 07, 2014 at 06:55:56PM +0000, Ian Jackson wrote: > I did the following test: > > mv /etc/xen/scripts/block /etc/xen/scripts/block.aside > xl migrate debian.guest.osstest localhost > > xl did what appears to be the right thing: it did most of the > migration, failed to run the block scripts at the end of the > migration, and destroyed the destination domain and instead resumed > the source guest. > > However, the source guest immediately went mad spewing WARNINGs and > was after that no longer contactable via the network and not > apparently responsive on the console. See below. > > This is with: > > [ 0.000000] Linux version 3.4.70+ (osstest@rice-weevil) (gcc > version 4.4.5 (Debian 4.4.5-8) ) #1 SMP Wed Dec 4 03:14:51 GMT 2013 > > For reasons I don't understand it doesn't seem to print the actual > kernel git hash in dmesg, but I think it was that from flight 22264, > i.e. 234d96ee0f3b8e49501d068a2a3165aa4db60903. It's i386, on a > 64-bit Xen. This a bit of ancient kernel. Does it show up with 3.12? CC-ing the other maintainers. > > Thanks, > Ian. > > debian login: [ 124.595658] PM: freeze of devices complete after 2.980 msecs > [ 124.595991] PM: late freeze of devices complete after 0.013 msecs > [ 124.600919] PM: noirq freeze of devices complete after 4.884 msecs > [ 124.601105] Grant tables using version 2 layout. > [ 124.601105] ------------[ cut here ]------------ > [ 124.601105] kernel BUG at drivers/xen/events.c:1582! > [ 124.601105] invalid opcode: 0000 [#1] SMP > [ 124.601105] Modules linked in: [last unloaded: scsi_wait_scan] > [ 124.601105] > [ 124.601105] Pid: 6, comm: migration/0 Not tainted 3.4.70+ #1 > [ 124.601105] EIP: 0061:[<c12f5d25>] EFLAGS: 00010082 CPU: 0 > [ 124.601105] EIP is at xen_irq_resume+0x215/0x370 > [ 124.601105] EAX: ffffffef EBX: deadbeef ECX: deadbeef EDX: 00000000 > [ 124.601105] ESI: c190b020 EDI: df461f24 EBP: df451eb8 ESP: df451e10 > [ 124.601105] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0069 > [ 124.601105] CR0: 8005003b CR2: 08b7c8a8 CR3: 038f0000 CR4: 00002660 > [ 124.601105] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 > [ 124.601105] DR6: ffff0ff0 DR7: 00000400 > [ 124.601105] Process migration/0 (pid: 6, ti=df450000 task=df43d860 task.ti=df450000) > [ 124.601105] Stack: > [ 124.601105] c104ea40 df451e18 c398b80c deadbeef df461f10 df451e58 c12f350a c19b165c > [ 124.601105] df451e94 00000003 df451e78 c190b080 c190b020 00000000 00000010 00000000 > [ 124.601105] 00000000 00000000 9420f17e 0008a6c2 fc798ba3 0008a6df 00000004 00000413 > [ 124.601105] Call Trace: > [ 124.601105] [<c104ea40>] ? xen_iret_crit_fixup+0x3c/0x3c > [ 124.601105] [<c12f350a>] ? gnttab_map_frames_v2+0xda/0x120 > [ 124.601105] [<c1055b90>] ? xen_spin_lock+0xa0/0x100 > [ 124.601105] [<c104d155>] ? xen_mm_unpin_all+0x65/0x80 > [ 124.601105] [<c12f6cad>] xen_suspend+0x8d/0xc0 > [ 124.601105] [<c10e750b>] stop_machine_cpu_stop+0x9b/0x110 > [ 124.601105] [<c10e71f7>] cpu_stopper_thread+0xc7/0x1a0 > [ 124.601105] [<c10b3f6f>] ? finish_task_switch+0x5f/0xe0 > [ 124.601105] [<c10e7470>] ? stop_one_cpu_nowait+0x40/0x40 > [ 124.601105] [<c10b682b>] ? default_wake_function+0xb/0x10 > [ 124.601105] [<c10af990>] ? __wake_up_common+0x40/0x70 > [ 124.601105] [<c16441ad>] ? _raw_spin_unlock_irqrestore+0x2d/0x50 > [ 124.601105] [<c10b2479>] ? complete+0x49/0x60 > [ 124.601105] [<c10e7130>] ? res_counter_charge+0x180/0x180 > [ 124.601105] [<c10a7474>] kthread+0x74/0x80 > [ 124.601105] [<c10a7400>] ? kthread_freezable_should_stop+0x60/0x60 > [ 124.601105] [<c164b276>] kernel_thread_helper+0x6/0x10 > [ 124.601105] Code: 22 e8 ff ff 8b 55 8c 89 d8 e8 88 e6 ff ff 83 45 94 01 83 7d 94 04 0f 84 80 fe ff ff 8b 55 8c 8b 04 95 e0 11 88 c1 e9 64 ff ff ff <0f> 0b eb fe 0f 0b eb fe 8b 1d 00 60 85 c1 81 fb 00 60 85 c1 74 > [ 124.601105] EIP: [<c12f5d25>] xen_irq_resume+0x215/0x370 SS:ESP 0069:df451e10 > [ 124.601105] ---[ end trace 69a5c8cd56e77bce ]--- > [ 124.601105] ------------[ cut here ]------------ > [ 124.601105] WARNING: at kernel/time/tick-sched.c:464 tick_nohz_idle_enter+0x7a/0x90() > [ 124.601105] Modules linked in: [last unloaded: scsi_wait_scan] > [ 124.601105] Pid: 0, comm: swapper/0 Tainted: G D 3.4.70+ #1 > [ 124.601105] Call Trace: > [ 124.601105] [<c10888ed>] warn_slowpath_common+0x6d/0xa0 > [ 124.601105] [<c10d1b1a>] ? tick_nohz_idle_enter+0x7a/0x90 > [ 124.601105] [<c10d1b1a>] ? tick_nohz_idle_enter+0x7a/0x90 > [ 124.601105] [<c108893d>] warn_slowpath_null+0x1d/0x20 > [ 124.601105] [<c10d1b1a>] tick_nohz_idle_enter+0x7a/0x90 > [ 124.601105] [<c105e22a>] cpu_idle+0x1a/0xa0 > [ 124.601105] [<c16242f8>] rest_init+0x58/0x60 > [ 124.601105] [<c1887919>] start_kernel+0x355/0x35b > [ 124.601105] [<c1887435>] ? kernel_init+0x1cf/0x1cf > [ 124.601105] [<c18870ba>] i386_start_kernel+0xa9/0xb0 > [ 124.601105] [<c188b733>] xen_start_kernel+0x5c4/0x5cc > [ 124.601105] ---[ end trace 69a5c8cd56e77bcf ]--- > [ 124.601105] ------------[ cut here ]------------ > [ 124.601105] WARNING: at kernel/time/timekeeping.c:266 ktime_get+0xe9/0x100() > [ 124.601105] Modules linked in: [last unloaded: scsi_wait_scan] > [ 124.601105] Pid: 0, comm: swapper/0 Tainted: G D W 3.4.70+ #1 > [ 124.601105] Call Trace: > [ 124.601105] [<c10888ed>] warn_slowpath_common+0x6d/0xa0 > [ 124.601105] [<c10c9fc9>] ? ktime_get+0xe9/0x100 > [ 124.601105] [<c10c9fc9>] ? ktime_get+0xe9/0x100 > [ 124.601105] [<c108893d>] warn_slowpath_null+0x1d/0x20 > [ 124.601105] [<c10c9fc9>] ktime_get+0xe9/0x100 > [ 124.601105] [<c10d1664>] tick_nohz_stop_sched_tick+0x34/0x3f0 > [ 124.601105] [<c10886ea>] ? print_oops_end_marker+0x2a/0x30 > [ 124.601105] [<c10888fd>] ? warn_slowpath_common+0x7d/0xa0 > [ 124.601105] [<c10d1b1a>] ? tick_nohz_idle_enter+0x7a/0x90 > [ 124.601105] [<c10d1ae5>] tick_nohz_idle_enter+0x45/0x90 > [ 124.601105] [<c105e22a>] cpu_idle+0x1a/0xa0 > [ 124.601105] [<c16242f8>] rest_init+0x58/0x60 > [ 124.601105] [<c1887919>] start_kernel+0x355/0x35b > [ 124.601105] [<c1887435>] ? kernel_init+0x1cf/0x1cf > [ 124.601105] [<c18870ba>] i386_start_kernel+0xa9/0xb0 > [ 124.601105] [<c188b733>] xen_start_kernel+0x5c4/0x5cc > [ 124.601105] ---[ end trace 69a5c8cd56e77bd0 ]--- > [ 124.601105] ------------[ cut here ]------------ > [ 124.601105] WARNING: at kernel/time/timekeeping.c:266 ktime_get+0xe9/0x100() > [ 124.601105] Modules linked in: [last unloaded: scsi_wait_scan] > [ 124.601105] Pid: 0, comm: swapper/0 Tainted: G D W 3.4.70+ #1 > [ 124.601105] Call Trace: > [ 124.601105] [<c10888ed>] warn_slowpath_common+0x6d/0xa0 > [ 124.601105] [<c10c9fc9>] ? ktime_get+0xe9/0x100 > [ 124.601105] [<c10c9fc9>] ? ktime_get+0xe9/0x100 > [ 124.601105] [<c108893d>] warn_slowpath_null+0x1d/0x20 > [ 124.601105] [<c10c9fc9>] ktime_get+0xe9/0x100 > [ 124.601105] [<c10d1379>] tick_check_idle+0x39/0xf0 > [ 124.601105] [<c108f06c>] irq_enter+0x4c/0x70 > [ 124.601105] [<c12f4e36>] xen_evtchn_do_upcall+0x16/0x30 > [ 124.601105] [<c164b2c7>] xen_do_upcall+0x7/0xc > [ 124.601105] [<c1002227>] ? hypercall_page+0x227/0x1000 > [ 124.601105] [<c104e16a>] ? xen_force_evtchn_callback+0x1a/0x30 > [ 124.601105] [<c104e994>] check_events+0x8/0xc > [ 124.601105] [<c104e93c>] ? xen_clocksource_get_cycles+0xc/0xc > [ 124.601105] [<c104e953>] ? xen_irq_enable_direct_reloc+0x4/0x4 > [ 124.601105] [<c10d1af4>] ? tick_nohz_idle_enter+0x54/0x90 > [ 124.601105] [<c105e22a>] cpu_idle+0x1a/0xa0 > [ 124.601105] [<c16242f8>] rest_init+0x58/0x60 > [ 124.601105] [<c1887919>] start_kernel+0x355/0x35b > [ 124.601105] [<c1887435>] ? kernel_init+0x1cf/0x1cf > [ 124.601105] [<c18870ba>] i386_start_kernel+0xa9/0xb0 > [ 124.601105] [<c188b733>] xen_start_kernel+0x5c4/0x5cc > [ 124.601105] ---[ end trace 69a5c8cd56e77bd1 ]--- > [ 124.601105] ------------[ cut here ]------------ > [ 124.601105] WARNING: at kernel/time/timekeeping.c:266 ktime_get+0xe9/0x100() > [ 124.601105] Modules linked in: [last unloaded: scsi_wait_scan] > [ 124.601105] Pid: 0, comm: swapper/0 Tainted: G D W 3.4.70+ #1 > [ 124.601105] Call Trace: > [ 124.601105] [<c10888ed>] warn_slowpath_common+0x6d/0xa0 > [ 124.601105] [<c10c9fc9>] ? ktime_get+0xe9/0x100 > [ 124.601105] [<c10c9fc9>] ? ktime_get+0xe9/0x100 > [ 124.601105] [<c108893d>] warn_slowpath_null+0x1d/0x20 > [ 124.601105] [<c10c9fc9>] ktime_get+0xe9/0x100 > [ 124.601105] [<c10d1664>] tick_nohz_stop_sched_tick+0x34/0x3f0 > [ 124.601105] [<c12f4288>] ? info_for_irq+0x8/0x20 > [ 124.601105] [<c12f47c3>] ? evtchn_from_irq+0x13/0x40 > [ 124.601105] [<c104e789>] ? xen_clocksource_read+0x19/0x20 > [ 124.601105] [<c12f4b68>] ? __xen_evtchn_do_upcall+0x258/0x2b0 > [ 124.601105] [<c10d1a5f>] tick_nohz_irq_exit+0x3f/0x80 > [ 124.601105] [<c108ef1f>] irq_exit+0x4f/0xb0 > [ 124.601105] [<c12f4e40>] xen_evtchn_do_upcall+0x20/0x30 > [ 124.601105] [<c164b2c7>] xen_do_upcall+0x7/0xc > [ 124.601105] [<c1002227>] ? hypercall_page+0x227/0x1000 > [ 124.601105] [<c104e16a>] ? xen_force_evtchn_callback+0x1a/0x30 > [ 124.601105] [<c104e994>] check_events+0x8/0xc > [ 124.601105] [<c104e93c>] ? xen_clocksource_get_cycles+0xc/0xc > [ 124.601105] [<c104e953>] ? xen_irq_enable_direct_reloc+0x4/0x4 > [ 124.601105] [<c10d1af4>] ? tick_nohz_idle_enter+0x54/0x90 > [ 124.601105] [<c105e22a>] cpu_idle+0x1a/0xa0 > [ 124.601105] [<c16242f8>] rest_init+0x58/0x60 > [ 124.601105] [<c1887919>] start_kernel+0x355/0x35b > [ 124.601105] [<c1887435>] ? kernel_init+0x1cf/0x1cf > [ 124.601105] [<c18870ba>] i386_start_kernel+0xa9/0xb0 > [ 124.601105] [<c188b733>] xen_start_kernel+0x5c4/0x5cc > [ 124.601105] ---[ end trace 69a5c8cd56e77bd2 ]--- > [ 124.601105] ------------[ cut here ]------------ > [ 124.601105] WARNING: at kernel/time/timekeeping.c:266 ktime_get+0xe9/0x100() > [ 124.601105] Modules linked in: [last unloaded: scsi_wait_scan] > [ 124.601105] Pid: 0, comm: swapper/0 Tainted: G D W 3.4.70+ #1 > [ 124.601105] Call Trace: > [ 124.601105] [<c10888ed>] warn_slowpath_common+0x6d/0xa0 > [ 124.601105] [<c10c9fc9>] ? ktime_get+0xe9/0x100 > [ 124.601105] [<c10c9fc9>] ? ktime_get+0xe9/0x100 > [ 124.601105] [<c108893d>] warn_slowpath_null+0x1d/0x20 > [ 124.601105] [<c10c9fc9>] ktime_get+0xe9/0x100 > [ 124.601105] [<c10d1379>] tick_check_idle+0x39/0xf0 > [ 124.601105] [<c108f06c>] irq_enter+0x4c/0x70 > [ 124.601105] [<c12f4e36>] xen_evtchn_do_upcall+0x16/0x30 > [ 124.601105] [<c164b2c7>] xen_do_upcall+0x7/0xc > [ 124.601105] [<c10023a7>] ? hypercall_page+0x3a7/0x1000 > [ 124.601105] [<c104e1c2>] ? xen_safe_halt+0x12/0x20 > [ 124.601105] [<c104e1b0>] ? xen_irq_disable+0x10/0x10 > [ 124.601105] [<c105ed2b>] default_idle+0x5b/0x190 > [ 124.601105] [<c1040054>] ? svm_set_tsc_khz+0x74/0x140 > [ 124.601105] [<c105e27f>] cpu_idle+0x6f/0xa0 > [ 124.601105] [<c16242f8>] rest_init+0x58/0x60 > [ 124.601105] [<c1887919>] start_kernel+0x355/0x35b > [ 124.601105] [<c1887435>] ? kernel_init+0x1cf/0x1cf > [ 124.601105] [<c18870ba>] i386_start_kernel+0xa9/0xb0 > [ 124.601105] [<c188b733>] xen_start_kernel+0x5c4/0x5cc > [ 124.601105] ---[ end trace 69a5c8cd56e77bd3 ]--- > [ 124.601105] ------------[ cut here ]------------ > [ 124.601105] WARNING: at kernel/time/timekeeping.c:266 ktime_get+0xe9/0x100() > [ 124.601105] Modules linked in: [last unloaded: scsi_wait_scan] > [ 124.601105] Pid: 0, comm: swapper/0 Tainted: G D W 3.4.70+ #1 > [ 124.601105] Call Trace: > [ 124.601105] [<c10888ed>] warn_slowpath_common+0x6d/0xa0 > [ 124.601105] [<c10c9fc9>] ? ktime_get+0xe9/0x100 > [ 124.601105] [<c10c9fc9>] ? ktime_get+0xe9/0x100 > [ 124.601105] [<c108893d>] warn_slowpath_null+0x1d/0x20 > [ 124.601105] [<c10c9fc9>] ktime_get+0xe9/0x100 > [ 124.601105] [<c10d1664>] tick_nohz_stop_sched_tick+0x34/0x3f0 > [ 124.601105] [<c12f4288>] ? info_for_irq+0x8/0x20 > [ 124.601105] [<c12f47c3>] ? evtchn_from_irq+0x13/0x40 > [ 124.601105] [<c104e789>] ? xen_clocksource_read+0x19/0x20 > [ 124.601105] [<c12f4b68>] ? __xen_evtchn_do_upcall+0x258/0x2b0 > [ 124.601105] [<c10d1a5f>] tick_nohz_irq_exit+0x3f/0x80 > [ 124.601105] [<c108ef1f>] irq_exit+0x4f/0xb0 > [ 124.601105] [<c12f4e40>] xen_evtchn_do_upcall+0x20/0x30 > [ 124.601105] [<c164b2c7>] xen_do_upcall+0x7/0xc > [ 124.601105] [<c10023a7>] ? hypercall_page+0x3a7/0x1000 > [ 124.601105] [<c104e1c2>] ? xen_safe_halt+0x12/0x20 > [ 124.601105] [<c104e1b0>] ? xen_irq_disable+0x10/0x10 > [ 124.601105] [<c105ed2b>] default_idle+0x5b/0x190 > [ 124.601105] [<c1040054>] ? svm_set_tsc_khz+0x74/0x140 > [ 124.601105] [<c105e27f>] cpu_idle+0x6f/0xa0 > [ 124.601105] [<c16242f8>] rest_init+0x58/0x60 > [ 124.601105] [<c1887919>] start_kernel+0x355/0x35b > [ 124.601105] [<c1887435>] ? kernel_init+0x1cf/0x1cf > [ 124.601105] [<c18870ba>] i386_start_kernel+0xa9/0xb0 > [ 124.601105] [<c188b733>] xen_start_kernel+0x5c4/0x5cc > [ 124.601105] ---[ end trace 69a5c8cd56e77bd4 ]--- > [ 124.601105] ------------[ cut here ]------------ > [ 124.601105] WARNING: at kernel/time/timekeeping.c:266 ktime_get+0xe9/0x100() > [ 124.601105] Modules linked in: [last unloaded: scsi_wait_scan] > [ 124.601105] Pid: 0, comm: swapper/0 Tainted: G D W 3.4.70+ #1 > [ 124.601105] Call Trace: > [ 124.601105] [<c10888ed>] warn_slowpath_common+0x6d/0xa0 > [ 124.601105] [<c10c9fc9>] ? ktime_get+0xe9/0x100 > [ 124.601105] [<c10c9fc9>] ? ktime_get+0xe9/0x100 > [ 124.601105] [<c108893d>] warn_slowpath_null+0x1d/0x20 > [ 124.601105] [<c10c9fc9>] ktime_get+0xe9/0x100 > [ 124.601105] [<c10d1379>] tick_check_idle+0x39/0xf0 > [ 124.601105] [<c108f06c>] irq_enter+0x4c/0x70 > [ 124.601105] [<c12f4e36>] xen_evtchn_do_upcall+0x16/0x30 > [ 124.601105] [<c164b2c7>] xen_do_upcall+0x7/0xc > [ 124.601105] [<c10023a7>] ? hypercall_page+0x3a7/0x1000 > [ 124.601105] [<c104e1c2>] ? xen_safe_halt+0x12/0x20 > [ 124.601105] [<c104e1b0>] ? xen_irq_disable+0x10/0x10 > [ 124.601105] [<c105ed2b>] default_idle+0x5b/0x190 > [ 124.601105] [<c1040054>] ? svm_set_tsc_khz+0x74/0x140 > [ 124.601105] [<c105e27f>] cpu_idle+0x6f/0xa0 > [ 124.601105] [<c16242f8>] rest_init+0x58/0x60 > [ 124.601105] [<c1887919>] start_kernel+0x355/0x35b > [ 124.601105] [<c1887435>] ? kernel_init+0x1cf/0x1cf > [ 124.601105] [<c18870ba>] i386_start_kernel+0xa9/0xb0 > [ 124.601105] [<c188b733>] xen_start_kernel+0x5c4/0x5cc > [ 124.601105] ---[ end trace 69a5c8cd56e77bd5 ]--- > [ 124.601105] ------------[ cut here ]------------ _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
From: Ian Jackson <Ian.Jackson@eu.citrix.com> To: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: xen-devel@lists.xen.org, david.vrabel@citrix.com, boris.ostrovsky@oracle.com Subject: Re: [Xen-devel] 3.4.70+ kernel WARNING spew dysfunction on failed migration Date: Tue, 7 Jan 2014 19:23:32 +0000 Message-ID: <21196.21556.181273.225889@mariner.uk.xensource.com>
[ Reply to this message; Retrieve Raw Message; Archives: gmane, marc.info ]
Konrad Rzeszutek Wilk writes ("Re: 3.4.70+ kernel WARNING spew dysfunction on failed migration"): > On Tue, Jan 07, 2014 at 06:55:56PM +0000, Ian Jackson wrote: > > For reasons I don't understand it doesn't seem to print the actual > > kernel git hash in dmesg, but I think it was that from flight 22264, > > i.e. 234d96ee0f3b8e49501d068a2a3165aa4db60903. It's i386, on a > > 64-bit Xen. > > This a bit of ancient kernel. Does it show up with 3.12? 3.4.70 is what the osstest push gate is using. (ISTR trying to switch to 3.11 but encountering some problem.) I haven't tried 3.12 but can do so. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
From: Boris Ostrovsky <boris.ostrovsky@oracle.com> To: Ian Jackson <Ian.Jackson@eu.citrix.com> Cc: xen-devel@lists.xen.org, david.vrabel@citrix.com Subject: Re: [Xen-devel] 3.4.70+ kernel WARNING spew dysfunction on failed migration Date: Tue, 07 Jan 2014 14:36:44 -0500 Message-ID: <52CC574C.8080405@oracle.com>
[ Reply to this message; Retrieve Raw Message; Archives: gmane, marc.info ]
On 01/07/2014 02:23 PM, Ian Jackson wrote: > Konrad Rzeszutek Wilk writes ("Re: 3.4.70+ kernel WARNING spew dysfunction on failed migration"): >> On Tue, Jan 07, 2014 at 06:55:56PM +0000, Ian Jackson wrote: >>> For reasons I don't understand it doesn't seem to print the actual >>> kernel git hash in dmesg, but I think it was that from flight 22264, >>> i.e. 234d96ee0f3b8e49501d068a2a3165aa4db60903. It's i386, on a >>> 64-bit Xen. >> This a bit of ancient kernel. Does it show up with 3.12? > 3.4.70 is what the osstest push gate is using. (ISTR trying to switch > to 3.11 but encountering some problem.) > > I haven't tried 3.12 but can do so. > > Ian. This is hypercall failing, btw: https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/tree/drivers/xen/events.c?id=refs/tags/v3.4.75#n1582 -boris _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
From: Boris Ostrovsky <boris.ostrovsky@oracle.com> To: Ian Jackson <Ian.Jackson@eu.citrix.com> Cc: david.vrabel@citrix.com, xen-devel@lists.xen.org Subject: Re: [Xen-devel] 3.4.70+ kernel WARNING spew dysfunction on failed migration Date: Tue, 07 Jan 2014 15:05:48 -0500 Message-ID: <52CC5E1C.90704@oracle.com>
[ Reply to this message; Retrieve Raw Message; Archives: gmane, marc.info ]
On 01/07/2014 02:36 PM, Boris Ostrovsky wrote: > On 01/07/2014 02:23 PM, Ian Jackson wrote: >> Konrad Rzeszutek Wilk writes ("Re: 3.4.70+ kernel WARNING spew >> dysfunction on failed migration"): >>> On Tue, Jan 07, 2014 at 06:55:56PM +0000, Ian Jackson wrote: >>>> For reasons I don't understand it doesn't seem to print the actual >>>> kernel git hash in dmesg, but I think it was that from flight 22264, >>>> i.e. 234d96ee0f3b8e49501d068a2a3165aa4db60903. It's i386, on a >>>> 64-bit Xen. >>> This a bit of ancient kernel. Does it show up with 3.12? >> 3.4.70 is what the osstest push gate is using. (ISTR trying to switch >> to 3.11 but encountering some problem.) >> >> I haven't tried 3.12 but can do so. >> >> Ian. > > This is hypercall failing, btw: > > https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/tree/drivers/xen/events.c?id=refs/tags/v3.4.75#n1582 > More specifically, it fails if ( v->virq_to_evtchn[virq] != 0 ) ERROR_EXIT(-EEXIST); in Xen's evtchn_bind_virq(). Would be interesting to see if this is still a problem in new kernels. -boris _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
From: Ian Campbell <Ian.Campbell@citrix.com> To: Ian Jackson <Ian.Jackson@eu.citrix.com> Cc: xen-devel@lists.xen.org Subject: Re: [Xen-devel] 3.4.70+ kernel WARNING spew dysfunction on failed migration Date: Tue, 7 Jan 2014 22:43:01 +0000 Message-ID: <1389134581.6917.19.camel@dagon.hellion.org.uk>
[ Reply to this message; Retrieve Raw Message; Archives: gmane, marc.info ]
On Tue, 2014-01-07 at 18:55 +0000, Ian Jackson wrote: > I did the following test: > > mv /etc/xen/scripts/block /etc/xen/scripts/block.aside > xl migrate debian.guest.osstest localhost > > xl did what appears to be the right thing: it did most of the > migration, failed to run the block scripts at the end of the > migration, and destroyed the destination domain and instead resumed > the source guest. > > However, the source guest immediately went mad spewing WARNINGs and > was after that no longer contactable via the network and not > apparently responsive on the console. See below. Might this be the libxl resume thing described at the end of: http://lists.xen.org/archives/html/xen-devel/2013-02/msg00130.html ? I thought we'd switch to using fast resume by default to workaround this, but looking at the code it seems not. It'd be lovely if the slow path finally got implemented instead of falling through the cracks again. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
From: Ian Campbell <Ian.Campbell@citrix.com> To: Ian Jackson <Ian.Jackson@eu.citrix.com> Cc: xen-devel@lists.xen.org Subject: Re: [Xen-devel] 3.4.70+ kernel WARNING spew dysfunction on failed migration Date: Wed, 8 Jan 2014 13:02:24 +0000 Message-ID: <1389186144.4883.60.camel@kazak.uk.xensource.com>
[ Reply to this message; Retrieve Raw Message; Archives: gmane, marc.info ]
create ^ title it libxl should implement non-suspend-cancel based resume path owner Ian Jackson <Ian.Jackson@eu.citrix.com> thanks To summarise what I just said to Ian J in the corridor (and lets have a bug to record it): There are two mechanisms by which a suspend can be aborted and the original domain resumed. The older method is that the toolstack resets a bunch of state (see tools/python/xen/xend/XendDomainInfo.py resumeDomain) and then restarts the domain. The domain will see HYPERVISOR_suspend return 0 and will continue without any realisation that it is actually running in the original domain and not in a new one. This method is supposed to be implemented by libxl_domain_resume(suspend_cancel=0) but it is not. The other method is newer and in this case the toolstack arranges that HYPERVISOR_suspend returns 1 and restarts it (I beleiv . The domain will observe this and realise that it has been restarted in the same domain and will behave accordingly. This method is implemented, correctly AFAIK, by libxl_domain_resume(suspend_cancel=1). However the newer method is not available in all kernels, although it does date from the Linux 2.6.18 days and is implemented in all Linux pvops kernels I can't speak for others (e.g. BSD). The toolstack is supposed to check for the XEN_ELFNOTE_SUSPEND_CANCEL ELF note when building the domain. The presence/absence of this flag needs to be remembered so that it can be consulted on resume (this also implies preserving that knowledge over migration). xl currently uses libxl_domain_resume(suspend_cancel=0) on migration failure which as it stands won't work for *any* domain. Arguably switching to suspend_cancel=1 for now will mean that some subset of kernels will work, and those which don't will not have regressed, until we can correctly implement the suspend_cancel=0 and the necessary tracking of XEN_ELFNOTE_SUSPEND_CANCEL. I've also just noticed that on failure to save (as opposed to migrate) xl does use suspend_cancel=1. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Control reply; (Full Text)
From: David Vrabel <david.vrabel@citrix.com> To: Ian Jackson <Ian.Jackson@eu.citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com>, xen-devel@lists.xen.org, Boris Ostrovsky <boris.ostrovsky@oracle.com> Subject: Re: [Xen-devel] 3.4.70+ kernel WARNING spew dysfunction on failed migration Date: Wed, 8 Jan 2014 14:19:37 +0000 Message-ID: <52CD5E79.9000008@citrix.com>
[ Reply to this message; Retrieve Raw Message; Archives: gmane, marc.info ]
On 07/01/14 18:55, Ian Jackson wrote: > I did the following test: > > mv /etc/xen/scripts/block /etc/xen/scripts/block.aside > xl migrate debian.guest.osstest localhost > > xl did what appears to be the right thing: it did most of the > migration, failed to run the block scripts at the end of the > migration, and destroyed the destination domain and instead resumed > the source guest. > > However, the source guest immediately went mad spewing WARNINGs and > was after that no longer contactable via the network and not > apparently responsive on the console. See below. > > This is with: > > [ 0.000000] Linux version 3.4.70+ (osstest@rice-weevil) (gcc > version 4.4.5 (Debian 4.4.5-8) ) #1 SMP Wed Dec 4 03:14:51 GMT 2013 > > For reasons I don't understand it doesn't seem to print the actual > kernel git hash in dmesg, but I think it was that from flight 22264, > i.e. 234d96ee0f3b8e49501d068a2a3165aa4db60903. It's i386, on a > 64-bit Xen. > > Thanks, > Ian. > > debian login: [ 124.595658] PM: freeze of devices complete after 2.980 msecs > [ 124.595991] PM: late freeze of devices complete after 0.013 msecs > [ 124.600919] PM: noirq freeze of devices complete after 4.884 msecs > [ 124.601105] Grant tables using version 2 layout. > [ 124.601105] ------------[ cut here ]------------ > [ 124.601105] kernel BUG at drivers/xen/events.c:1582! > [ 124.601105] invalid opcode: 0000 [#1] SMP > [ 124.601105] Modules linked in: [last unloaded: scsi_wait_scan] > [ 124.601105] > [ 124.601105] Pid: 6, comm: migration/0 Not tainted 3.4.70+ #1 > [ 124.601105] EIP: 0061:[<c12f5d25>] EFLAGS: 00010082 CPU: 0 > [ 124.601105] EIP is at xen_irq_resume+0x215/0x370 We shouldn't be calling xen_irq_resume() when resuming the source VM. The EVTCHNOP_bind_irq is failing because the VIRQ is still bound. This would suggest that the suspend hypercall has not correctly returned the cancelled state. Could this be because of the tools issue mentioned by Ian C? David _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
From: Ian Campbell <Ian.Campbell@citrix.com> To: David Vrabel <david.vrabel@citrix.com> Cc: xen-devel@lists.xen.org, Ian Jackson <Ian.Jackson@eu.citrix.com>, Boris Ostrovsky <boris.ostrovsky@oracle.com> Subject: Re: [Xen-devel] 3.4.70+ kernel WARNING spew dysfunction on failed migration Date: Wed, 8 Jan 2014 14:24:28 +0000 Message-ID: <1389191068.4883.86.camel@kazak.uk.xensource.com>
[ Reply to this message; Retrieve Raw Message; Archives: gmane, marc.info ]
On Wed, 2014-01-08 at 14:19 +0000, David Vrabel wrote: > On 07/01/14 18:55, Ian Jackson wrote: > > I did the following test: > > > > mv /etc/xen/scripts/block /etc/xen/scripts/block.aside > > xl migrate debian.guest.osstest localhost > > > > xl did what appears to be the right thing: it did most of the > > migration, failed to run the block scripts at the end of the > > migration, and destroyed the destination domain and instead resumed > > the source guest. > > > > However, the source guest immediately went mad spewing WARNINGs and > > was after that no longer contactable via the network and not > > apparently responsive on the console. See below. > > > > This is with: > > > > [ 0.000000] Linux version 3.4.70+ (osstest@rice-weevil) (gcc > > version 4.4.5 (Debian 4.4.5-8) ) #1 SMP Wed Dec 4 03:14:51 GMT 2013 > > > > For reasons I don't understand it doesn't seem to print the actual > > kernel git hash in dmesg, but I think it was that from flight 22264, > > i.e. 234d96ee0f3b8e49501d068a2a3165aa4db60903. It's i386, on a > > 64-bit Xen. > > > > Thanks, > > Ian. > > > > debian login: [ 124.595658] PM: freeze of devices complete after 2.980 msecs > > [ 124.595991] PM: late freeze of devices complete after 0.013 msecs > > [ 124.600919] PM: noirq freeze of devices complete after 4.884 msecs > > [ 124.601105] Grant tables using version 2 layout. > > [ 124.601105] ------------[ cut here ]------------ > > [ 124.601105] kernel BUG at drivers/xen/events.c:1582! > > [ 124.601105] invalid opcode: 0000 [#1] SMP > > [ 124.601105] Modules linked in: [last unloaded: scsi_wait_scan] > > [ 124.601105] > > [ 124.601105] Pid: 6, comm: migration/0 Not tainted 3.4.70+ #1 > > [ 124.601105] EIP: 0061:[<c12f5d25>] EFLAGS: 00010082 CPU: 0 > > [ 124.601105] EIP is at xen_irq_resume+0x215/0x370 > > We shouldn't be calling xen_irq_resume() when resuming the source VM. > The EVTCHNOP_bind_irq is failing because the VIRQ is still bound. > > This would suggest that the suspend hypercall has not correctly returned > the cancelled state. > > Could this be because of the tools issue mentioned by Ian C? I'm fairly confident that it is, yes. (well "this" is actually, toolstack failed to implement the old style resume but told the guest it had, but not returning cancel...) Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
From: Ian Jackson <Ian.Jackson@eu.citrix.com> To: Ian Campbell <Ian.Campbell@citrix.com> Cc: xen-devel@lists.xen.org Subject: Re: [Xen-devel] 3.4.70+ kernel WARNING spew dysfunction on failed migration Date: Thu, 9 Jan 2014 19:08:41 +0000 Message-ID: <21198.62393.819485.361532@mariner.uk.xensource.com>
[ Reply to this message; Retrieve Raw Message; Archives: gmane, marc.info ]
Ian Campbell writes ("Re: [Xen-devel] 3.4.70+ kernel WARNING spew dysfunction on failed migration"): > The older method is that the toolstack resets a bunch of state (see > tools/python/xen/xend/XendDomainInfo.py resumeDomain) and then restarts > the domain. The domain will see HYPERVISOR_suspend return 0 and will > continue without any realisation that it is actually running in the > original domain and not in a new one. This method is supposed to be > implemented by libxl_domain_resume(suspend_cancel=0) but it is not. I have looked into this and I think I can fairly simply implement the old protocol in libxl. This is necessary, I think, to preserve our back-to-3.0 ABI compatibility guarantee. Looking at a modern pvops Linux kernel, does seem to try to cope with older hypervisors which don't do the "new" protocol. So that's a reasonable thing to start with, but looking at the code in Linux I suspect it may not actually work very well. So if anyone has an ancient test case of some kind that would be helpful... Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
From: Ian Campbell <Ian.Campbell@citrix.com> To: Ian Jackson <Ian.Jackson@eu.citrix.com> Cc: xen-devel@lists.xen.org Subject: Re: [Xen-devel] 3.4.70+ kernel WARNING spew dysfunction on failed migration Date: Fri, 10 Jan 2014 10:26:31 +0000 Message-ID: <1389349591.19142.25.camel@kazak.uk.xensource.com>
[ Reply to this message; Retrieve Raw Message; Archives: gmane, marc.info ]
On Thu, 2014-01-09 at 19:08 +0000, Ian Jackson wrote: > Ian Campbell writes ("Re: [Xen-devel] 3.4.70+ kernel WARNING spew dysfunction on failed migration"): > > The older method is that the toolstack resets a bunch of state (see > > tools/python/xen/xend/XendDomainInfo.py resumeDomain) and then restarts > > the domain. The domain will see HYPERVISOR_suspend return 0 and will > > continue without any realisation that it is actually running in the > > original domain and not in a new one. This method is supposed to be > > implemented by libxl_domain_resume(suspend_cancel=0) but it is not. > > I have looked into this and I think I can fairly simply implement the > old protocol in libxl. This is necessary, I think, to preserve our > back-to-3.0 ABI compatibility guarantee. > > Looking at a modern pvops Linux kernel, does seem to try to cope with > older hypervisors which don't do the "new" protocol. So that's a > reasonable thing to start with, but looking at the code in Linux I > suspect it may not actually work very well. So if anyone has an > ancient test case of some kind that would be helpful... The linux-2.6.18-xen.hg kernel ought to work in the old mode I think. Or any of the SLES fwd ports? Looks like RHEL4 (linux-2.6.9-89.0.16.EL kernel) doesn't have the support for the new mode at all. It would probably be wise to validate this under xend before chasing red-herrings with xl. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel