From xen-devel-bounces@lists.xen.org Thu Feb 27 12:27:16 2014 Received: (at maildrop) by bugs.xenproject.org; 27 Feb 2014 12:27:16 +0000 Received: from lists.xen.org ([50.57.142.19]) by bugs.xenproject.org with esmtp (Exim 4.80) (envelope-from ) id 1WJ03Y-000543-GX for xen-devel-maildrop-Eithu9ie@bugs.xenproject.org; Thu, 27 Feb 2014 12:27:16 +0000 Received: from localhost ([127.0.0.1] helo=lists.xen.org) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1WIzxr-0004dY-Uy; Thu, 27 Feb 2014 12:21:23 +0000 Received: from mail6.bemta5.messagelabs.com ([195.245.231.135]) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1WIzxp-0004dT-Nj for xen-devel@lists.xensource.com; Thu, 27 Feb 2014 12:21:22 +0000 Received: from [85.158.139.211:14420] by server-4.bemta-5.messagelabs.com id 4D/EA-08092-0CD2F035; Thu, 27 Feb 2014 12:21:20 +0000 X-Env-Sender: dunlapg@gmail.com X-Msg-Ref: server-11.tower-206.messagelabs.com!1393503678!2104141!1 X-Originating-IP: [74.125.82.169] X-SpamReason: No, hits=0.3 required=7.0 tests=RCVD_BY_IP X-StarScan-Received: X-StarScan-Version: 6.9.16; banners=-,-,- X-VirusChecked: Checked Received: (qmail 15180 invoked from network); 27 Feb 2014 12:21:19 -0000 Received: from mail-we0-f169.google.com (HELO mail-we0-f169.google.com) (74.125.82.169) by server-11.tower-206.messagelabs.com with RC4-SHA encrypted SMTP; 27 Feb 2014 12:21:19 -0000 Received: by mail-we0-f169.google.com with SMTP id t61so2840286wes.14 for ; Thu, 27 Feb 2014 04:21:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=2MDlqq5jn/EgA7Qt0s/1/gq2b4Sp2tegKeLo4EF7lEg=; b=IKFpGaifwh4GZMP/d5FknnJtAmpOo6qUwlGH+bYLKUjP/osuk8vXnwYnBq2cAlpEkU v8NGvX9YFyDhDCPltVBedv+mzgGk/oz3u147TCHamQ39gp/iGOl/EfaeWzqPZcmNml2N /LWlIlFKODfbY44DlrP1z2ywlx3hpmwSHj/UeugpbtFqhmDDnQHnqlazwWoCRcbKCon/ HJKzmfFR7pekpw+8IeJhcyU1lq5ORYf9SeYgdxCNKl+Il4Ii+W+LXqbvPbbLvfWj4VCq ba5sikxSqhWGyTKRDlWirE05HHtK5JAe9pVIbl3GqNvYkj44qLm6It0VopdKpNT+KiZ/ bujg== MIME-Version: 1.0 X-Received: by 10.194.81.196 with SMTP id c4mr2137933wjy.57.1393503678433; Thu, 27 Feb 2014 04:21:18 -0800 (PST) Received: by 10.194.75.163 with HTTP; Thu, 27 Feb 2014 04:21:18 -0800 (PST) In-Reply-To: <1383720072-6242-1-git-send-email-gaoyang.zyh@taobao.com> References: <1383720072-6242-1-git-send-email-gaoyang.zyh@taobao.com> Date: Thu, 27 Feb 2014 12:21:18 +0000 X-Google-Sender-Auth: hbOhGbAG3RRFGV_LehtfdH2gdE4 Message-ID: From: George Dunlap To: Zhu Yanhai Cc: Zhu Yanhai , "xen-devel@lists.xensource.com" , Ian Campbell , Andrew Cooper , Charles Wang , Shen Yiben , Wan Jia Subject: Re: [Xen-devel] [PATCH] x86/fpu: CR0.TS should be set before trap into PV guest's #NM exception handler X-BeenThere: xen-devel@lists.xen.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org create ^ title it linux pvops: fpu corruption due to incorrect assumptions about TS bit after exception under Xen thanks On Wed, Nov 6, 2013 at 6:41 AM, Zhu Yanhai wrote: > As we know Intel X86's CR0.TS is a sticky bit, which means once set > it remains set until cleared by some software routines, in other words, > the exception handler expects the bit is set when it starts to execute. > > However xen doesn't simulate this behavior quite well for PV guests - > vcpu_restore_fpu_lazy() clears CR0.TS unconditionally in the very beginning, > so the guest kernel's #NM handler runs with CR0.TS cleared. Generally speaking > it's fine since the linux kernel executes the exception handler with > interrupt disabled and a sane #NM handler will clear the bit anyway > before it exits, but there's a catch: if it's the first FPU trap for the process, > the linux kernel must allocate a piece of SLAB memory for it to save > the FPU registers, which opens a schedule window as the memory > allocation might sleep -- and with CR0.TS keeps clear! > > [see the code below in linux kernel, > > void math_state_restore(void) > { > struct task_struct *tsk = current; > > if (!tsk_used_math(tsk)) { > local_irq_enable(); > /* > * does a slab alloc which can sleep > */ > if (init_fpu(tsk)) { <<<< Here it might open a schedule window > /* > * ran out of memory! > */ > do_group_exit(SIGKILL); > return; > } > local_irq_disable(); > } > > __thread_fpu_begin(tsk); <<<< Here the process gets marked as a 'fpu user' > after the schedule window > > /* > * Paranoid restore. send a SIGSEGV if we fail to restore the state. > */ > if (unlikely(restore_fpu_checking(tsk))) { > drop_init_fpu(tsk); > force_sig(SIGSEGV, tsk); > return; > } > > tsk->fpu_counter++; > } > ] > > The check in linux kernel's switch_fpu_prepare() doesn't stts() either because > the current process only gets marked as a FPU user after the schedule window > (the story is a bit different if eagerfpu is enabled, anyway a sane hypervisor > cannot depend on such undetermined things). And then supposing that the new > process scheduled-in wants to touch FPU, nobody will do fxrstor/frstor for it anymore, > conducing to a serious data damage. > > Also, The point is everything is fine on linux + baremetal since CR0.TS will > keep set until the interrupted #NM handler got the memory it needs and exits, > so the incomer FPU user will get trapped as it's supposed to be. > > The test case is as below, > > buf = malloc(BUF_SIZE); > if (!buf) { > fprintf(stderr, "error %s during %s\n", > strerror(-err), > "malloc"); > return 1; > } > memset(buf, IO_PATTERN, BUF_SIZE); > memset(cmp_buf, IO_PATTERN, BUF_SIZE); > > if (memcmp(buf, cmp_buf, BUF_SIZE)) { > unsigned long long *ubuf = (unsigned long long *)buf; > int i; > > for (i = 0; i < BUF_SIZE / sizeof(unsigned long long); i++) > printf("%d: 0x%llx\n", i, ubuf[i]); > > return 2; > } > > Two shell scripts on each box's dom0 runs above program repeatedly until > the compare fails (so every time the C program is a new fpu user and triggers > memory allocation). we can see the data damage at least once with > xen 4.3 + linux 2.6.32 on ~200 physical machines within two hours. > With xen 4.3 + linux 3.11.6 stable it becomes harder to reproduce > (guess it's because of the eagerfpu feature introduced in linux kernel 3.7) > but it's still possible to come out within about four hours. > > The fix here is trying to make xen behave as close to the hardware as possible. > > This bug only has effects on PV guests (and including dom0 kernel of course). > > Cc: Wan Jia > Cc: Shen Yiben > Cc: Charles Wang > Cc: George Dunlap > Cc: Andrew Cooper > Cc: Ian Campbell > Signed-off-by: Zhu Yanhai > --- > xen/arch/x86/traps.c | 2 +- > 1 files changed, 1 insertions(+), 1 deletions(-) > > diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c > index 77c200b..b0321a6 100644 > --- a/xen/arch/x86/traps.c > +++ b/xen/arch/x86/traps.c > @@ -3267,8 +3267,8 @@ void do_device_not_available(struct cpu_user_regs *regs) > > if ( curr->arch.pv_vcpu.ctrlreg[0] & X86_CR0_TS ) > { > + stts(); > do_guest_trap(TRAP_no_device, regs, 0); > - curr->arch.pv_vcpu.ctrlreg[0] &= ~X86_CR0_TS; > } > else > TRACE_0D(TRC_PV_MATH_STATE_RESTORE); > -- > 1.7.4.4 > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel