From xen-devel-bounces@lists.xen.org Sat Dec 14 01:37:38 2013 Received: (at maildrop) by bugs.xenproject.org; 14 Dec 2013 01:37:38 +0000 Received: from lists.xen.org ([50.57.142.19]) by bugs.xenproject.org with esmtp (Exim 4.80) (envelope-from ) id 1VreAk-0005un-0k for xen-devel-maildrop-Eithu9ie@bugs.xenproject.org; Sat, 14 Dec 2013 01:37:38 +0000 Received: from localhost ([127.0.0.1] helo=lists.xen.org) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1Vre6b-0006OU-Gw; Sat, 14 Dec 2013 01:33:21 +0000 Received: from mail6.bemta14.messagelabs.com ([193.109.254.103]) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1Vre6Z-0006OP-Mg for xen-devel@lists.xen.org; Sat, 14 Dec 2013 01:33:19 +0000 Received: from [193.109.254.147:36936] by server-2.bemta-14.messagelabs.com id 2C/33-00361-F55BBA25; Sat, 14 Dec 2013 01:33:19 +0000 X-Env-Sender: dario.faggioli@citrix.com X-Msg-Ref: server-9.tower-27.messagelabs.com!1386984788!5134529!1 X-Originating-IP: [66.165.176.63] X-SpamReason: No, hits=0.0 required=7.0 tests=sa_preprocessor: VHJ1c3RlZCBJUDogNjYuMTY1LjE3Ni42MyA9PiAzMDYwNDg=\n, ML_RADAR_SPEW_LINKS_8, spamassassin: , async_handler: YXN5bmNfZGVsYXk6IDcwNTYxNjkgKHRpbWVvdXQp\n X-StarScan-Received: X-StarScan-Version: 6.9.16; banners=-,-,- X-VirusChecked: Checked Received: (qmail 21788 invoked from network); 14 Dec 2013 01:33:09 -0000 Received: from smtp02.citrix.com (HELO SMTP02.CITRIX.COM) (66.165.176.63) by server-9.tower-27.messagelabs.com with RC4-SHA encrypted SMTP; 14 Dec 2013 01:33:09 -0000 X-IronPort-AV: E=Sophos;i="4.95,483,1384300800"; d="asc'?scan'208";a="82192367" Received: from accessns.citrite.net (HELO FTLPEX01CL01.citrite.net) ([10.9.154.239]) by FTLPIPO02.CITRIX.COM with ESMTP; 14 Dec 2013 01:33:07 +0000 Received: from [127.0.0.1] (10.80.16.47) by smtprelay.citrix.com (10.13.107.78) with Microsoft SMTP Server id 14.2.342.4; Fri, 13 Dec 2013 20:33:06 -0500 Message-ID: <1386984785.3980.96.camel@Solace> From: Dario Faggioli To: Justin Weaver Date: Sat, 14 Dec 2013 02:33:05 +0100 X-Mailer: Evolution 3.8.5 (3.8.5-2.fc19) MIME-Version: 1.0 X-DLP: MIA2 Cc: George Dunlap , xen-devel Subject: [Xen-devel] multiple runqueues in credit2 X-BeenThere: xen-devel@lists.xen.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Content-Type: multipart/mixed; boundary="===============0117560102561012024==" Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org --===============0117560102561012024== Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="=-BYr6u6y5qi/39ApavwJU" --=-BYr6u6y5qi/39ApavwJU Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi George, Bot Justin and I were able to reproduce a situation where, on a 2 socket system (see below), credit2 was activating only 1 runqueue. That seemed in line with some comment in the sched_credit2.c source file, such as this one: /* * Design: * * VMs "burn" credits based on their weight; higher weight means * credits burn more slowly. The highest weight vcpu burns credits at * a rate of 1 credit per nanosecond. Others burn proportionally * more. * * vcpus are inserted into the runqueue by credit order. * * Credits are "reset" when the next vcpu in the runqueue is less than * or equal to zero. At that point, everyone's credits are "clipped" * to a small value, and a fixed credit is added to everyone. * * The plan is for all cores that share an L2 will share the same * runqueue. At the moment, there is one global runqueue for all * cores. */ However, I remembered it different, and looking at init_pcpu() I spotted this: /* Figure out which runqueue to put it in */ /* NB: cpu 0 doesn't get a STARTING callback, so we hard-code it to run= queue 0. */ if ( cpu =3D=3D 0 ) rqi =3D 0; else rqi =3D cpu_to_socket(cpu); which looks to me like the code for having one runqueue per socket _is_ there already! That means two things: (1) that comment above is wrong :-) but, at the same time, (2) this code right here is not working! Justin also noticed that init_pcpu() was actually being called twice, for all pcpus except #0, triggering the following warning: printk("%s: Strange, cpu %d already initialized!\n", __func__, cpu); I did some investigation, in the following system: cpu_topology : cpu: core socket node 0: 0 0 0 1: 1 0 0 2: 2 0 0 3: 3 0 0 4: 0 1 1 5: 1 1 1 6: 2 1 1 7: 3 1 1 So, what I expect is, for instance, cpu 1 to be on runqueue 0, and cpu 5 on runqueue 1. The problem is here: static void * csched_alloc_pdata(const struct scheduler *ops, int cpu) { /* Check to see if the cpu is online yet */ /* Note: cpu 0 doesn't get a STARTING callback */ if ( cpu =3D=3D 0 || cpu_to_socket(cpu) >=3D 0 ) init_pcpu(ops, cpu); else printk("%s: cpu %d not online yet, deferring initializatgion\n", __func__, cpu); return (void *)1; } In fact, this is meant to actually call init_pcpu() *only* on pcpu 0 (which don't get the STARTING notification) and on those pcpus that are already onlined. Unfortunately, "cpu_to_socket(cpu) >=3D 0" is not (any longer?) a valid way to check the latter, and in fact init_pcpus() is always called, even for pcpus that are not identified and inited yet. That, with cpu_to_socket() returning constantly 0, means all the pcpus end up in the sole and only runqueue 0. I verified that removing the right side of the || makes things work (I enabled some debug output and added some more myself): (XEN) csched_alloc_pdata for cpu 0 on socket 0 (XEN) Adding cpu 0 to runqueue 0 (XEN) First cpu on runqueue, activating ... (XEN) CPU 1 APIC 1 -> Node 0 (XEN) csched_vcpu_insert: Inserting d32767v1 (XEN) csched_alloc_pdata for cpu 1 on socket 0 (XEN) csched_alloc_pdata: cpu 1 not online yet, deferring initializatgion (XEN) Booting processor 1/1 eip 8e000 (XEN) Initializing CPU#1 (XEN) CPU: L1 I cache 64K (64 bytes/line), D cache 64K (64 bytes/line) (XEN) CPU: L2 Cache: 512K (64 bytes/line) (XEN) CPU 1(4) -> Processor 0, Core 1 (XEN) CPU1: AMD Quad-Core AMD Opteron(tm) Processor 2376 stepping 02 (XEN) csched_cpu_starting on cpu 1 (XEN) Adding cpu 1 to runqueue 0 ... (XEN) CPU 5 APIC 5 -> Node 1 (XEN) microcode: CPU4 collect_cpu_info: patch_id=3D0x1000086 (XEN) csched_vcpu_insert: Inserting d32767v5 (XEN) csched_alloc_pdata for cpu 5 on socket 0 (XEN) csched_alloc_pdata: cpu 5 not online yet, deferring initializatgion (XEN) Booting processor 5/5 eip 8e000 (XEN) Initializing CPU#5 (XEN) CPU: L1 I cache 64K (64 bytes/line), D cache 64K (64 bytes/line) (XEN) CPU: L2 Cache: 512K (64 bytes/line) (XEN) CPU 5(4) -> Processor 1, Core 1 (XEN) CPU5: AMD Quad-Core AMD Opteron(tm) Processor 2376 stepping 02 (XEN) csched_cpu_starting on cpu 5 (XEN) Adding cpu 5 to runqueue 1 ... Now the question is, for fixing this, would it be preferable to do something along this line (i.e., removing the right side of the || and, in general, make csched_alloc_pdata() a pcpu 0 only thing)? Or, perhaps, should I look into a way to properly initialize the cpu_data array, so that cpu_to_socket() actually returns something '< 0' for pcpus not yet onlined and identified? The former is surely quicker, but I think I like the latter better (provided it's doable). What do you think? Thanks and Regards, Dario --=20 <> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) --=-BYr6u6y5qi/39ApavwJU Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.15 (GNU/Linux) iEYEABECAAYFAlKrtVEACgkQk4XaBE3IOsTM9QCdFrt7d+9YzCyMW7ktZwOv8roD nloAoJJT59mo+WYN/OcuFWPVKmERsbQ6 =8d+O -----END PGP SIGNATURE----- --=-BYr6u6y5qi/39ApavwJU-- --===============0117560102561012024== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel --===============0117560102561012024==--