[ Retrieve as mbox ]
From: Ian Campbell <ian.campbell@citrix.com> To: "xen-devel@lists.xen.org" <xen-devel@lists.xen.org>, Martin Osterloh <osterlohm@ainfosec.com>, Andrew Cooper <andrew.cooper3@citrix.com> Subject: Re: [Xen-devel] Current LibXL Status Date: Thu, 19 Nov 2015 11:23:15 +0000 Message-ID: <1447932195.5647.46.camel@citrix.com>
[ Reply to this message; Retrieve Raw Message; Archives: marc.info, gmane ]
create ! title it libxl exit() on ENOMEM incompatible with gc'd languages thanks On Thu, 2015-11-19 at 10:55 +0000, Andrew Cooper wrote: > On 19/11/15 09:20, Ian Campbell wrote: > > On Wed, 2015-11-18 at 18:32 +0000, Martin Osterloh wrote: > > > > > I wanted to inquire about the current state of LibXL and in > > > particular > > > if there are any issues with using it in long-running processes. > > It is currently being used by libvirtd which I think has shaken out > > most of > > the issues with that environment. > > > > There are certain to be other bugs, but nothing show-stopping. > > There really is a show-stopper, which I have stated before. Ah yes, nobody ever made a proposal to fix this so it slipped off my radar. I've recorded it in the BTS this time. > Languages such as OCaml use -ENOMEM as a hint to run the garbage > collector some more.  I expect Haskell is the same. > > It is not appropriate for libxl (or any library for that matter) to use > exit() as its method of resolving out-of-memory conditions. Note that the decision to take this approach was widely consulted at the time, including with Ocaml folks, it wasn't just done on a whim. That's not to say we cannot reconsider and find a different, better, approach which works for such languages. I think retrofitting all the necessary error paths to libxl to report memory allocation failures back up to the caller is going to be untenable and probably a project which would never actually be complete. But I notice that there are only ~10 calls to libxl__alloc_failed in libxl. I think it would be possible to turn each of those into a retry loop which calls an application provided hook function on each iteration. If no hook is provided by the application then the current behaviour would remain. Language bindings for Ocaml/haskel/etc would set the hook and use it to call into their gc. You'd probably want to limit the number of retry attempts and provide a way for the app to say "no, we really are out of memory" if it wants. Would such an approach work for Ocaml and haskell? In particular I'd be concerned about things like the ocaml interpreter lock (caml_enter_blocking_section etc) and calls to libxl not consistently holding/dropping it potentially leading to deadlocks. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Control reply; (Full Text)
From: Andrew Cooper <andrew.cooper3@citrix.com> To: "xen-devel@lists.xen.org" <xen-devel@lists.xen.org>, Ian Campbell <ian.campbell@citrix.com>, Martin Osterloh <osterlohm@ainfosec.com> Subject: Re: [Xen-devel] Current LibXL Status Date: Thu, 19 Nov 2015 11:33:39 +0000 Message-ID: <564DB393.3070805@citrix.com>
[ Reply to this message; Retrieve Raw Message; Archives: marc.info, gmane ]
On 19/11/15 11:23, Ian Campbell wrote: > create ! > title it libxl exit() on ENOMEM incompatible with gc'd languages > thanks Can this be extended to "should not use exit() in general" ? andrewcoop@andrewcoop:/local/xen.git/xen$ git grep exit\( -- :/tools/libxl/libxl* ../tools/libxl/libxl.c:1707: _exit(0); ../tools/libxl/libxl.c:1711: _exit(errno); ../tools/libxl/libxl.c:1716: _exit(-1); ../tools/libxl/libxl_aoutils.c:478: if (r) { LOGE(ERROR,"openpty failed"); _exit(-1); } ../tools/libxl/libxl_aoutils.c:482: if (rc) { LOGE(ERROR,"sendmsg to parent failed"); _exit(-1); } ../tools/libxl/libxl_aoutils.c:483: _exit(0); ../tools/libxl/libxl_bootloader.c:557: if (r) { LOGE(ERROR, "login_tty failed"); exit(-1); } ../tools/libxl/libxl_bootloader.c:559: exit(-1); ../tools/libxl/libxl_event.c:1387: exit(-1); ../tools/libxl/libxl_event.h:104: * and call exit(-1). ../tools/libxl/libxl_exec.c:106: _exit(-1); ../tools/libxl/libxl_exec.c:316: exit(255); ../tools/libxl/libxl_exec.c:324: _exit(127); ../tools/libxl/libxl_exec.c:340: _exit(r); ../tools/libxl/libxl_internal.c:28: _exit(-1); ../tools/libxl/libxl_remus_disk_drbd.c:229: _exit(ackwait); ../tools/libxl/libxl_save_callout.c:184: exit(-1); ../tools/libxl/libxl_save_helper.c:64: if (r < 0) { perror("memory allocation failed during logging"); exit(-1); } ../tools/libxl/libxl_save_helper.c:102: exit(-1); ../tools/libxl/libxl_save_helper.c:122: if (!r) { perror("memory allocation failed"); exit(-1); } ../tools/libxl/libxl_save_helper.c:189: if (r<0) { perror("write"); exit(-1); } ../tools/libxl/libxl_save_helper.c:210: if (r<=0) exit(-2); ../tools/libxl/libxl_save_helper.c:230: exit(0); ../tools/libxl/libxlu_cfg_l.c:1763: exit( YY_EXIT_FAILURE ); ../tools/libxl/libxlu_disk_l.c:2248: exit( YY_EXIT_FAILURE ); The majority of those are cases are not appropriate uses of exit(). AFAIIR, the *only* valid use of exit() in a library is to clean up in a child process from a library-initiated fork(). ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
From: Ian Campbell <ian.campbell@citrix.com> To: "xen-devel@lists.xen.org" <xen-devel@lists.xen.org>, Martin Osterloh <osterlohm@ainfosec.com>, Andrew Cooper <andrew.cooper3@citrix.com> Subject: Re: [Xen-devel] Current LibXL Status Date: Thu, 19 Nov 2015 11:48:47 +0000 Message-ID: <1447933727.5647.51.camel@citrix.com>
[ Reply to this message; Retrieve Raw Message; Archives: marc.info, gmane ]
On Thu, 2015-11-19 at 11:33 +0000, Andrew Cooper wrote: > > The majority of those are cases are not appropriate uses of exit(). > AFAIIR, the *only* valid use of exit() in a library is to clean up in a > child process from a library-initiated fork(). ... or (in this case) in the libxl-save-helper (separate process). The only one I can find which isn't one of this is in libxl__event_disaster, and that is only if the applications (or language bindings) haven't provided a suitable disaster callback. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
From: Ian Campbell <ian.campbell@citrix.com> To: Ian Jackson <Ian.Jackson@eu.citrix.com>, "xen-devel@lists.xen.org" <xen-devel@lists.xen.org>, Andrew Cooper <andrew.cooper3@citrix.com>, Martin Osterloh <osterlohm@ainfosec.com> Cc: Yang Hongyang <yanghy@cn.fujitsu.com>, Shriram Rajagopalan <rshriram@cs.ubc.ca> Subject: Re: [Xen-devel] Current LibXL Status Date: Thu, 19 Nov 2015 11:55:20 +0000 Message-ID: <1447934120.5647.54.camel@citrix.com>
[ Reply to this message; Retrieve Raw Message; Archives: marc.info, gmane ]
On Thu, 2015-11-19 at 11:48 +0000, Ian Campbell wrote: > On Thu, 2015-11-19 at 11:33 +0000, Andrew Cooper wrote: > >  > > The majority of those are cases are not appropriate uses of exit(). > > AFAIIR, the *only* valid use of exit() in a library is to clean up in a > > child process from a library-initiated fork(). > > ... or (in this case) in the libxl-save-helper (separate process). > > The only one I can find which isn't one of this is > in libxl__event_disaster, and that is only if the applications (or language > bindings) haven't provided a suitable disaster callback. Was looking at 4.4, in staging I also see a very odd one in drbd_preresume_async, which isn't obviously in a child process AFAICT. Hongyang, what prevents that exit from killing the whole toolstack process? Ian. > _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
From: Ian Campbell <ian.campbell@citrix.com> To: "xen-devel@lists.xen.org" <xen-devel@lists.xen.org>, Andrew Cooper <andrew.cooper3@citrix.com>, Martin Osterloh <osterlohm@ainfosec.com>, Ian Jackson <Ian.Jackson@eu.citrix.com> Cc: Yang Hongyang <yanghy@cn.fujitsu.com>, Shriram Rajagopalan <rshriram@cs.ubc.ca> Subject: Re: [Xen-devel] Current LibXL Status Date: Thu, 19 Nov 2015 12:16:51 +0000 Message-ID: <1447935411.5647.55.camel@citrix.com>
[ Reply to this message; Retrieve Raw Message; Archives: marc.info, gmane ]
On Thu, 2015-11-19 at 11:55 +0000, Ian Campbell wrote: > On Thu, 2015-11-19 at 11:48 +0000, Ian Campbell wrote: > > On Thu, 2015-11-19 at 11:33 +0000, Andrew Cooper wrote: > > >  > > > The majority of those are cases are not appropriate uses of exit(). > > > AFAIIR, the *only* valid use of exit() in a library is to clean up in > > > a > > > child process from a library-initiated fork(). > > > > ... or (in this case) in the libxl-save-helper (separate process). > > > > The only one I can find which isn't one of this is > > in libxl__event_disaster, and that is only if the applications (or > > language > > bindings) haven't provided a suitable disaster callback. > > Was looking at 4.4, in staging I also see a very odd one in > drbd_preresume_async, which isn't obviously in a child process AFAICT. > > Hongyang, what prevents that exit from killing the whole toolstack > process? I had missed an _async suffix on that function versus the one which was the actual callback, it is invoked via drbd_async_call which involves a fork(). Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
From: George Dunlap <George.Dunlap@eu.citrix.com> To: Ian Campbell <ian.campbell@citrix.com> Cc: "xen-devel@lists.xen.org" <xen-devel@lists.xen.org>, Martin Osterloh <osterlohm@ainfosec.com>, Andrew Cooper <andrew.cooper3@citrix.com> Subject: Re: [Xen-devel] Current LibXL Status Date: Thu, 19 Nov 2015 15:34:24 +0000 Message-ID: <CAFLBxZZQ7cdK=mmwVWp+ax0bKE3zOAR2cGcOc=STuVQAX26etg@mail.gmail.com>
[ Reply to this message; Retrieve Raw Message; Archives: marc.info, gmane ]
On Thu, Nov 19, 2015 at 11:23 AM, Ian Campbell <ian.campbell@citrix.com> wrote: > create ! > title it libxl exit() on ENOMEM incompatible with gc'd languages > thanks Actually, can I suggest that someone send a new thread with an appropriate title, so that people who aren't directly following the thread know what kinds of things are being discussed? Also so Martin can decide whether he wants to follow it or not. :-) -George _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
From: Yang Hongyang <hongyang.yang@easystack.cn> To: Ian Jackson <Ian.Jackson@eu.citrix.com>, "xen-devel@lists.xen.org" <xen-devel@lists.xen.org>, Martin Osterloh <osterlohm@ainfosec.com>, Ian Campbell <ian.campbell@citrix.com>, Andrew Cooper <andrew.cooper3@citrix.com> Cc: Shriram Rajagopalan <rshriram@cs.ubc.ca>, Yang Hongyang <yanghy@cn.fujitsu.com> Subject: Re: [Xen-devel] Current LibXL Status Date: Fri, 20 Nov 2015 08:30:46 +0800 Message-ID: <564E69B6.3020509@easystack.cn>
[ Reply to this message; Retrieve Raw Message; Archives: marc.info, gmane ]
On 2015年11月19日 20:16, Ian Campbell wrote: > On Thu, 2015-11-19 at 11:55 +0000, Ian Campbell wrote: >> On Thu, 2015-11-19 at 11:48 +0000, Ian Campbell wrote: >>> On Thu, 2015-11-19 at 11:33 +0000, Andrew Cooper wrote: >>>> >>>> The majority of those are cases are not appropriate uses of exit(). >>>> AFAIIR, the *only* valid use of exit() in a library is to clean up in >>>> a >>>> child process from a library-initiated fork(). >>> >>> ... or (in this case) in the libxl-save-helper (separate process). >>> >>> The only one I can find which isn't one of this is >>> in libxl__event_disaster, and that is only if the applications (or >>> language >>> bindings) haven't provided a suitable disaster callback. >> >> Was looking at 4.4, in staging I also see a very odd one in >> drbd_preresume_async, which isn't obviously in a child process AFAICT. >> >> Hongyang, what prevents that exit from killing the whole toolstack >> process? > > I had missed an _async suffix on that function versus the one which was the > actual callback, it is invoked via drbd_async_call which involves a fork(). Yeah, it is in a child process. > > Ian. > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel > -- Thanks, Yang _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
From: George Dunlap <dunlapg@umich.edu> To: Ian Campbell <ian.campbell@citrix.com> Cc: Shriram Rajagopalan <rshriram@cs.ubc.ca>, Ian Jackson <Ian.Jackson@eu.citrix.com>, "xen-devel@lists.xen.org" <xen-devel@lists.xen.org>, Martin Osterloh <osterlohm@ainfosec.com>, Yang Hongyang <yanghy@cn.fujitsu.com>, Andrew Cooper <andrew.cooper3@citrix.com> Subject: Re: [Xen-devel] Current LibXL Status Date: Thu, 18 Feb 2016 17:09:24 +0000 Message-ID: <CAFLBxZaUU0z3W9GU6imwooA+pZVqheLHDFzPq-8vc-0RsTXNRA@mail.gmail.com>
[ Reply to this message; Retrieve Raw Message; Archives: marc.info, gmane ]
On Thu, Nov 19, 2015 at 12:16 PM, Ian Campbell <ian.campbell@citrix.com> wrote: > On Thu, 2015-11-19 at 11:55 +0000, Ian Campbell wrote: >> On Thu, 2015-11-19 at 11:48 +0000, Ian Campbell wrote: >> > On Thu, 2015-11-19 at 11:33 +0000, Andrew Cooper wrote: >> > > >> > > The majority of those are cases are not appropriate uses of exit(). >> > > AFAIIR, the *only* valid use of exit() in a library is to clean up in >> > > a >> > > child process from a library-initiated fork(). >> > >> > ... or (in this case) in the libxl-save-helper (separate process). >> > >> > The only one I can find which isn't one of this is >> > in libxl__event_disaster, and that is only if the applications (or >> > language >> > bindings) haven't provided a suitable disaster callback. >> >> Was looking at 4.4, in staging I also see a very odd one in >> drbd_preresume_async, which isn't obviously in a child process AFAICT. >> >> Hongyang, what prevents that exit from killing the whole toolstack >> process? > > I had missed an _async suffix on that function versus the one which was the > actual callback, it is invoked via drbd_async_call which involves a fork(). So what was the conclusion here? It looks like we've confirmed that exit() is only called: 1. In the case of a malloc() failure 2. in libxl-save-helper (a separate process forked by the library) 3. In libxl__event_disaster(), if no callback is provided Which just leaves #1 as something to be discussed? (Also, I had suggested making a new thread, but I can't find any threads continuing this conversation.) -George _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
From: Ian Jackson <Ian.Jackson@eu.citrix.com> To: George Dunlap <dunlapg@umich.edu> Cc: Shriram Rajagopalan <rshriram@cs.ubc.ca>, Ian Campbell <ian.campbell@citrix.com>, Martin Osterloh <osterlohm@ainfosec.com>, Yang Hongyang <yanghy@cn.fujitsu.com>, Andrew Cooper <andrew.cooper3@citrix.com>, "xen-devel@lists.xen.org" <xen-devel@lists.xen.org> Subject: Re: [Xen-devel] Current LibXL Status Date: Thu, 18 Feb 2016 17:19:56 +0000 Message-ID: <22213.64828.978431.135803@mariner.uk.xensource.com>
[ Reply to this message; Retrieve Raw Message; Archives: marc.info, gmane ]
George Dunlap writes ("Re: [Xen-devel] Current LibXL Status"): > So what was the conclusion here? It looks like we've confirmed that > exit() is only called: > > 1. In the case of a malloc() failure > 2. in libxl-save-helper (a separate process forked by the library) > 3. In libxl__event_disaster(), if no callback is provided 4. In other processes forked by the library But, yes,l basically. > Which just leaves #1 as something to be discussed? Is this crashing on malloc failure a problem ? From the point of view of libxl's innards, making malloc failures fatal means that nothing that allocates memory needs to care about malloc failures. That massively reduces the number of error paths to be considered and eliminates an entire class of (largely theoretical) bugs. And often there is no good recovery possible (and logging the error is hard too). I'm not sure whether I'd want to change this policy even if someone wanted to commit to auditing libxl and submitting the necessary patches to cope with every malloc failure. Having to cope with malloc failure would be a continual burden on every proposed change or new feature. If in practice this is a problem it would probaby be better to run libxl in a process which can be restarted if it dies due to malloc failure. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
From: Ian Campbell <ian.campbell@citrix.com> To: George Dunlap <dunlapg@umich.edu> Cc: "xen-devel@lists.xen.org" <xen-devel@lists.xen.org>, Yang Hongyang <yanghy@cn.fujitsu.com>, Andrew Cooper <andrew.cooper3@citrix.com>, Martin Osterloh <osterlohm@ainfosec.com>, Shriram Rajagopalan <rshriram@cs.ubc.ca>, Ian Jackson <Ian.Jackson@eu.citrix.com> Subject: Re: [Xen-devel] Current LibXL Status Date: Thu, 18 Feb 2016 17:24:21 +0000 Message-ID: <1455816261.6225.57.camel@citrix.com>
[ Reply to this message; Retrieve Raw Message; Archives: marc.info, gmane ]
On Thu, 2016-02-18 at 17:09 +0000, George Dunlap wrote: > On Thu, Nov 19, 2015 at 12:16 PM, Ian Campbell <ian.campbell@citrix.com> > wrote: > > On Thu, 2015-11-19 at 11:55 +0000, Ian Campbell wrote: > > > On Thu, 2015-11-19 at 11:48 +0000, Ian Campbell wrote: > > > > On Thu, 2015-11-19 at 11:33 +0000, Andrew Cooper wrote: > > > > > > > > > > The majority of those are cases are not appropriate uses of > > > > > exit(). > > > > > AFAIIR, the *only* valid use of exit() in a library is to clean > > > > > up in > > > > > a > > > > > child process from a library-initiated fork(). > > > > > > > > ... or (in this case) in the libxl-save-helper (separate process). > > > > > > > > The only one I can find which isn't one of this is > > > > in libxl__event_disaster, and that is only if the applications (or > > > > language > > > > bindings) haven't provided a suitable disaster callback. > > > > > > Was looking at 4.4, in staging I also see a very odd one in > > > drbd_preresume_async, which isn't obviously in a child process > > > AFAICT. > > > > > > Hongyang, what prevents that exit from killing the whole toolstack > > > process? > > > > I had missed an _async suffix on that function versus the one which was > > the > > actual callback, it is invoked via drbd_async_call which involves a > > fork(). > > So what was the conclusion here?  It looks like we've confirmed that > exit() is only called: > > 1. In the case of a malloc() failure > 2. in libxl-save-helper (a separate process forked by the library) > 3. In libxl__event_disaster(), if no callback is provided > > Which just leaves #1 as something to be discussed? Somewhere/when I proposed handling #1 by having the small number of places which call libxl__alloc_failed() call an application (or language binding) provided "please free some memory" hook (i.e. "please run your garbage collector") and then retry some appropriate number of times. Perhaps with an error code available for the hook to say "this is never going to help". There's less than a dozen such call sites so this is quite doable, vastly so compared with adding OOM error handling and reporting back up the callchain to all libxl functions. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
From: Ian Campbell <ian.campbell@citrix.com> To: George Dunlap <dunlapg@umich.edu>, Ian Jackson <Ian.Jackson@eu.citrix.com> Cc: Andrew Cooper <andrew.cooper3@citrix.com>, Yang Hongyang <yanghy@cn.fujitsu.com>, Martin Osterloh <osterlohm@ainfosec.com>, "xen-devel@lists.xen.org" <xen-devel@lists.xen.org>, Shriram Rajagopalan <rshriram@cs.ubc.ca> Subject: Re: [Xen-devel] Current LibXL Status Date: Thu, 18 Feb 2016 17:26:33 +0000 Message-ID: <1455816393.6225.59.camel@citrix.com>
[ Reply to this message; Retrieve Raw Message; Archives: marc.info, gmane ]
On Thu, 2016-02-18 at 17:19 +0000, Ian Jackson wrote: > George Dunlap writes ("Re: [Xen-devel] Current LibXL Status"): > > So what was the conclusion here?  It looks like we've confirmed that > > exit() is only called: > > > > 1. In the case of a malloc() failure > > 2. in libxl-save-helper (a separate process forked by the library) > > 3. In libxl__event_disaster(), if no callback is provided >  4. In other processes forked by the library > > But, yes,l basically. > > > Which just leaves #1 as something to be discussed? > > Is this crashing on malloc failure a problem ? It is for non-C language bindings which might be using garbage collection, since they might be OOM from a malloc perspective but actually have loads of spare memory waiting to be collected (which they might plausibly be doing quite lazily). I just reminded people of my proposal to provide a callback to allow the app/bindings to run their gc here in my reply to George before I saw your reply. > From the point of view of libxl's innards, making malloc failures > fatal means that nothing that allocates memory needs to care about > malloc failures.  That massively reduces the number of error paths to > be considered and eliminates an entire class of (largely theoretical) > bugs. > > And often there is no good recovery possible (and logging the error is > hard too). > > I'm not sure whether I'd want to change this policy even if someone > wanted to commit to auditing libxl and submitting the necessary > patches to cope with every malloc failure.  Having to cope with malloc > failure would be a continual burden on every proposed change or new > feature. I agree that we don't want to change this policy, but I think an OOM hook is sufficient to solve the actual problem. > If in practice this is a problem it would probaby be better to run > libxl in a process which can be restarted if it dies due to malloc > failure. > > Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
From: George Dunlap <George.Dunlap@eu.citrix.com> To: Ian Campbell <ian.campbell@citrix.com> Cc: Jonathan Ludlam <jonathan.ludlam@eu.citrix.com>, Rob Hoes <Rob.Hoes@citrix.com>, "xen-devel@lists.xen.org" <xen-devel@lists.xen.org>, Andrew Cooper <andrew.cooper3@citrix.com>, Martin Osterloh <osterlohm@ainfosec.com> Subject: Re: [Xen-devel] Current LibXL Status Date: Thu, 18 Feb 2016 17:26:52 +0000 Message-ID: <CAFLBxZbFOOJCx2bQ=kGfSDAso4D-MNuB72wMVLEuLXSnh8+xqg@mail.gmail.com>
[ Reply to this message; Retrieve Raw Message; Archives: marc.info, gmane ]
On Thu, Nov 19, 2015 at 11:23 AM, Ian Campbell <ian.campbell@citrix.com> wrote: > create ! > title it libxl exit() on ENOMEM incompatible with gc'd languages > thanks > > On Thu, 2015-11-19 at 10:55 +0000, Andrew Cooper wrote: >> On 19/11/15 09:20, Ian Campbell wrote: >> > On Wed, 2015-11-18 at 18:32 +0000, Martin Osterloh wrote: >> > >> > > I wanted to inquire about the current state of LibXL and in >> > > particular >> > > if there are any issues with using it in long-running processes. >> > It is currently being used by libvirtd which I think has shaken out >> > most of >> > the issues with that environment. >> > >> > There are certain to be other bugs, but nothing show-stopping. [snip] >> Languages such as OCaml use -ENOMEM as a hint to run the garbage >> collector some more. I expect Haskell is the same. So the implication you've made here, is that if libxl were to return -ENOMEM to ocaml, that it would run the garbage collector and free up some heap space, so that malloc() could succeed and the libxl operation could proceed. In preparation for writing up a descripion for an "outreach project" (Outreachy/OPW/GSoC), I went and had a chat with some of the guys on the xapi team (cc'd) about what such a thing might look like. According to them, the ocaml garbage collector never frees memory. It grows its own internal heap as necessary, but it never reduces the size of its heap. So no -ENOMEM call or OOM callback can make a failed malloc succeed. One might then ask if libxl could simply allocate memory *from the ocaml heap* itself. It turns out that is also not tenable: Data in the ocaml heap is stored in a heavily coded format. (For example integers are stored as (n*2+1), so that pointers can all be even and non-pointers can all be odd.) The only thing they said might be improved is: 1. To know that libxl would never call exit() for any other reason (which it seems is true) 2. To have a callback in OOM conditions. It's unlikely the process as a whole could do anything but exit, but the ocaml runtime itself might still have internal heap available, which would allow it to exit more gracefully. I think if Haskell or any library *is* capable of integrating with the garbage collector, we'll have to wait until someone who understands the language actually writes bindings and can ask for something they know to be useful for that language. -George _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
From: Ian Campbell <ian.campbell@citrix.com> To: George Dunlap <George.Dunlap@eu.citrix.com> Cc: Jonathan Ludlam <jonathan.ludlam@eu.citrix.com>, Martin Osterloh <osterlohm@ainfosec.com>, Andrew Cooper <andrew.cooper3@citrix.com>, "xen-devel@lists.xen.org" <xen-devel@lists.xen.org>, Rob Hoes <Rob.Hoes@citrix.com> Subject: Re: [Xen-devel] Current LibXL Status Date: Thu, 18 Feb 2016 17:39:27 +0000 Message-ID: <1455817167.6225.65.camel@citrix.com>
[ Reply to this message; Retrieve Raw Message; Archives: marc.info, gmane ]
On Thu, 2016-02-18 at 17:26 +0000, George Dunlap wrote: > According to them, the ocaml garbage collector never frees memory.  It > grows its own internal heap as necessary, but it never reduces the > size of its heap. Assume that's true: Wow! Although, I presume this is a factor of the current/particular implementation, rather than a fundamental property of an ocaml runtime, so in theory it could change (and we have here a good real world argument why it perhaps should do!) > So no -ENOMEM call or OOM callback can make a failed malloc succeed. > > One might then ask if libxl could simply allocate memory *from the > ocaml heap* itself.  It turns out that is also not tenable: Data in > the ocaml heap is stored in a heavily coded format.  (For example > integers are stored as (n*2+1), so that pointers can all be even and > non-pointers can all be odd.) > > The only thing they said might be improved is: > > 1.  To know that libxl would never call exit() for any other reason > (which it seems is true) > > 2. To  have a callback in OOM conditions.  It's unlikely the process > as a whole could do anything but exit, but the ocaml runtime itself > might still have internal heap available, which would allow it to exit > more gracefully. > > I think if Haskell or any library *is* capable of integrating with the > garbage collector, we'll have to wait until someone who understands > the language actually writes bindings and can ask for something they > know to be useful for that language. Agreed. Hopefully it is possible to make the callback without needing to malloc anything! If I were adding an API for #2 to libxl (which must therefore be stable) I think I might be inclined to allow for the possibility of the callback returning "please retry" since it would be inconvenient in the usual API stability ways to retrofit it, I would guess, but really until a language even exposes that possibility we don't really know if it is worthwhile doing so. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
From: George Dunlap <dunlapg@umich.edu> To: Ian Campbell <ian.campbell@citrix.com> Cc: Shriram Rajagopalan <rshriram@cs.ubc.ca>, Ian Jackson <Ian.Jackson@eu.citrix.com>, "xen-devel@lists.xen.org" <xen-devel@lists.xen.org>, Martin Osterloh <osterlohm@ainfosec.com>, Andrew Cooper <andrew.cooper3@citrix.com>, Yang Hongyang <yanghy@cn.fujitsu.com> Subject: Re: [Xen-devel] Current LibXL Status Date: Thu, 18 Feb 2016 17:40:42 +0000 Message-ID: <CAFLBxZaiC4RHDhoPc72J3AaCC2nCpJT1cRBWNS=xLm99cqJxCA@mail.gmail.com>
[ Reply to this message; Retrieve Raw Message; Archives: marc.info, gmane ]
On Thu, Feb 18, 2016 at 5:26 PM, Ian Campbell <ian.campbell@citrix.com> wrote: > On Thu, 2016-02-18 at 17:19 +0000, Ian Jackson wrote: >> George Dunlap writes ("Re: [Xen-devel] Current LibXL Status"): >> > So what was the conclusion here? It looks like we've confirmed that >> > exit() is only called: >> > >> > 1. In the case of a malloc() failure >> > 2. in libxl-save-helper (a separate process forked by the library) >> > 3. In libxl__event_disaster(), if no callback is provided >> 4. In other processes forked by the library >> >> But, yes,l basically. >> >> > Which just leaves #1 as something to be discussed? >> >> Is this crashing on malloc failure a problem ? > > It is for non-C language bindings which might be using garbage collection, > since they might be OOM from a malloc perspective but actually have loads > of spare memory waiting to be collected (which they might plausibly be > doing quite lazily). > > I just reminded people of my proposal to provide a callback to allow the > app/bindings to run their gc here in my reply to George before I saw your > reply. > >> From the point of view of libxl's innards, making malloc failures >> fatal means that nothing that allocates memory needs to care about >> malloc failures. That massively reduces the number of error paths to >> be considered and eliminates an entire class of (largely theoretical) >> bugs. >> >> And often there is no good recovery possible (and logging the error is >> hard too). >> >> I'm not sure whether I'd want to change this policy even if someone >> wanted to commit to auditing libxl and submitting the necessary >> patches to cope with every malloc failure. Having to cope with malloc >> failure would be a continual burden on every proposed change or new >> feature. > > I agree that we don't want to change this policy, but I think an OOM hook > is sufficient to solve the actual problem. And in the situation like ocaml seems to be at the moment, it also gives the process an opportunity to attempt to shut down as gracefully as it can even if it knows it can't free up any more memory to libxl. (And who knows, some future version of ocaml may even allow you to ask to shrink the heap.) -George _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
From: George Dunlap <George.Dunlap@eu.citrix.com> To: Ian Campbell <ian.campbell@citrix.com> Cc: Rob Hoes <Rob.Hoes@citrix.com>, "xen-devel@lists.xen.org" <xen-devel@lists.xen.org>, Andrew Cooper <andrew.cooper3@citrix.com>, Martin Osterloh <osterlohm@ainfosec.com>, Jonathan Ludlam <jonathan.ludlam@eu.citrix.com> Subject: Re: [Xen-devel] Current LibXL Status Date: Thu, 18 Feb 2016 17:47:53 +0000 Message-ID: <CAFLBxZZPt8bwrmiJGK6aAkCasDMzMUxE9RapgHowV88tP9ek-A@mail.gmail.com>
[ Reply to this message; Retrieve Raw Message; Archives: marc.info, gmane ]
On Thu, Feb 18, 2016 at 5:39 PM, Ian Campbell <ian.campbell@citrix.com> wrote: > On Thu, 2016-02-18 at 17:26 +0000, George Dunlap wrote: > >> According to them, the ocaml garbage collector never frees memory. It >> grows its own internal heap as necessary, but it never reduces the >> size of its heap. > > Assume that's true: Wow! > > Although, I presume this is a factor of the current/particular > implementation, rather than a fundamental property of an ocaml runtime, so > in theory it could change (and we have here a good real world argument why > it perhaps should do!) > >> So no -ENOMEM call or OOM callback can make a failed malloc succeed. >> >> One might then ask if libxl could simply allocate memory *from the >> ocaml heap* itself. It turns out that is also not tenable: Data in >> the ocaml heap is stored in a heavily coded format. (For example >> integers are stored as (n*2+1), so that pointers can all be even and >> non-pointers can all be odd.) >> >> The only thing they said might be improved is: >> >> 1. To know that libxl would never call exit() for any other reason >> (which it seems is true) >> >> 2. To have a callback in OOM conditions. It's unlikely the process >> as a whole could do anything but exit, but the ocaml runtime itself >> might still have internal heap available, which would allow it to exit >> more gracefully. >> >> I think if Haskell or any library *is* capable of integrating with the >> garbage collector, we'll have to wait until someone who understands >> the language actually writes bindings and can ask for something they >> know to be useful for that language. > > Agreed. > > Hopefully it is possible to make the callback without needing to malloc > anything! > > If I were adding an API for #2 to libxl (which must therefore be stable) I > think I might be inclined to allow for the possibility of the callback > returning "please retry" since it would be inconvenient in the usual API > stability ways to retrofit it, I would guess, but really until a language > even exposes that possibility we don't really know if it is worthwhile > doing so. Well we could just make the semantics such that if the callback returns at all, that libxl should retry. That shouldn't be particularly harder than just making a callback we don't expect to ever return (which is what it sounds like the xapi folks would be happy with). Then when we come across a language runtime / application that *can* do something useful, it might possibly Just Work; but if it doesn't we can implement what they actually need. -George _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
From: Ian Campbell <ian.campbell@citrix.com> To: George Dunlap <George.Dunlap@eu.citrix.com> Cc: Jonathan Ludlam <jonathan.ludlam@eu.citrix.com>, Rob Hoes <Rob.Hoes@citrix.com>, "xen-devel@lists.xen.org" <xen-devel@lists.xen.org>, Andrew Cooper <andrew.cooper3@citrix.com>, Martin Osterloh <osterlohm@ainfosec.com> Subject: Re: [Xen-devel] Current LibXL Status Date: Thu, 18 Feb 2016 17:50:41 +0000 Message-ID: <1455817841.6225.70.camel@citrix.com>
[ Reply to this message; Retrieve Raw Message; Archives: marc.info, gmane ]
On Thu, 2016-02-18 at 17:39 +0000, Ian Campbell wrote: > On Thu, 2016-02-18 at 17:26 +0000, George Dunlap wrote: > > > According to them, the ocaml garbage collector never frees memory.  It > > grows its own internal heap as necessary, but it never reduces the > > size of its heap. > > Assume that's true: Wow! https://github.com/ocaml/ocaml/blob/trunk/byterun/compact.c#L374 is a call by do_compaction to caml_shrink_heap at https://github.com/ocaml/ocaml/blob/trunk/byterun/memory.c#L408 which calls caml_free_for_heap at https://github.com/ocaml/ocaml/blob/trunk/byterun/memory.c#L290 which certainly looks like it is returning memory to the malloc pool. Maybe this never happens in practice for some reason, or perhaps the "byterun" in those paths is relevant? I couldn't find another heap implementation which would be associated with native binaries. http://caml.inria.fr/mantis/view.php?id=5389 also seems to suggest that it is expected for the heap to shrink (i.e. it was considered a bug that it did not in this case). Ian. > > Although, I presume this is a factor of the current/particular > implementation, rather than a fundamental property of an ocaml runtime, > so > in theory it could change (and we have here a good real world argument > why > it perhaps should do!) > > > So no -ENOMEM call or OOM callback can make a failed malloc succeed. > > > > One might then ask if libxl could simply allocate memory *from the > > ocaml heap* itself.  It turns out that is also not tenable: Data in > > the ocaml heap is stored in a heavily coded format.  (For example > > integers are stored as (n*2+1), so that pointers can all be even and > > non-pointers can all be odd.) > > > > The only thing they said might be improved is: > > > > 1.  To know that libxl would never call exit() for any other reason > > (which it seems is true) > > > > 2. To  have a callback in OOM conditions.  It's unlikely the process > > as a whole could do anything but exit, but the ocaml runtime itself > > might still have internal heap available, which would allow it to exit > > more gracefully. > > > > I think if Haskell or any library *is* capable of integrating with the > > garbage collector, we'll have to wait until someone who understands > > the language actually writes bindings and can ask for something they > > know to be useful for that language. > > Agreed. > > Hopefully it is possible to make the callback without needing to malloc > anything! > > If I were adding an API for #2 to libxl (which must therefore be stable) > I > think I might be inclined to allow for the possibility of the callback > returning "please retry" since it would be inconvenient in the usual API > stability ways to retrofit it, I would guess, but really until a language > even exposes that possibility we don't really know if it is worthwhile > doing so. > > Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
From: Ian Jackson <Ian.Jackson@eu.citrix.com> To: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com>, Martin Osterloh <osterlohm@ainfosec.com>, "xen-devel@lists.xen.org" <xen-devel@lists.xen.org> Subject: [Xen-devel] libxl and malloc failure (Re: Current LibXL Status) Date: Thu, 18 Feb 2016 18:15:23 +0000 Message-ID: <22214.2619.86697.724156@mariner.uk.xensource.com>
[ Reply to this message; Retrieve Raw Message; Archives: marc.info, gmane ]
Andrew Cooper writes ("Re: [Xen-devel] Current LibXL Status"): > There really is a show-stopper, which I have stated before. > > Languages such as OCaml use -ENOMEM as a hint to run the garbage > collector some more. I expect Haskell is the same. This cannot possibly be true (if what you mean is that they use ENOMEM[1] from malloc as such an indication). It would make it impossible to write a correct binding to a normal C library. Typically C library which calls malloc will do so in the middle of its execution. Even if the library returns the resulting error up as a distinguishable error code, you can't just make the same library call again - it may have done half its work but not the other half. So I think you must be wrong. Ian. [1] NB that when malloc fails there is no -ENOMEM, only ENOMEM. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
From: Ian Jackson <Ian.Jackson@eu.citrix.com> To: Ian Campbell <ian.campbell@citrix.com> Cc: "xen-devel@lists.xen.org" <xen-devel@lists.xen.org>, Martin Osterloh <osterlohm@ainfosec.com>, Andrew Cooper <andrew.cooper3@citrix.com> Subject: Re: [Xen-devel] Current LibXL Status Date: Thu, 18 Feb 2016 18:30:57 +0000 Message-ID: <22214.3553.260609.617265@mariner.uk.xensource.com>
[ Reply to this message; Retrieve Raw Message; Archives: marc.info, gmane ]
Ian Campbell writes ("Re: [Xen-devel] Current LibXL Status"): > The only one I can find which isn't one of this is > in libxl__event_disaster, and that is only if the applications (or language > bindings) haven't provided a suitable disaster callback. libxl__event_disaster can currently happen in the following cases. Disasters which are specific to a specific requested event (type!=0): xc_domain_getinfolist failure (breaks reporting domain death, only xw_write failure for acknowledging disk eject (breaks reporting ejects, only) General disasters (type==0) due to xenstore problems: POLLERR or POLLHUP on xenstore fd xs_check_watch fails (represents trouble on xenstore fd or with xenstore) xs_read for domain death check General disasters (type==0) due to hypercall trouble: xenevtchn_pending fails other than with EAGAIN (unrequested) event other than POLLIN on evtchn fd General disasters (type==0) due to syscall failure: poll syscall fails (unrequested) event other than POLLIN on self-pipe read() or write() fails on a self-pipe (other than EAGAIN/EWOULDBLOCK) waitpid() syscall fails General disasters (type==0) induced by the application: waitpid reports ECHILD when we know we have a child (probably, something else in the process reaped it; arguably this should abort) application's childproc_hooks->reaped_callback failure Disasters are not reported unless it is impossible to proceed. So in general it is not possible for a process to recover from a disaster reported by libxl. But disasters are never expected. They should occur only if everything is totally doomed anyway. Having a daemon crash is a perfectly reasonable response. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
From: Ian Campbell <ian.campbell@citrix.com> To: Andrew Cooper <andrew.cooper3@citrix.com>, Ian Jackson <Ian.Jackson@eu.citrix.com> Cc: "xen-devel@lists.xen.org" <xen-devel@lists.xen.org>, Martin Osterloh <osterlohm@ainfosec.com> Subject: Re: [Xen-devel] libxl and malloc failure (Re: Current LibXL Status) Date: Fri, 19 Feb 2016 10:52:11 +0000 Message-ID: <1455879131.6225.89.camel@citrix.com>
[ Reply to this message; Retrieve Raw Message; Archives: marc.info, gmane ]
graft 51 ^ thanks On Thu, 2016-02-18 at 18:15 +0000, Ian Jackson wrote: > Andrew Cooper writes ("Re: [Xen-devel] Current LibXL Status"): > > There really is a show-stopper, which I have stated before. > > > > Languages such as OCaml use -ENOMEM as a hint to run the garbage > > collector some more.  I expect Haskell is the same. > > This cannot possibly be true (if what you mean is that they use > ENOMEM[1] from malloc as such an indication).  It would make it > impossible to write a correct binding to a normal C library. > > Typically C library which calls malloc will do so in the middle of its > execution.  Even if the library returns the resulting error up as > a distinguishable error code, you can't just make the same library > call again - it may have done half its work but not the other half. FWIW there is some code in the Ocaml runtime which raises an Out_of_memory exception in response to ENOMEM in C land (and from ocaml code in some circumstances too). The Unix module (which wraps standard Unix libc type APIs) looks to (when appropriate) reasonably consistently return errors as Unix_error(errno), that is handled generically so will include raising Unix_error(ENOMEM) as necessary. In principal it ought to be possible to write an ocaml program which correctly dealt with those (and similar ones from other lower level bindings), for example the interactive command line interpreter catches Out_of_memory (but not Unix_error) and calls the gc before iterating. (I did see one reference in a book which suggested Out_of_memory wasn't catcheable, but that appears to be out of date from what I can tell). So I suppose it ought to be possible to write an ocaml daemon which caught errors like these at the top level and aborted only the currently attempted operation without taking down the whole daemon process. How feasible that is in practice I can't say. This of course relies on all the C bindings correctly turning errors into exceptions and the semantics of your particular daemon allowing for abandoning things half way through (which I suppose is a relative of some kind to crash-only s/w). Ian. > > So I think you must be wrong. > > Ian. > > [1] NB that when malloc fails there is no -ENOMEM, only ENOMEM. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Control reply; (Full Text)
From: Ian Jackson <Ian.Jackson@eu.citrix.com> To: Ian Campbell <ian.campbell@citrix.com> Cc: "xen-devel@lists.xen.org" <xen-devel@lists.xen.org>, Martin Osterloh <osterlohm@ainfosec.com>, Andrew Cooper <andrew.cooper3@citrix.com> Subject: Re: [Xen-devel] libxl and malloc failure (Re: Current LibXL Status) Date: Mon, 22 Feb 2016 16:48:50 +0000 Message-ID: <22219.15346.138200.277441@mariner.uk.xensource.com>
[ Reply to this message; Retrieve Raw Message; Archives: marc.info, gmane ]
Ian Campbell writes ("Re: libxl and malloc failure (Re: [Xen-devel] Current LibXL Status)"): > On Thu, 2016-02-18 at 18:15 +0000, Ian Jackson wrote: > > Andrew Cooper writes ("Re: [Xen-devel] Current LibXL Status"): > > > Languages such as OCaml use -ENOMEM as a hint to run the garbage > > > collector some more. I expect Haskell is the same. > > > > This cannot possibly be true (if what you mean is that they use > > ENOMEM[1] from malloc as such an indication). It would make it > > impossible to write a correct binding to a normal C library. > > > > Typically C library which calls malloc will do so in the middle of its > > execution. Even if the library returns the resulting error up as > > a distinguishable error code, you can't just make the same library > > call again - it may have done half its work but not the other half. > > FWIW there is some code in the Ocaml runtime which raises an Out_of_memory > exception in response to ENOMEM in C land (and from ocaml code in some > circumstances too). ISTM that either there is something else which is turning ocaml out of heap errors into ENOMEM, or that this code you describe is fundamentally misconceived. > In principal it ought to be possible to write an ocaml program which > correctly dealt with those (and similar ones from other lower level > bindings), for example the interactive command line interpreter catches > Out_of_memory (but not Unix_error) and calls the gc before iterating. (I > did see one reference in a book which suggested Out_of_memory wasn't > catcheable, but that appears to be out of date from what I can tell). But iterating in the cli would not recover the operation which had failed. In the general case, there is no recovery possible when a C language library function returns ENOMEM. Sometimes the internal state of the library will now be unsuitable. Certainly, a policy of always retrying the operation after a gc would be very wrong indeed. Ie if ocaml has the behaviour claimed by Andrew (ie, that the need for a gc can be detected during malloc and reported as ENOMEM), all ocaml programs would already suffer from occasional random failures of operations being performed by code written in C. Note that "need for a gc" is not an abnormal part of operation in a consing garbage collected language like ocaml. It is entirely routine. Ie these random failures would occur in normal, otherwise entirely successful, operation. They would occur even if the program's actual memory use was modest. > This of course relies on all the C bindings correctly turning errors into > exceptions and the semantics of your particular daemon allowing for > abandoning things half way through (which I suppose is a relative of some > kind to crash-only s/w). Even if the operations are properly abandonded, they have still been erroneously made to fail. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel