From 4126b1f5c6983b7c2dd4f92d635ab762d861c2d6 Mon Sep 17 00:00:00 2001 From: Denis Vlasenko Date: Tue, 31 Oct 2006 18:41:29 +0000 Subject: [PATCH] add usefun info on SIGINT handling peculiarities --- docs/sigint.htm | 627 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 627 insertions(+) create mode 100644 docs/sigint.htm diff --git a/docs/sigint.htm b/docs/sigint.htm new file mode 100644 index 000000000..6fe76bbef --- /dev/null +++ b/docs/sigint.htm @@ -0,0 +1,627 @@ + + + +Proper handling of SIGINT/SIGQUIT [http://www.cons.org/cracauer/sigint.html] + + +

Proper handling of SIGINT/SIGQUIT

+ +

+ + + + + +
Abstract: +In UNIX terminal sessions, you usually have a key like +C-c (Control-C) to immediately end whatever program you +have running in the foreground. This should work even when the program +you called has called other programs in turn. Everything should be +aborted, giving you your command prompt back, no matter how deep the +call stack is. + +

Basically, it's trivial. But the existence of interactive +applications that use SIGINT and/or SIGQUIT for other purposes than a +complete immediate abort make matters complicated, and - as was to +expect - left us with several ways to solve the problems. Of course, +existing shells and applications follow different ways. + +

This Web pages outlines different ways to solve the problem and +argues that only one of them can do everything right, although it +means that we have to fix some existing software. + + + +

Intended audience: Programmers who implement programs that catch SIGINT/SIGQUIT. +
Programmers who implements shells or shell-like programs that +execute batches of programs. + +

Users who have problems problems getting rid of runaway shell +scripts using Control-C. Or have interactive applications +that don't behave right when sending SIGINT. Examples are emacs'es +that die on Control-g or shellscript statements that sometimes are +executed and sometimes not, apparently not determined by the user's +intention. + + +

Required knowledge: You have to know what it means to catch SIGINT or SIGQUIT and how +processes are waiting for other processes (childs) they spawned. + + +
+ + + +

Basic concepts

+ +What technically happens when you press Control-C is that all programs +running in the foreground in your current terminal (or virtual +terminal) get the signal SIGINT sent. + +

You may change the key that triggers the signal using +stty and running programs may remap the SIGINT-sending +key at any time they like, without your intervention and without +asking you first. + +

The usual reaction of a running program to SIGINT is to exit. +However, not all program do an exit on SIGINT, programs are free to +use the signal for other actions or to ignore it at all. + +

All programs running in the foreground receive the signal. This may +be a nested "stack" of programs: You started a program that started +another and the outer is waiting for the inner to exit. This nesting +may be arbitrarily deep. + +

The innermost program is the one that decides what to do on SIGINT. +It may exit, do something else or do nothing. Still, when the user hit +SIGINT, all the outer programs are awaken, get the signal and may +react on it. + +

What we try to achieve

+ +The problem is with shell scripts (or similar programs that call +several subprograms one after another). + +

Let us consider the most basic script: +

+#! /bin/sh
+program1
+program2
+
+and the usual run looks like this: +
+$ sh myscript
+[output of program1]
+[output of program2]
+$
+
+ +

Let us assume that both programs do nothing special on SIGINT, they +just exit. + +

Now imagine the user hits C-c while a shellscript is executing its +first program. The following programs receive SIGINT: program1 and +also the shell executing the script. program1 exits. + +

But what should the shell do? If we say that it is only the +innermost's programs business to react on SIGINT, the shell will do +nothing special (not exit) and it will continue the execution of the +script and run program2. But this is wrong: The user's intention in +hitting C-c is to abort the whole script, to get his prompt back. If +he hits C-c while the first program is running, he does not want +program2 to be even started. + +

here is what would happen if the shell doesn't do anything: +

+$ sh myscript
+[first half of program1's output]
+C-c   [users presses C-c]
+[second half of program1's output will not be displayed]
+[output of program2 will appear]
+
+ + +

Consider a more annoying example: +

+#! /bin/sh
+# let's assume there are 300 *.dat files
+for file in *.dat ; do
+	dat2ascii $dat
+done
+
+ +If your shell wouldn't end if the user hits C-c, +C-c would just end one dat2ascii run and +the script would continue. Thus, you had to hit C-c up to +300 times to end this script. + +

Alternatives to do so

+ +

There are several ways to handle abortion of shell scripts when +SIGINT is received while a foreground child runs: + +

+ +
  • As just outlined, the shellscript may just continue, ignoring the +fact that the user hit C-c. That way, your shellscript - +including any loops - would continue and you had no chance of aborting +it except using the kill command after finding out the outermost +shell's PID. This "solution" will not be discussed further, as it is +obviously not desirable. + +

  • The shell itself exits immediately when it receives SIGINT. Not +only the program called will exit, but the calling (the +script-executing) shell. The first variant is to exit the shell (and +therefore discontinuing execution of the script) immediately, while +the background program may still be executing (remember that although +the shell is just waiting for the called program to exit, it is woken +up and may act). I will call the way of doing things the "IUE" (for +"immediate unconditional exit") for the rest of this document. + +

  • As a variant of the former, when the shell receives SIGINT +while it is waiting for a child to exit, the shell does not exit +immediately. but it remembers the fact that a SIGINT happened. After +the called program exits and the shell's wait ends, the shell will +exit itself and hence discontinue the script. I will call the way of +doing things the "WUE" (for "wait and unconditional exit") for the +rest of this document. + +

  • There is also a way that the calling shell can tell whether the +called program exited on SIGINT and if it ignored SIGINT (or used it +for other purposes). As in the WUE way, the shell waits for +the child to complete. It figures whether the program was ended on +SIGINT and if so, it discontinue the script. If the program did any +other exit, the script will be continued. I will call the way of doing +things the "WCE" (for "wait and cooperative exit") for the rest of +this document. + +
  • + +

    The problem

    + +On first sight, all three solutions (IUE, WUE and WCE) all seem to do +what we want: If C-c is hit while the first program of the shell +script runs, the script is discontinued. The user gets his prompt back +immediately. So what are the difference between these way of handling +SIGINT? + +

    There are programs that use the signal SIGINT for other purposes +than exiting. They use it as a normal keystroke. The user is expected +to use the key that sends SIGINT during a perfectly normal program +run. As a result, the user sends SIGINT in situations where he/she +does not want the program or the script to end. + +

    The primary example is the emacs editor: C-g does what ESC does in +other applications: It cancels a partially executed or prepared +operation. Technically, emacs remaps the key that sends SIGINT from +C-c to C-g and catches SIGINT. + +

    Remember that the SIGINT is sent to all programs running in the +foreground. If emacs is executing from a shell script, both emacs and +the shell get SIGINT. emacs is the program that decides what to do: +Exit on SIGINT or not. emacs decides not to exit. The problem arises +when the shell draws its own conclusions from receiving SIGINT without +consulting emacs for its opinion. + +

    Consider this script: +

    +#! /bin/sh
    +emacs /tmp/foo
    +cp /tmp/foo /home/user/mail/sent
    +
    + +

    If C-g is used in emacs, both the shell and emacs will received +SIGINT. Emacs will not exit, the user used C-g as a normal editing +keystroke, he/she does not want the script to be aborted on C-g. + +

    The central problem is that the second command (cp) may +unintentionally be killed when the shell draws its own conclusion +about the user's intention. The innermost program is the only one to +judge. + +

    One more example

    + +

    Imagine a mail session using a curses mailer in a tty. You called +your mailer and started to compose a message. Your mailer calls emacs. +C-g is a normal editing key in emacs. Technically it +sends SIGINT (it was C-c, but emacs remapped the key) to +

    +
  • emacs +
  • the shell between your mailer and emacs, the one from your mailers + system("emacs /tmp/bla.44") command +
  • the mailer itself +
  • possibly another shell if your mailer was called by a shell script +or from another application using system(3) +
  • your interactive shell (which ignores it since it is interactive +and hence is not relevant to this discussion) +
  • + +

    If everyone just exits on SIGINT, you will be left with nothing but +your login shell, without asking. + +

    But for sure you don't want to be dropped out of your editor and +out of your mailer back to the commandline, having your edited data +and mailer status deleted. + +

    Understand the difference: While C-g is used an a kind +of abort key in emacs, it isn't the major "abort everything" key. When +you use C-g in emacs, you want to end some internal emacs +command. You don't want your whole emacs and mailer session to end. + +

    So, if the shell exits immediately if the user sends SIGINT (the +second of the four ways shown above), the parent of emacs would die, +leaving emacs without the controlling tty. The user will lose it's +editing session immediately and unrecoverable. If the "main" shell of +the operating system defaults to this behavior, every editor session +that is spawned from a mailer or such will break (because it is +usually executed by system(3), which calls /bin/sh). This was the case +in FreeBSD before I and Bruce Evans changed it in 1998. + +

    If the shell recognized that SIGINT was sent and exits after the +current foreground process exited (the third way of the four), the +editor session will not be disturbed, but things will still not work +right. + +

    A further look at the alternatives

    + +

    Still considering this script to examine the shell's actions in the +IUE, WUE and ICE way of handling SIGINT: +

    +#! /bin/sh
    +emacs /tmp/foo
    +cp /tmp/foo /home/user/mail/sent
    +
    + +

    The IUE ("immediate unconditional exit") way does not work at all: +emacs wants to survive the SIGINT (it's a normal editing key for +emacs), but its parent shell unconditionally thinks "We received +SIGINT. Abort everything. Now.". The shell will exit even before emacs +exits. But this will leave emacs in an unusable state, since the death +of its calling shell will leave it without required resources (file +descriptors). This way does not work at all for shellscripts that call +programs that use SIGINT for other purposes than immediate exit. Even +for programs that exit on SIGINT, but want to do some cleanup between +the signal and the exit, may fail before they complete their cleanup. + +

    It should be noted that this way has one advantage: If a child +blocks SIGINT and does not exit at all, this way will get control back +to the user's terminal. Since such programs should be banned from your +system anyway, I don't think that weighs against the disadvantages. + +

    WUE ("wait and unconditional exit") is a little more clever: If C-g +was used in emacs, the shell will get SIGINT. It will not immediately +exit, but remember the fact that a SIGINT happened. When emacs ends +(maybe a long time after the SIGINT), it will say "Ok, a SIGINT +happened sometime while the child was executing, the user wants the +script to be discontinued". It will then exit. The cp will not be +executed. But that's bad. The "cp" will be executed when the emacs +session ended without the C-g key ever used, but it will not be +executed when the user used C-g at least one time. That is clearly not +desired. Since C-g is a normal editing key in emacs, the user expects +the rest of the script to behave identically no matter what keys he +used. + +

    As a result, the "WUE" way is better than the "IUE" way in that it +does not break SIGINT-using programs completely. The emacs session +will end undisturbed. But it still does not support scripts where +other actions should be performed after a program that use SIGINT for +non-exit purposes. Since the behavior is basically undeterminable for +the user, this can lead to nasty surprises. + +

    The "WCE" way fixes this by "asking" the called program whether it +exited on SIGINT or not. While emacs receives SIGINT, it does not exit +on it and a calling shell waiting for its exit will not be told that +it exited on SIGINT. (Although it receives SIGINT at some point in +time, the system does not enforce that emacs will exit with +"I-exited-on-SIGINT" status. This is under emacs' control, see below). + +

    this still work for the normal script without SIGINT-using +programs:

    +
    +#! /bin/sh
    +program1
    +program2
    +
    + +Unless program1 and program2 mess around with signal handling, the +system will tell the calling shell whether the programs exited +normally or as a result of SIGINT. + +

    The "WCE" way then has an easy way to things right: When one called +program exited with "I-exited-on-SIGINT" status, it will discontinue +the script after this program. If the program ends without this +status, the next command in the script is started. + +

    It is important to understand that a shell in "WCE" modus does not +need to listen to the SIGINT signal at all. Both in the +"emacs-then-cp" script and in the "several-normal-programs" script, it +will be woken up and receive SIGINT when the user hits the +corresponding key. But the shell does not need to react on this event +and it doesn't need to remember the event of any SIGINT, either. +Telling whether the user wants to end a script is done by asking that +program that has to decide, that program that interprets keystrokes +from the user, the innermost program. + +

    So everything is well with WCE?

    + +Well, almost. + +

    The problem with the "WCE" modus is that there are broken programs +that do not properly communicate the required information up to the +calling program. + +

    Unless a program messes with signal handling, the system does this +automatically. + +

    There are programs that want to exit on SIGINT, but they don't let +the system do the automatic exit, because they want to do some +cleanup. To do so, they catch SIGINT, do the cleanup and then exit by +themselves. + +

    And here is where the problem arises: Once they catch the signal, +the system will no longer communicate the "I-exited-on-SIGINT" status +to the calling program automatically. Even if the program exit +immediately in the signal handler of SIGINT. Once it catches the +signal, it has to take care of communicating the signal status +itself. + +

    Some programs don't do this. On SIGINT, they do cleanup and exit +immediatly, but the calling shell isn't told about the non-normal exit +and it will call the next program in the script. + +

    As a result, the user hits SIGINT and while one program exits, the +shellscript continues. To him/her it looks like the shell fails to +obey to his abortion command. + +

    Both IUE or WUE shell would not have this problem, since they +discontinue the script on their own. But as I said, they don't support +programs using SIGINT for non-exiting purposes, no matter whether +these programs properly communicate their signal status to the calling +shell or not. + +

    Since some shell in wide use implement the WUE way (and some even +IUE), there is a considerable number of broken programs out there that +break WCE shells. The programmers just don't recognize it if their +shell isn't WCE. + +

    How to be a proper program

    + +

    (Short note in advance: What you need to achieve is that +WIFSIGNALED(status) is true in the calling program and that +WTERMSIG(status) returns SIGINT.) + +

    If you don't catch SIGINT, the system automatically does the right +thing for you: Your program exits and the calling program gets the +right "I-exited-on-SIGINT" status after waiting for your exit. + +

    But once you catch SIGINT, you have to act. + +

    Decide whether the SIGINT is used for exit/abort purposes and hence +a shellscript calling this program should discontinue. This is +hopefully obvious. If you just need to do some cleanup on SIGINT, but +then exit immediately, the answer is "yes". + +

    If so, you have to tell the calling program about it by exiting +with the "I-exited-on-SIGINT" status. + +

    There is no other way of doing this than to kill yourself with a +SIGINT signal. Do it by resetting the SIGINT handler to SIG_DFL, then +send yourself the signal. + +

    +void sigint_handler(int sig)
    +{
    +	
    +	signal(SIGINT, SIG_DFL);
    +	kill(getpid(), SIGINT);
    +}
    +
    + +Notes: + + + +
  • You cannot "fake" the proper exit status by an exit(3) with a +special numeric value. People often assume this since the manuals for +shells often list some return value for exactly this. But this is just +a convention for your shell script. It does not work from one UNIX API +program to another. + +

    All that happens is that the shell sets the "$?" variable to a +special numeric value for the convenience of your script, because your +script does not have access to the lower-lever UNIX status evaluation +functions. This is just an agreement between your script and the +executing shell, it does not have any meaning in other contexts. + +

  • Do not use kill(0, SIGINT) without consulting the manul for +your OS implementation. I.e. on BSD, this would not send the signal to +the current process, but to all processes in the group. + +

  • POSIX 1003.1 allows all these calls to appear in signal +handlers, so it is portable. + +
  • + +

    In a bourne shell script, you can catch signals using the +trap command. Here, the same as for C programs apply. If +the intention of SIGINT is to end your program, you have to exit in a +way that the calling programs "sees" that you have been killed. If +you don't catch SIGINT, this happend automatically, but of you catch +SIGINT, i.e. to do cleanup work, you have to end the program by +killing yourself, not by calling exit. + +

    Consider this example from FreeBSD's mkdep, which is a +bourne shell script. + +

    +TMP=_mkdep$$
    +trap 'rm -f $TMP ; trap 2 ; kill -2 $$' 1 2 3 13 15
    +
    + +Yes, you have to do it the hard way. It's even more annoying in shell +scripts than in C programs since you can't "pre-delete" temporary +files (which isn't really portable in C, though). + +

    All this applies to programs in all languages, not only C and +bourne shell. Every language implementation that lets you catch SIGINT +should also give you the option to reset the signal and kill yourself. + +

    It is always desireable to exit the right way, even if you don't +expect your usual callers to depend on it, some unusual one will come +along. This proper exit status will be needed for WCE and will not +hurt when the calling shell uses IUE or WUE. + +

    How to be a proper shell

    + +All this applies only for the script-executing case. Most shells will +also have interactive modes where things are different. + + + +
  • Do nothing special when SIGINT appears while you wait for a child. +You don't even have to remember that one happened. + +

  • Wait for child to exit, get the exit status. Do not truncate it +to type char. + +

  • Look at WIFSIGNALED(status) and WTERMSIG(status) to tell +whether the child says "I exited on SIGINT: in my opinion the user +wants the shellscript to be discontinued". + +

  • If the latter applies, discontinue the script. + +

  • Exit. But since a shellscript may in turn be called by a +shellscript, you need to make sure that you properly communicate the +discontinue intention to the calling program. As in any other program +(see above), do + +
    +	signal(SIGINT, SIG_DFL);
    +	kill(getpid(), SIGINT);
    +
    + +
  • + +

    Other remarks

    + +Although this web page talks about SIGINT only, almost the same issues +apply to SIGQUIT, including proper exiting by killing yourself after +catching the signal and proper reaction on the WIFSIGNALED(status) +value. One notable difference for SIGQUIT is that you have to make +sure that not the whole call tree dumps core. + +

    What to fight

    + +Make sure all programs really kill themselves if they react +to SIGINT or SIGQUIT and intend to abort their operation as a result +of this signal. Programs that don't use SIGINT/SIGQUIT as a +termination trigger - but as part of normal operation - don't kill +themselves, but do a normal exit instead. + +

    Make sure people understand why you can't fake an exit-on-signal by +doing exit(...) using any numerical status. + +

    Make sure you use a shell that behaves right. Especially if you +develop programs, since it will help seeing problems. + +

    Concrete examples how to fix programs:

    + + +

    Testsuite for shells

    + +I have a collection of shellscripts that test shells for the +behavior. See my download dir to get the newest +"sh-interrupt" files, either as a tarfile or as individual file for +online browsing. This isn't really documented, besides from the +comments the scripts echo. + +

    Appendix 1 - table of implementation choices

    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    Method signDoes what?Example shells that implement it:What happens when a shellscript called emacs, the user used +C-g and the script has additional commands in it?What happens when a shellscript called emacs, the user did not use +C-c and the script has additional commands in it?What happens if a non-interactive child catches SIGINT?To behave properly, childs must do what?
    IUEThe shell executing a script exits immediately if it receives +SIGINT.4.4BSD ash (ash), NetBSD, FreeBSD prior to 3.0/22.8The editor session is lost and subsequent commands are not +executed.The editor continues as normal and the subsequent commands are +executed. The scripts ends immediately, returning to the caller even before +the current foreground child of the shell exits. It doesn't matter what the child does or how it exits, even if the +child continues to operate, the shell returns.
    WUEIf the shell executing a script received SIGINT while a foreground +process was running, it will exit after that child's exit.pdksh (OpenBSD /bin/sh)The editor continues as normal, but subsequent commands from the +script are not executed.The editor continues as normal and subsequent commands are +executed. The scripts returns to its caller after the current foreground +child exits, no matter how the child exited. It doesn't matter how the child exits (signal status or not), but +if it doesn't return at all, the shell will not return. In no case +will further commands from the script be executed.
    WCEThe shell exits if a child signaled that it was killed on a +signal (either it had the default handler for SIGINT or it killed +itself). bash (Linux /bin/sh), most commercial /bin/sh, FreeBSD /bin/sh +from 3.0/2.2.8.The editor continues as normal and subsequent commands are +executed. The editor continues as normal and subsequent commands are +executed. The scripts returns to its caller after the current foreground +child exits, but only if the child exited with signal status. If +the child did a normal exit (even if it received SIGINT, but catches +it), the script will continue. The child must be implemented right, or the user will not be able +to break shell scripts reliably.
    + +

     +
    ©2005 Martin Cracauer <cracauer @ cons.org> +http://www.cons.org/cracauer/ +
    Last changed: $Date: 2005/02/11 21:44:43 $ +