Monday, March 14, 2011

Stopping Runaway Processes: Magic SysRq Keys

Pre-sequitur:
Consider the following Wikipedia entry that I subsequently found after doing all this work: 
 http://en.wikipedia.org/wiki/Magic_SysRq_key
 But it doesn't invalidate my findings:

     w f e v b (in that order) are the more useful.
And many of the claimed functions simply do not work, as discussed below.
Always willing to be corrected...
======================================================================== 
I found out how to get the kernel's attention when it is in some kind of process panic or runaway process. Before this, I just pulled the plug... :-( 
So this is much more elegant: 
        Alt+Fn+SysRq+<command key>
where +<command key> is one or another of a long list of characters, discussed below. It seems that  
        w f e v b 
(in that order) are the more useful. 
----------------------------------------------------------------------- 
When I leave the machine idle for a while (going to bed, going to lunch) I often come back to find it frozen with mad disk thrashing going on. Sometimes I get its attention back with a mouse move or hitting the Esc key, but more often I just have to pull the plug.
What I find from /var/log/messages is that hald is doing a lot of stuff and won't let go of the system.
Today I noticed that the log ends with entries like:
Mar 12 14:07:42 P1630 kernel: [11391.748962] Out of memory: Kill process
1378 (hald) score 598 or sacrifice child
Mar 12 14:07:42 P1630 kernel: [11391.748968] Killed process 1504
(hald-runner) total-vm:56372kB, anon-rss:26496kB, file-rss:632kB
Googling tells me this is the response of the out of memory killer (oom_killer), so I"ve read a lot about how it calculates the score by which to decide which process to kill to free up memory, and the winner (or actually, loser) in each case is hald.
Pulling the plug is actually a PITA, Searching around (thanks DenverD) gave me a much more elegant solution:
        /usr/src/linux/Documentation/sysrq.txt
This  bears careful reading and even more careful experimentation, which I here report:
The Fujitsu P1630 keyboard has a Function (Fn) key which is used to invoke functions on other keys that are indicated by a box around the function name, such as SysRq. In this case the following are the results of various key combinations, where a comma implies a wait, a + implies pressing simultaneously, and <command key> implies any of the letters of the alphabet cited in the sysrq.txt source above:

Alt, Fn+SysRq+<command key> just brings up the screen shot module 
 
Alt+Fn+SysRq+<command key> does nothing for <command key> = d g h.
 Instead, dmesg shows a help screen:
 [13321.912458] SysRq : HELP : loglevel(0-9) reBoot Crash terminate-all-tasks(E) memory-full-oom-kill(F) kill-all-tasks(I) thaw-filesystems(J) saK
  show-backtrace-all-active-cpus(L) show-memory-usage(M) nice-all-RT-tasks(N) powerOff show-registers(P) show-all-timers(Q) unRaw Sync
 show-task-states(T) Unmount force-fb(V) show-blocked-tasks(W) dump-ftrace-buffer(Z)  
 
The following are the results for the remaining <command key> combinations:
 
Alt+Fn+SysRq+b  Will immediately reboot the system without syncing 
                or unmounting your disks.  
                This is clearly the most dangerous and exciting!
 
Alt+Fn+SysRq+e     Send a SIGTERM to all processes, except for init. 
                   Dumps you to the terminal prompt. 
                   This is almost as much fun as <b>...
 
Alt+Fn+SysRq+f     Calls oom_kill to kill a memory hog process. This is
                   probably the most commonly useful one, e.g.:
 [13385.402701] [11278]     0 11278     8026     2004   0       0             0 packagekitd
 [13385.402706] Out of memory: Kill process 3421 (firefox) score 122 or sacrifice child
 [13385.402716] Killed process 3631 (plugin-containe) total-vm:137648kB, anon-rss:840kB, file-rss:2436kB

Alt+Fn+SysRq+    i j k l m o all change the log levels:
        Alt+Fn+SysRq+i
        j[13859.996645] SysRq : Changing Loglevel [13859.996673] Loglevel set to 5 
        Alt+Fn+SysRq+j
          [14057.594808] SysRq : Changing Loglevel
         [14057.594837] Loglevel set to 1
        Alt+Fn+SysRq+k 
         [14158.143866] SysRq : Changing Loglevel
         [14158.143894] Loglevel set to 2
        Alt+Fn+SysRq+l 
         [14158.143866] SysRq : Changing Loglevel
         [14158.143894] Loglevel set to 3
        Alt+Fn+SysRq+m
          [14240.847738] SysRq : Changing Loglevel
         [14240.847767] Loglevel set to 0 
        Alt+Fn+SysRq+o
         [14399.663636] SysRq : Changing Loglevel
         [14399.663660] Loglevel set to 6
 
Alt+Fn+SysRq+n        This adjusts the niceness of the tasks.
         [14327.222095] SysRq : Nice All RT Tasks
 
Alt+Fn+SysRq+p  Will dump the current registers and flags 
                to your console to appear in dmesg).
 
Alt+Fn+SysRq+q  Will dump to dmesg per CPU lists of all 
                armed hrtimers (but NOT regular timer_list 
                timers) and detailed information about all
                clockevent devices.
 
Alt+Fn+SysRq+r  Turns off keyboard raw mode and sets it to XLATE.
        [14773.473596] SysRq : Keyboard mode set to system default
 
Alt+Fn+SysRq+s  Will attempt to sync all mounted filesystems.
        [14816.857775] SysRq : Emergency Sync
        [14816.860901] Emergency Sync complete
 
Alt+Fn+SysRq+t  Will dump a list of current tasks and their information to dmesg
 
Alt+Fn+SysRq+u  Will attempt to remount all mounted filesystems read-only
 
Alt+Fn+SysRq+v  Forcefully restores framebuffer console (same as b)
 
Alt+Fn+SysRq+w  Dumps tasks that are in uninterruptable (blocked) state.

So more experimentation later, but it seems like w f e v b (in that order) are the more useful.

No comments: