[BUG] grml-zsh-config makes zsh hang #93

Open
opened 2024-12-04 08:04:37 +00:00 by kth5 · 1 comment
kth5 commented 2024-12-04 08:04:37 +00:00 (Migrated from github.com)

We've noticed this while working on the new installer. zsh just hangs and we never reach an interactive shell, this seems to be a problem with zsh itself or grml-zsh-config.

strace w/ grml-zsh-config installed
message.txt

When uninstalling grml-zsh-config, zsh seems to act normally.

We've noticed this while working on the new installer. `zsh` just hangs and we never reach an interactive shell, this seems to be a problem with `zsh` itself or `grml-zsh-config`. **strace w/ grml-zsh-config installed** [message.txt](https://github.com/user-attachments/files/18004539/message.txt) When uninstalling `grml-zsh-config`, `zsh` seems to act normally.
flaviojs commented 2024-12-11 18:58:59 +00:00 (Migrated from github.com)

Typing GRML_OSTYPE=$(uname -s) in a zsh shell (without grml-zsh-config) also freezes, so it's either a bug in zsh or one of it's dependencies.

It gets stuck in pause().

With:

  • a powerpc qemu "G4" vm with ArchPOWER
  • a "root.zsh.bad" pacstrap with base zsh grml-zsh-config
  • "set -x" added to root.zsh.bad/etc/zsh/zshrc (comes from grml-zsh-config)
  • a log produced with sudo arch-chroot root.zsh.bad strace --absolute-timestamps=ms --stack-trace --no-abbrev zsh --verbose --xtrace 2>zsh.bad.txt
  • stack trace functions produced with DEBUGINFOD_URLS="https://debuginfod.archlinuxpower.org" gdb zsh and info symbol <address>
00:37:58.704 write(2, "GRML_OSTYPE=$(uname -s)\n", 24) = 24
 > /usr/bin/zsh() [0x7783c] # ingetc + 1276 in section .text of /usr/bin/zsh

GRML_OSTYPE=$(uname -s)

00:37:58.727 fcntl64(3, F_DUPFD, 10)    = 12
 > /usr/bin/zsh() [0xdc0a4] # movefd + 156 in section .text of /usr/bin/zsh

00:37:58.737 close(3)                   = 0
 > /usr/bin/zsh() [0xdc0b4] # movefd + 172 in section .text of /usr/bin/zsh

00:37:58.748 fcntl64(4, F_DUPFD, 10)    = 13
 > /usr/bin/zsh() [0xdc0a4] # movefd + 156 in section .text of /usr/bin/zsh

00:37:58.760 close(4)                   = 0
 > /usr/bin/zsh() [0xdc0b4] # movefd + 172 in section .text of /usr/bin/zsh

00:37:58.771 rt_sigprocmask(SIG_BLOCK, [CHLD], [WINCH], 8) = 0
 > /usr/bin/zsh() [0x52d98] # getoutput + 484 in section .text of /usr/bin/zsh

00:37:58.782 clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xa60fc088) = 5
 > /usr/bin/zsh() [0x4640c] # zfork.lto_priv + 140 in section .text of /usr/bin/zsh

00:37:58.815 close(13)                  = 0
 > /usr/bin/zsh() [0x52dcc] # getoutput + 536 in section .text of /usr/bin/zsh

+/etc/zsh/zshrc:250> GRML_OSTYPE=
+/etc/zsh/zshrc:250> uname -s

00:37:58.839 rt_sigprocmask(SIG_UNBLOCK, [CHLD], [CHLD WINCH], 8) = 0
 > /usr/bin/zsh() [0x526e0] # readoutput + 428 in section .text of /usr/bin/zsh

00:37:58.865 read(12, "Linux\n", 64)    = 6
 > /usr/bin/zsh() [0x5270c] # readoutput + 472 in section .text of /usr/bin/zsh

00:37:58.895 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=5, si_uid=0, si_status=0, si_utime=4 /* 0.04 s */, si_stime=2 /* 0.02 s */} ---
 > /usr/bin/zsh() [0x5270c] # readoutput + 472 in section .text of /usr/bin/zsh

00:37:58.906 rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1], [CHLD WINCH], 8) = 0
 > /usr/bin/zsh() [0xbe2dc] # zhandler + 128 in section .text of /usr/bin/zsh

00:37:58.918 rt_sigprocmask(SIG_SETMASK, [CHLD WINCH], ~[KILL STOP RTMIN RT_1], 8) = 0
 > /usr/bin/zsh() [0xbe3bc] # zhandler + 352 in section .text of /usr/bin/zsh

00:37:58.930 wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG|WSTOPPED|WCONTINUED, {ru_utime={tv_sec=0, tv_usec=39745}, ru_stime={tv_sec=0, tv_usec=18066}, ru_maxrss=2352, ru_ixrss=0, ru_idrss=0, ru_isrss=0, ru_minflt=134, ru_majflt=0, ru_nswap=0, ru_inblock=0, ru_oublock=0, ru_msgsnd=0, ru_msgrcv=0, ru_nsignals=0, ru_nvcsw=1, ru_nivcsw=17}) = 5
 > /usr/bin/zsh() [0xbe8f4] # wait_for_processes + 184 in section .text of /usr/bin/zsh

00:37:58.945 getrusage(RUSAGE_CHILDREN, {ru_utime={tv_sec=0, tv_usec=39745}, ru_stime={tv_sec=0, tv_usec=18066}, ru_maxrss=2352, ru_ixrss=0, ru_idrss=0, ru_isrss=0, ru_minflt=134, ru_majflt=0, ru_nswap=0, ru_inblock=0, ru_oublock=0, ru_msgsnd=0, ru_msgrcv=0, ru_nsignals=0, ru_nvcsw=1, ru_nivcsw=17}) = 0
 > /usr/lib/libc.so.6() [0x233928]
 > /usr/bin/zsh() [0xbec1c] # wait_for_processes + 992 in section .text of /usr/bin/zsh

00:37:58.960 wait4(-1, 0xafc10048, WNOHANG|WSTOPPED|WCONTINUED, 0xafc0ff44) = -1 ECHILD (No child processes)
 > /usr/bin/zsh() [0xbe8f4] # wait_for_processes + 184 in section .text of /usr/bin/zsh

00:37:58.976 sigreturn({mask=[WINCH]})  = 6
 > /usr/lib/libc.so.6() [0x233928]
 > /usr/bin/zsh() [0x5270c] # readoutput + 472 in section .text of /usr/bin/zsh

00:37:58.985 read(12, "", 64)           = 0
 > /usr/bin/zsh() [0x5270c] # readoutput + 472 in section .text of /usr/bin/zsh

00:37:58.995 rt_sigprocmask(SIG_BLOCK, [CHLD], [WINCH], 8) = 0
 > /usr/bin/zsh() [0x52b8c] # readoutput + 1624 in section .text of /usr/bin/zsh

00:37:59.007 close(12)                  = 0
 > /usr/bin/zsh() [0x5276c] # readoutput + 568 in section .text of /usr/bin/zsh

00:37:59.016 rt_sigprocmask(SIG_BLOCK, [CHLD], [CHLD WINCH], 8) = 0
 > /usr/bin/zsh() [0x79d7c] # waitforpid + 360 in section .text of /usr/bin/zsh

00:37:59.028 kill(5, 0)                 = -1 ESRCH (No such process)
 > /usr/bin/zsh() [0x79db0] # waitforpid + 412 in section .text of /usr/bin/zsh

00:37:59.037 rt_sigprocmask(SIG_SETMASK, [INT], [CHLD WINCH], 8) = 0
 > /usr/bin/zsh() [0xbb8c8] # signal_suspend + 132 in section .text of /usr/bin/zsh

00:37:59.049 pause()                    = ?
00:43:28.943 +++ killed by SIGKILL +++

If I understood correctly:

  1. it spawns a child process to resolve uname -s (pid 5)
  2. reads "Linux\n" from the child stdout (fd 12)
  3. receives a SIGCHLD signal and "resolves" it with wait4? (pid 5 stops existing?)
  4. reads EOF from the child stdout (fd 12)
  5. tries to kill the child (pid 5 no longer exists)
  6. starts waiting for a SIGCHLD signal

The sigprocmask+pause indicates that it is running the POSIX_SIGNALS + BROKEN_POSIX_SIGSUSPEND code.

The build log of the previous zsh package is not available so it is unknown which of these possibilities is correct:

  1. something changed in one of it's dependencies, causing it to start using the current code path
  2. something changed in the zsh code and this particular code path stopped working properly
Typing `GRML_OSTYPE=$(uname -s)` in a zsh shell (without grml-zsh-config) also freezes, so it's either a bug in zsh or one of it's dependencies. It gets stuck in pause(). With: * a powerpc qemu "G4" vm with ArchPOWER * a "root.zsh.bad" pacstrap with base zsh grml-zsh-config * "set -x" added to `root.zsh.bad/etc/zsh/zshrc` (comes from grml-zsh-config) * a log produced with `sudo arch-chroot root.zsh.bad strace --absolute-timestamps=ms --stack-trace --no-abbrev zsh --verbose --xtrace 2>zsh.bad.txt` * stack trace functions produced with `DEBUGINFOD_URLS="https://debuginfod.archlinuxpower.org" gdb zsh` and `info symbol <address>` ``` 00:37:58.704 write(2, "GRML_OSTYPE=$(uname -s)\n", 24) = 24 > /usr/bin/zsh() [0x7783c] # ingetc + 1276 in section .text of /usr/bin/zsh GRML_OSTYPE=$(uname -s) 00:37:58.727 fcntl64(3, F_DUPFD, 10) = 12 > /usr/bin/zsh() [0xdc0a4] # movefd + 156 in section .text of /usr/bin/zsh 00:37:58.737 close(3) = 0 > /usr/bin/zsh() [0xdc0b4] # movefd + 172 in section .text of /usr/bin/zsh 00:37:58.748 fcntl64(4, F_DUPFD, 10) = 13 > /usr/bin/zsh() [0xdc0a4] # movefd + 156 in section .text of /usr/bin/zsh 00:37:58.760 close(4) = 0 > /usr/bin/zsh() [0xdc0b4] # movefd + 172 in section .text of /usr/bin/zsh 00:37:58.771 rt_sigprocmask(SIG_BLOCK, [CHLD], [WINCH], 8) = 0 > /usr/bin/zsh() [0x52d98] # getoutput + 484 in section .text of /usr/bin/zsh 00:37:58.782 clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xa60fc088) = 5 > /usr/bin/zsh() [0x4640c] # zfork.lto_priv + 140 in section .text of /usr/bin/zsh 00:37:58.815 close(13) = 0 > /usr/bin/zsh() [0x52dcc] # getoutput + 536 in section .text of /usr/bin/zsh +/etc/zsh/zshrc:250> GRML_OSTYPE= +/etc/zsh/zshrc:250> uname -s 00:37:58.839 rt_sigprocmask(SIG_UNBLOCK, [CHLD], [CHLD WINCH], 8) = 0 > /usr/bin/zsh() [0x526e0] # readoutput + 428 in section .text of /usr/bin/zsh 00:37:58.865 read(12, "Linux\n", 64) = 6 > /usr/bin/zsh() [0x5270c] # readoutput + 472 in section .text of /usr/bin/zsh 00:37:58.895 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=5, si_uid=0, si_status=0, si_utime=4 /* 0.04 s */, si_stime=2 /* 0.02 s */} --- > /usr/bin/zsh() [0x5270c] # readoutput + 472 in section .text of /usr/bin/zsh 00:37:58.906 rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1], [CHLD WINCH], 8) = 0 > /usr/bin/zsh() [0xbe2dc] # zhandler + 128 in section .text of /usr/bin/zsh 00:37:58.918 rt_sigprocmask(SIG_SETMASK, [CHLD WINCH], ~[KILL STOP RTMIN RT_1], 8) = 0 > /usr/bin/zsh() [0xbe3bc] # zhandler + 352 in section .text of /usr/bin/zsh 00:37:58.930 wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG|WSTOPPED|WCONTINUED, {ru_utime={tv_sec=0, tv_usec=39745}, ru_stime={tv_sec=0, tv_usec=18066}, ru_maxrss=2352, ru_ixrss=0, ru_idrss=0, ru_isrss=0, ru_minflt=134, ru_majflt=0, ru_nswap=0, ru_inblock=0, ru_oublock=0, ru_msgsnd=0, ru_msgrcv=0, ru_nsignals=0, ru_nvcsw=1, ru_nivcsw=17}) = 5 > /usr/bin/zsh() [0xbe8f4] # wait_for_processes + 184 in section .text of /usr/bin/zsh 00:37:58.945 getrusage(RUSAGE_CHILDREN, {ru_utime={tv_sec=0, tv_usec=39745}, ru_stime={tv_sec=0, tv_usec=18066}, ru_maxrss=2352, ru_ixrss=0, ru_idrss=0, ru_isrss=0, ru_minflt=134, ru_majflt=0, ru_nswap=0, ru_inblock=0, ru_oublock=0, ru_msgsnd=0, ru_msgrcv=0, ru_nsignals=0, ru_nvcsw=1, ru_nivcsw=17}) = 0 > /usr/lib/libc.so.6() [0x233928] > /usr/bin/zsh() [0xbec1c] # wait_for_processes + 992 in section .text of /usr/bin/zsh 00:37:58.960 wait4(-1, 0xafc10048, WNOHANG|WSTOPPED|WCONTINUED, 0xafc0ff44) = -1 ECHILD (No child processes) > /usr/bin/zsh() [0xbe8f4] # wait_for_processes + 184 in section .text of /usr/bin/zsh 00:37:58.976 sigreturn({mask=[WINCH]}) = 6 > /usr/lib/libc.so.6() [0x233928] > /usr/bin/zsh() [0x5270c] # readoutput + 472 in section .text of /usr/bin/zsh 00:37:58.985 read(12, "", 64) = 0 > /usr/bin/zsh() [0x5270c] # readoutput + 472 in section .text of /usr/bin/zsh 00:37:58.995 rt_sigprocmask(SIG_BLOCK, [CHLD], [WINCH], 8) = 0 > /usr/bin/zsh() [0x52b8c] # readoutput + 1624 in section .text of /usr/bin/zsh 00:37:59.007 close(12) = 0 > /usr/bin/zsh() [0x5276c] # readoutput + 568 in section .text of /usr/bin/zsh 00:37:59.016 rt_sigprocmask(SIG_BLOCK, [CHLD], [CHLD WINCH], 8) = 0 > /usr/bin/zsh() [0x79d7c] # waitforpid + 360 in section .text of /usr/bin/zsh 00:37:59.028 kill(5, 0) = -1 ESRCH (No such process) > /usr/bin/zsh() [0x79db0] # waitforpid + 412 in section .text of /usr/bin/zsh 00:37:59.037 rt_sigprocmask(SIG_SETMASK, [INT], [CHLD WINCH], 8) = 0 > /usr/bin/zsh() [0xbb8c8] # signal_suspend + 132 in section .text of /usr/bin/zsh 00:37:59.049 pause() = ? 00:43:28.943 +++ killed by SIGKILL +++ ``` If I understood correctly: 1) it spawns a child process to resolve `uname -s` (pid 5) 2) reads "Linux\n" from the child stdout (fd 12) 3) receives a SIGCHLD signal and "resolves" it with wait4? (pid 5 stops existing?) 4) reads EOF from the child stdout (fd 12) 5) tries to kill the child (pid 5 no longer exists) 6) starts waiting for a SIGCHLD signal The sigprocmask+pause indicates that it is running the POSIX_SIGNALS + BROKEN_POSIX_SIGSUSPEND code. The build log of the previous zsh package is not available so it is unknown which of these possibilities is correct: 1) something changed in one of it's dependencies, causing it to start using the current code path 2) something changed in the zsh code and this particular code path stopped working properly
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: ArchPOWER/packages#93
No description provided.