Skip to content

segfault together with "import cv2" #43834

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ppwwyyxx opened this issue Aug 29, 2020 · 4 comments
Closed

segfault together with "import cv2" #43834

ppwwyyxx opened this issue Aug 29, 2020 · 4 comments
Labels
high priority module: build Build system issues module: crash Problem manifests as a hard crash, as opposed to a RuntimeError module: dependency bug Problem is not caused by us, but caused by an upstream library we use triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@ppwwyyxx
Copy link
Collaborator

ppwwyyxx commented Aug 29, 2020

🐛 Bug

To reproduce:

╰─$cat Dockerfile
FROM centos/python-36-centos7
USER root
WORKDIR /root
RUN yum update -y
RUN pip install --upgrade pip
RUN pip install opencv-python
RUN pip install -U matplotlib
RUN pip install torch==1.6.0+cpu -f https://download.pytorch.org/whl/torch_stable.html
COPY a.py /root/a.py
RUN python -X faulthandler a.py

╰─$cat a.py
import torch
try:
    import cv2
except:
    pass
import matplotlib.pyplot as plt

╰─$docker build   -t test:v0 .      

..................
Step 10/10 : RUN python -X faulthandler a.py
 ---> Running in 1b7bc36c9727
Fatal Python error: Segmentation fault

Thread 0x00007f185a430740 (most recent call first):
  File "/opt/rh/rh-python36/root/usr/lib64/python3.6/json/encoder.py", line 356 in _iterencode_dict
The command '/bin/sh -c python -X faulthandler a.py' returned a non-zero code: 139

A lower-level repro (to replace a.py above):

from ctypes import cdll

def load(x):
    print("Loading", x)
    try:
        cdll.LoadLibrary(x)
    except Exception as e:
        print("\tFailed", e)
        pass
    else:
        print("\tSucc")

load("/opt/app-root/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so")
#load("/opt/app-root/lib/python3.6/site-packages/cv2/../opencv_python.libs/libz-d8a329de.so.1.2.7")
load("/opt/app-root/lib/python3.6/site-packages/cv2/../opencv_python.libs/libcrypto-354cbd1a.so.1.1")
load("libgcc_s.so.1")

logs for this one:

Loading /opt/app-root/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so                                                                                                         
        Succ                                                                                                                                                                                        
Loading /opt/app-root/lib/python3.6/site-packages/cv2/../opencv_python.libs/libcrypto-354cbd1a.so.1.1                                                                                   
        Failed libz-d8a329de.so.1.2.7: cannot open shared object file: No such file or directory                                                                                                    
Loading libgcc_s.so.1                                                                                                                                                                              
                                                                                                                                                                                     
Program received signal SIGSEGV, Segmentation fault.                                                                                                                                     
0x00007f7340b4bbd6 in elf_machine_rela (reloc=0x7f732c79c5a8, reloc=0x7f732c79c5a8, skip_ifunc=0, reloc_addr_arg=0x7f732ca03ec8, version=0x150, sym=0x7f732c7309f8, map=0x1fcfb30)                 
    at ../sysdeps/x86_64/dl-machine.h:299                                                                                                                                                           
299           struct link_map *sym_map = RESOLVE_MAP (&sym, version, r_type);                                                                                                                       
(gdb) bt                                                                                                                                                                                            
#0  0x00007f7340b4bbd6 in elf_machine_rela (reloc=0x7f732c79c5a8, reloc=0x7f732c79c5a8, skip_ifunc=0, reloc_addr_arg=0x7f732ca03ec8, version=0x150, sym=0x7f732c7309f8, map=0x1fcfb30)             
    at ../sysdeps/x86_64/dl-machine.h:299                                                                                                                                                           
#1  elf_dynamic_do_Rela (skip_ifunc=0, lazy=<optimized out>, nrelative=<optimized out>, relsize=<optimized out>, reladdr=<optimized out>, map=0x1fcfb30) at do-rel.h:137                           
#2  _dl_relocate_object (scope=<optimized out>, reloc_mode=reloc_mode@entry=0, consider_profiling=<optimized out>, consider_profiling@entry=0) at dl-reloc.c:259   
#3  0x00007f7340b5465c in dl_open_worker (a=a@entry=0x7fff80e6ce18) at dl-open.c:423                                                                                                               
#4  0x00007f7340b4f7c4 in _dl_catch_error (objname=objname@entry=0x7fff80e6ce08, errstring=errstring@entry=0x7fff80e6ce10, mallocedp=mallocedp@entry=0x7fff80e6ce00,
    operate=operate@entry=0x7f7340b54150 <dl_open_worker>, args=args@entry=0x7fff80e6ce18) at dl-error.c:177                                                                                       
#5  0x00007f7340b53b7b in _dl_open (file=0x7f733f7d5d70 "libgcc_s.so.1", mode=-2147483646, caller_dlopen=<optimized out>, nsid=-2, argc=2, argv=0x7fff80e6e0e8, env=0x7fff80e6e100)                 
    at dl-open.c:649                                                                                                 
#6  0x00007f73401f8fab in dlopen_doit (a=a@entry=0x7fff80e6d020) at dlopen.c:66                                                                                                                     
#7  0x00007f7340b4f7c4 in _dl_catch_error (objname=0x102beb0, errstring=0x102beb8, mallocedp=0x102bea8, operate=0x7f73401f8f50 <dlopen_doit>, args=0x7fff80e6d020) at dl-error.c:177               
#8  0x00007f73401f95ad in _dlerror_run (operate=operate@entry=0x7f73401f8f50 <dlopen_doit>, args=args@entry=0x7fff80e6d020) at dlerror.c:163                                         
#9  0x00007f73401f9041 in __dlopen (file=<optimized out>, mode=<optimized out>) at dlopen.c:87                                                            
#10 0x00007f733f354401 in py_dl_open (self=<optimized out>, args=<optimized out>) at /usr/src/debug/Python-3.6.9/Modules/_ctypes/callproc.c:1377                                            

cc @ezyang @gchanan @zou3519 @malfet @seemethere @walterddr

@malfet malfet added high priority module: build Build system issues module: crash Problem manifests as a hard crash, as opposed to a RuntimeError triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Aug 30, 2020
@ezyang
Copy link
Contributor

ezyang commented Sep 14, 2020

Segfaulting in the dynamic linker, that's a new one!

@ppwwyyxx
Copy link
Collaborator Author

It seems more like a bug in glibc of centos7, which incorrectly handles a failed second dlopen. It doesn't happen when I tried the same in centos8 with the same python/opencv/pytorch version.

@ezyang ezyang added the module: dependency bug Problem is not caused by us, but caused by an upstream library we use label Sep 15, 2020
@ezyang
Copy link
Contributor

ezyang commented Sep 15, 2020

OK, I'll go ahead and close for now, unless you think we should be trying to work around this issue.

@ppwwyyxx
Copy link
Collaborator Author

This turns out to be a glibc bug https://sourceware.org/bugzilla/show_bug.cgi?id=20839 triggered by some libs in opencv. The latest opencv-python has added a workaround for it. (details at opencv/opencv-python#381)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
high priority module: build Build system issues module: crash Problem manifests as a hard crash, as opposed to a RuntimeError module: dependency bug Problem is not caused by us, but caused by an upstream library we use triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

4 participants