Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Salt-Call Hangs when IPv6 is disabled on System #32719

Closed
azweb76 opened this issue Apr 20, 2016 · 15 comments
Closed

Salt-Call Hangs when IPv6 is disabled on System #32719

azweb76 opened this issue Apr 20, 2016 · 15 comments
Assignees
Labels
Bug broken, incorrect, or confusing behavior cannot-reproduce cannot be replicated with info/context provided Core relates to code central or existential to Salt fixed-pls-verify fix is linked, bug author to confirm fix Grains severity-medium 3rd level, incorrect or bad functionality, confusing and lacks a work around
Milestone

Comments

@azweb76
Copy link

azweb76 commented Apr 20, 2016

Description of Issue/Question

When calling salt-call --local state.apply or salt-call --local grains.items, there is a 30-60s hang while the grains/core.py fqdn_ip6 attempts to resolve ipv6 info.

Setup

default installation (on CentOS 7)

Steps to Reproduce Issue

  1. disable ipv6
  2. run salt-call --local grains.items --log-level=trace

Versions Report

v2015.8.8.2

@terminalmage @beardedeagle

@terminalmage terminalmage self-assigned this Apr 20, 2016
@terminalmage terminalmage added Bug broken, incorrect, or confusing behavior Core relates to code central or existential to Salt Grains TEAM Core labels Apr 20, 2016
@terminalmage
Copy link
Contributor

Thanks, I'll be hacking away at this at the sprint tonight. 😄

@terminalmage
Copy link
Contributor

Hmm, I can't reproduce:

# sysctl net.ipv6.conf.all.disable_ipv6
net.ipv6.conf.all.disable_ipv6 = 1
# ip addr | grep inet6
# time salt-call --local grains.item ipv6
local:
    ----------
    ipv6:
salt-call --local grains.item ipv6  0.36s user 0.14s system 66% cpu 0.750 total
# time salt-call --local grains.item ip6_interfaces
local:
    ----------
    ip6_interfaces:
        ----------
        lo:
        lxcbr0:
        wlp1s0:
salt-call --local grains.item ip6_interfaces  0.36s user 0.13s system 65% cpu 0.757 total

@terminalmage
Copy link
Contributor

I did test on Arch Linux though, I'll try on a Cent7 VM

@terminalmage
Copy link
Contributor

terminalmage commented Apr 21, 2016

Sitting here with @beardedeagle at the sprint and he's not able to reproduce either. This looks to be a problem with DNS resolution, since that is what fqdn_ip6 is doing. After a bit of RTFM'ing I found that DNS resolution times out by default after 30 seconds (see here), which might explain the lag.

I'd have a look at your /etc/resolv.conf and see if the nameservers being used are the problem. If other nameservers can be substituted, try editing /etc/resolv.conf and see if the problem persists.

@beardedeagle
Copy link

As @terminalmage said, unfortunately I am unable to reproduce this issue. I spun up a new centos 7 server in the same environment, installed masterless salt and ran salt-call --local grains.items --log-level=trace:

# time salt-call --local grains.items --log-level=trace
real    0m1.299s
user    0m0.530s
sys 0m0.182s

@jfindlay jfindlay added the cannot-reproduce cannot be replicated with info/context provided label Apr 21, 2016
@jfindlay jfindlay added this to the Approved milestone Apr 21, 2016
@jfindlay
Copy link
Contributor

As @terminalmage and @beardedeagle have both verified that this is working, I'm going to close this until if a more specific, concrete case can be provided.

@cetra3
Copy link

cetra3 commented Aug 5, 2016

@jfindlay,

I've managed to run into this issue also.

Steps to reproduce:

  • Disable ipv6:
sysctl net.ipv6.conf.all.disable_ipv6=1
sysctl net.ipv6.conf.lo.disable_ipv6=1
sysctl net.ipv6.conf.default.disable_ipv6=1
  • Make sure there is no AAAA record for the hostname of the server in /etc/hosts

To fix this, you can either reenable ipv6:

sysctl net.ipv6.conf.all.disable_ipv6=0
sysctl net.ipv6.conf.lo.disable_ipv6=0
sysctl net.ipv6.conf.default.disable_ipv6=0

Or add an entry into the /etc/hosts file equal to the hostname, pointing to ipv6 loopback.

I.e, if your server's hostname is server.example.com, add the following line to /etc/hosts:

::1 server server.example.com

@terminalmage
Copy link
Contributor

I still can't reproduce this.

% ip addr | grep inet6 | wc -l
0
% sudo sysctl -a 2>/dev/null | grep disable_ipv6
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1
net.ipv6.conf.lxcbr0.disable_ipv6 = 1
net.ipv6.conf.wlp1s0.disable_ipv6 = 1
% grep `hostname` /etc/hosts
::1             localhost tardis

And when I run salt-call:

# time salt-call --local grains.items | tail
    systemd:
        ----------
        features:
            +PAM -AUDIT -SELINUX -IMA -APPARMOR +SMACK -SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN
        version:
            231
    virtual:
        physical
    zmqversion:
        4.1.2
salt-call --local grains.items  0.48s user 0.14s system 96% cpu 0.645 total
tail  0.00s user 0.00s system 0% cpu 0.644 total

@terminalmage
Copy link
Contributor

I removed the /etc/hosts line and it still doesn't timeout. What is in your resolv.conf? I don't happen to have any ipv6 DNS hosts there.

@cetra3
Copy link

cetra3 commented Aug 5, 2016

When I tcpdump traffic, the DNS server set in resolv.conf returns nxdomain for the AAAA record of the hostname of the server. Not sure if this has any grounding on the problem.

@terminalmage terminalmage reopened this Aug 5, 2016
@terminalmage terminalmage modified the milestones: C 6, Approved Aug 5, 2016
@terminalmage terminalmage added the fixed-pls-verify fix is linked, bug author to confirm fix label Aug 5, 2016
terminalmage added a commit to terminalmage/salt that referenced this issue Aug 5, 2016
This prevents DNS resolution issues from causing ``salt-call --local``
to hang.

Resolves saltstack#32719.
@terminalmage
Copy link
Contributor

@cetra3 OK, I still can't reproduce this but I came up with a solution that I think should fix the problem for you (and others who have run into this corner case). I've opened pull request #35233, which simply skips compilation of fqdn_ipv6 (or fqdn_ipv4) grains when the associated list of IP addresses is empty. So, when ipv6 is disabled and there are no ipv6 addresses found on the minion, we won't even attempt a socket.getaddrinfo().

@cetra3
Copy link

cetra3 commented Aug 5, 2016

OK cheers for that. I will give it a shot in my environment and report back

@terminalmage
Copy link
Contributor

Thanks!

@cetra3
Copy link

cetra3 commented Aug 7, 2016

Ok, I've tested #35233 and it appears to work a lot better:

Without the change:

time salt-call test.ping
local:
    True

real    0m27.815s
user    0m0.558s
sys 0m0.099s

With the change:

time salt-call test.ping
local:
    True

real    0m1.852s
user    0m0.665s
sys 0m0.154s

@cachedout
Copy link
Contributor

Cool. I'll go ahead and close this then. Thanks!

@meggiebot meggiebot added the severity-medium 3rd level, incorrect or bad functionality, confusing and lacks a work around label Aug 9, 2016
akanto added a commit to hortonworks/salt-bootstrap that referenced this issue Oct 21, 2016
keyki pushed a commit to hortonworks/salt-bootstrap that referenced this issue Oct 21, 2016
akanto added a commit to hortonworks/cloudbreak that referenced this issue Oct 24, 2016
keyki pushed a commit to hortonworks/cloudbreak that referenced this issue Oct 24, 2016
akanto added a commit to hortonworks/cloudbreak that referenced this issue Oct 24, 2016
keyki pushed a commit to hortonworks/cloudbreak that referenced this issue Oct 24, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug broken, incorrect, or confusing behavior cannot-reproduce cannot be replicated with info/context provided Core relates to code central or existential to Salt fixed-pls-verify fix is linked, bug author to confirm fix Grains severity-medium 3rd level, incorrect or bad functionality, confusing and lacks a work around
Projects
None yet
Development

No branches or pull requests

7 participants