最近在一台外网机器上起了salt minion ,但是同事发现/etc/salt/minion_id
不对,之前自动生成的minion_id 都是机器的/etc/hostname
,这回变成了一个奇怪的域名:cncXXXX.XXX.ln.cn
,并且这个域名和使用 hostname --all-fqdns
返回的结果相同。先查下minion_id 是怎么生成的。
Github:saltstack/salt/doc/topics/tutorials/walkthrough.rst:
When the minion is started, it will generate an id value, unless it has been generated on a previous run and cached (in /etc/salt/minion_id by default). This is the name by which the minion will attempt to authenticate to the master. The following steps are attempted, in order to try to find a value that is not localhost:
- The Python function socket.getfqdn() is run
- /etc/hostname is checked (non-Windows only)
- /etc/hosts (%WINDIR%\system32\drivers\etc\hosts on Windows hosts) is checked for hostnames that map to anything within 127.0.0.0/8.
If none of the above are able to produce an id which is not localhost, then a sorted list of IP addresses on the minion (excluding any within 127.0.0.0/8) is inspected. The first publicly-routable IP address is used, if there is one. Otherwise, the first privately-routable IP address is used.
If all else fails, then localhost is used as a fallback.
应该是第一点,通过socket.getfqdn
拿到的结果,也验证了和上文提到的hostname --all-fqdns
拿到的结果一样。
FQDN是什么?
FQDN全称是"Fully qualified domain name",中文叫完整网域名称,更细致的介绍看鸟哥的解释完整主机名称: Fully Qualified Domain Name (FQDN)。
socket.getfqdn()
看下这个函数的实现, socket.py:
def getfqdn(name=''):
"""Get fully qualified domain name from name.
An empty argument is interpreted as meaning the local host.
First the hostname returned by gethostbyaddr() is checked, then
possibly existing aliases. In case no FQDN is available, hostname
from gethostname() is returned.
"""
name = name.strip()
if not name or name == '0.0.0.0':
name = gethostname()
try:
hostname, aliases, ipaddrs = gethostbyaddr(name)
except error:
pass
else:
aliases.insert(0, hostname)
for name in aliases:
if '.' in name:
break
else:
name = hostname
return name
返回结果应该是gethostbyaddr() 这个函数返回的,这个函数的介绍如下:
gethostbyaddr() -- map an IP number or hostname to DNS info.
拿到的结果应该就是一个反查IP的时候拿到的PTR记录。证实一下:
看来这个IP 之前有人用过,然后添加了一条PTR 记录,也没办法删了,再继续看看这个gethostbyaddr 的实现和工作原理,看看能怎么解决。
关于这个函数更详细的介绍,来自python.org/2/library/socket:
socket.gethostbyaddr(ip_address)
Return a triple (hostname, aliaslist, ipaddrlist) where hostname is the primary host name responding to the given ip_address, aliaslist is a (possibly empty) list of alternative host names for the same address, and ipaddrlist is a list of IPv4/v6 addresses for the same interface on the same host (most likely containing only a single address). To findthe fully qualified domain name, use the function getfqdn(). gethostbyaddr() supports both IPv4 and IPv6.
这个函数的实现:cpython/socketmodule.c, 应该是调用了系统的gethostbyaddr_r,不过也有一个自己实现的cpython/getaddrinfo.c),Linux 系统提供的gethostbyaddr 的介绍,实现上应该是类似的, GETHOSTBYNAME(3), 这个函数依赖了下列三个配置文件:
/etc/host.conf
resolver configuration file
/etc/hosts
host database file
/etc/nsswitch.conf
name service switch configuration
其中nsswitch.conf 定义了一些C 函数库进行操作搜索的动作,先后顺序。
The Name Service Switch (NSS) is a facility in Unix-like operating systems that provides a variety of sources for common configuration databases and name resolution mechanisms. These sources include local operating system files (such as /etc/passwd, /etc/group, and /etc/hosts), the Domain Name System (DNS), the Network Information Service (NIS), and LDAP.
nsswith.conf
文件中指定了hosts 查询的先后顺序:
hosts Host names and numbers, used by gethostbyname(3) and related functions.
目前的配置如下:
# dns Use DNS (Domain Name Service)
# files Use the local files
#hosts: db files nisplus nis dns
hosts: files dns myhostname
参考手册,按照配置的定义,关于hosts,会先查files,而对于hosts,对应的files 是/etc/hosts,然后是dns 查询,最后是看/etc/hostname
。
做下实验,把配置改为:
hosts: files
socket.getfqdn()
返回结果为空。
配置改成:
hosts: myhostname files dns
socket.getfqdn()
返回结果为/etc/hostname
的内容。
到这里可以找到解决方法了,把文章开头出问题的这台机器的nsswitch.conf
如上配置,myhostname 放到前面即可解决问题。
hostname --all-fqdns
在返回来看看,一开始hostname --all-fqdns
的结果和socket.getfqdn()
一样,并再上文更改nsswithc.conf
的配置过程中,插件结果表现也是一致的。
找Hostname 的源代码看下,
getnameinfo - address-to-name translation in protocol-independent manner
getnameinfo() 的功能和 gethostbyaddr() 类似,具体见:GETNAMEINFO(3) ,也可以通过nsswitch.conf
配置hosts 的查询顺序。