记录一次高可用集群使用kubeadm v1.9.6升级v1.13 的问题
项目中使用kubeadm 将k8s版本从v1.9.6升级到1.13.4,由于无法跨版本升级,所以大致流程是
- v1.9.6 -> v1.10.8
- v1.10.8 -> v1.11.5
- v1.11.5 -> v1.12.4
- v1.12.4 -> v1.13.4
其中操作集群A从v1.9.6
到v1.10.8
升级过程中,执行upgrade
的时候出现了如下问题:
[root@cloud-cn-master-1 ~]# kubeadm upgrade apply v1.10.8 -y
...
[upgrade/prepull] Successfully prepulled the images for all the control plane components
[upgrade/apply] Upgrading your Static Pod-hosted control plane to version "v1.10.8"...
Static pod: kube-apiserver-rz-dev-master01 hash: a82830fd687fdabd030b65ee6c4b4fd4
Static pod: kube-controller-manager-rz-dev-master01 hash: 1a23c184fadb64c889a41831476c56e8
Static pod: kube-scheduler-rz-dev-master01 hash: a0adc2bf23e7d5336ecd4677ce95938c
[upgrade/staticpods] Writing new Static Pod manifests to "/etc/kubernetes/tmp/kubeadm-upgraded-manifests500670946"
[controlplane] Adding extra host path mount "k8s" to "kube-controller-manager"
[upgrade/staticpods] current and new manifests of kube-apiserver are equal, skipping upgrade
[upgrade/staticpods] Moved new manifest to "/etc/kubernetes/manifests/kube-controller-manager.yaml" and backed up old manifest to "/etc/kubernetes/tmp/kubeadm-backup-manifests-2019-04-22-13-04-58/kube-controller-manager.yaml"
[upgrade/staticpods] Waiting for the kubelet to restart the component
[upgrade/staticpods] This might take a minute or longer depending on the component/version gap (timeout 5m0s)
Static pod: kube-controller-manager-rz-dev-master01 hash: 1a23c184fadb64c889a41831476c56e8
Static pod: kube-controller-manager-rz-dev-master01 hash: 1a23c184fadb64c889a41831476c56e8
Static pod: kube-controller-manager-rz-dev-master01 hash: 1a23c184fadb64c889a41831476c56e8
Static pod: kube-controller-manager-rz-dev-master01 hash: 1a23c184fadb64c889a41831476c56e8
Static pod: kube-controller-manager-rz-dev-master01 hash: 1a23c184fadb64c889a41831476c56e8
...
升级过程中一直卡在校验hash
这步,由于之前测试没有出现类似情况,而此次出问题和测试环境唯一的区别就是该环境升级过证书,
默认证书有效期是1年,此环境更新成30年了,所以尝试先把证书还原后再次执行upgrade
,升级成功,具体原因未排查。
在另一个环境从v1.12
升级到v1.13.4
版本时候,出现了如下情况:
[root@cloud-cn-master-1 ~]# kubeadm upgrade apply v1.13.4 -y
[preflight] Running pre-flight checks.
[upgrade] Making sure the cluster is healthy:
[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
FATAL: failed to get node registration: node doesn't have kubeadm.alpha.kubernetes.io/cri-socket annotation
报错提示node
缺少annotation
,而查看操作kubeadm
升级的master node
,是存在cri-socket
的annotation
的,但是其他两台master
不存在,于是尝试手动添加annotation
,然后再次跑upgrade
[root@cloud-cn-master-1 ~]# kubectl annotate node <nodename> kubeadm.alpha.kubernetes.io/cri-socket=/var/run/dockershim.sock
[root@cloud-cn-master-1 ~]# kubeadm upgrade apply v1.13.4 -y
[preflight] Running pre-flight checks.
[upgrade] Making sure the cluster is healthy:
[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[upgrade/config] FATAL: failed to getAPIEndpoint: failed to get APIEndpoint information for this node
又报错找不到APIEndpoint
,尝试执行看下kubeadm-config
的数据
[root@cloud-cn-master-1 ~]# kubectl -n kube-system get cm kubeadm-config -oyaml|grep -A5 cloud-cn-master-1
ClusterStatus: |
apiEndpoints:
cloud-cn-master-1:
advertiseAddress: 10.0.128.251
bindPort: 6443
apiVersion: kubeadm.k8s.io/v1beta1
kind: ClusterStatus
发现该apiEndpoints
是存在的,但是只存在这一个节点,于是尝试edit cm
把另外两个节点的apiEndpoint
也配置上,再次upgrade
,
发现又出现了校验hash
不通过的错误,于是尝试去看下kubeadm
的源码,理清楚这个config
阶段的思路,
源码可以参考此处,
// if this isn't a new controlplane instance (e.g. in case of kubeadm upgrades)
// get nodes specific information as well
if !newControlPlane {
// gets the nodeRegistration for the current from the node object
if err := getNodeRegistration(kubeconfigDir, client, &initcfg.NodeRegistration); err != nil {
return nil, errors.Wrap(err, "failed to get node registration")
}
// gets the APIEndpoint for the current node from then ClusterStatus in the kubeadm-config ConfigMap
if err := getAPIEndpoint(configMap.Data, initcfg.NodeRegistration.Name, &initcfg.LocalAPIEndpoint); err != nil {
return nil, errors.Wrap(err, "failed to getAPIEndpoint")
}
}
该部分即对应执行upgrade
的逻辑,先去getNodeRegistration
然后去getAPIEndpoint
,看下getAPIEndpoint的逻辑
// getAPIEndpoint returns the APIEndpoint for the current node
func getAPIEndpoint(data map[string]string, nodeName string, apiEndpoint *kubeadmapi.APIEndpoint) error {
// gets the ClusterStatus from kubeadm-config
clusterStatus, err := UnmarshalClusterStatus(data)
if err != nil {
return err
}
// gets the APIEndpoint for the current machine from the ClusterStatus
e, ok := clusterStatus.APIEndpoints[nodeName]
if !ok {
return errors.New("failed to get APIEndpoint information for this node")
}
apiEndpoint.AdvertiseAddress = e.AdvertiseAddress
apiEndpoint.BindPort = e.BindPort
return nil
}
会根据nodeName
和kubeadm-config
这个configmap
的数据去拿APIEndpoint
的AdvertiseAddress
和BindPort
信息,但是手动确认过确实是存在
APIEndpoint
的配置的,所以再次查看传过来的nodeName
是否正确,由于nodeName
是从NodeRegistration
中获取的先看下NodeRegistration
的获取逻辑:
// getNodeRegistration returns the nodeRegistration for the current node
func getNodeRegistration(kubeconfigDir string, client clientset.Interface, nodeRegistration *kubeadmapi.NodeRegistrationOptions) error {
// gets the name of the current node
nodeName, err := getNodeNameFromKubeletConfig(kubeconfigDir)
if err != nil {
return errors.Wrap(err, "failed to get node name from kubelet config")
}
// gets the corresponding node and retrieves attributes stored there.
node, err := client.CoreV1().Nodes().Get(nodeName, metav1.GetOptions{})
if err != nil {
return errors.Wrap(err, "failed to get corresponding node")
}
criSocket, ok := node.ObjectMeta.Annotations[constants.AnnotationKubeadmCRISocket]
if !ok {
return errors.Errorf("node %s doesn't have %s annotation", nodeName, constants.AnnotationKubeadmCRISocket)
}
// returns the nodeRegistration attributes
nodeRegistration.Name = nodeName
nodeRegistration.CRISocket = criSocket
nodeRegistration.Taints = node.Spec.Taints
// NB. currently nodeRegistration.KubeletExtraArgs isn't stored at node level but only in the kubeadm-flags.env
// that isn't modified during upgrades
// in future we might reconsider this thus enabling changes to the kubeadm-flags.env during upgrades as well
return nil
}
发现nodeName
是通过getNodeNameFromKubeletConfig
获取的,也就是说读取的是kubelet.conf
配置,看下getNodeNameFromKubeletConfig
逻辑:
// getNodeNameFromConfig gets the node name from a kubelet config file
// TODO: in future we want to switch to a more canonical way for doing this e.g. by having this
// information in the local kubelet config.yaml
func getNodeNameFromKubeletConfig(kubeconfigDir string) (string, error) {
// loads the kubelet.conf file
fileName := filepath.Join(kubeconfigDir, constants.KubeletKubeConfigFileName)
config, err := clientcmd.LoadFromFile(fileName)
if err != nil {
return "", err
}
// gets the info about the current user
authInfo := config.AuthInfos[config.Contexts[config.CurrentContext].AuthInfo]
// gets the X509 certificate with current user credentials
var certs []*x509.Certificate
if len(authInfo.ClientCertificateData) > 0 {
// if the config file uses an embedded x509 certificate (e.g. kubelet.conf created by kubeadm), parse it
if certs, err = certutil.ParseCertsPEM(authInfo.ClientCertificateData); err != nil {
return "", err
}
} else if len(authInfo.ClientCertificate) > 0 {
// if the config file links an external x509 certificate (e.g. kubelet.conf created by TLS bootstrap), load it
if certs, err = certutil.CertsFromFile(authInfo.ClientCertificate); err != nil {
return "", err
}
} else {
return "", errors.New("invalid kubelet.conf. X509 certificate expected")
}
// We are only putting one certificate in the certificate pem file, so it's safe to just pick the first one
// TODO: Support multiple certs here in order to be able to rotate certs
cert := certs[0]
// gets the node name from the certificate common name
return strings.TrimPrefix(cert.Subject.CommonName, constants.NodesUserPrefix), nil
}
发现kubeadm先加载本机的kubelet.conf
文件,然后尝试去找当前context中配置的用户的client-certificate-data数据,然后解析cert
的信息,找到subject的CommanName,来当作NodeName然后去kubeadm-config中找对应的apiEndpoint,所以尝试解析下当前的证书数据,看下CN的值:
#先获取client-certificate-data并做base64解密得到cert信息,$client-certificate-data为kubelet.conf中client-certificate-data对应的内容
[root@cloud-cn-master-1 ~]# echo $client-certificate-data |base64 -d > kubelet.crt
[root@cloud-cn-master-1 ~]# openssl x509 -in kubelet.crt -text |grep -i Subject
Subject: O=system:nodes, CN=system:node:cloud-cn-master-2
Subject Public Key Info:
👆,好吧原因定位到了,是因为获取到的CommanName
与本机不匹配,于是在cloud-cn-master-1
上获取到的是cloud-cn-master-2
的nodeName
,
然后去获取annotation
自然也就拿不到了,然后即使手动annotate
了,也是临时过了一步,到configmap
中取apiEndpoint
自然也获取不到,
哪怕再手动维护apiEndpoint
,那么后续到校验hash也是无法通过,所以根本原因在于升级证书的时候没有kubelet
证书被覆盖了,导致一系列的问题