Ubuntu 16.04部署Kubernetes v1.12.1集群并部署Polyaxon服务器

Ubuntu使用Kubernetes集群部署Polyaxon

主要步骤

  1. 系统环境设置
    1.1 准备
    1.2 查看系统信息
    1.3 查看Mac地址、产品uuid、Hostname
    1.4 关闭防火墙
    1.5 禁用SELINUX
    1.6 关闭swap
    1.7 配置/etc/hosts文件
    1.8 示例说明
  2. 安装docker-ce
    2.1 添加 docker 源
    2.2 查看 docker 版本
    2.3 安装 docker 18.06.1-ce
    2.4 验证 docker 的安装
  3. Kubernetes (k8s)
    3.1 说明
    3.2 各节点安装kubelet、kubeadm、kubectl
    3.3 Master使用kubeadm创建一个单Master集群
    3.4 安装网络插件
    3.5 Master隔离
    3.6 子节点加入master节点
    3.7 测试
    3.8 卸载清理k8s
  4. 安装helm
    4.1 Master安装Helm客户端
    4.2 Master安装Helm Tiller服务端
  5. polyapxon
    5.1 部署Polyapxon服务器
    5.2 安装Polyaxon客户端
    5.3 卸载Polyaxon

部署表

/系统环境设置安装 Docker安装 KubeadmKubeadm init安装网络插件Kubeadm join安装 helm安装 polyaxon
Master×
Slave/子节点××××

系统环境设置(所有机器)

1.1 准备

  • 若干台 Ubuntu 16.04 或 CentOS 7 系统
  • 每台机器相互SSH免密登录
  • 每台机器2 GB或更多的内存
  • 集群中所有机器之间网络连接正常
  • 所有机器具有不同的Mac地址、产品uuid、Hostname

1.2 查看系统信息

执行如下语句查看系统信息:

lsb_release -a

1.3 查看Mac地址、产品uuid、Hostname

Kubernetes要求集群中所有机器具有不同的Mac地址、产品uuid、Hostname。可以使用如下命令查看:

# UUID
cat /sys/class/dmi/id/product_uuid# Mac地址
ip link# Hostname
cat /etc/hostname

1.4 关闭防火墙

关闭防火墙:

systemctl stop firewalld
systemctl disable firewalld

CendOS需要打开相应的端口,详见:Check required ports。

1.5 禁用SELINUX:

禁用SELINUX:

setenforce 0
vim /etc/selinux/config
#写入
SELINUX=disabled
#保存并退出

1.6 关闭swap

Kubernetes 1.8 开始要求必须禁用Swap,如果不关闭,默认配置下Kubelet将无法启动,而polyapxon只支持Kubernetes 1.8版本及以上。编辑系统/etc/fstab文件,注释掉引用swap的行:


保存并重启后执行:

sudo swapoff -a

1.7 配置/etc/hosts文件

执行sudo vim /etc/hosts写入:

192.168.0.6 master
192.168.0.5 slave1
192.168.0.7 slave2

保存并退出

1.8 示例说明

本例使用3台ubuntu 16.04主机运行在同一个内网中

/IPhostname
Master192.168.0.6seeta-03
Slave1192.168.0.5seeta-02
Slave2192.168.0.7seeta-0002

2. 安装docker-ce

Kubernetes 用于管理 docker ,docker的版本要兼容kubernetes,可以到官网兼容性列表查看,想要安装的是哪个版本Kubernetes,就看哪个版本的CHANGELOG。本文中安装的是1.12.1版本即查看CHANGELOG-1.12.md ,下图可以看到docker 最高兼容到 18.06,docker建议尽量安装最新版本。

2.1 添加 docker 源

sudo apt-get update
sudo apt-get -y install apt-transport-https ca-certificates curl software-properties-common
curl -fsSL  | sudo apt-key add -
sudo add-apt-repository "deb [arch=amd64]  $(lsb_release -cs) stable"
sudo apt-get -y update

2.2 查看 docker 版本

sudo apt-cache madison docker-ce

2.3 安装 docker 18.06.1-ce

sudo apt install docker-ce=18.06.1~ce~3-0~ubuntu
sudo systemctl enable docker

2.4 验证 docker 的安装

执行如下命令查看docker版本

sudo docker version

输出:

Client:Version:           18.06.1-ceAPI version:       1.38Go version:        go1.10.3Git commit:        e68fc7aBuilt:             Tue Aug 21 17:24:56 2018OS/Arch:           linux/amd64Experimental:      falseServer:Engine:Version:          18.06.1-ceAPI version:      1.38 (minimum version 1.12)Go version:       go1.10.3Git commit:       e68fc7aBuilt:            Tue Aug 21 17:23:21 2018OS/Arch:          linux/amd64Experimental:     false

3. Kubernetes

3.1 说明

Kubernetes 是一个开源的,用于管理云平台中多个主机上的容器化的应用,Kubernetes的目标是让部署容器化的应用简单并且高效, Kubernetes 提供了应用部署,规划,更新,维护的一种机制。详情访问:k8s官网

3.2 各节点安装kubelet、kubeadm、kubectl

3.2.1 工具说明
工具说明
kubeadm引导启动k8s集群的命令行工具
kubelet在群集中所有节点上运行的核心组件, 用来执行如启动pods和containers等操作
kubectl操作集群的命令行工具
3.2.2 添加apt-key
sudo apt-get update && apt-get install -y apt-transport-https
curl -s .gpg | sudo apt-key add -
3.2.3 各节点添加kubernetes源
sudo cat <<EOF >/etc/apt/sources.list.d/kubernetes.list   # 输入下面两行内容deb / kubernetes-xenial mainEOF
3.2.4 各节点查看源中的软件版本
sudo apt-cache madison kubelet

输出:

3.2.5 各节点安装 kubelet、kubeadm、kubectl
sudo apt-get update
sudo apt install kubelet=1.12.1-00 kubeadm=1.12.1-00 kubectl=1.12.1-00
sudo apt-mark hold kubelet kubeadm kubectl
3.2.6 各节点查看初始镜像要求
kubeadm config images list

输出:

k8s.gcr.io/kube-apiserver:v1.13.1
k8s.gcr.io/kube-controller-manager:v1.13.1
k8s.gcr.io/kube-scheduler:v1.13.1
k8s.gcr.io/kube-proxy:v1.13.1
k8s.gcr.io/pause:3.1
k8s.gcr.io/etcd:3.2.24
k8s.gcr.io/coredns:1.2.2
3.2.7 各节点拉取镜像

由于国内并不能访问gcr.io,可以使用打tag的方式,我们选择更为简单的方法,通过修改配置文件来镜像配置的实现。在kubeadm v1.11+版本中,增加了一个kubeadm config print-default命令,可以让我们方便的将kubeadm的默认配置打印到文件中,在各个节点的一个安全的路径下如$HOME/k8s/执行下列命令:

kubeadm config print-default > kubeadm.conf

修改kubeadm.conf中的镜像仓储地址:

sed -i "s/imageRepository: .*/imageRepository: registry.aliyuncs\/google_containers/g" kubeadm.conf

指定版本号,避免初始化时从.12.txt读取,使用如下命令来设置:

#注意可以修改版本号
sed -i "s/kubernetesVersion: .*/kubernetesVersion: v1.12.1/g" kubeadm.conf

使用--config参数指定kubeadm.conf文件来运行 kubeadm 的images pull的命令,在kubeadm.conf所在目录下执行:

kubeadm config images pull --config kubeadm.conf

耐心等待,输出:

W0102 18:43:32.305695   28207 common.go:105] WARNING: Detected resource kinds that may not apply: [InitConfiguration MasterConfiguration JoinConfiguration NodeConfiguration]
[config] WARNING: Ignored YAML document with GroupVersionKind kubeadm.k8s.io/v1alpha3, Kind=JoinConfiguration
[config/images] Pulled registry.aliyuncs/google_containers/kube-apiserver:v1.12.1
[config/images] Pulled registry.aliyuncs/google_containers/kube-controller-manager:v1.12.1
[config/images] Pulled registry.aliyuncs/google_containers/kube-scheduler:v1.12.1
[config/images] Pulled registry.aliyuncs/google_containers/kube-proxy:v1.12.1
[config/images] Pulled registry.aliyuncs/google_containers/pause:3.1
[config/images] Pulled registry.aliyuncs/google_containers/etcd:3.2.24
[config/images] Pulled registry.aliyuncs/google_containers/coredns:1.2.2

注意: 基础镜像pause的拉取地址需要单独设置,否则还是会从k8s.gcr.io来拉取,单独打一个tag:

sudo docker tag registry.aliyuncs/google_containers/pause:3.1 k8s.gcr.io/pause:3.1

3.3 Master使用kubeadm创建一个单Master集群

初始化Master节点

通常,我们在执行init命令时,可能还需要指定advertiseAddress--pod-network-cidr等参数,但是由于我们这里使用kubeadm.conf配置文件来初始化,就不在命令行中指定其他参数了,只需在kubeadm.conf来设置:

#你需要更改此处 “ 192.168.0.6 ” 为自己master节点的IP
sed -i "s/advertiseAddress: .*/advertiseAddress: 192.168.0.6/g" kubeadm.conf

--pod-network-cid设置为10.244.0.0/16,修改如下:

#无需修改
sed -i "s/podSubnet: .*/podSubnet: \"10.244.0.0\/16\"/g" kubeadm.conf

执行初始化命令:

sudo kubeadm init --config kubeadm.conf

输出:

W1109 17:01:47.071494   42929 common.go:105] WARNING: Detected resource kinds that may not apply: [InitConfiguration MasterConfiguration JoinConfiguration NodeConfiguration]
[config] WARNING: Ignored YAML document with GroupVersionKind kubeadm.k8s.io/v1alpha3, Kind=JoinConfiguration
[init] using Kubernetes version: v1.12.2
[preflight] running pre-flight checks
[preflight/images] Pulling images required for setting up a Kubernetes cluster
[preflight/images] This might take a minute or two, depending on the speed of your internet connection
[preflight/images] You can also perform this action in beforehand using 'kubeadm config images pull'
[kubelet] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[preflight] Activating the kubelet service
[certificates] Generated ca certificate and key.
[certificates] Generated apiserver certificate and key.
[certificates] apiserver serving cert is signed for DNS names [ubuntu1 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 192.168.0.8]
[certificates] Generated apiserver-kubelet-client certificate and key.
[certificates] Generated front-proxy-ca certificate and key.
[certificates] Generated front-proxy-client certificate and key.
[certificates] Generated etcd/ca certificate and key.
[certificates] Generated etcd/server certificate and key.
[certificates] etcd/server serving cert is signed for DNS names [ubuntu1 localhost] and IPs [127.0.0.1 ::1]
[certificates] Generated apiserver-etcd-client certificate and key.
[certificates] Generated etcd/peer certificate and key.
[certificates] etcd/peer serving cert is signed for DNS names [ubuntu1 localhost] and IPs [192.168.0.8 127.0.0.1 ::1]
[certificates] Generated etcd/healthcheck-client certificate and key.
[certificates] valid certificates and keys now exist in "/etc/kubernetes/pki"
[certificates] Generated sa key and public key.
[kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/admin.conf"
[kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/kubelet.conf"
[kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/controller-manager.conf"
[kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/scheduler.conf"
[controlplane] wrote Static Pod manifest for component kube-apiserver to "/etc/kubernetes/manifests/kube-apiserver.yaml"
[controlplane] wrote Static Pod manifest for component kube-controller-manager to "/etc/kubernetes/manifests/kube-controller-manager.yaml"
[controlplane] wrote Static Pod manifest for component kube-scheduler to "/etc/kubernetes/manifests/kube-scheduler.yaml"
[etcd] Wrote Static Pod manifest for a local etcd instance to "/etc/kubernetes/manifests/etcd.yaml"
[init] waiting for the kubelet to boot up the control plane as Static Pods from directory "/etc/kubernetes/manifests"
[init] this might take a minute or longer if the control plane images have to be pulled
[apiclient] All control plane components are healthy after 57.002438 seconds
[uploadconfig] storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config-1.12" in namespace kube-system with the configuration for the kubelets in the cluster
[markmaster] Marking the node ubuntu1 as master by adding the label "node-role.kubernetes.io/master=''"
[markmaster] Marking the node ubuntu1 as master by adding the taints [node-role.kubernetes.io/master:NoSchedule]
[patchnode] Uploading the CRI Socket information "/var/run/dockershim.sock" to the Node API object "ubuntu1" as an annotation
[bootstraptoken] using token: abcdef.0123456789abcdef
[bootstraptoken] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstraptoken] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstraptoken] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstraptoken] creating the "cluster-info" ConfigMap in the "kube-public" namespace
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxyYour Kubernetes master has initialized successfully!To start using your cluster, you need to run the following as a regular user:mkdir -p $HOME/.kubesudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/configsudo chown $(id -u):$(id -g) $HOME/.kube/configYou should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at: can now join any number of machines by running the following on each node
as root:kubeadm join 192.168.0.6:6443 --token abcdef.0123456789abcdef --discovery-token-ca-cert-hash sha256:5e44393289eb7e463479f93327b2593a45b32fa8afb8978d878e0f2c9bf8e29b

注意:如果你执行init命令没有成功,修改完配置信息,需要先执行sudo kubeadm reset命令再重新执行kubeadm init命令

Tips: kubeadm init最后一行输出的kubeadm join命令语句最好记录一下,后面部署子节点会用到,就是这句(这只是我的示例):

kubeadm join 192.168.0.6:6443 --token abcdef.0123456789abcdef --discovery-token-ca-cert-hash sha256:5e44393289eb7e463479f93327b2593a45b32fa8afb8978d878e0f2c9bf8e29b

如果想使用非root用户操作kubectl,执行以下命令(这也是kubeadm init输出的一部分):

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

3.4 安装网络插件

3.4.1 查看状态

在安装之前,先查看一下当前Pods的状态:

kubectl get pods --all-namespaces

输出:

NAMESPACE     NAME                              READY   STATUS    RESTARTS   AGE
kube-system   coredns-5c545769d8-6cl9s          0/1     Pending   0          110s
kube-system   coredns-5c545769d8-h8fjj          0/1     Pending   0          111s
kube-system   etcd-ubuntu1                      1/1     Running   0          75s
kube-system   kube-apiserver-ubuntu1            1/1     Running   0          87s
kube-system   kube-controller-manager-ubuntu1   1/1     Running   0          96s
kube-system   kube-proxy-snhqr                  1/1     Running   0          111s
kube-system   kube-scheduler-ubuntu1            1/1     Running   0          98s

如上,可以看到CoreDND的状态是Pending,因为我们还没有安装网络插件。

由于我的虚拟机网段是192.168.0.x,无法使用Calico网络,所以使用了Canal网络插件,它是CalicoFlannel的结合体,在上面kubeadm init的时候已经指定了--pod-network-cidr=10.244.0.0/16,这是Canal插件所要求的。

3.4.2 各个节点修改/etc/resolv.conf

上述pods中有两个CoreDNS,CoreDNS启动后会通过宿主机的/etc/resolv.conf文件去获取上游DNS的信息,如果这个时候获取的DNS的服务器是本地地址的话,就会出现环路,即便执行了安装网络插件的命令,这两个pods的状态会一直处于CrashLoopBackoff状态。这一问题的官方解决办法:Troubleshooting Loops In Kubernetes Clusters,一共有三种方法我们使用第三种方法:修改各个主机/etc/resolv.conf中的DNS。

vim /etc/resolv.conf

修改前:

nameserver 127.0.0.1

修改后:

nameserver 8.8.8.8
#nameserver 127.0.0.1

使修改生效:

sudo ldconfig
3.4.3 Master安装Canal网络插件
# 源地址:.3/getting-started/kubernetes/installation/hosted/canal/rbac.yaml
kubectl apply -f .3/rbac.yaml# 源地址:.3/getting-started/kubernetes/installation/hosted/canal/canal.yaml
kubectl apply -f .3/canal.yaml

关于更多Canal的信息,可以查看 Installing Calico for policy and flannel for networking。

耐心的等待,然后再使用kubectl get pods --all-namespaces命令来查看网络插件的安装情况:

NAMESPACE     NAME                              READY   STATUS    RESTARTS   AGE
kube-system   coredns-5c545769d8-6cl9s          1/1     Running   0          7h
kube-system   coredns-5c545769d8-h8fjj          1/1     Running   0          7h
kube-system   etcd-ubuntu1                      1/1     Running   0          7h
kube-system   kube-apiserver-ubuntu1            1/1     Running   0          7h
kube-system   kube-controller-manager-ubuntu1   1/1     Running   0          7h
kube-system   kube-proxy-snhqr                  1/1     Running   0          7h
kube-system   kube-scheduler-ubuntu1            1/1     Running   0          7h

当STATUS全部变为了Running,表示网络插件安装成功。如果按照上述方法coredns两个pods还是无法Runing,建议使用使用systemctl daemon-reload 命令重新读取配置,再使用service kubelet restart重启 kubelet,再重新执行安装命令。如果还是不行可以试试看重启Master主机,再执行安装命令。

3.5 Master隔离

默认情况下,由于安全原因,集群并不会将pods部署在Master节点上。但是在开发环境下,我们可能就只有一个Master节点,这时可以使用下面的命令来解除这个限制:

kubectl taint nodes --all node-role.kubernetes.io/master-

输出:

node/ubuntu1 untainted

3.6 子节点加入master节点

在子节点执行之前master节点kubeadm init输出的kubeadm join命令:kubeadm join --token <token> <master-ip>:<master-port> --discovery-token-ca-cert-hash sha256:<hash>(你需要执行自己init成功后的join命令):

kubeadm join 192.168.0.6:6443 --token abcdef.0123456789abcdef --discovery-token-ca-cert-hash sha256:5e44393289eb7e463479f93327b2593a45b32fa8afb8978d878e0f2c9bf8e29b

如果我们忘记了Master节点--token,可以使用如下命令来查看:

sudo kubeadm token list

输出:

TOKEN                     TTL       EXPIRES                     USAGES                   DESCRIPTION   EXTRA GROUPS
pe9eow.4wywpjhj9txkvef9   23h       2019-01-08T11:28:44+08:00   authentication,signing      <none>   system:bootstrappers:kubeadm:default-node-token

默认情况下,token的有效期是24小时,如果我们的token已经过期的话,可以使用以下命令重新生成:

kubeadm token create

输出:

pe9eow.4wywpjhj9txkvef9

如果我们也没有--discovery-token-ca-cert-hash的值,可以使用以下命令生成:
输入如下命令查看:

openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | openssl dgst -sha256 -hex | sed 's/^.* //'

输出:

5e44393289eb7e463479f93327b2593a45b32fa8afb8978d878e0f2c9bf8e29b

执行kubeadm join命令,输出如下:

[preflight] running pre-flight checks[WARNING RequiredIPVSKernelModulesAvailable]: the IPVS proxier will not be used, because the following required kernel modules are not loaded: [ip_vs ip_vs_rr ip_vs_wrr ip_vs_sh] or no builtin kernel ipvs support: map[ip_vs_wrr:{} ip_vs_sh:{} nf_conntrack_ipv4:{} ip_vs:{} ip_vs_rr:{}]
you can solve this problem with following methods:1. Run 'modprobe -- ' to load missing kernel modules;
2. Provide the missing builtin kernel ipvs support[discovery] Trying to connect to API Server "192.168.0.6:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://192.168.0.6:6443"
[discovery] Requesting info from "https://192.168.0.6:6443" again to validate TLS against the pinned public key
[discovery] Cluster info signature and contents are valid and TLS certificate validates against pinned roots, will use API Server "192.168.0.8:6443"
[discovery] Successfully established connection with API Server "192.168.0.6:6443"
[kubelet] Downloading configuration for the kubelet from the "kubelet-config-1.12" ConfigMap in the kube-system namespace
[kubelet] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[preflight] Activating the kubelet service
[tlsbootstrap] Waiting for the kubelet to perform the TLS Bootstrap...
[patchnode] Uploading the CRI Socket information "/var/run/dockershim.sock" to the Node API object "ubuntu2" as an annotationThis node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.Run 'kubectl get nodes' on the master to see this node join the cluster.

这时候我们就可以在Master节点上使用kubectl get nodes命令来查看节点的状态:

sudo kubectl get nodes

输出:

NAME         STATUS   ROLES    AGE   VERSION
seeta-0002   Ready    <none>   1d    v1.12.1
seeta-02     Ready    <none>   1d    v1.12.1
seeta-03     Ready    master   1d    v1.12.1

要想在子节点也支持kubectl命令的话,你需要将Master节点的/etc/kubernetes/admin.conf拷贝到所有子节点,执行如下列语句(注意192.168.0.7是我一个子节点的IP,你需要修改为自己的子节点IP):

# Master下执行:
sudo scp /etc/kubernetes/admin.conf 192.168.0.7:/etc/kubernetes/admin.conf# 子节点(192.168.0.7)下执行:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

这样你可以在各个节点使用kubectl命令啦

3.7 测试

3.7.1 验证kube-apiserver, kube-controller-manager, kube-scheduler, pod network

部署一个 Nginx Deployment,包含3个Pod(因为我这里一共有三台机器):

sudo kubectl create deployment nginx --image=nginx:alpine
sudo kubectl scale deployment nginx --replicas=3

验证Nginx Pod是否正确运行,并且会分配10.244.开头的集群IP:

sudo kubectl get pods -l app=nginx -o wide

输出:

NAME                     READY   STATUS    RESTARTS   AGE   IP            NODE         NOMINATED NODE
nginx-65d5c4f7cc-652hd   1/1     Running   1          11d   10.244.0.26   seeta-03     <none>
nginx-65d5c4f7cc-csjdl   1/1     Running   0          11d   10.244.2.2    seeta-0002   <none>
nginx-65d5c4f7cc-s4n78   1/1     Running   0          11d   10.244.1.2    seeta-02     <none>
3.7.2 验证一下kube-proxy是否正常:

以 NodePort 方式对外提供服务,(官方文档),执行如下语句:

sudo kubectl expose deployment nginx --port=80 --type=NodePort
#查看集群外可访问的Port:
sudo kubectl get services nginx

输出:

NAME    TYPE       CLUSTER-IP      EXTERNAL-IP   PORT(S)        AGE
nginx   NodePort   10.97.226.246   <none>        80:31504/TCP   11d

可以通过任意 NodeIP:Port 在集群外部访问这个服务,本示例中部署的3台集群IP分别是192.168.0.6192.168.0.5192.168.0.7

curl http://192.168.0.5:31504
curl http://192.168.0.6:31504
curl http://192.168.0.7:31504
3.7.3 验证dns, pod network是否正常:
# 运行Busybox并进入交互模式
sudo kubectl expose deployment nginx --port=80 --type=NodePort
# 输入`nslookup nginx`查看是否可以正确解析出集群内的IP,已验证DNS是否正常
nslookup nginx

输出如下表示dns正常:

Server:    10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.localName:      nginx
Address 1: 10.97.226.246 nginx.default.svc.cluster.local

通过服务名进行访问,验证kube-proxy是否正常,继续输入如下命令:

curl http://nginx/

输出:

<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>body {width: 35em;margin: 0 auto;font-family: Tahoma, Verdana, Arial, sans-serif;}
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p><p>For online documentation and support please refer to
<a href="/">nginx</a>.<br/>
Commercial support is available at
<a href="/">nginx</a>.</p><p><em>Thank you for using nginx.</em></p>
</body>
</html>

分别访问一下3个Pod的集群IP,验证跨Node的网络通信是否正常:

curl http://10.244.0.26/
curl http://10.244.2.2/
curl http://10.244.1.2/

测试结束,ctrl+d退出交互模式,删除curl这个pod

sudo kubectl delete deploy curl

输出:

deployment.extensions "curl" deleted

3.8 卸载清理k8s

输入如下命令(未实验过):

kubeadm reset -f
modprobe -r ipip
lsmod
rm -rf ~/.kube/
rm -rf /etc/kubernetes/
rm -rf /etc/systemd/system/kubelet.service.d
rm -rf /etc/systemd/system/kubelet.service
rm -rf /usr/bin/kube*
rm -rf /etc/cni
rm -rf /opt/cni
rm -rf /var/lib/etcd
rm -rf /var/etcd

4. 安装helm

4.1 Master安装Helm客户端

从官网下载2.5版本以上的helm文件(尽量使用较新的版本),以下都以v2.12.1版为例,从官网下载helm-v2.12.1-linux-amd64.tar.gz,此处需要翻墙,提供一个v2.12.1的百度云链接,在Master节点执行:

tar -zxvf helm-v2.12.1-linux-amd64.tar.gz
#解压后名字变成linux-amd64
mv linux-amd64/helm /usr/local/bin/helm
#查看Helm客户端是否成功安装,目前只能查看到客户端的版本,服务器还没有安装。
sudo helm version

输出:

Client: &version.Version{SemVer:"v2.12.1", GitCommit:"02a47c7249b1fc6d8fd3b94e6b4babf9d818144e", GitTreeState:"clean"}

helm 有很多子命令和参数,为了提高使用命令行的效率,通常建议安装 helm 的 bash 命令补全脚本,方法如下:

helm completion bash > .helmrc
echo "source .helmrc" >> .bashrc
source .bashrc

4.2 Master安装Helm Tiller服务端

拉取国内镜像:

sudo docker pull registry-hangzhou.aliyuncs/google_containers/tiller:v2.12.1
sudo docker tag registry-hangzhou.aliyuncs/google_containers/tiller:v2.12.1 gcr.io/kubernetes-helm/tiller:v2.12.1

权限配置:

sudo kubectl create serviceaccount --namespace kube-system tiller
sudo kubectl create clusterrolebinding tiller-cluster-rule --clusterrole=cluster-admin --serviceaccount=kube-system:tiller
sudo kubectl patch deploy --namespace kube-system tiller-deploy -p '{"spec":{"template":{"spec":{"serviceAccount":"tiller"}}}}'

Helm部署tiller:

helm init -i registry-hangzhou.aliyuncs/google_containers/tiller:v2.12.1

再次执行sudo helm version,可能需要等待一会,输出如下内容表示Helm部署完毕:

Client: &version.Version{SemVer:"v2.12.1", GitCommit:"02a47c7249b1fc6d8fd3b94e6b4babf9d818144e", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.12.1", GitCommit:"02a47c7249b1fc6d8fd3b94e6b4babf9d818144e", GitTreeState:"clean"}

5. Polyapxon

Polyapxon的官方安装文档

5.1 部署Polyapxon服务器

Master执行如下命令创建polyaxon命名空间:

sudo kubectl create namespace polyaxon

输出:

namespace "polyaxon" created

Polyapxon有配置文件你可以创建一个config.yml 或者 polyaxon_config.yml,根据github其官网和自己的需求填写,他们还提供了一个yml生成器。但是我使用默认的配置文件进行部署:

helm repo add polyaxon 
helm repo update
sudo helm install polyaxon/polyaxon --name=polyaxon --namespace=polyaxon

大概过3-5分钟,会输出如下信息:

NOTES:
Polyaxon is currently running:1. Get the application URL by running these commands:NOTE: It may take a few minutes for the LoadBalancer IP to be available.You can watch the status by running:'kubectl get --namespace polyaxon svc -w polyaxon-polyaxon-api'export POLYAXON_IP=$(kubectl get svc --namespace polyaxon polyaxon-polyaxon-api -o jsonpath='{.status.loadBalancer.ingress[0].ip}')export POLYAXON_HTTP_PORT=80export POLYAXON_WS_PORT=1337echo http://$POLYAXON_IP:$POLYAXON_HTTP_PORT2. Setup your cli by running theses commands:polyaxon config set --host=$POLYAXON_IP --http_port=$POLYAXON_HTTP_PORT  --ws_port=$POLYAXON_WS_PORT3. Log in with superuserUSER: rootPASSWORD: Get login password withkubectl get secret --namespace polyaxon polyaxon-polyaxon-secret -o jsonpath="{.data.POLYAXON_ADMIN_PASSWORD}" | base64 --decode

这里输出Notes中一共有三点,一是如何获取到 polyaxon 的运行地址,二是更新 polyaxon 的配置,三是客户端登陆信息。这里第一点获取运行地址需要更改一下运行命令,否则会出错,将 export POLYAXON_IP=$(kubectl get svc --namespace polyaxon polyaxon-polyaxon-api -o jsonpath='{.status.loadBalancer.ingress[0].ip}')改为export POLYAXON_IP=$(kubectl get svc --namespace polyaxon polyaxon-polyaxon-api -o jsonpath='{.spec.clusterIP}')。进入root权限(su -),我们就提示顺序执行:

export POLYAXON_IP=$(kubectl get svc --namespace polyaxon polyaxon-polyaxon-api -o jsonpath='{.spec.clusterIP}')
export POLYAXON_HTTP_PORT=80
export POLYAXON_WS_PORT=1337
echo http://$POLYAXON_IP:$POLYAXON_HTTP_PORT
#最后一步会输出类似:http://10.107.85.252:80

5.2 Polyaxon客户端

Master执行如下命令安装客户端:

#python2的话使用pip就好了
pip3 install -U polyaxon-cli
polyaxon config set --host=$POLYAXON_IP --http_port=$POLYAXON_HTTP_PORT  --ws_port=$POLYAXON_WS_PORT

进入root权限(su -)登录客户端,默认的用户名为root,密码为rootpassword

polyaxon login --username=root
Please enter your password:#Login successful

如果你不确定密码的话,输入部署时的Notes中的第三点,最后一句命令(注意密码会在用户名前面打印,不太显眼):

kubectl get secret --namespace polyaxon polyaxon-polyaxon-secret -o jsonpath="{.data.POLYAXON_ADMIN_PASSWORD}" | base64 --decode

查看polyaxon-api服务映射出来的端口号:

kubectl get service -n polyaxon

输出:

NAME                       TYPE           CLUSTER-IP       EXTERNAL-IP   PORT(S)                                 AGE
polyaxon-docker-registry   NodePort       10.98.185.92     <none>        5000:31813/TCP                          6m26s
polyaxon-polyaxon-api      LoadBalancer   10.107.85.252    <pending>     80:30945/TCP,1337:31122/TCP             6m26s
polyaxon-postgresql        ClusterIP      10.98.103.223    <none>        5432/TCP                                6m26s
polyaxon-rabbitmq          ClusterIP      10.103.114.255   <none>        4369/TCP,5672/TCP,25672/TCP,15672/TCP   6m26s
polyaxon-redis             ClusterIP      10.106.3.31      <none>        6379/TCP                                6m26s

注意:polyaxon-polyaxon-api的POST为80:30945,这个30945即为服务的端口号,你可以在浏览器访问http://192.168.0.6:30945/使用polyaxon

Tips: 如果你想安装 NFS 并且创建 persistent volumepersistent volume claim,进行持久存储,可以参考这篇文章和官网,虽然我按他的配置没能部署起来。

5.3 卸载Polyaxon

删除polyaxon:

helm delete polyaxon --purge

如果你没有安装好polyaxon,使用如下命令删除:

helm delete polyaxon --purge --no-hooks

删除命名空间:

kubectl delete namespace polyaxon

Ubuntu 16.04部署Kubernetes v1.12.1集群并部署Polyaxon服务器

Ubuntu使用Kubernetes集群部署Polyaxon

主要步骤

  1. 系统环境设置
    1.1 准备
    1.2 查看系统信息
    1.3 查看Mac地址、产品uuid、Hostname
    1.4 关闭防火墙
    1.5 禁用SELINUX
    1.6 关闭swap
    1.7 配置/etc/hosts文件
    1.8 示例说明
  2. 安装docker-ce
    2.1 添加 docker 源
    2.2 查看 docker 版本
    2.3 安装 docker 18.06.1-ce
    2.4 验证 docker 的安装
  3. Kubernetes (k8s)
    3.1 说明
    3.2 各节点安装kubelet、kubeadm、kubectl
    3.3 Master使用kubeadm创建一个单Master集群
    3.4 安装网络插件
    3.5 Master隔离
    3.6 子节点加入master节点
    3.7 测试
    3.8 卸载清理k8s
  4. 安装helm
    4.1 Master安装Helm客户端
    4.2 Master安装Helm Tiller服务端
  5. polyapxon
    5.1 部署Polyapxon服务器
    5.2 安装Polyaxon客户端
    5.3 卸载Polyaxon

部署表

/系统环境设置安装 Docker安装 KubeadmKubeadm init安装网络插件Kubeadm join安装 helm安装 polyaxon
Master×
Slave/子节点××××

系统环境设置(所有机器)

1.1 准备

  • 若干台 Ubuntu 16.04 或 CentOS 7 系统
  • 每台机器相互SSH免密登录
  • 每台机器2 GB或更多的内存
  • 集群中所有机器之间网络连接正常
  • 所有机器具有不同的Mac地址、产品uuid、Hostname

1.2 查看系统信息

执行如下语句查看系统信息:

lsb_release -a

1.3 查看Mac地址、产品uuid、Hostname

Kubernetes要求集群中所有机器具有不同的Mac地址、产品uuid、Hostname。可以使用如下命令查看:

# UUID
cat /sys/class/dmi/id/product_uuid# Mac地址
ip link# Hostname
cat /etc/hostname

1.4 关闭防火墙

关闭防火墙:

systemctl stop firewalld
systemctl disable firewalld

CendOS需要打开相应的端口,详见:Check required ports。

1.5 禁用SELINUX:

禁用SELINUX:

setenforce 0
vim /etc/selinux/config
#写入
SELINUX=disabled
#保存并退出

1.6 关闭swap

Kubernetes 1.8 开始要求必须禁用Swap,如果不关闭,默认配置下Kubelet将无法启动,而polyapxon只支持Kubernetes 1.8版本及以上。编辑系统/etc/fstab文件,注释掉引用swap的行:


保存并重启后执行:

sudo swapoff -a

1.7 配置/etc/hosts文件

执行sudo vim /etc/hosts写入:

192.168.0.6 master
192.168.0.5 slave1
192.168.0.7 slave2

保存并退出

1.8 示例说明

本例使用3台ubuntu 16.04主机运行在同一个内网中

/IPhostname
Master192.168.0.6seeta-03
Slave1192.168.0.5seeta-02
Slave2192.168.0.7seeta-0002

2. 安装docker-ce

Kubernetes 用于管理 docker ,docker的版本要兼容kubernetes,可以到官网兼容性列表查看,想要安装的是哪个版本Kubernetes,就看哪个版本的CHANGELOG。本文中安装的是1.12.1版本即查看CHANGELOG-1.12.md ,下图可以看到docker 最高兼容到 18.06,docker建议尽量安装最新版本。

2.1 添加 docker 源

sudo apt-get update
sudo apt-get -y install apt-transport-https ca-certificates curl software-properties-common
curl -fsSL  | sudo apt-key add -
sudo add-apt-repository "deb [arch=amd64]  $(lsb_release -cs) stable"
sudo apt-get -y update

2.2 查看 docker 版本

sudo apt-cache madison docker-ce

2.3 安装 docker 18.06.1-ce

sudo apt install docker-ce=18.06.1~ce~3-0~ubuntu
sudo systemctl enable docker

2.4 验证 docker 的安装

执行如下命令查看docker版本

sudo docker version

输出:

Client:Version:           18.06.1-ceAPI version:       1.38Go version:        go1.10.3Git commit:        e68fc7aBuilt:             Tue Aug 21 17:24:56 2018OS/Arch:           linux/amd64Experimental:      falseServer:Engine:Version:          18.06.1-ceAPI version:      1.38 (minimum version 1.12)Go version:       go1.10.3Git commit:       e68fc7aBuilt:            Tue Aug 21 17:23:21 2018OS/Arch:          linux/amd64Experimental:     false

3. Kubernetes

3.1 说明

Kubernetes 是一个开源的,用于管理云平台中多个主机上的容器化的应用,Kubernetes的目标是让部署容器化的应用简单并且高效, Kubernetes 提供了应用部署,规划,更新,维护的一种机制。详情访问:k8s官网

3.2 各节点安装kubelet、kubeadm、kubectl

3.2.1 工具说明
工具说明
kubeadm引导启动k8s集群的命令行工具
kubelet在群集中所有节点上运行的核心组件, 用来执行如启动pods和containers等操作
kubectl操作集群的命令行工具
3.2.2 添加apt-key
sudo apt-get update && apt-get install -y apt-transport-https
curl -s .gpg | sudo apt-key add -
3.2.3 各节点添加kubernetes源
sudo cat <<EOF >/etc/apt/sources.list.d/kubernetes.list   # 输入下面两行内容deb / kubernetes-xenial mainEOF
3.2.4 各节点查看源中的软件版本
sudo apt-cache madison kubelet

输出:

3.2.5 各节点安装 kubelet、kubeadm、kubectl
sudo apt-get update
sudo apt install kubelet=1.12.1-00 kubeadm=1.12.1-00 kubectl=1.12.1-00
sudo apt-mark hold kubelet kubeadm kubectl
3.2.6 各节点查看初始镜像要求
kubeadm config images list

输出:

k8s.gcr.io/kube-apiserver:v1.13.1
k8s.gcr.io/kube-controller-manager:v1.13.1
k8s.gcr.io/kube-scheduler:v1.13.1
k8s.gcr.io/kube-proxy:v1.13.1
k8s.gcr.io/pause:3.1
k8s.gcr.io/etcd:3.2.24
k8s.gcr.io/coredns:1.2.2
3.2.7 各节点拉取镜像

由于国内并不能访问gcr.io,可以使用打tag的方式,我们选择更为简单的方法,通过修改配置文件来镜像配置的实现。在kubeadm v1.11+版本中,增加了一个kubeadm config print-default命令,可以让我们方便的将kubeadm的默认配置打印到文件中,在各个节点的一个安全的路径下如$HOME/k8s/执行下列命令:

kubeadm config print-default > kubeadm.conf

修改kubeadm.conf中的镜像仓储地址:

sed -i "s/imageRepository: .*/imageRepository: registry.aliyuncs\/google_containers/g" kubeadm.conf

指定版本号,避免初始化时从.12.txt读取,使用如下命令来设置:

#注意可以修改版本号
sed -i "s/kubernetesVersion: .*/kubernetesVersion: v1.12.1/g" kubeadm.conf

使用--config参数指定kubeadm.conf文件来运行 kubeadm 的images pull的命令,在kubeadm.conf所在目录下执行:

kubeadm config images pull --config kubeadm.conf

耐心等待,输出:

W0102 18:43:32.305695   28207 common.go:105] WARNING: Detected resource kinds that may not apply: [InitConfiguration MasterConfiguration JoinConfiguration NodeConfiguration]
[config] WARNING: Ignored YAML document with GroupVersionKind kubeadm.k8s.io/v1alpha3, Kind=JoinConfiguration
[config/images] Pulled registry.aliyuncs/google_containers/kube-apiserver:v1.12.1
[config/images] Pulled registry.aliyuncs/google_containers/kube-controller-manager:v1.12.1
[config/images] Pulled registry.aliyuncs/google_containers/kube-scheduler:v1.12.1
[config/images] Pulled registry.aliyuncs/google_containers/kube-proxy:v1.12.1
[config/images] Pulled registry.aliyuncs/google_containers/pause:3.1
[config/images] Pulled registry.aliyuncs/google_containers/etcd:3.2.24
[config/images] Pulled registry.aliyuncs/google_containers/coredns:1.2.2

注意: 基础镜像pause的拉取地址需要单独设置,否则还是会从k8s.gcr.io来拉取,单独打一个tag:

sudo docker tag registry.aliyuncs/google_containers/pause:3.1 k8s.gcr.io/pause:3.1

3.3 Master使用kubeadm创建一个单Master集群

初始化Master节点

通常,我们在执行init命令时,可能还需要指定advertiseAddress--pod-network-cidr等参数,但是由于我们这里使用kubeadm.conf配置文件来初始化,就不在命令行中指定其他参数了,只需在kubeadm.conf来设置:

#你需要更改此处 “ 192.168.0.6 ” 为自己master节点的IP
sed -i "s/advertiseAddress: .*/advertiseAddress: 192.168.0.6/g" kubeadm.conf

--pod-network-cid设置为10.244.0.0/16,修改如下:

#无需修改
sed -i "s/podSubnet: .*/podSubnet: \"10.244.0.0\/16\"/g" kubeadm.conf

执行初始化命令:

sudo kubeadm init --config kubeadm.conf

输出:

W1109 17:01:47.071494   42929 common.go:105] WARNING: Detected resource kinds that may not apply: [InitConfiguration MasterConfiguration JoinConfiguration NodeConfiguration]
[config] WARNING: Ignored YAML document with GroupVersionKind kubeadm.k8s.io/v1alpha3, Kind=JoinConfiguration
[init] using Kubernetes version: v1.12.2
[preflight] running pre-flight checks
[preflight/images] Pulling images required for setting up a Kubernetes cluster
[preflight/images] This might take a minute or two, depending on the speed of your internet connection
[preflight/images] You can also perform this action in beforehand using 'kubeadm config images pull'
[kubelet] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[preflight] Activating the kubelet service
[certificates] Generated ca certificate and key.
[certificates] Generated apiserver certificate and key.
[certificates] apiserver serving cert is signed for DNS names [ubuntu1 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 192.168.0.8]
[certificates] Generated apiserver-kubelet-client certificate and key.
[certificates] Generated front-proxy-ca certificate and key.
[certificates] Generated front-proxy-client certificate and key.
[certificates] Generated etcd/ca certificate and key.
[certificates] Generated etcd/server certificate and key.
[certificates] etcd/server serving cert is signed for DNS names [ubuntu1 localhost] and IPs [127.0.0.1 ::1]
[certificates] Generated apiserver-etcd-client certificate and key.
[certificates] Generated etcd/peer certificate and key.
[certificates] etcd/peer serving cert is signed for DNS names [ubuntu1 localhost] and IPs [192.168.0.8 127.0.0.1 ::1]
[certificates] Generated etcd/healthcheck-client certificate and key.
[certificates] valid certificates and keys now exist in "/etc/kubernetes/pki"
[certificates] Generated sa key and public key.
[kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/admin.conf"
[kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/kubelet.conf"
[kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/controller-manager.conf"
[kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/scheduler.conf"
[controlplane] wrote Static Pod manifest for component kube-apiserver to "/etc/kubernetes/manifests/kube-apiserver.yaml"
[controlplane] wrote Static Pod manifest for component kube-controller-manager to "/etc/kubernetes/manifests/kube-controller-manager.yaml"
[controlplane] wrote Static Pod manifest for component kube-scheduler to "/etc/kubernetes/manifests/kube-scheduler.yaml"
[etcd] Wrote Static Pod manifest for a local etcd instance to "/etc/kubernetes/manifests/etcd.yaml"
[init] waiting for the kubelet to boot up the control plane as Static Pods from directory "/etc/kubernetes/manifests"
[init] this might take a minute or longer if the control plane images have to be pulled
[apiclient] All control plane components are healthy after 57.002438 seconds
[uploadconfig] storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config-1.12" in namespace kube-system with the configuration for the kubelets in the cluster
[markmaster] Marking the node ubuntu1 as master by adding the label "node-role.kubernetes.io/master=''"
[markmaster] Marking the node ubuntu1 as master by adding the taints [node-role.kubernetes.io/master:NoSchedule]
[patchnode] Uploading the CRI Socket information "/var/run/dockershim.sock" to the Node API object "ubuntu1" as an annotation
[bootstraptoken] using token: abcdef.0123456789abcdef
[bootstraptoken] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstraptoken] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstraptoken] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstraptoken] creating the "cluster-info" ConfigMap in the "kube-public" namespace
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxyYour Kubernetes master has initialized successfully!To start using your cluster, you need to run the following as a regular user:mkdir -p $HOME/.kubesudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/configsudo chown $(id -u):$(id -g) $HOME/.kube/configYou should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at: can now join any number of machines by running the following on each node
as root:kubeadm join 192.168.0.6:6443 --token abcdef.0123456789abcdef --discovery-token-ca-cert-hash sha256:5e44393289eb7e463479f93327b2593a45b32fa8afb8978d878e0f2c9bf8e29b

注意:如果你执行init命令没有成功,修改完配置信息,需要先执行sudo kubeadm reset命令再重新执行kubeadm init命令

Tips: kubeadm init最后一行输出的kubeadm join命令语句最好记录一下,后面部署子节点会用到,就是这句(这只是我的示例):

kubeadm join 192.168.0.6:6443 --token abcdef.0123456789abcdef --discovery-token-ca-cert-hash sha256:5e44393289eb7e463479f93327b2593a45b32fa8afb8978d878e0f2c9bf8e29b

如果想使用非root用户操作kubectl,执行以下命令(这也是kubeadm init输出的一部分):

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

3.4 安装网络插件

3.4.1 查看状态

在安装之前,先查看一下当前Pods的状态:

kubectl get pods --all-namespaces

输出:

NAMESPACE     NAME                              READY   STATUS    RESTARTS   AGE
kube-system   coredns-5c545769d8-6cl9s          0/1     Pending   0          110s
kube-system   coredns-5c545769d8-h8fjj          0/1     Pending   0          111s
kube-system   etcd-ubuntu1                      1/1     Running   0          75s
kube-system   kube-apiserver-ubuntu1            1/1     Running   0          87s
kube-system   kube-controller-manager-ubuntu1   1/1     Running   0          96s
kube-system   kube-proxy-snhqr                  1/1     Running   0          111s
kube-system   kube-scheduler-ubuntu1            1/1     Running   0          98s

如上,可以看到CoreDND的状态是Pending,因为我们还没有安装网络插件。

由于我的虚拟机网段是192.168.0.x,无法使用Calico网络,所以使用了Canal网络插件,它是CalicoFlannel的结合体,在上面kubeadm init的时候已经指定了--pod-network-cidr=10.244.0.0/16,这是Canal插件所要求的。

3.4.2 各个节点修改/etc/resolv.conf

上述pods中有两个CoreDNS,CoreDNS启动后会通过宿主机的/etc/resolv.conf文件去获取上游DNS的信息,如果这个时候获取的DNS的服务器是本地地址的话,就会出现环路,即便执行了安装网络插件的命令,这两个pods的状态会一直处于CrashLoopBackoff状态。这一问题的官方解决办法:Troubleshooting Loops In Kubernetes Clusters,一共有三种方法我们使用第三种方法:修改各个主机/etc/resolv.conf中的DNS。

vim /etc/resolv.conf

修改前:

nameserver 127.0.0.1

修改后:

nameserver 8.8.8.8
#nameserver 127.0.0.1

使修改生效:

sudo ldconfig
3.4.3 Master安装Canal网络插件
# 源地址:.3/getting-started/kubernetes/installation/hosted/canal/rbac.yaml
kubectl apply -f .3/rbac.yaml# 源地址:.3/getting-started/kubernetes/installation/hosted/canal/canal.yaml
kubectl apply -f .3/canal.yaml

关于更多Canal的信息,可以查看 Installing Calico for policy and flannel for networking。

耐心的等待,然后再使用kubectl get pods --all-namespaces命令来查看网络插件的安装情况:

NAMESPACE     NAME                              READY   STATUS    RESTARTS   AGE
kube-system   coredns-5c545769d8-6cl9s          1/1     Running   0          7h
kube-system   coredns-5c545769d8-h8fjj          1/1     Running   0          7h
kube-system   etcd-ubuntu1                      1/1     Running   0          7h
kube-system   kube-apiserver-ubuntu1            1/1     Running   0          7h
kube-system   kube-controller-manager-ubuntu1   1/1     Running   0          7h
kube-system   kube-proxy-snhqr                  1/1     Running   0          7h
kube-system   kube-scheduler-ubuntu1            1/1     Running   0          7h

当STATUS全部变为了Running,表示网络插件安装成功。如果按照上述方法coredns两个pods还是无法Runing,建议使用使用systemctl daemon-reload 命令重新读取配置,再使用service kubelet restart重启 kubelet,再重新执行安装命令。如果还是不行可以试试看重启Master主机,再执行安装命令。

3.5 Master隔离

默认情况下,由于安全原因,集群并不会将pods部署在Master节点上。但是在开发环境下,我们可能就只有一个Master节点,这时可以使用下面的命令来解除这个限制:

kubectl taint nodes --all node-role.kubernetes.io/master-

输出:

node/ubuntu1 untainted

3.6 子节点加入master节点

在子节点执行之前master节点kubeadm init输出的kubeadm join命令:kubeadm join --token <token> <master-ip>:<master-port> --discovery-token-ca-cert-hash sha256:<hash>(你需要执行自己init成功后的join命令):

kubeadm join 192.168.0.6:6443 --token abcdef.0123456789abcdef --discovery-token-ca-cert-hash sha256:5e44393289eb7e463479f93327b2593a45b32fa8afb8978d878e0f2c9bf8e29b

如果我们忘记了Master节点--token,可以使用如下命令来查看:

sudo kubeadm token list

输出:

TOKEN                     TTL       EXPIRES                     USAGES                   DESCRIPTION   EXTRA GROUPS
pe9eow.4wywpjhj9txkvef9   23h       2019-01-08T11:28:44+08:00   authentication,signing      <none>   system:bootstrappers:kubeadm:default-node-token

默认情况下,token的有效期是24小时,如果我们的token已经过期的话,可以使用以下命令重新生成:

kubeadm token create

输出:

pe9eow.4wywpjhj9txkvef9

如果我们也没有--discovery-token-ca-cert-hash的值,可以使用以下命令生成:
输入如下命令查看:

openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | openssl dgst -sha256 -hex | sed 's/^.* //'

输出:

5e44393289eb7e463479f93327b2593a45b32fa8afb8978d878e0f2c9bf8e29b

执行kubeadm join命令,输出如下:

[preflight] running pre-flight checks[WARNING RequiredIPVSKernelModulesAvailable]: the IPVS proxier will not be used, because the following required kernel modules are not loaded: [ip_vs ip_vs_rr ip_vs_wrr ip_vs_sh] or no builtin kernel ipvs support: map[ip_vs_wrr:{} ip_vs_sh:{} nf_conntrack_ipv4:{} ip_vs:{} ip_vs_rr:{}]
you can solve this problem with following methods:1. Run 'modprobe -- ' to load missing kernel modules;
2. Provide the missing builtin kernel ipvs support[discovery] Trying to connect to API Server "192.168.0.6:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://192.168.0.6:6443"
[discovery] Requesting info from "https://192.168.0.6:6443" again to validate TLS against the pinned public key
[discovery] Cluster info signature and contents are valid and TLS certificate validates against pinned roots, will use API Server "192.168.0.8:6443"
[discovery] Successfully established connection with API Server "192.168.0.6:6443"
[kubelet] Downloading configuration for the kubelet from the "kubelet-config-1.12" ConfigMap in the kube-system namespace
[kubelet] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[preflight] Activating the kubelet service
[tlsbootstrap] Waiting for the kubelet to perform the TLS Bootstrap...
[patchnode] Uploading the CRI Socket information "/var/run/dockershim.sock" to the Node API object "ubuntu2" as an annotationThis node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.Run 'kubectl get nodes' on the master to see this node join the cluster.

这时候我们就可以在Master节点上使用kubectl get nodes命令来查看节点的状态:

sudo kubectl get nodes

输出:

NAME         STATUS   ROLES    AGE   VERSION
seeta-0002   Ready    <none>   1d    v1.12.1
seeta-02     Ready    <none>   1d    v1.12.1
seeta-03     Ready    master   1d    v1.12.1

要想在子节点也支持kubectl命令的话,你需要将Master节点的/etc/kubernetes/admin.conf拷贝到所有子节点,执行如下列语句(注意192.168.0.7是我一个子节点的IP,你需要修改为自己的子节点IP):

# Master下执行:
sudo scp /etc/kubernetes/admin.conf 192.168.0.7:/etc/kubernetes/admin.conf# 子节点(192.168.0.7)下执行:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

这样你可以在各个节点使用kubectl命令啦

3.7 测试

3.7.1 验证kube-apiserver, kube-controller-manager, kube-scheduler, pod network

部署一个 Nginx Deployment,包含3个Pod(因为我这里一共有三台机器):

sudo kubectl create deployment nginx --image=nginx:alpine
sudo kubectl scale deployment nginx --replicas=3

验证Nginx Pod是否正确运行,并且会分配10.244.开头的集群IP:

sudo kubectl get pods -l app=nginx -o wide

输出:

NAME                     READY   STATUS    RESTARTS   AGE   IP            NODE         NOMINATED NODE
nginx-65d5c4f7cc-652hd   1/1     Running   1          11d   10.244.0.26   seeta-03     <none>
nginx-65d5c4f7cc-csjdl   1/1     Running   0          11d   10.244.2.2    seeta-0002   <none>
nginx-65d5c4f7cc-s4n78   1/1     Running   0          11d   10.244.1.2    seeta-02     <none>
3.7.2 验证一下kube-proxy是否正常:

以 NodePort 方式对外提供服务,(官方文档),执行如下语句:

sudo kubectl expose deployment nginx --port=80 --type=NodePort
#查看集群外可访问的Port:
sudo kubectl get services nginx

输出:

NAME    TYPE       CLUSTER-IP      EXTERNAL-IP   PORT(S)        AGE
nginx   NodePort   10.97.226.246   <none>        80:31504/TCP   11d

可以通过任意 NodeIP:Port 在集群外部访问这个服务,本示例中部署的3台集群IP分别是192.168.0.6192.168.0.5192.168.0.7

curl http://192.168.0.5:31504
curl http://192.168.0.6:31504
curl http://192.168.0.7:31504
3.7.3 验证dns, pod network是否正常:
# 运行Busybox并进入交互模式
sudo kubectl expose deployment nginx --port=80 --type=NodePort
# 输入`nslookup nginx`查看是否可以正确解析出集群内的IP,已验证DNS是否正常
nslookup nginx

输出如下表示dns正常:

Server:    10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.localName:      nginx
Address 1: 10.97.226.246 nginx.default.svc.cluster.local

通过服务名进行访问,验证kube-proxy是否正常,继续输入如下命令:

curl http://nginx/

输出:

<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>body {width: 35em;margin: 0 auto;font-family: Tahoma, Verdana, Arial, sans-serif;}
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p><p>For online documentation and support please refer to
<a href="/">nginx</a>.<br/>
Commercial support is available at
<a href="/">nginx</a>.</p><p><em>Thank you for using nginx.</em></p>
</body>
</html>

分别访问一下3个Pod的集群IP,验证跨Node的网络通信是否正常:

curl http://10.244.0.26/
curl http://10.244.2.2/
curl http://10.244.1.2/

测试结束,ctrl+d退出交互模式,删除curl这个pod

sudo kubectl delete deploy curl

输出:

deployment.extensions "curl" deleted

3.8 卸载清理k8s

输入如下命令(未实验过):

kubeadm reset -f
modprobe -r ipip
lsmod
rm -rf ~/.kube/
rm -rf /etc/kubernetes/
rm -rf /etc/systemd/system/kubelet.service.d
rm -rf /etc/systemd/system/kubelet.service
rm -rf /usr/bin/kube*
rm -rf /etc/cni
rm -rf /opt/cni
rm -rf /var/lib/etcd
rm -rf /var/etcd

4. 安装helm

4.1 Master安装Helm客户端

从官网下载2.5版本以上的helm文件(尽量使用较新的版本),以下都以v2.12.1版为例,从官网下载helm-v2.12.1-linux-amd64.tar.gz,此处需要翻墙,提供一个v2.12.1的百度云链接,在Master节点执行:

tar -zxvf helm-v2.12.1-linux-amd64.tar.gz
#解压后名字变成linux-amd64
mv linux-amd64/helm /usr/local/bin/helm
#查看Helm客户端是否成功安装,目前只能查看到客户端的版本,服务器还没有安装。
sudo helm version

输出:

Client: &version.Version{SemVer:"v2.12.1", GitCommit:"02a47c7249b1fc6d8fd3b94e6b4babf9d818144e", GitTreeState:"clean"}

helm 有很多子命令和参数,为了提高使用命令行的效率,通常建议安装 helm 的 bash 命令补全脚本,方法如下:

helm completion bash > .helmrc
echo "source .helmrc" >> .bashrc
source .bashrc

4.2 Master安装Helm Tiller服务端

拉取国内镜像:

sudo docker pull registry-hangzhou.aliyuncs/google_containers/tiller:v2.12.1
sudo docker tag registry-hangzhou.aliyuncs/google_containers/tiller:v2.12.1 gcr.io/kubernetes-helm/tiller:v2.12.1

权限配置:

sudo kubectl create serviceaccount --namespace kube-system tiller
sudo kubectl create clusterrolebinding tiller-cluster-rule --clusterrole=cluster-admin --serviceaccount=kube-system:tiller
sudo kubectl patch deploy --namespace kube-system tiller-deploy -p '{"spec":{"template":{"spec":{"serviceAccount":"tiller"}}}}'

Helm部署tiller:

helm init -i registry-hangzhou.aliyuncs/google_containers/tiller:v2.12.1

再次执行sudo helm version,可能需要等待一会,输出如下内容表示Helm部署完毕:

Client: &version.Version{SemVer:"v2.12.1", GitCommit:"02a47c7249b1fc6d8fd3b94e6b4babf9d818144e", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.12.1", GitCommit:"02a47c7249b1fc6d8fd3b94e6b4babf9d818144e", GitTreeState:"clean"}

5. Polyapxon

Polyapxon的官方安装文档

5.1 部署Polyapxon服务器

Master执行如下命令创建polyaxon命名空间:

sudo kubectl create namespace polyaxon

输出:

namespace "polyaxon" created

Polyapxon有配置文件你可以创建一个config.yml 或者 polyaxon_config.yml,根据github其官网和自己的需求填写,他们还提供了一个yml生成器。但是我使用默认的配置文件进行部署:

helm repo add polyaxon 
helm repo update
sudo helm install polyaxon/polyaxon --name=polyaxon --namespace=polyaxon

大概过3-5分钟,会输出如下信息:

NOTES:
Polyaxon is currently running:1. Get the application URL by running these commands:NOTE: It may take a few minutes for the LoadBalancer IP to be available.You can watch the status by running:'kubectl get --namespace polyaxon svc -w polyaxon-polyaxon-api'export POLYAXON_IP=$(kubectl get svc --namespace polyaxon polyaxon-polyaxon-api -o jsonpath='{.status.loadBalancer.ingress[0].ip}')export POLYAXON_HTTP_PORT=80export POLYAXON_WS_PORT=1337echo http://$POLYAXON_IP:$POLYAXON_HTTP_PORT2. Setup your cli by running theses commands:polyaxon config set --host=$POLYAXON_IP --http_port=$POLYAXON_HTTP_PORT  --ws_port=$POLYAXON_WS_PORT3. Log in with superuserUSER: rootPASSWORD: Get login password withkubectl get secret --namespace polyaxon polyaxon-polyaxon-secret -o jsonpath="{.data.POLYAXON_ADMIN_PASSWORD}" | base64 --decode

这里输出Notes中一共有三点,一是如何获取到 polyaxon 的运行地址,二是更新 polyaxon 的配置,三是客户端登陆信息。这里第一点获取运行地址需要更改一下运行命令,否则会出错,将 export POLYAXON_IP=$(kubectl get svc --namespace polyaxon polyaxon-polyaxon-api -o jsonpath='{.status.loadBalancer.ingress[0].ip}')改为export POLYAXON_IP=$(kubectl get svc --namespace polyaxon polyaxon-polyaxon-api -o jsonpath='{.spec.clusterIP}')。进入root权限(su -),我们就提示顺序执行:

export POLYAXON_IP=$(kubectl get svc --namespace polyaxon polyaxon-polyaxon-api -o jsonpath='{.spec.clusterIP}')
export POLYAXON_HTTP_PORT=80
export POLYAXON_WS_PORT=1337
echo http://$POLYAXON_IP:$POLYAXON_HTTP_PORT
#最后一步会输出类似:http://10.107.85.252:80

5.2 Polyaxon客户端

Master执行如下命令安装客户端:

#python2的话使用pip就好了
pip3 install -U polyaxon-cli
polyaxon config set --host=$POLYAXON_IP --http_port=$POLYAXON_HTTP_PORT  --ws_port=$POLYAXON_WS_PORT

进入root权限(su -)登录客户端,默认的用户名为root,密码为rootpassword

polyaxon login --username=root
Please enter your password:#Login successful

如果你不确定密码的话,输入部署时的Notes中的第三点,最后一句命令(注意密码会在用户名前面打印,不太显眼):

kubectl get secret --namespace polyaxon polyaxon-polyaxon-secret -o jsonpath="{.data.POLYAXON_ADMIN_PASSWORD}" | base64 --decode

查看polyaxon-api服务映射出来的端口号:

kubectl get service -n polyaxon

输出:

NAME                       TYPE           CLUSTER-IP       EXTERNAL-IP   PORT(S)                                 AGE
polyaxon-docker-registry   NodePort       10.98.185.92     <none>        5000:31813/TCP                          6m26s
polyaxon-polyaxon-api      LoadBalancer   10.107.85.252    <pending>     80:30945/TCP,1337:31122/TCP             6m26s
polyaxon-postgresql        ClusterIP      10.98.103.223    <none>        5432/TCP                                6m26s
polyaxon-rabbitmq          ClusterIP      10.103.114.255   <none>        4369/TCP,5672/TCP,25672/TCP,15672/TCP   6m26s
polyaxon-redis             ClusterIP      10.106.3.31      <none>        6379/TCP                                6m26s

注意:polyaxon-polyaxon-api的POST为80:30945,这个30945即为服务的端口号,你可以在浏览器访问http://192.168.0.6:30945/使用polyaxon

Tips: 如果你想安装 NFS 并且创建 persistent volumepersistent volume claim,进行持久存储,可以参考这篇文章和官网,虽然我按他的配置没能部署起来。

5.3 卸载Polyaxon

删除polyaxon:

helm delete polyaxon --purge

如果你没有安装好polyaxon,使用如下命令删除:

helm delete polyaxon --purge --no-hooks

删除命名空间:

kubectl delete namespace polyaxon