背景
k8s官方部署安装集群的是使用kubeadm方式,但是该方式比较复杂繁琐,所以产生了一些新的部署安装集群方式,比如k3s和rke2等新方式
k3s有着非常庞大的社区支持,部署安装也非常简单,设计为轻量级的k8s,可以很好的运行在物联网设备或者边缘计算设备上面
据rke2官方文档描述说该部署是继承了k3s的可用性、易操作性和部署模式,继承了与上游 Kubernetes 的紧密一致性,在一些地方,K3s 与上游的 Kubernetes 有分歧(k3s魔改了一些k8s组件),以便为边缘部署进行优化,rke2同时也预设了安全配置,符合各项安全测试规范,但是部署方式上比k3s更复杂一些
整体来看选择k3s和rke2都是可以用于生产环境的选择,如果更注重安全性,可以选择rke2
硬件资源
一个高可用集群
- 一个注册地址
- 一个奇数的
server管理节点(推荐最小3个)来运行etcd,kubenetes API以及其他控制面服务 - (可选)零个或多个
agent节点,将运行你的应用程序和服务
采用Ubuntu服务器,系统版本22.04,二核4G
IP地址如下
192.168.100.136(第一个管理节点)192.168.100.141(管理节点)192.168.100.142(管理节点)192.168.100.137(agent节点,可选)192.168.100.138(agent节点,可选)
机器之间可以相互ping通
执行命令过程当中需要sudo权限或者切换为root用户
第一台管理节点配置
节点IP是192.168.100.136
获取rke2安装程序
$ curl -sfL https://rancher-mirror.rancher.cn/rke2/install.sh | INSTALL_RKE2_MIRROR=cn sh -$ curl -sfL https://rancher-mirror.rancher.cn/rke2/install.sh | INSTALL_RKE2_MIRROR=cn sh -创建自定义配置文件
$ mkdir -p /etc/rancher/rke2
$ vim /etc/rancher/rke2/config.yaml$ mkdir -p /etc/rancher/rke2
$ vim /etc/rancher/rke2/config.yaml写入内容如下
token参数表示自定义一个token标识node-name表示配置节点名,该名称是全局唯一的,用于dns路由tls-san表示TLS证书上添加额外的主机名或IPv4/IPv6地址作为备用名称,此处填写本机IP,该参数是为了避免固定注册地址的证书错误system-default-registry表示使用国内镜像
token: demo-server
node-name: demo-server-node
tls-san: 192.168.100.136
system-default-registry: "registry.cn-hangzhou.aliyuncs.com"token: demo-server
node-name: demo-server-node
tls-san: 192.168.100.136
system-default-registry: "registry.cn-hangzhou.aliyuncs.com"启动服务(rke2-server是采用systemd管理,确保节点重启后或进程崩溃或被杀时自动重启)
$ systemctl start rke2-server
$ systemctl enable rke2-server$ systemctl start rke2-server
$ systemctl enable rke2-server服务启动后会在/etc/rancher/rke2/rke2.yaml位置生成一个文件包含了集群的信息
查看安装的二进制执行文件,rke2默认是安装到/var/lib/rancher/rke2/bin/路径下面,但是该路径是不被$PATH所包含的
$ ll /var/lib/rancher/rke2/bin/
total 300396
drwxr-xr-x 2 root root 4096 Oct 8 08:41 ./
drwxr-xr-x 4 root root 4096 Oct 8 08:41 ../
-rwxr-xr-x 1 root root 57352072 Oct 8 08:41 containerd*
-rwxr-xr-x 1 root root 7381616 Oct 8 08:41 containerd-shim*
-rwxr-xr-x 1 root root 11606088 Oct 8 08:41 containerd-shim-runc-v1*
-rwxr-xr-x 1 root root 11626984 Oct 8 08:41 containerd-shim-runc-v2*
-rwxr-xr-x 1 root root 24838144 Oct 8 08:41 crictl*
-rwxr-xr-x 1 root root 20586248 Oct 8 08:41 ctr*
-rwxr-xr-x 1 root root 48570656 Oct 8 08:41 kubectl*
-rwxr-xr-x 1 root root 114644328 Oct 8 08:41 kubelet*
-rwxr-xr-x 1 root root 10973592 Oct 8 08:41 runc*$ ll /var/lib/rancher/rke2/bin/
total 300396
drwxr-xr-x 2 root root 4096 Oct 8 08:41 ./
drwxr-xr-x 4 root root 4096 Oct 8 08:41 ../
-rwxr-xr-x 1 root root 57352072 Oct 8 08:41 containerd*
-rwxr-xr-x 1 root root 7381616 Oct 8 08:41 containerd-shim*
-rwxr-xr-x 1 root root 11606088 Oct 8 08:41 containerd-shim-runc-v1*
-rwxr-xr-x 1 root root 11626984 Oct 8 08:41 containerd-shim-runc-v2*
-rwxr-xr-x 1 root root 24838144 Oct 8 08:41 crictl*
-rwxr-xr-x 1 root root 20586248 Oct 8 08:41 ctr*
-rwxr-xr-x 1 root root 48570656 Oct 8 08:41 kubectl*
-rwxr-xr-x 1 root root 114644328 Oct 8 08:41 kubelet*
-rwxr-xr-x 1 root root 10973592 Oct 8 08:41 runc*修改全局PATH
$ vim /etc/profile.d/rke2.sh
# 写入如下内容
export PATH=$PATH:/var/lib/rancher/rke2/bin$ vim /etc/profile.d/rke2.sh
# 写入如下内容
export PATH=$PATH:/var/lib/rancher/rke2/bin重新加载环境
$ source /etc/profile$ source /etc/profile现在可以执行kubectl命令了,但是发现报错如下
$ kubectl get nodes
The connection to the server localhost:8080 was refused - did you specify the right host or port?$ kubectl get nodes
The connection to the server localhost:8080 was refused - did you specify the right host or port?这个主要是rke2使用的配置文件路径问题
可以通过临时指定环境变量
$ KUBECONFIG=/etc/rancher/rke2/rke2.yaml kubectl get nodes$ KUBECONFIG=/etc/rancher/rke2/rke2.yaml kubectl get nodes也可以修改/etc/profile.d/rke2.sh新增加一行
export KUBECONFIG=/etc/rancher/rke2/rke2.yamlexport KUBECONFIG=/etc/rancher/rke2/rke2.yaml如果执行报错
error: error loading config file "/etc/rancher/rke2/rke2.yaml": open /etc/rancher/rke2/rke2.yaml: permission deniederror: error loading config file "/etc/rancher/rke2/rke2.yaml": open /etc/rancher/rke2/rke2.yaml: permission denied需要执行
$ sudo chown $USER:$USER /etc/rancher/rke2/rke2.yaml$ sudo chown $USER:$USER /etc/rancher/rke2/rke2.yaml该节点的rke2 server进程会开放9345端口监听新节点的注册(正常情况下,Kubernetes API 的服务端口是6443,这个开放端口是不同的),其他节点加入该集群的时候需要在/etc/rancher/rke2/config.yaml配置文件当中加入一行数据
server: https://192.168.100.136:9345server: https://192.168.100.136:9345验证查看有一个单节点k8s
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
demo-server-node Ready control-plane,etcd,master 6m27s v1.24.6+rke2r1$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
demo-server-node Ready control-plane,etcd,master 6m27s v1.24.6+rke2r1其他server管理节点配置
配置过程和第一个server管理节点一样,不同的地方在于配置/etc/rancher/rke2/config.yaml文件的时候需要修改一些参数
写入内容如下
server参数表示第一个管理节点的IP,注意,这个地方使用httpstoken参数保持和第一个管理节点192.168.100.136的/etc/rancher/rke2/config.yaml的token的参数位置一致node-name配置节点名称,两个server节点需要配置为不一样的名称
server: https://192.168.100.136:9345
token: demo-server
node-name: demo-server-node-2
tls-san: 192.168.100.136
system-default-registry: "registry.cn-hangzhou.aliyuncs.com"server: https://192.168.100.136:9345
token: demo-server
node-name: demo-server-node-2
tls-san: 192.168.100.136
system-default-registry: "registry.cn-hangzhou.aliyuncs.com"配置完成之后在三台server节点上执行如下命令可以查看节点信息,如果输出如下信息表示高可用集群配置完成
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
demo-server-node Ready control-plane,etcd,master 22m v1.24.6+rke2r1
demo-server-node-2 Ready control-plane,etcd,master 4m5s v1.24.6+rke2r1
demo-server-node-3 Ready control-plane,etcd,master 22m v1.24.6+rke2r1$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
demo-server-node Ready control-plane,etcd,master 22m v1.24.6+rke2r1
demo-server-node-2 Ready control-plane,etcd,master 4m5s v1.24.6+rke2r1
demo-server-node-3 Ready control-plane,etcd,master 22m v1.24.6+rke2r1Agent节点配置
获取安装程序
$ curl -sfL https://rancher-mirror.oss-cn-beijing.aliyuncs.com/rke2/install.sh | INSTALL_RKE2_MIRROR=cn INSTALL_RKE2_TYPE="agent" sh -$ curl -sfL https://rancher-mirror.oss-cn-beijing.aliyuncs.com/rke2/install.sh | INSTALL_RKE2_MIRROR=cn INSTALL_RKE2_TYPE="agent" sh -创建配置文件
$ mkdir -p /etc/rancher/rke2/
$ vim /etc/rancher/rke2/config.yaml$ mkdir -p /etc/rancher/rke2/
$ vim /etc/rancher/rke2/config.yaml写入内容如下
server参数表第一个示管理节点的IP,注意,这个地方使用httpsnode-name配置节点名称,两个agent节点需要配置为不一样的名称token参数保持和第一个管理节点192.168.100.136的/etc/rancher/rke2/config.yaml的token的参数位置一致
server: https://192.168.100.136:9345
node-name: demo-agent1
token: demo-server
system-default-registry: "registry.cn-hangzhou.aliyuncs.com"server: https://192.168.100.136:9345
node-name: demo-agent1
token: demo-server
system-default-registry: "registry.cn-hangzhou.aliyuncs.com"启动agent并配置开机启动
$ systemctl start rke2-agent
$ systemctl enable rke2-agent$ systemctl start rke2-agent
$ systemctl enable rke2-agent两台·agent的操作是基本一样的,唯一区别就是/etc/rancher/rke2/config.yaml文件中的node-name参数需要不同
验证集群搭建
在管理节点上执行
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
demo-agent1 Ready <none> 109s v1.24.6+rke2r1
demo-agent2 Ready <none> 66s v1.24.6+rke2r1
demo-server-node Ready control-plane,etcd,master 36m v1.24.6+rke2r1
demo-server-node-2 Ready control-plane,etcd,master 17m v1.24.6+rke2r1
demo-server-node-3 Ready control-plane,etcd,master 35m v1.24.6+rke2r1$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
demo-agent1 Ready <none> 109s v1.24.6+rke2r1
demo-agent2 Ready <none> 66s v1.24.6+rke2r1
demo-server-node Ready control-plane,etcd,master 36m v1.24.6+rke2r1
demo-server-node-2 Ready control-plane,etcd,master 17m v1.24.6+rke2r1
demo-server-node-3 Ready control-plane,etcd,master 35m v1.24.6+rke2r1非root用户使用
在非特权用户下面执行kubectl命令时候报错权限不足
$ kubectl get nodes
error: error loading config file "/etc/rancher/rke2/rke2.yaml": open /etc/rancher/rke2/rke2.yaml: permission denied$ kubectl get nodes
error: error loading config file "/etc/rancher/rke2/rke2.yaml": open /etc/rancher/rke2/rke2.yaml: permission denied修改相关文件权限
$ sudo chown $USER:$USER /etc/rancher/rke2/rke2.yaml$ sudo chown $USER:$USER /etc/rancher/rke2/rke2.yaml配置kubectl补全
安装
$ sudo apt-get install bash-completion$ sudo apt-get install bash-completion配置~/.bashrc新增如下两行,这个地方是命令补全的关键
source /usr/share/bash-completion/bash_completion
source <(kubectl completion bash)source /usr/share/bash-completion/bash_completion
source <(kubectl completion bash)当前shell生效
$ source ~/.bashrc$ source ~/.bashrc配置镜像仓库
配置国内镜像源加快镜像拉取
配置私有镜像仓库
编辑文件/etc/rancher/rke2/registries.yaml写入如下信息
mirrors:
docker.io:
endpoint:
- "https://docker.mirrors.ustc.edu.cn"
configs:
"harbor.example.com":
auth:
username: user
password: pass
tls:
insecure_skip_verify: true
# cert_file: # path to the cert file used to authenticate to the registry
# key_file: # path to the key file for the certificate used to authenticate to the registry
# ca_file: # path to the ca file used to verify the registry's certificatemirrors:
docker.io:
endpoint:
- "https://docker.mirrors.ustc.edu.cn"
configs:
"harbor.example.com":
auth:
username: user
password: pass
tls:
insecure_skip_verify: true
# cert_file: # path to the cert file used to authenticate to the registry
# key_file: # path to the key file for the certificate used to authenticate to the registry
# ca_file: # path to the ca file used to verify the registry's certificatemirrors字段说明
- 表示当拉取镜像的时候,会把
docker.io重定向到国内的镜像网站https://docker.mirrors.ustc.edu.cn
configs字段说明
该段内容表示配置私有镜像仓库,比如自己搭建的
harbor仓库,如果没有私人仓库,则configs段配置可以省略harbor.example.com填写镜像仓库的地址auth块下面的username和password填写仓库的登录账号密码如果镜像仓库访问使用
https(使用了tls),则需要填写tls块的信息如果不验证
CA证书,则tls下面填写insecure_skip_verify: true即可,如果要验证证书,则需要填写cert_file,key_file,ca_file三个参数
每个节点都需要配置该文件确保获取镜像
验证集群部署工作负载
在任意一个server节点创建一个文件deployment-nginx.xml写入如下
把spec.template.spec.containers[].image字段替换为私有镜像地址可以验证私有镜像仓库配置是否正确
apiVersion: apps/v1
kind: Deployment
metadata:
name: demo-deployment-nginx
labels:
app: nginx
spec:
replicas: 10
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: demo-nginx
image: nginx
ports:
- containerPort: 80apiVersion: apps/v1
kind: Deployment
metadata:
name: demo-deployment-nginx
labels:
app: nginx
spec:
replicas: 10
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: demo-nginx
image: nginx
ports:
- containerPort: 80执行命令
$ kubectl apply -f deployment-nginx.yaml$ kubectl apply -f deployment-nginx.yaml查看Deployment状态
$ kubectl get deployments.apps -o wide
NAME READY UP-TO-DATE AVAILABLE AGE CONTAINERS IMAGES SELECTOR
demo-deployment-nginx 10/10 10 10 9m16s demo-nginx nginx app=nginx$ kubectl get deployments.apps -o wide
NAME READY UP-TO-DATE AVAILABLE AGE CONTAINERS IMAGES SELECTOR
demo-deployment-nginx 10/10 10 10 9m16s demo-nginx nginx app=nginx查看pods状态
$ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
demo-deployment-nginx-c598b987f-4qdqh 1/1 Running 0 101s 10.42.1.32 demo-server-node-3 <none> <none>
demo-deployment-nginx-c598b987f-89tvw 1/1 Running 0 101s 10.42.0.29 demo-server-node <none> <none>
demo-deployment-nginx-c598b987f-bgqlw 1/1 Running 0 101s 10.42.1.31 demo-server-node-3 <none> <none>
demo-deployment-nginx-c598b987f-hp692 1/1 Running 0 101s 10.42.0.26 demo-server-node <none> <none>
demo-deployment-nginx-c598b987f-kg8fc 1/1 Running 0 101s 10.42.0.30 demo-server-node <none> <none>
demo-deployment-nginx-c598b987f-kvk4v 1/1 Running 0 101s 10.42.0.27 demo-server-node <none> <none>
demo-deployment-nginx-c598b987f-lg9wc 1/1 Running 0 101s 10.42.2.31 demo-server-node-2 <none> <none>
demo-deployment-nginx-c598b987f-nnjkv 1/1 Running 0 101s 10.42.0.28 demo-server-node <none> <none>
demo-deployment-nginx-c598b987f-wbnlz 1/1 Running 0 101s 10.42.1.35 demo-server-node-3 <none> <none>
demo-deployment-nginx-c598b987f-wmt94 1/1 Running 0 101s 10.42.0.31 demo-server-node <none> <none>$ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
demo-deployment-nginx-c598b987f-4qdqh 1/1 Running 0 101s 10.42.1.32 demo-server-node-3 <none> <none>
demo-deployment-nginx-c598b987f-89tvw 1/1 Running 0 101s 10.42.0.29 demo-server-node <none> <none>
demo-deployment-nginx-c598b987f-bgqlw 1/1 Running 0 101s 10.42.1.31 demo-server-node-3 <none> <none>
demo-deployment-nginx-c598b987f-hp692 1/1 Running 0 101s 10.42.0.26 demo-server-node <none> <none>
demo-deployment-nginx-c598b987f-kg8fc 1/1 Running 0 101s 10.42.0.30 demo-server-node <none> <none>
demo-deployment-nginx-c598b987f-kvk4v 1/1 Running 0 101s 10.42.0.27 demo-server-node <none> <none>
demo-deployment-nginx-c598b987f-lg9wc 1/1 Running 0 101s 10.42.2.31 demo-server-node-2 <none> <none>
demo-deployment-nginx-c598b987f-nnjkv 1/1 Running 0 101s 10.42.0.28 demo-server-node <none> <none>
demo-deployment-nginx-c598b987f-wbnlz 1/1 Running 0 101s 10.42.1.35 demo-server-node-3 <none> <none>
demo-deployment-nginx-c598b987f-wmt94 1/1 Running 0 101s 10.42.0.31 demo-server-node <none> <none>卸载
如果需要重新加入另外一个集群或者更改第一个管理节点,最简单的还是先卸载再重装rke2,server和agent节点的卸载是一样的
在/usr/local/bin/rke2-uninstall.sh或者/usr/bin/rke2-uninstall.sh有可执行的脚本
确定可执行sh位置后,运行如下命令即可
$ /usr/local/bin/rke2-uninstall.sh
# 或者
$ rke2-uninstall.sh$ /usr/local/bin/rke2-uninstall.sh
# 或者
$ rke2-uninstall.sh