Rook-ceph

Rook概述

Rook（https://rook.io/）
是由CNCF社区管理的云原生存储编排系统，rook并不是一个实际的存储软件，它做的是将存储软件的部署和运维动作通过Kubernetes来实现自动化。比如rook-ceph项目实际上是在Kubernetes中定义了对应的operator和CRD资源对象来对ceph集群进行操作。

rook目前支持的存储

Ceph
EdgeFS
CockroachDB
Cassandra
NFS
Yugabyte DB

目前比较成熟的是rook-ceph。通过rook-ceph可以将ceph非常简单方便的部署到Kubernetes，通过Kubernetes的资源对象来对ceph进行控制。

rook架构

使用Rook部署ceph

环境概述

软件	版本
centos	7.7
Kubernetes	1.17.4
rook	v1.3

每个节点预留了块50G的磁盘做为osd节点

节点磁盘信息如下：

 lsblk -f
NAME   FSTYPE LABEL UUID                                 MOUNTPOINT
vda                                                      
└─vda1 ext4         995d4542-f0dd-47e6-90eb-690de3b64430 /
vdb

将节点vdb磁盘做为ceph-osd节点

1	git clone https://github.com/rook/rook.git -b release-1.3

若clone慢也可以使用

1	git clone https://gitee.com/wanshaoyuan/rook.git -b release-1.3

部署

1	cd rook/cluster/examples/kubernetes/ceph

1 2	kubectl create -f common.yaml kubectl create -f operator.yaml

cluster.yaml文件内包含对ceph初始化的配置

1	kubectl create -f cluster.yaml

参数：
默认情况会下rook-ceph会将集群内全部节点以及节点上全部磁盘做为osd节点，生产环境不建议这样使用，建议指定节点和指定节点设备。

1 2	useAllDevices: true //将host上全部空余设备做为ceph-osd磁盘 useAllNodes: true //将集群内全部节点做为ceph节点

使用指定节点的指定设备配置
将
useAllNodes和useAllDevices设置为false

nodes:
- name: "172.24.234.128"
  devices: # specific devices to use for storage can be specified for each node
  - name: "vdb"
- name: "172.24.234.147"
  devices:
  - name: "vdb"
- name: "172.24.234.156"
  devices:
  - name: "vdb"

注意：nodes：name处要与kubectl get node出来的显示一致，若为ip显示ip若为主机名显示主机名

ceph-dashboard访问

1	kubectl apply -f dashboard-external-https.yaml

获取访问端口

kubectl get svc/rook-ceph-mgr-dashboard-external-https -n rook-ceph
 
NAME                                     TYPE       CLUSTER-IP    EXTERNAL-IP   PORT(S)          AGE
rook-ceph-mgr-dashboard-external-https   NodePort   10.43.117.2   <none>        8443:30519/TCP   53s

获取访问密码

1	kubectl -n rook-ceph get secret rook-ceph-dashboard-password -o jsonpath='{.data.password}' \| base64 --decode

查看集群健康状态

1
2
3

kubectl get CephCluster -n rook-ceph
NAME        DATADIRHOSTPATH   MONCOUNT   AGE   PHASE   MESSAGE                        HEALTH
rook-ceph   /var/lib/rook     3          20m   Ready   Cluster updated successfully   HEALTH_OK

创建存储池和storageclass

apiVersion: ceph.rook.io/v1
kind: CephBlockPool
metadata:
  name: replicapool
  namespace: rook-ceph
spec:
  failureDomain: host
  replicated:
    size: 3
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
   name: rook-ceph-block
# Change "rook-ceph" provisioner prefix to match the operator namespace if needed
provisioner: rook-ceph.rbd.csi.ceph.com
parameters:
    # clusterID is the namespace where the rook cluster is running
    clusterID: rook-ceph
    # Ceph pool into which the RBD image shall be created
    pool: replicapool

    # RBD image format. Defaults to "2".
    imageFormat: "2"

    # RBD image features. Available for imageFormat: "2". CSI RBD currently supports only `layering` feature.
    imageFeatures: layering

    # The secrets contain Ceph admin credentials.
    csi.storage.k8s.io/provisioner-secret-name: rook-csi-rbd-provisioner
    csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph
    csi.storage.k8s.io/node-stage-secret-name: rook-csi-rbd-node
    csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph

    # Specify the filesystem type of the volume. If not specified, csi-provisioner
    # will set default as `ext4`.
    csi.storage.k8s.io/fstype: xfs

# Delete the rbd volume when a PVC is deleted
reclaimPolicy: Delete

部署应用测试

1	kubectl apply -f /root/rook/cluster/examples/kubernetes/mysql.yaml -f /root/rook/cluster/examples/kubernetes/wordpress.yaml

扩容存储节点

需要保证磁盘的gpt分区表类型，先对磁盘进行初始化


parted -s /dev/xxxx mklabel gpt

sgdisk --zap-all /dev/xxx

编辑集群资源

1	kubectl edit CephCluster/rook-ceph -n rook-ceph

添加对应节点磁盘字段

- config: null
  devices:
  - config: null
    name: vdb
  name: rke-node5
  resources: {}
- config: null
  devices:
  - config: null
    name: vdb
  name: rke-node6
  resources: {}
- config: null
  devices:
  - config: null
    name: vdb
  name: rke-node7
  resources: {}

常见问题：

1、RKE部署问题

1、通过Rancher RKE部署的Kubernetes集群
因为rancher rke部署的集群kubelet是运行在容器中的，所以需要将flexvolume插件映射到kubelet容器中，不然无法挂载pvc到workload中。

为kubelt添加以下参数：

extra_args:
        volume-plugin-dir: /usr/libexec/kubernetes/kubelet-plugins/volume/exec
      extra_binds:
        /usr/libexec/kubernetes/kubelet-plugins/volume/exec:/usr/libexec/kubernetes/kubelet-plugins/volume/exec

2、ubuntu16.04操作系统部署问题

ubuntu16.04默认4.4内核无法挂载rbd块到workload中，提示缺少特性，需要将内核升级到4.15。
升级步骤如下：

升级内核到4.15

1 2	uname -a Linux kworker2 4.4.0-142-generic #168-Ubuntu SMP Wed Jan 16 21:00:45 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

1	apt-get install --install-recommends linux-generic-hwe-16.04

reboot

1 2	uname -a Linux kworker2 4.15.0-60-generic #67~16.04.1-Ubuntu SMP Mon Aug 26 08:57:33 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

清理集群

1
2
3

kubectl delete -f cluster.yaml 
kubectl delete -f operator.yaml 
kubectl delete -f common.yaml

清理宿主机目录

#!/usr/bin/env bash
DISK="/dev/sdb"
# Zap the disk to a fresh, usable state (zap-all is important, b/c MBR has to be clean)
# You will have to run this step for all disks.
sgdisk --zap-all $DISK
dd if=/dev/zero of="$DISK" bs=1M count=100 oflag=direct,dsync

# These steps only have to be run once on each node
# If rook sets up osds using ceph-volume, teardown leaves some devices mapped that lock the disks.
ls /dev/mapper/ceph-* | xargs -I% -- dmsetup remove %
# ceph-volume setup can leave ceph-<UUID> directories in /dev (unnecessary clutter)
rm -rf /dev/ceph-*
rm /var/lib/rook/ -rf
rm /var/lib/kubelet/plugins/ -rf
rm /var/lib/kubelet/plugins_registry/ -rf

我爱西红柿

Solution Architect