29.K8S的文件系统进阶-Rook

对于一般的大集群来说,往往调度集群和文件集群是分开的,两者之间使用网络进行彼此的通信

那么我们需要进行搭建一个具有高可用的网络存储集群,这里首选的就是ceph

官网是https://ceph.io/

Ceph中存在对象存储,里面的存储方式键值对的存储方式,以及块设备,比如AWS的EBS/青云的云硬盘等

Ceph文件系统,比块存储有着更加丰富存储接口,需要考虑目录,文件属性等支持

我们来看下一个Ceph的存储集群中的构成

首先是Monitors,监控器,负责了管理之间的身份验证守护进程和客户端,通常需要三个监控器进行高可用配置

其次是Mangers,负责跟踪运行时指标和当前Cepth的集群状态,包括存储利用率等,一般需要两个进行高可用的配置

OSDS,负责进行实际的数据操作,需要三个进行高可用

MDS,负责存储元数据,也就是负责帮助系统找到对应的数据在哪个位置的组件

那么我们如何安装并使用这个存储集群呢?

其一为Cephadm,使用容器安装Ceph对应的管理组件

其二为Rook,部署在Kubernetes中,利用operator等对象集成Kubernetes的API

那么我们必然考虑使用Rook

对应的Rook的官方文档如下

https://rook.io/docs/rook/v1.8/quickstart.html

我们来看对应的安装指南

首先是一个必须的条件

需要一个干净的文件存储地址

– Raw devices (no partitions or formatted filesystems)；原始磁盘，无分区或者格式化

– Raw partitions (no formatted filesystem)；原始分区，无格式化文件系统

这样就说明我们需要一个额外的挂载磁盘,供Ceph集群初始化

我们可以在云厂商中购买磁盘并挂载到对应的宿主机上

挂载完成云厂商会提示挂载的位置,我们也可以使用fdisk -l来看自己挂载的磁盘,比如/dev/vdc

然后进行相关的磁盘清理

dd if=/dev/zero of=/dev/vdc bs=1M status=progress

然后就是Rook相关的部署及初始化了

首先需要done下来对应的yaml文件

$ git clone –single-branch –branch v1.8.6 https://github.com/rook/rook.git

cd rook/deploy/examples

下面分别有 crds.yaml,common.yaml,operator.yaml

然后修改operator.yaml

将镜像进行替换一下,修改yaml中的环境变量

# ROOK_CSI_CEPH_IMAGE: “quay.io/cephcsi/cephcsi:v3.5.1”

# ROOK_CSI_REGISTRAR_IMAGE: “k8s.gcr.io/sig-storage/csi-node-driver-registrar:v2.5.0”

# ROOK_CSI_RESIZER_IMAGE: “k8s.gcr.io/sig-storage/csi-resizer:v1.4.0”

# ROOK_CSI_PROVISIONER_IMAGE: “k8s.gcr.io/sig-storage/csi-provisioner:v3.1.0”

# ROOK_CSI_SNAPSHOTTER_IMAGE: “k8s.gcr.io/sig-storage/csi-snapshotter:v5.0.1”

# ROOK_CSI_ATTACHER_IMAGE: “k8s.gcr.io/sig-storage/csi-attacher:v3.4.0”

然后部署operator的yaml

kubectl create -f crds.yaml -f common.yaml -f operator.yaml

上面的operator其实是我们CRD自定义的处理器

CRD是K8S中的自定义资源类型,Rook就是自定义了些资源类型来进行使用的

利用这个operator,我们可以做到

Rook负责帮我们创建好StroageClass

然后我们创建PVC的时候,自定调用StorageClass中提供商,也就是Ceph集群操作

而Ceph中,根据PVC的定义,提供了Block Share FS等存储方式

比如Block,块存储,对应的就是我们之前NFS中的单一文件件,进行单节点的读写,适用于有状态的副本集

Share FS共享存储,对应的是ReadWriteMany 多节点读写,适用于无状态应用

这就是Rook和Ceph结合的能力

然后我们修改cluster.yaml,使用我们指定的磁盘当做存储节点

这是一个用于配置整个集群的yaml,我们的主要修改其中的组件配置,诸如

mon:

# Set the number of mons to be started. Generally recommended to be 3.

# For highest availability, an odd number of mons should be specified.

# The mons should be on unique nodes. For production, at least 3 nodes are recommended for this reason.

# Mons should only be allowed on the same node for test environments where data loss is acceptable.

allowMultiplePerNode: false

上面就说了,我们配置monitor就需要三个进行高可用,我们就按照官方文档给予其三个monitor然后配置存储相关的关键性配置

storage: # cluster level storage configuration and selection

useAllNodes: true

useAllDevices: true

#deviceFilter:

config:

# crushRoot: “custom-root” # specify a non-default root label for the CRUSH map

# metadataDevice: “md0” # specify a non-rotational storage so ceph-volume will use it as block db device of bluestore.

# databaseSizeMB: “1024” # uncomment if the disks are smaller than 100 GB

# journalSizeMB: “1024” # uncomment if the disks are 20 GB or smaller

# osdsPerDevice: “1” # this value can be overridden at the node or device level

# encryptedDevice: “true” # the default value for this option is “false”

# Individual nodes and their config can be specified as well, but ‘useAllNodes’ above must be set to false. Then, only the named

# nodes below will be used as storage resources. Each node’s ‘name’ field should match their ‘kubernetes.io/hostname’ label.

# nodes:

# – name: “172.17.4.201”

# devices: # specific devices to use for storage can be specified for each node

# – name: “sdb”

# – name: “nvme01” # multiple osds can be created on high performance devices

# config:

# osdsPerDevice: “5”

# – name: “/dev/disk/by-id/ata-ST4000DM004-XXXX” # devices can be specified using full udev paths

# config: # configuration can be specified at the node level which overrides the cluster level config

# – name: “172.17.4.301”

# deviceFilter: “^sd.”

# when onlyApplyOSDPlacement is false, will merge both placement.All() and placement.osd

onlyApplyOSDPlacement: false

# The section for configuring management of daemon disruptions during upgrade or fencing.

首先将odsPerDevice改为3个及以上个奇数

配置每个Node上的odsPerDevice的数量

然后是node配置信息

其中的name要和node上kubernetes.io/hostname标签上的一致

其次是devcie中的name,这个name要和挂载设备的路径一致

然后根据自我情况尝试替换镜像源

进行部署cluster.yaml即可

然后查看对应的组件,一定要包含如下的组件们

#部署完成的最终结果一定要有这些组件

NAME READY STATUS RESTARTS AGE

csi-cephfsplugin-provisioner-d77bb49c6-n5tgs 5/5 Running 0 140s

csi-cephfsplugin-provisioner-d77bb49c6-v9rvn 5/5 Running 0 140s

csi-cephfsplugin-rthrp 3/3 Running 0 140s

csi-rbdplugin-hbsm7 3/3 Running 0 140s

csi-rbdplugin-provisioner-5b5cd64fd-nvk6c 6/6 Running 0 140s

csi-rbdplugin-provisioner-5b5cd64fd-q7bxl 6/6 Running 0 140s

rook-ceph-crashcollector-minikube-5b57b7c5d4-hfldl 1/1 Running 0 105s

rook-ceph-mgr-a-64cd7cdf54-j8b5p 1/1 Running 0 77s

rook-ceph-mon-a-694bb7987d-fp9w7 1/1 Running 0 105s

rook-ceph-mon-b-856fdd5cb9-5h2qk 1/1 Running 0 94s

rook-ceph-mon-c-57545897fc-j576h 1/1 Running 0 85s

rook-ceph-operator-85f5b946bd-s8grz 1/1 Running 0 92m

rook-ceph-osd-0-6bb747b6c5-lnvb6 1/1 Running 0 23s

rook-ceph-osd-1-7f67f9646d-44p7v 1/1 Running 0 24s

rook-ceph-osd-2-6cd4b776ff-v4d68 1/1 Running 0 25s

rook-ceph-osd-prepare-node1-vx2rz 0/2 Completed 0 60s

rook-ceph-osd-prepare-node2-ab3fd 0/2 Completed 0 60s

rook-ceph-osd-prepare-node3-w4xyz 0/2 Completed 0 60s

如果发现缺少对应的组件pod的时候,就可以删除部分的pod,来让其重新执行

然后我们需要访问dashboard,从而进行Rook集群的管理

不过需要注意,在Ceph的各个组件中

Osd是所有的都是进行工作的,Monitor也是同理

但是MGR是同一时间只有一个可以使用

所以如果有对应的Service直接进行LoadBalancer,那么会出现偶尔无法访问的情况

所以我们需要重新配置一个Ingress和Service进行访问

而整个的集群的Ingress 访问我们可以如下的配置

我们有一个IngressController,进行所有的Ingress配置的管理

然后是创建一个含有defaultBackend rule的Ingress,其来管理所有的TLS证书

诸如下面的操作,首先是创建一个secret

kubectl create secret tls itdachang.com –key tls.key –cert tls.crt

然后配置一个Ingress规则

apiVersion: networking.k8s.io/v1

kind: Ingress

metadata:

namespace: default

spec:

tls:

– hosts:

– basehome.com

– rook.hoem.com

secretName: basehome.com

defaultBackend:

service:

port:

number: 80

然后我们如果有了新的host需要进行TLS访问,这样既可

接下来是由于我们mgr组件的特殊性,我们需要按照如下的操作来配置一个新的Ingress

首先是部署一个nodeport来验证是否有不能访问的mgr

然后是创建一个自己的service,来管理自己需要管理的mgr

apiVersion: v1

> kind: Service

> metadata:

> labels:

> app: rook-ceph-mgr

> ceph_daemon_id: a

> rook_cluster: rook-ceph

> name: rook-ceph-mgr-dashboard-active

> namespace: rook-ceph

> spec:

> ports:

> – name: dashboard

> port: 8443

> protocol: TCP

> targetPort: 8443

> selector: #service选择哪些Pod

> app: rook-ceph-mgr

> ceph_daemon_id: a

> rook_cluster: rook-ceph

> sessionAffinity: None

> type: ClusterIP

管理了对应的pod上标签

然后是创建了对应的Ingress

指向了对应的Servcie

apiVersion: networking.k8s.io/v1

> kind: Ingress

> metadata:

> name: ceph-rook-dash

> namespace: rook-ceph

> annotations:

> nginx.ingress.kubernetes.io/backend-protocol: “HTTPS”

> nginx.ingress.kubernetes.io/server-snippet: |

> proxy_ssl_verify off;

> spec:

> rules:

> – host: rook.hoem.com

> http:

> paths:

> – path: /

> pathType: Prefix

> backend:

> service:

> name: rook-ceph-mgr-dashboard-active

> port:

> number: 8443

到了Kubernetes中的实战环节

在rook上提供了多种模式的存储

比如Block块存储,还有Object,类似S3这类存储方式,还有Shared File System 创建一个文件系统供分享文件

我们首先看看对于块存储,是如何使用的

首先需要创建一个CRD,也就是Ceph中的自定义对象

apiVersion: ceph.rook.io/v1

kind: CephBlockPool

metadata:

namespace: rook-ceph

spec:

failureDomain: host #容灾模式，host或者osd

replicated:

size: 2 #数据副本数量

然后是对应的storageClass,作为一个提供商进行存储绑定

apiVersion: storage.k8s.io/v1

kind: StorageClass #存储驱动

metadata:

# Change “rook-ceph” provisioner prefix to match the operator namespace if needed

provisioner: rook-ceph.rbd.csi.ceph.com

parameters:

# clusterID is the namespace where the rook cluster is running

clusterID: rook-ceph

# Ceph pool into which the RBD image shall be created

pool: replicapool

# (optional) mapOptions is a comma-separated list of map options.

# For krbd options refer

# https://docs.ceph.com/docs/master/man/8/rbd/#kernel-rbd-krbd-options

# For nbd options refer

# https://docs.ceph.com/docs/master/man/8/rbd-nbd/#options

# mapOptions: lock_on_read,queue_depth=1024

# (optional) unmapOptions is a comma-separated list of unmap options.

# For krbd options refer

# https://docs.ceph.com/docs/master/man/8/rbd/#kernel-rbd-krbd-options

# For nbd options refer

# https://docs.ceph.com/docs/master/man/8/rbd-nbd/#options

# unmapOptions: force

# RBD image format. Defaults to “2”.

imageFormat: “2”

# RBD image features. Available for imageFormat: “2”. CSI RBD currently supports only `layering` feature.

imageFeatures: layering

# The secrets contain Ceph admin credentials.

csi.storage.k8s.io/provisioner-secret-name: rook-csi-rbd-provisioner

csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph

csi.storage.k8s.io/controller-expand-secret-name: rook-csi-rbd-provisioner

csi.storage.k8s.io/controller-expand-secret-namespace: rook-ceph

csi.storage.k8s.io/node-stage-secret-name: rook-csi-rbd-node

csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph

# Specify the filesystem type of the volume. If not specified, csi-provisioner

# will set default as `ext4`. Note that `xfs` is not recommended due to potential deadlock

# in hyperconverged settings where the volume is mounted on the same node as the osds.

csi.storage.k8s.io/fstype: ext4

# Delete the rbd volume when a PVC is deleted

reclaimPolicy: Delete

allowVolumeExpansion: true

然后创建一个有状态应用,让其提一个PVC请求,来请求StorageClass来访问这个存储集群

apiVersion: apps/v1

kind: StatefulSet

metadata:

namespace: default

spec:

selector:

matchLabels:

app: sts-nginx # has to match .spec.template.metadata.labels

serviceName: “sts-nginx”

replicas: 3 # by default is 1

template:

metadata:

labels:

app: sts-nginx # has to match .spec.selector.matchLabels

spec:

terminationGracePeriodSeconds: 10

containers:

– name: sts-nginx

image: nginx

ports:

– containerPort: 80

volumeMounts:

– name: www

mountPath: /usr/share/nginx/html

volumeClaimTemplates:

– metadata:

spec:

accessModes: [ “ReadWriteOnce” ]

storageClassName: “rook-ceph-block”

resources:

requests:

storage: 20Mi

—

apiVersion: v1

kind: Service

metadata:

namespace: default

spec:

selector:

app: sts-nginx

type: ClusterIP

ports:

– name: sts-nginx

port: 80

targetPort: 80

protocol: TCP

这样就可以看到StorageClass中提供了对应的PV来供这些StatefulSet绑定

对应的Rook还提供了SharedFileSystem这样的能力,提供了RWX

apiVersion: ceph.rook.io/v1

kind: CephFilesystem

metadata:

namespace: rook-ceph # namespace:cluster

spec:

# The metadata pool spec. Must use replication.

metadataPool:

replicated:

size: 3

requireSafeReplicaSize: true

parameters:

# Inline compression mode for the data pool

# Further reference: https://docs.ceph.com/docs/nautilus/rados/configuration/bluestore-config-ref/#inline-compression

compression_mode:

none

# gives a hint (%) to Ceph in terms of expected consumption of the total cluster capacity of a given pool

# for more info: https://docs.ceph.com/docs/master/rados/operations/placement-groups/#specifying-expected-pool-size

#target_size_ratio: “.5”

# The list of data pool specs. Can use replication or erasure coding.

dataPools:

– failureDomain: host

replicated:

size: 3

# Disallow setting pool with replica 1, this could lead to data loss without recovery.

# Make sure you’re *ABSOLUTELY CERTAIN* that is what you want

requireSafeReplicaSize: true

parameters:

# Inline compression mode for the data pool

# Further reference: https://docs.ceph.com/docs/nautilus/rados/configuration/bluestore-config-ref/#inline-compression

compression_mode:

none

# gives a hint (%) to Ceph in terms of expected consumption of the total cluster capacity of a given pool

# for more info: https://docs.ceph.com/docs/master/rados/operations/placement-groups/#specifying-expected-pool-size

#target_size_ratio: “.5”

# Whether to preserve filesystem after CephFilesystem CRD deletion

preserveFilesystemOnDelete: true

# The metadata service (mds) configuration

metadataServer:

# The number of active MDS instances

activeCount: 1

# Whether each active MDS instance will have an active standby with a warm metadata cache for faster failover.

# If false, standbys will be available, but will not have a warm cache.

activeStandby: true

# The affinity rules to apply to the mds deployment

placement:

# nodeAffinity:

# requiredDuringSchedulingIgnoredDuringExecution:

# nodeSelectorTerms:

# – matchExpressions:

# – key: role

# operator: In

# values:

# – mds-node

# topologySpreadConstraints:

# tolerations:

# – key: mds-node

# operator: Exists

# podAffinity:

podAntiAffinity:

requiredDuringSchedulingIgnoredDuringExecution:

– labelSelector:

matchExpressions:

– key: app

operator: In

values:

– rook-ceph-mds

# topologyKey: kubernetes.io/hostname will place MDS across different hosts

topologyKey: kubernetes.io/hostname

preferredDuringSchedulingIgnoredDuringExecution:

– weight: 100

podAffinityTerm:

labelSelector:

matchExpressions:

– key: app

operator: In

values:

– rook-ceph-mds

# topologyKey: */zone can be used to spread MDS across different AZ

# Use <topologyKey: failure-domain.beta.kubernetes.io/zone> in k8s cluster if your cluster is v1.16 or lower

# Use <topologyKey: topology.kubernetes.io/zone> in k8s cluster is v1.17 or upper

topologyKey: topology.kubernetes.io/zone

# A key/value list of annotations

annotations:

# key: value

# A key/value list of labels

labels:

# key: value

resources:

# The requests and limits set here, allow the filesystem MDS Pod(s) to use half of one CPU core and 1 gigabyte of memory

# limits:

# cpu: “500m”

# memory: “1024Mi”

# requests:

# cpu: “500m”

# memory: “1024Mi”

# priorityClassName: my-priority-class

mirroring:

enabled: false

以及一个StorageClass

apiVersion: storage.k8s.io/v1

kind: StorageClass

metadata:

# Change “rook-ceph” provisioner prefix to match the operator namespace if needed

provisioner: rook-ceph.cephfs.csi.ceph.com

parameters:

# clusterID is the namespace where operator is deployed.

clusterID: rook-ceph

# CephFS filesystem name into which the volume shall be created

fsName: myfs

# Ceph pool into which the volume shall be created

# Required for provisionVolume: “true”

pool: myfs-data0

# The secrets contain Ceph admin credentials. These are generated automatically by the operator

# in the same namespace as the cluster.

csi.storage.k8s.io/provisioner-secret-name: rook-csi-cephfs-provisioner

csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph

csi.storage.k8s.io/controller-expand-secret-name: rook-csi-cephfs-provisioner

csi.storage.k8s.io/controller-expand-secret-namespace: rook-ceph

csi.storage.k8s.io/node-stage-secret-name: rook-csi-cephfs-node

csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph

reclaimPolicy: Delete

allowVolumeExpansion: true

之后我们就可以创建多个无状态服务,来绑定这个StroageClass

apiVersion: apps/v1

kind: Deployment

metadata:

namespace: default

labels:

app: nginx-deploy

spec:

selector:

matchLabels:

app: nginx-deploy

replicas: 3

strategy:

rollingUpdate:

maxSurge: 25%

maxUnavailable: 25%

type: RollingUpdate

template:

metadata:

labels:

app: nginx-deploy

spec:

containers:

– name: nginx-deploy

image: nginx

volumeMounts:

– name: localtime

mountPath: /etc/localtime

– name: nginx-html-storage

mountPath: /usr/share/nginx/html

volumes:

– name: localtime

hostPath:

path: /usr/share/zoneinfo/Asia/Shanghai

– name: nginx-html-storage

persistentVolumeClaim:

claimName: nginx-pv-claim

—

apiVersion: v1

kind: PersistentVolumeClaim

metadata:

labels:

app: nginx-deploy

spec:

storageClassName: rook-cephfs

accessModes:

– ReadWriteMany ##如果是ReadWriteOnce将会是什么效果

resources:

requests:

storage: 10Mi

这样就能做到了一个共享文件夹的分享使用

最后我们说下Rook的卸载

首先需要delete掉之前的yaml

Delete -f 之前的yaml

再执行如下命令

kubectl -n rook-ceph get cephcluster

kubectl -n rook-ceph patch cephclusters.ceph.rook.io rook-ceph -p ‘{“metadata”:{“finalizers”: []}}’ –type=merge

然后清除每个节点的

/var/lib/rook目录

Heaven.Blog

29.K8S的文件系统进阶-Rook

发表评论取消回复

发表评论 取消回复

发表评论取消回复