The quarterly update for Dell CSI Driver is there ! But today marks a significant milestone as we are also announcing the availability of Dell EMC Container Storage Modules (CSM).
- Container Storage Modules
- New CSI features
- Useful links
Container Storage Modules
Dell Container Storage Modules are a set of modules that aim to extend Kubernetes storage features beyond what is available in the CSI specification.
The CSM modules will expose storage enterprise features directly within Kubernetes, so developers are empowered to leverage them for their deployment in a seamless way.
Most of these modules are released as sidecars containers that work with the CSI driver for the Dell storage array technology you use.
CSM modules are open-source and freely available from : https://github.com/dell/csm.
Volume Group Snapshot
Many stateful apps can run on top of multiple volumes. For example, we can have a transactional DB like Postgres with a volume for its data and another for the redo log, or Cassandra that is distributed across nodes, each having a volume, etc.
When you want to take a recoverable snapshot, it is vital to take them consistently at the exact same time.
Dell CSI Volume Group Snapshotter solves that problem for you. With the help of a CustomResourceDefinition, an additional sidecar to the Dell CSI drivers, and leveraging vanilla Kubernetes snapshots, you can manage the lifecycle of crash-consistent snapshots. It means you can create a group of volumes for which the drivers create snapshots, restore them or move them one shot simultaneously!
To take a crash-consistent snapshot, you can either use labels on your
PersistantVolumeClaim or, be explicit and, pass the list of
PVC you want to snap ; for example:
apiVersion: v1 apiVersion: volumegroup.storage.dell.com/v1alpha2 kind: DellCsiVolumeGroupSnapshot metadata: # Name must be 13 characters or less in length name: "vg-snaprun1" spec: driverName: "csi-vxflexos.dellemc.com" memberReclaimPolicy: "Retain" volumesnapshotclass: "poweflex-snapclass" pvcLabel: "vgs-snap-label" # pvcList: # - "pvcName1" # - "pvcName2"
For the first release, CSI for PowerFlex supports Volume Group Snapshot.
The CSM Observability module is delivered as an open-telemetry agent that collects array-level metrics to scrape them for storage in a Prometheus DB.
The integration is as easy as creating a Prometheus ServiceMonitor for Prometheus ; for example:
apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: name: otel-collector namespace: powerstore spec: endpoints: - path: /metrics port: exporter-https scheme: https tlsConfig: insecureSkipVerify: true selector: matchLabels: app.kubernetes.io/instance: karavi-observability app.kubernetes.io/name: otel-collector
With the observability module, you will gain visibility on the capacity of the volume you manage with Dell CSI drivers and their performance in terms of bandwidth, IOPS, and response time.
Thanks to pre-canned Grafana dashboards, you will be able to go through these metrics’ history and see the topology between a Kubernetes PV till its translation as a LUN or fileshare in the backend array.
The Kubernetes admin can also collect array level metrics to check the overall capacity performance directly from Prometheus/Grafana tools he is used to.
For the first release, PowerFlex and PowerStore support CSM Observability.
Every Dell storage arrays support replication capabilities. It can be asynchronous with an associated recovery point objective, synchronous replication between sites, or even active-active.
Each replication type serves a different purpose related to the use-case or the constraint you have on your data centers.
The Dell CSM replication module allows creating a persistent volume that can be of any of three replication types synchronous, asynchronous, and metro, assuming the underlying storage box supports it.
The Kubernetes architecture can build on a stretched cluster between two sites or two or more independent clusters. The module itself is composed of three main components:
- The Replication controller who’s role is to manage the CustomResourceDefinition that abstract the concept of Replication across the Kubernetes cluster
- The Replication sidecar for the CSI driver that will convert the Replication controller request as an actual call on the array side
repctlutility to ease the managing to replication object across multiple Kubernetes clusters
The usual workflow is you create a
PVC that is replicated with classic Kubernetes directive by just picking the right
StorageClass. And then, you can use
repctl or edit the
DellCSIReplicationGroup CRD to launch operations like Failover, Failback, Reprotect, Suspend, Synchronize, etc.
For the first release PowerMax and PowerStore support CSM Replication.
With CSM Authorization we are giving back more control on the storage consumption to the storage administrator.
The authorization module is an independent service, installed and owned by the storage admin.
Within that module, the Storage administrator will create access control policies and storage quotas to make sure that Kubernetes consumers are not overconsuming storage or trying to access data that doesn’t belong to them.
CSM Authorization makes multi-tenant architecture real by enforcing Role-Based Access Control on storage objects coming from multiple and independent Kubernetes clusters.
The authorization module acts as a proxy between the CSI driver and the backend array. The access is granted with an access token that can be revoked at any point in time. Quotas can be changed on the fly to limit or increase storage consumption from the different tenants.
For the first release PowerMax and PowerFlex support CSM Authorization.
When dealing with StatefulApp, if a node goes down, vanilla Kubernetes is pretty conservative.
Indeed, from the Kubernetes control plane, the failing node is seen as not ready. It can be because the node is down or network partitioning between the control plane and the node or simply because the kubelet is down. In the last two scenarios, the StatefulApp is still running and possibly writing data on disk. Therefore, Kubernetes won’t take action and lets the admin manually trigger a Pod deletion if desired.
The CSM Resiliency module (sometimes named PodMon) aims to improve that behavior with the help of collected metrics from the array.
Since the driver has access to the storage backend from pretty much all other nodes, we can see the volume status (mapped or not) and its activity (are there IOPS or not). So when a node goes in NotReady state, and we see no IOPS on the volume, Resiliency will relocate the Pod to a new node and clean whatever leftover objects might be.
The entire process happens in seconds between the moment a node is seen down and the rescheduling of the Pod.
To protect an app with the resiliency module, you only have to add the label
podmon.dellemc.com/driver to it, and it is then protected.
For more details on the module’s design, you can check the documentation here.
For the first release PowerFlex and Unity support CSM Resiliency.
All the modules above are release either as an independent helm chart or options within the CSI Drivers.
For more complex deployments, which may involve multiple Kubernetes clusters or a mix of modules, it is possible to use the
For the first release, all drivers and modules support the CSM Installer.
New CSI features
This release gives for every driver the:
- Support of OpenShift 4.8
- Support of Kubernetes 1.22
- Support of Rancher Kubernetes Engine 2
- Normalized configurations between drivers
- Dynamic Logging Configuration
- New CSM installer
VMware Tanzu Kubernetes Grid
VMware Tanzu offers storage management via its CNS-CSI driver, but it
ReadWriteMany access mode.
If your workload needs concurrent access to the filesystem you can now rely on CSI Driver for PowerStore, PowerScale and Unity via NFS protocol. The three platforms are officially supported and qualified on Tanzu.
NFS behind NAT
By default, the CSI driver creates volume with 777 POSIX permission on the directory.
isiVolumePathPermissions can be configured as part of the
ConfigMap with the PowerScale settings or at the
StorageClass level. The accepted values are : parameter:
public for the ACL or any combinaison of
For more details you can :