Configuring multi-tenant Cloud Pak for Data environment on OpenShift

Tomasz Hanusiak
5 min readSep 14, 2020

--

In some scenarios you need to deploy multiple Cloud Pak for Data instances (for example Prod/Dev environments, HA configuration, separation of different services), in other cases you need to install Cloud Pak for Data next to existing solutions and you want to make sure the products don’t affect each other.

Since Cloud Pak for Data deploys in a dedicated namespace there is some initial separation, this however may not be enough. Let me share a couple of additional mechanism that can be used to keep the workloads independent.

Please make sure that you review you license agreements before you deploy multiple CP4D instances on top of a single OpenShift cluster.

1. Dedicating a group of workers to a namespace.

Allocating specific resources (workers) to Cloud Pak for Data can be done in various ways — there are taints, labels, toleration, etc.

In my opinion the easiest mechanism is the node-selector approach. This solution allows you to fine-tune your configuration in a few simple steps.

  1. Label nodes
oc label node mynode type=cp4doc get node mynode --show-labels

2. Change your project configuration

oc patch namespace myproject -p '{"metadata":{"annotations":{"openshift.io/node-selector":"type=cp4d"}}}'

The above will not relocate the already running pods, but any new workload will be placed on the desired workers.

You can notice this by describing a newly create pod and grepping for Node-Selectors

oc describe pod mypod |grep Node-Selectors
Node-Selectors: type=cp4d

More details can be found in OpenShift documentation:

2. Create a dedicated Storage Class for each namespace.

Usually a single Storage Class is used across multiple namespaces, and while this works fine, sometimes project- specific storage definition may be needed.

This can be especially useful when using a storage mechanism which is not namespace-aware — for example NFS.

Instead of keeping all the data in a single location (export path) you can separate them by defining a new Storage Class, this allows you to back up/transfer the data and configure security restrictions per project.

For example (assuming NFS):

Provider 1:      - env:
- name: PROVISIONER_NAME
value: cpd-storage-dev.io/nfs
- name: NFS_SERVER
value: 10.10.10.10
- name: NFS_PATH
value: /nfsdev
Provider 2: - env:
- name: PROVISIONER_NAME
value: cpd-storage-prod.io/nfs
- name: NFS_SERVER
value: 10.10.10.10
- name: NFS_PATH
value: /nfsprod

3. Create a new Machine Config Pool for individual projects

This approach allows you to apply different changes to specific workers instead of modifying all workers at once.

Since OpenShift Container Platform 4.3 & Cloud Pak for Data 3.0.1 we suggest the following steps to configure nodes:

https://www.ibm.com/support/knowledgecenter/en/SSQNUZ_3.0.1/cpd/install/node-settings.html#node-settings__crio

Please note that some MachineConfig configuration are needed and that by default only one group of workers exist in an OCP cluster.

To change this, follow this process:

Label the desired nodes

oc label node <node-name> node-role.kubernetes.io/cp4d=""oc get node

Create a new MachineConfigPool

cat << EOF > cp4dmcp.yaml
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
name: cp4d
spec:
machineConfigSelector:
matchExpressions:
- {key: machineconfiguration.openshift.io/role, operator: In, values: [worker,cp4d]}
nodeSelector:
matchLabels:
node-role.kubernetes.io/cp4d: ""
EOF
oc create -f cp4dmcp.yaml
oc get mcp

Test it with

cat << EOF > cp4dmc-test.yaml
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
labels:
machineconfiguration.openshift.io/role: cp4d
name: 31-cp4d
spec:
config:
ignition:
version: 2.2.0
storage:
files:
- contents:
source: data:,cp4d-test
filesystem: root
mode: 0644
path: /etc/cp4dtest
EOF
oc create -f cp4dmc-test.yaml
oc get mc | grep 31-cp4d
Wait for the node to restart (monitor `oc get node`), finally run:ssh core@<node-name> 'cat /etc/cp4dtest'

More references:
https://access.redhat.com/solutions/4287111

4. Configure egress IPs and/or redirection through a load balancer

This method allows you to control the traffic coming out of your project/namespace. You may need to do so if your worker nodes are dynamic, you need to limit the number of firewall rules or you simply need to clearly differentiate between two Cloud Pak for Data instances.

The first solution involves egress IP configuration and can be achieved by modifying the NetNamespace resource and then patching hosts’ subnet settings.

Please see the following link for details:

The second solution involves redirecting all outgoing traffic through your load balancer, this is particularly useful when dealing with air-gaped environments.

Start by modifying the load balancer configuration (for example haproxy):

Assuming that there is a service running on exampleHost (1.2.3.4), listening on port 678.Edit the configuration file of the load balancer (/etc/haproxy/haproxy.cfg) and restart the lb after applying the changes.Add a `frontend` and `backend` section:frontend example
bind *:30678
mode tcp
default_backend example-tcp
option tcplog
backend example-tcp
mode tcp
balance source
server exampleHost 1.2.3.4:678 check

Check that you can talk to the service through the load balancer (`curl -k -v <load_balancer_ip>:30678').

Once the load balancer has been configured, let’s create a Service and an Endpoint which will talk to the load balancer. This way we don’t need to use the load balancer details in our application, and the users can simply talk to the example-service as if it was the application running on exampleHost.

cat << EOF > objects.yaml
apiVersion: v1
kind: Service
metadata:
name: example-service
spec:
clusterIP: None
ports:
- protocol: TCP
port: 30678
targetPort: 30678
---
apiVersion: v1
kind: Endpoints
metadata:
name: example-service
subsets:
- addresses:
- ip: <load_balancer_ip>
ports:
- port: 30678
EOF
oc create -f objects.yaml

--

--