My Journey with Red Hat CloudForms 4.2, a Cloud Management Platform
I first discovered the cloud about five years ago. Since then, the revolutionary technology has caused me to change from someone that visits a physical data center location to a couch potato!
However, a question started to bother me as I realized that on a larger scale, outsourcing IT maintenance might not be as appealing. How are enterprise organizations going to adopt a cloud-based IT model given their requirements for governance, risk, and compliance? I also wondered, as the cloud movement grew, how will these organizations leverage this elastic capability in a cost-effective way?
There are numerous admirable characteristics about the cloud, but to me, the chief one is its nature of utility. If you are not leveraging this characteristic, you are not realizing the cost savings cloud technology can offer. For example, with cloud utility, a business can switch servers on and off automatically in a compliant fashion. If a server is not making the business money, switch it off. If the business could be making more money by having an additional server, switch a new one on. If a developer wants to quickly try something, he can via the self-service catalog in the private cloud self-service portal.
Readers may be thinking that this all sounds deceptively easy. First, let’s think through the details of the full stack, because it is not just the server that we need to worry about. There is the mixed bag of firewalls, load balancers, Domain Name System (DNS), middleware, and other items on the technology front. On the governance side, there are aspects related to change control, testing, business continuity, disaster recovery, ITIL, and more. Server builds automation, the components needed to provide the necessary application platform, and then the application code itself are well established. But organizations will have invested in particular technologies for governance and technical architectural reasons. For example, ELB just does not have the technical feature their million dollar application needs, so they’ll have vendor load balancer appliances.
With all of this complexity, how would one pull all these technologies, processes, and standards together?
Ultimately, we need glue for this hybrid and bespoke world. There is also a loopback in this requirement, in that the glue needs to be compliant, too.This is where Red Hat CloudForms comes in. The following content is a compilation of some things that I’ve learned over the last few projects and working with CloudForms, which will serve as introduction to organizations interested in leveraging the technology.
My Findings and Learnings
The platform has many excellent benefits, including the following:
- CloudForms is a single pane of glass platform. Reporting and capacity management, the service catalogue, VM operations, compliance control, and other functions are all accessed from a single portal.
- CloudForms has out-of-the-box functions for provisioning, reconfiguration, and retirement workflows. The workflows have request, approval, and execute stages.
- Compliance is out-of-the-box. Everything is audited and all actions and events are logged.
- This is a Red Hat product, which means that there is an upstream open source community project aligned with the product. General open source is wonderful, and commercially supported and enterprise-grade open source is even better (for the organizations that I work with)!
- The product is Ruby and Ruby on Rails. Application architectural patterns such as decoupling, timeout, and retry come out-of-the-box as part of the framework. All you need to do is write the required Ruby scripts and plug them into the framework. The script execution has On Entry, On Exit, and On Error hooks. These Ruby scripts are very much standalone, and it can be compared to Ruby coding as used with Chef configuration management. You don’t need to be a Ruby application developer to write the necessary code, and I have generally found that I do not have to add to the out-of-the-box gems list.
- CloudForms has RBAC and tenancy capabilities. RBAC has LDAP integration, so integration with Microsoft Active Directory or any LDAP directory store for authentication and access control is in place. Tenancy allows you to manage resources on a per tenant manner. Tenant A can only access resources owned by Tenant A. Tenancy also comes with resource quota policy. Each tenant can be limited to the amount of resources (CPU, memory, storage, number of instances, etc.) that they have paid for.
- CloudForms supports multiple providers both on and off-premise, including vCenter, AWS, Google Cloud Platform, Microsoft Azure, and OpenStack. API integration for these providers is offered as part of the core platform code.
As you can imagine, there is certain knowledge that comes from using a platform like this in the real world, and CloudForms does not have an easy on-ramp. Although readers should stay tuned for other use cases, I have focused this article on vCenter and Cloudforms Service Catalog-based requests for now. Here are some things I’ve picked up:
- CloudForms can quickly become the default answer to every problem. Fight that urge.
An orchestrator orchestrates. CloudForms is an orchestrator. Therefore, my mantra is: “Go somewhere else first and if it is not possible, then use CloudForms”. That is, if you have an IPAM system, don’t make CloudForms do IP management. If you have a configuration management system, use that to configure systems and services. CloudForms should only be used to tell the configuration management system to create a load balancer with certain characteristics.
- CloudForms’ Object structure can be contextually confusing to work with. The workflows consist of Request and Task objects and each of these potentially has child objects. As the workflow progresses, it switches contextual boundaries from one object to the next (for example, switching from a request object to task object). So a script that worked in the request object context now switches to the task object context and “stops” working.
Figure 1: Service Object Relationships
In terms of code reuse, this is a big deal. I might want to do a Microsoft Active Directory group check at the approval stage (request) and again at the end stage (task) of a provisioning. The framework provides mechanisms to manage this.
This allows the script to try another context if the first fails. This works when using the script within a context, e.g. request or task. How do we reuse code that is used in request and task context.
For this, we check the vmdb_object_type and from the example above, you can see there are not just the two contexts request and task, but also vm and others.
- Knowing how to debug is critical for a complicated platform such as this. Debug scripts such as Object Walker and Object Reader can come in handy at the entry point of the Cloudforms automation engine (e.g. from a service catalog or from a button on a VM or the API).
Object Walker can be called from your code block and it will dump the context to the automation.log. The vmdb_object_type value is always provided at the end. Object Reader formats the dump in a more human friendly format. Object Walker provides you with detailed information, describing all CloudForms objects available for use in the script it was called. For example, userid can be accessed to “know” who the requester is. There is a wealth of very useful objects.
- CloudForms can issue duplicate server names. If your organization needs to conform to a specific server naming convention, you will most likely need to write code to work around the default behaviour.
- Get to know the Request, Approval, and Execute workflow aspects of CloudForms. This is where the bulk of your time will likely be spent.
- Working with Requests, specifically Service Catalog based requests:
The requester/user logs into the portal and is presented with a service catalog form: Please fill in this form and hit Submit.
CloudForms has out-of-the-box capability to generate the Service Catalog form. Automate > Customization > Service Dialogs is the starting point.
Figure 2: Customization form generator
This GUI allows you to build the form framework. Each field in the form is an Element and the element’s name value is the variable that stores the user’s input.
Figure 3: Element name
In this case, the variable dialog_ip_addr will be created by the CloudForms automation engine and can thus be retrieved for use in the code.
Once we have our form structure ready we tie it to a Service Dialog, via Services > Catalogs > Catalogs Items.
Figure 4: Catalog Item
The catalog item provides entry points to the automation engine. In the case of non-generic types (VMWare, AWS, etc.), it provides default values to provisioning requests (e.g. a network, IP address, number of CPUs, vCenter cluster, etc.). As previously mentioned, these default values can then be changed based on user inputs from the form. Catalog Items are then added to Catalogs (Services > Catalogs > Catalogs).
The user completes the form and then hits Submit.
- Approval workflows:
Welcome to the State Machine. It is the mechanism that abstracts the developer from some important application architectural patterns such as retry, timeout, on error, etc. and therefore, makes CloudForms an enterprise orchestrator, without the enterprise-size development team. The developer only needs to write Ruby scripts, not complete software code with enterprise architectural patterns.
Figure 5: An Approval Statemachine
One way to describe the CloudForms state machines is that they are the mechanisms that control the execution of standalone Ruby scripts. CloudForms steps through each state in the state machine, applying On Entry, On Exit, etc. controls for each if defined.
Once the service catalog form has been submitted, the approval process starts. Part of this process includes a resource quota validation. The flow enters the approval state machine. This is typically the first opportunity for customization of a CloudForms state machine. It is likely that the organization will have a change control system (ServiceNow, BMC Remedy, etc.). The developer can write integration code to automate the change creation process. For this, the state machine will typically be extended with a few states (Ruby scripts to execute) and attributes. State machine attributes are values provided to all scripts in the state machine as variables. In figure 5 above, approval_type is an attribute and it can be retrieved in code as follows:
When it comes to integration with external systems, the state machines generally contain attributes for the external system’s API URL and authentication details (passwords are encrypted by CloudForms). That way, they are not repeated in multiple code scripts, each script can just retrieve and use the attribute from the state machine. The password attribute encryption helps, too!
Approval state machines can be extended for any number of reasons, since each organization will have their own process before proceeding, which might involve checking Active Directory to ensure the requester is a member of a specific AD group.
Once the approval state machine is complete, CloudForms will check whether the requester has enough resource quota to fulfil the request. It is because of the quota policy check that the developer needs to specify the number of VMs to be provisioned at the service catalog (request) stage.
- Executing the approved request
We now have an approved request, time to execute it and provision something. Figure 4 showed the Provisioning Entry Point. This is the instance that will be called next. An instance can be described as a localised state machine. It will inherit your state machine’s schema, but you can change the values of each state. Usually an instance executes a method (Ruby script), so the developer now has a way to overwrite values passed to the method. It allows for a more granular control of a method in the case that the global state machine values don’t work.
Typically for service catalog based requests the CatalogItemInitialization instance is the first script to execute. The primary function of this script is to parse the inputs provided by the requester via the form and then set up variables so that they can be used by child tasks later in the flow. For example, given the input of a user selecting a specific data center, we can retrieve the value once the flow reaches the data center placement process.
The flow now starts the actual provisioning stage. This is of course done via a state machine.
Figure 6: A Provisioning state machine
This state machine will again be extended and modified to suit each organization’s requirements. Here you are likely to write integration scripts for IPAM, DNS, AD, Configuration Management, and ITIL.
- Handling Variables
The standalone execution of methods (Ruby scripts) via states in state machines is great, but you need to know how to pass variables between methods in a state machine and also pass then to methods in other state machines. To pass variables within a state machine, use the state variable. Variables are actually available between methods in a state machine, but if a state machine retry loop is triggered, all variables are lost. To safeguard against this, use state variables to declare variables in a state machine.
To persist variables between state machines, use the option function.
This is especially handy for workflows that involve a change control record — one that is created within the approval state machine and which then needs to be closed once provisioned.
Another use case for variable persistence is the server object. For instance the IPAM IP Id number is required to release the IP address. The Id is provided during provisioning and can be attached to the server object via custom attributes. The value can then be retrieved during retirement.
- Datastore structure and organization
- It is essential to maintain proper hygiene of your datastore/domain structures. A good place to start is to house common functions together. See a sample provided on the right.
Figure 7: Example datastore structure
- The integration functions should have generic high level names, e.g. LDAP, IPAM, etc. which allows for multiple specific technologies below them. For example, you may have multiple LDAP stores including Microsoft Active Directory and/or DS 389. Or you could have Infoblox or/and FusionLayer as your IPAM providers. Use $evm.instantiate to call a method in a different state machine.
- A nice characteristic that you can use is that the domains are executed in sequence, which you can order. You could have the same method in multiple domains and CloudForms will execute the first one in the domain sequence. This is great for ad-hoc testing of new code.
To do this, copy the method under consideration into a temporary domain that executes before the original domain. You can now change the code and test it without affecting the original domain and instantaneously revert to the original by deleting the copied method.
Figure 8: Using domain sequence for testing