Wednesday, November 12, 2014

Part 3: High Availability options with vRealize Operations Manager!

With this part of the series, I will start right from where I left in my last article. In the previous post, I spoke about the architecture of vROps along with the various services and node types which are available with this release. At the end of that article, I spoke about the benefit of having a cluster like architecture which not only provides scalability to the entire solution, but also allows you to protect the solution by building in resiliency.

The cluster architecture of vROps is not about scaling the various services within the solution, however it is about making these services modular by using a uniform measure to scale them. This uniform measure is a DATA NODE. Hence, instead of scaling out let's say just the PERSISTENCE layer by adding more memory, you would basically look at adding a new DATA NODE which will automatically add scale to all the services in an equal amount. This not only makes the solution modular, it also ensures that standardisation is maintained during scale resulting in a predictable and optimised performance.

For ease lets take a scenario where we have One Master Node, One Master Replica and one Data Node as shown in the figure below:-

Taking the above architecture let us discuss a few scenarios which are handled by the vROps cluster for data distribution, redundancy and resiliency.


The resource will be assigned to one Node and all the analytics work regarding that resource will be done by that node itself. Only in case of failure the standby node which has the replicated data will become the Active Node for that resource.


With this architecture the data retrieval for 'HOT DATA' is extremely fast since the data is in the 
"in-memory database" layer of the cluster. Now that we know how the data is collected and retrieved, we can look at various failure scenarios which a vROps cluster can survive.

MASTER NODE FAILURE - If the master node fails, the complete responsibility of the master is taken over by the Master Replica. The Replica gets promoted and ensures that the solution is available at all the time. In case the Master Node failed due to a hardware failure and is back online with the help of vSphere HA, this node will be configured as the Replica Node thereafter.

DATA NODE FAILURE - If the Data Node fails, then the owning resources of that node are promoted on the surviving nodes which have a replica copy of these resources. The new owning node is responsible for collection of data from here onwards. If a data node has failed due to hardware failure and vSphere HA brings it back on a surviving ESXi node in the vSphere cluster, then this node automatically joins the vROps cluster and the data points are synced on this node. If a data node has been out of a cluster for a long time (more than 24 hours), it would be a better idea to re-create that node from the scratch rather than rebuilding/re-syncing the data.

IMPACT ON DATA FLOWS DURING FAILURE - If a node fails while the data is being queried or collected in the vROps cluster, it would not have any impact on data failure or availability as the surviving nodes will serve that data requirement through the replicated resources on them.

The placement of each node of the cluster should be done on a separate ESXi host using anti-affinity DRS rules to ensure that a host failure should not impact more than one Node in the cluster to avoid any data loss / availability issues.

Now that we know how vROps cluster architecture works, in the next part of this series we will have a look at the various deployment models which can come into picture when you plan to deploy vROps 6.0 in your infrastructure.

Till then.... Stay Tuned :-)

Share & Spread the Knowledge!!

Monday, November 10, 2014

Part 2 : vRealize Operations Manager Architecture Deep-dive!

In my previous post I gave you an overview of vRealize Operations Manager 6.0. In that post, I have spoken briefly about the architectural changes or differences between vCOps 5.x and vROps 6.0. With this post, I will take it a few levels deeper to explain the entire architecture of vROps 6.0

One of the biggest change in vROps 6.0 is the scale out architecture of the application, which not only allows you to monitor more resources, but also bring a RAIN like architecture due to the resiliency available in the application layer. I will talk about the technology behind this in a moment, but before that let's have a look at a graphic which can give us an overview of the ARCHITECTURE of the vROps appliance.

For those who have worked on vCOps 5.x, you will immediately notice that the above logic al architecture of vROps just shows a single VM / Appliance. This is not a mistake. This single node is the complete vROps solution as it has the formerly known Analytics and UI VM converged into a single VM. This not only makes things simpler to deploy, but make it a lot easier to manager as well. With this let me give you an overview of each layer starting from the topmost stack.

UI:ADMIN/PRODUCT - With vROps 6.0, the Admin UI, vSphere UI & the Custom UI are converged into a single UI. When you first launch the web ui using the IP address, you are placed in a first time setup wizard which is the Admin UI interface. On subsequent connections, you will be interfaced with the Product UI  which is a single user interface to look at the vSphere Objects and Customization options. The Admin UI hence would be used for the first time setup and then cluster management activities such as adding data nodes, removing data nodes, bringing the cluster online etc. The Product UI on the other hand is for Application access, where you can setup policies, Alerts, Custom Dashboards, Management Packs and a plethora of other tasks which comes from the previous version of the product and of-course all the new stuff which I discussed in my previous post.

COLLECTOR - The responsibility of the collector does not change much from the previous versions of the product. Collector as always is responsible for capturing the data coming through the adapters. The enhancement made here is the introduction of extensible published APIs which can now be used to inject data from 3rd party sources or do ETL operations through other tools in the datacenter. The APIs are published and can be utilised by customers to extend the goodness of vROps across their infrastructure & application platforms.

CONTROLLER - The controller here is the brain of the collection & retrieval engine. It is responsible to map the collected data to the right resources and also retrieve data for the requested queries. It also plays a vital role in keeping the remote collectors informed about the changes happening in the system and the work they need to do to ensure consistency of data for all the resources being monitored by the system.

ANALYTICS - The role of the Analytics stack does not change much. This engine ensures that all the patented algorithms within vROps are applied to the collected data and functions such as super-metrics, dynamic threshold calculation, Alerts etc are calculated and then available for viewing, providing recommendations and taking actions.

PERSISTENCE - While all this is happening on the top layer, the mastermind lies in the Persistence layer which gives vROps the performance required for monitoring thousands of objects for which data is collected, stored, analyzed and retrieved at the speed of light. This persistence layer works as a data service layer for all of the above layers & the agility in this data service layer comes from using in-memory database powered by Pivotal Gemfire. Gemfire not only helps with persistence of data, but it also makes vROps CloudScale by easily scaling out the vROps application across multiple nodes. This gives the scalability, performance & availability to the solution which was missing in the previous versions of vROps.

DATABASES - Along with the architectural change vROps 6.0, also has a change in the way the databases work. Let me give you a quick brief as to how these databases function as they are the backbone of the deployment:

  • FSDB - The File System Database is available in all the NODES of a vROps 6.0 Cluster deployment. This is where all the collected metrics are stored in the raw format.

  • xDB (HIS) - The xDB is where the Historical Inventory Service data is stored. This is available only on the MASTER Node or the first node of the vROps Cluster. This would also be a part of the REPLICA node which is a true copy of the MASTER node for failover purposes.

  • GLOBAL xDB - This is where the the user preferences, alerts & alarms stored. This would where all the customization related to vROps would be stored. Like xDB this is available only on the MASTER Node or the first node of the vROps Cluster. This would also be a part of the REPLICA node which is a true copy of the MASTER node for failover purposes.

We will have more clarity once we look at the cluster architecture of vROps 6.0. Let's now dive in the cluster architecture to understand this in more detail. We will have a look at this graphic to see how a vROps 6.0 cluster can scale out by adding new DATA NODES, and how one of the DATA Nodes can work as a MASTER-REPLICA to ensure that we always have a resilient master in case of the MASTER going down due to hardware or application failure. Remember we have a RAIN architecture and hence the MASTER will always be up and the collection will continue even in case of hardware or application level failures. Here is the Cluster Architecture represented through a graphic:-

With vROps 6.0, you have the concept of different kinds of nodes which can make up the vROps cluster. Let me give you a brief description about each node type in a cluster:-

MASTER NODE - As the name suggests is the MASTER of the cluster. This is essentially the first node of the cluster i.e. if you plan to build one. I will talk about various deployment models as we move forward in this series. This node has the Global xDB (Postgres), the xDB as well as the FSDB. In essence, this node is where all the customization of your entire vROps solution lies. Things such as user preferences, policies and the entire brain of the solution.

REPLICA NODE - Doing justice to it's name, the Replica Node also called 'Master Replica' is the exact copy of the master node. This is to give resiliency to the solution. In vROps GUI this is identified as enabling High Availability. This node is not doing any work, but just watching the master node at all times and syncing with the node to ensure that it can take its place once the Master Fails.

DATA NODE - Every node which collects data in the vROps cluster is a Data Node. The function of this node is to ensure that it collects the data from you environment based on the adapters which are assigned to this node. This node basically allows you to keep scaling your cluster by adding new nodes.

REMOTE COLLECTOR - The remote collector is not a new concept in vROps, but this is now the only solution to get data from an environment which is not within a LAN. In other words, you have to install a REMOTE COLLECTOR if you need to fetch the data from a remote location into a centralized vROps cluster/node. Good news is that it is the same appliance which you have to install, and just chose collector during the install which makes it a simple install. Collector does not have the CONTROLLER, ANALYTICS or the PERSISTENCE layer since it is not required. It sends the data out to the centralized controller and then the data is treated using the Analytics engine.

With this I will close this article. In my next article, I will give you an overview of how this Cluster Architecture Provides resiliency to vROps solution and ensures that even in case of Node failures or Data Loss, how can vROps can continue to function normally and fetch and load the collected data into the system.



Thursday, November 6, 2014

Part 1 : vCOps 5.x to vROps 6.0 - Intelligent Operations from Apps to Storage!

Back in the month of August, I had an opportunity to get an early interface to vRealize Operations Manager. I jumped out of the seat when I saw the new UI and the amazing features. As much I wanted to share the goodness of this version of vCOps now called vROps, I knew that I had to wait before we move the product from Beta to RTM and finally to a release built. I have been closely following the progress and have deployed a number of beta builds to provide constant feedback to the product management and of-course to learn the goodness which vROps brings along with itself.

Although their are quite a number of posts out there which have already given an overview of What's New with this version of vROps, I would probably want to give you a high level overview and then double click into the new architecture and features in the days to come. I am also planning a series on Installation & Configuration Best Practices and that would be inline with the series which I did with the 5.x versions.

Before we begin let's start with a quick graphic which gives a 30,000 feet overview of what's new with vRealize Operations Manager 6.0.

Let me dig down into each of these points and give you a preview of these enhancements:-

SCALABILITY - With vROps 6.0, the entire architecture of the application has changed. With this change, vROps is now enabled for horizontal scaling along with the existing option of vertical scale. This change makes the product bigger and better. You can now deploy vROps, in a cluster mode to scale the application on demand. This will allow you to add new objects which you want to monitor using vROps and needless to say that this will surpass the current limit of 12000 VMs which is the limitation of vCOps 5.x. This essentially is possible through the application RAID a.k.a. RAIN concept applied in the architecture. In short, the first node you will deploy will be a MASTER NODE, then you can create a REPLICA NODE which is to provide high availability to the vROps cluster. Every additional node which you add in this cluster will be a DATA NODE and allow you to scale the number of objects which you can monitor using the vROps cluster.

USER INTERFACE - This is definitely the best enhancement for me. vROps 6.0, just has a single UI which is not only intuitive, it has a great look and feel to it. This means that the 3 different UI's, i.e. Admin, vSphere & Custom all come together to make this Unified UI which will have all the features built in. The features on the UI will be unlocked based on the license you hold. The interesting thing is that it's not just an interface change, but a change in the way vROps will treat vSphere & Non-vSphere objects. All the objects within vROps 6.0 are so called first class citizens. In other words, all the objects and related metrics are treated equally whether they come from the VMware Adapter (vCenter Server) or from a 3rd party adapter.

ALERTS - With vROps 6.0, the alerting mechanism is not only based on hard & dynamic thresholds of the available metrics & super metrics, but you have the option to take it to the next level of creating alerts for multiple symptoms which you spot in your datacenter. In simpler terms, you have option to feed in the human intelligence into the alerting system to ensure that you not only get alerts on the things which you are not aware of, but alerts on things which you feel are important to your datacenter operations. While the symptoms make the alerts intelligent, their is an additional enhancement to make these alerts actionable through remediation engine built within vROps. This feature really brings the concept of Self-Healing Datacenter alive and ready to use with this release. vROps, 6.0 has actions built into the product based on python scripts. These actions can help you do sone of the basic tasks like, powering off/on VMs, Snapshots, adding resources etc. While this list is exhaustive, you also have the option to write your own actions. If this was not enough, VMware has integrated the Automation & Orchestration Jewel a.k.a. vRealize orchestrator (formerly called vCenter Orchestration (vCO)) via an adapter which can be used to trigger controlled workflows from vROps into the rest of the ecosystem of your datacenter.

VISUALIZATION - This is definitely one of the biggest enhancement based on the various requests from the customer and feedback from field employees like myself to the engineering groups. With vRealize Operations Manager, you can now look at creating custom reports without digging down into XML coding. With 5.x you only had a set of reports available, for anything extra it was a herculean task of going through XML coding and hoping that one would get the report they are looking for. While vCOps was not designed to be a reporting tool, the traditional IT still believes in looking at reports to take decisions. vROps 6.0 makes it easy by using the concept of VIEWS. You can built views either based on the dashboard widgets, or things such as Lists, Graphs, PIE etc. and plot the required performance or capacity data in these views. These views can then be planted either in reports or dashboards as per the requirement of the consumer. While all this is enhanced, the beauty is that this is available through simple GUI options instead of getting into behind the scenes. This takes custom dashboards and reporting to a completely new level and I am looking forward to create some useful dashboards and share them with you in the days to come. With this, I should also mention that vROps will no longer use 2 virtual machines ( Analytics & UI). Both the VMs are now clubbed into one and each vROps node will be a single VM with both the engines running in it.

CAPACITY MANAGEMENT - Capacity management with vROps from a policy perspective does not change much. Allocation & Demand are still in the game, however, since all the objects are FIRST CLASS citizens, the concept of capacity management is applicable to anything and everything now. While with vCOps 5.x, Capacity Management was limited to vSphere Objects, with vROps it is possible to apply the capacity management policies to each object which is being monitored by vROps. The other crucial change made with 6.0 is the PROJECTS feature. Projects is an enhancement to the What If Analysis. In essence, you will not only be able to forecast using the What If Analysis engine, you will also have the opportunity to convert your analysis into a Project and save it within the vROps engine. Doing this was not possible with the previous version. You will also have the additional option of using the commit option within the project which would automatically do the capacity modelling for you and reserve the capacity for the upcoming projects in an organisation. Remember, this is not limited to the virtual infrastructure. This functionality can be applied to anything which is being monitored under vROps. I can see this feature being tapped potentially by project management tools out their through the NEW API (I almost missed this) available with vROps.

AND MANY MORE....  :-)

Needless to say that this is just a teaser or an introductory post. In days to come, we will double click into each of these areas and many other features which I have not even highlighted yet. Hope you enjoy the post. As always, would love to hear your thoughts, feedback or constructive criticism :-)



Monday, October 6, 2014

Creating a One Click Datacenter Capacity & Cluster Performance Dashboard!

I have got a number of requests in the past to modify the famous Cluster Capacity Dashboard. Today I have modified it to create a similar dashboard which shows the capacity from a Datacenter Point of View. So if you have a large install base of vCenter Operations Manager with multiple Datacenters, you can use this dashboard to have a complete overview of the Datacenter. I call it the Executive Dashboard.

Here is how the dashboard looks like:-

This dashboard allows you to click on the datacenter for which you want to see the capacity overview. At the same time you can see the Top 5 Clusters across all the data-centers with highest amount of IOPS and Memory Usage. As always, I have made it very simple to replicate. You have to download the following files.

EXEC.XML - Click on the link to Download the file and import this to the following location within the UI VM.


Do remember to have this file with the 644 rights to read, write and execute as shown in one of my article from the past. Here is the link:-

Once you have imported that file, you just need to import this another file as dashboard on the vCOps CUSTOM UI. I call it a the EXEC-DASHBOARD.XML (Click to download). In case you do not see the list of datacenters, Edit the Resources Widget on the Left Pane, Browse to Resource Kinds - > Datacenter -> Click on All Attributes once and click on OK. This will get you the list of the clusters and you will be good to go!! -

That's it.. This should do the trick for you. Please share you comments and feedback on how this vCOps dashboard has helped you with your work.

Share & Spread the Knowledge!!

Friday, September 26, 2014

Recover Deleted Custom Dashboards from vCenter Operations Manager

Some time back, I wrote an article highlighting the custom dashboards which are available out of the box with the release of vCenter Operations Manager 5.7. I have been asked questions about, how these dashboards can be recovered in case someone deletes them accidentally. 

In reality these dashboards are a bunch of XML files which get imported as dashboards, when you install the vCOps vAPP for the first time. A shell script is behind this import which executed the script post installation and start importing these out of the box dashboard in your vCenter Operations Manager instance.

Hence, if comes a scenario where you end up deleting these dashboards, you would need to find that shell script and run it all over again to import the dashboards back into your vCOps instance. Now that we know the science behind these dashboards, let's have a look as to where this shell script is located. 

I am using WinScp to login to the UI VM and browse through to the /usr/lib/vmware-vcops/user/conf/dashboards directory as shown in the screenshot. Here you will find the shell script ""  and most importantly all the xml exports of the dashboards which can be imported back into vCOps individually as well.

If you wish to import all the dashboards, you would need to execute that shell script using either a ssh client such as putty or by taking a console connection to the UI VM. If you run this script it will import all the dashboards listed in the screenshot above and if any of those dashboard is already existing then you will have duplicate instances of same dashboards which in my opinion is not cool.

Hence if you want to recover only a subset of the listed dashboards, then either edit the shell script (if you are good with scripting) else just grab the xml file you want to import and use the IMPORT DASHBOARD option in the custom UI.

With this I will close this article and  get on with a 360 Degree Capacity Dashboard I am working on. Will share that for sure ;-)

Till then...