Just Moved a Client's Application Server from a Virtual Back to a Physical Server - Here's Why...
Richard Fieldhouse (10 April 2013)
Introduction
It's running rather against the trend to move our client's application server back out of the virtual world, so I've decided to explain a bit about what happened and why it had to be done.
Virtualisation can do much for a business in standardizing and streamlining the IT operation but there are sometimes pitfalls and unintended consequences which can come to light later on.
It's important not to brush these issues under the carpet so that the next visualisation might deliver the dividend without the problems that have come to light here. I shall try to be objective and explain the good parts and the bad for anyone contemplating moving from physical to virtual (PtoV) or back again (VtoP).
Why do Companies Virtualise their IT?
Most businesses have a number of standard processes like personnel and accounts combined with special, normally creative skill centres. IT systems initially supported the standard processes, but over time specialist, tailored IT systems also began to help the creative, distinctive and unique parts of the business.
Often these various systems ended up on a variety of different servers at different offices each serving local client PCs. In a bid to improve efficiency and often to deliver the synergies promised at the time of mergers, many businesses pressed hard to rationalise IT systems. Visualisation looked like the answer to everyone's prayers. It allows the identities of, say, ten server computers to be transferred, with the systems they support so that they exist as ten virtual servers, but now on only five physical machines. This technical driver is also handy in pushing for the centralisation of the servers for all the company's IT systems at a single geographic location.
Benefits and Consequences
There are obvious savings here in capital and running costs. The toolset to deliver the visualisation - in this case VMWare - generally works pretty well. It's possible to view and balance the resources used by each server to some extent, to take better backups than could be managed before, and to reconfigure some things on the fly. In this way, greedy applications can be given a bigger share of the resource cake.
Staff changes accompany this change in the physical IT setup which offer the opportunity for additional savings. There's also some change in the skill mix with demand for an increased understanding of the visualisation toolset putting added pressure on managers to drop the older skills involved with the support of the the various applications and the platforms on which these applications are running.
In order to get by with a reduced knowledge of the underlying operating systems and the layered applications like Microsoft Office and accounts packages, a policy of resisting the updates pushed by suppliers can be effective. In some ways suppliers like Microsoft have brought this upon themselves by repeatedly pushing clients to buy in to new versions of their various products even when the benefits are relatively minor.
Gradual Accumulation of Problems
While a virtual server can be fed with some extra resources if performance issues arise, the visibility of the underlying virtual systems is reduced. With fewer people to do the day-to-day housekeeping there is a serious temptation for all this to be left on one side.
When these problems finally come to light, things can be quite a mess with a virtual server groaning under the weight of sometimes more than a decade's worth of the user accounts of departed staff members, printer installations for printers that have long since been forgotten, and databases that are tolerating corrupted indexes. This last problem is not confined to applications - the databases behind the operating system itself and those behind Microsoft Exchange are often the worst offenders. Underneath all this, the problems of fragmented data storage - often ignored on virtual servers - can balloon to unimagined levels.
What to do when the Problems Appear
Having read the above, there'll be little question that 99% of the old system data will be thrown away. But the key is how you find and care for the 1% that needs to be kept - that can be enough to restore health to the whole of an ailing business... or enough to send an apparently viable one to the wall.
Some of this 1% will be contact information, some of it must be kept (and retrievable) for contractual reasons. But some of it too will be in the systems themselves - not the standard ones like word processing or email, but those that have been tailored to your business over the years - the ones that mould the way your staff interact with clients and other stakeholders.
These processes can be what has differentiated your business from its competitors - if you and your competitors alike find yourselves running the same systems, you may also find your practices twisted by those systems to match some one else's market conditions - or not to match a real world, or a UK market at all. Now is not the time to be saving money, or alienating the people you can still find who know something of how your systems work and can help you find and sustain the 1% of data and functionality that needs to kept.
While it's crucial to seek out the value in your legacy systems, it can still be right not to move to new versions of operating systems or standard software. A new installation of the existing systems can often be the best solution. This allows you to fetch across things like user accounts on an opt-in basis while not spending money on new licences or adapting to new versions of Word or Excel or whatever.
So, while a server which hosts only standard systems would perhaps be best replaced by a completely new, or outsourced altogether, for servers holding specialized systems, the priority is to preserve the function. It's these specialized servers where a rebuild of the existing environment can be the best solution.
[Aside on Blame]
[One thing that's it's best to avoid is getting hung up on who is to blame. A number of people were probably forced into cutting costs over a long period of time, and cajoled into making good on other people's promises. But it's also worth remembering that IT systems were generally being replaced completely every four or five years through the nineties and much about the systems we have today was not really built to last as long as we need it to now that the operating system installation can outlive its hardware. This exoneration does not quite extend to Microsoft though. There are areas where housekeeping processes, for instance those we have to use to repair the old system databases, are so bad that one could almost imagine they were trying to bring forward the point when an old operating system has to be declared obsolete. ]
Why must the move be to a physical machine
It's often tempting to think that moving a problematic installation to a new virtual server will be the right thing to do straight away. A move like this though carries with it an assumption that the symptoms you see are truly caused on the server where they appear. This may not be the case.
It's probably a waste of resources to indulge in a witch hunt trying to find the true cause of slow running, freezing, or apparently random errors appearing on previously reliable systems. The kind of problems I've described above are unlikely to be confined to a single system, but rather, they probably result from a malaise that will be affecting many of your installations in similar ways.
The safe way to ensure that the new installation is clean is to put it on clean isolated hardware. This can also avoid damaging disputes with the suppliers of the systems that are apparently misbehaving.
How to Keep Things Running Smoothly Next Time
While this is unlikely to be a solution that is too popular with the bean counters, the key to avoiding the issues described here is in maintaining the skills on hand to look after your system. There are two parts to this.
Firstly, for standard systems that you choose to keep in house, the process of looking after the system must be sustained and managed. Left to their own devices, an IT support team can lose sight of the big picture and increasingly concentrate on routine backups, printers and passwords - a situation reminiscent of the IT Crowd TV series who had become passed master in the art of "repairing" IT systems by switching the PC off and then switching it on again.
Although it's requires more proactive management, it is important that the team keep on top of housekeeping tasks on virtual servers. Perhaps the most important of these is the complete removal of obsolete accounts which occupy space, attract spam and can be a security risk.
Secondly, for specialist systems, whether they were developed in house or by a third party, the importance of continuity needs to be kept in mind. A simple ongoing support contract is not really sufficient. To maintain skills in such a system it's essential to share the task of succession planning. And, in turn, to justify the resource required for this, and to build upon the market advantage these systems can bring, it's best to sustain a steady level of ongoing system development. Yes, that does sound expensive in hard times doesn't it? But keep in mind that the alternative is that key knowledge for a business, perhaps employing thousands, of how it and it's systems interact can become confined to a very small number of people.
Cautionary Tale
So just as an idea of how important this issue can become, and an alternative take on why Microsoft have been cajoling us into ever larger systems, consider things from their point of view. Their fortune is based around applications like Word which actually date back to around 1990.
When Word opens a document today it takes ages - a surprisingly large number of files are opened in the background. What for? Do the people at Microsoft actually know? There could be modules of code running there for which the coders aren't around any more: some of them working on other things; some of them on their yachts; and some of them dead.
Perhaps, now, there is no one around who is prepared to risk the consequences of cutting out modules which, in fact, aren't contributing to the function of the system any more.
I'm not saying, for sure, that this has happened just that it might have. It is, after all, a consistent explanation for why Microsoft have struggled for so long to cut down the footprint of their code even though the need is so great (to help tem compete properly now that iPads or the Android tablets are on the rise).