Five keys to IT Operations

Any developer who builds a new system dreams of the day their software goes live, and real users start pounding on their application.  This is perhaps the most tangible validation of their work that some developers ever have.  On that day, it’s pretty common for developers to either be the team responsible for running the system, or at least to be working shoulder-to-shoulder with the people who are running it.

For those lucky souls who “grow into” this role, though, it may be helpful to have a sense of perspective about what success in Operations means.  Although specific technical and design details are aligned with your implementation, your job ultimately boils down to some simple objectives:

1. Understand what’s supposed to happen

Eugene F. "Gene" Kranz, provided by ...

Gene Kranz - Image via Wikipedia

While Development is all about designing, building, and testing, Operations is all about Execution.  Go rent Apollo 13, or better yet, pick up a copy of Failure is Not an Option by Gene Kranz.  Happiness in Operations means understanding what’s going to happen before it happens, and there’s no such thing as a good surprise.  You can see this in the checklists that all the NASA engineers used, and when someone tries to tell you that you’re too smart for checklists, remember that those guys were, in fact, rocket scientists.  Nuff said?

Virtually nothing in Operations is of any use whatsoever unless everyone who’s touching a keyboard knows the exact same stuff, which implies that checklists are written down in a way that ensures everyone’s got the same information.  If you don’t have anything better to start with, get a whiteboard and write, “8am: Make sure the servers are still on,” and then start filling in from there.

2. Know what’s actually happening

This is the IT equivalent of watching the gauges on your car’s dashboard.  Failure to pay attention to the gauges could prove costly when your engine starts shooting connecting rods through the hood.

In operations, you’re watching for failures and performance problems — hopefully in time to react to them before your customers start complaining, and you’re watching for unusual activity that could indicate problems with other systems you interface with or even hacker activity.  When you get more sophisticated about what you’re watching, you may even be able to provide design guidance on what features your customers are using the most, or whether there are parts of your application(s) that users seem to be struggling with, but please make sure you’re covering the basics first – server uptime, exceptions, and application performance.

As usual, tools help here.  It’s a whole lot more efficient and effective to have a tool checking to make sure servers are responding properly.  Fortunately, there are all sorts of tools like this, including some free ones.

Important:  Be sure to understand the difference between IT Operations and Business Operations.  These can, in some cases, be co-resident, but remember that one is focused on your systems and the other is focused on the business.  These two aspects of Operations should communicate liberally back and forth, but it’s important to understand the difference between technical status and management and business status and management.

3. Communicate status

Since it’s Operations’ job to know what’s happening, they therefore serve as a fount of knowledge for other departments.  In a lot of cases, proactive communication is more effective than “pull” communication, and again, whenever you can drive decision-making out of the process, it’s a good thing.  Therefore, operations should know in advance what sort of events should trigger communication, and to whom they’d be communicating.  Some of this could, in fact, be automated.

Status is typically focused on what’s happening right now, but a complete understanding of status also includes a sense for whether measures are trending in one direction or another.  Data about how our system performs over time, for instance, can tell you a lot about whether a performance metric you’re seeing right now is a blip or part of a trend that’s moving steadily toward a big problem.  This sort of long-term information should also help us see performance or resource constraints in time to react to them before they affect customers.

4. Handle catastrophes

Sometimes, bad things happen to good applications.  When the sh*t hits the fan, it’s absolutely imperative that the cure isn’t worse than the disease.  Go watch Apollo 13 again.  Since everything that normally happens in operations should happen according to a checklist or procedure, it should be glaringly obvious to everyone (to the point of discomfort) that you’re now operating off-script.

I’ve heard pilots describe their jobs as “hours of boredom punctuated with seconds of sheer terror.”  This is when you want to open the cockpit door and see Sully sitting there.  Sully uses checklists, too, by the way.

5. Maintenance and planning

Since operations has done such a good job of ensuring our system is running like a Swiss watch, they’ve got some time left to plan for future improvements.  With any luck, this might include stuff like:

  • Preparing and managing hardware and/or virtual servers.
  • Planning infrastructure changes for upcoming software releases.  This is actually a very important form of developer support, because this is where operations and development work together to make sure you can deploy the things you’re building without any undue drama.
  • Tuning / tweaking system monitoring and management tools.
  • Analysis to assist development – where are your servers stressed, what custom tasks do you deal with today that might be built into the application, etc.

This list is just a start, of course, but it’s a pretty good start.

What tips would you add?

Enhanced by Zemanta

Reason #358 why I hate Flash

I use four PCs on a regular basis (two work PCs, plus a laptop and desktop at home), and all three run Windows — one Windows Server 2003, one Server 2008, and two Windows 7.  All of these boxes are either on 24×7 or hibernated between uses, so the only time I reboot them is to install Windows updates.

And every… single…. time… I reboot any of these machines, I see one of these:

I typically go ahead and let Flash do what it wants to do, and yet it keeps coming back, over and over and over again.  Based on this, I’m forced to conclude that either (1) Flash isn’t really updating correctly, or (2) it really does have a new update to install every single time I reboot.

Neither of these is acceptable.  Adobe, you’re not building an OS here.  Get it right and get out of my way.  If there’s  a *real* new version or a *real* security disaster, then let me know about it, but I just refuse to believe that there are really that many emergencies that you need to install something every single time I reboot.

If you’re wondering why folks like Apple have made such a big stink about getting Flash off their systems, this is exactly the sort of issue they had in mind.

The bug “event horizon”

Black holes are fearsome astronomical phenomenon that are so dense and have a gravity well so deep that their infinitely-large mass is compressed into an infinitely small space.  Anything that approaches these monsters is almost certain to be sucked into the hole, and there is a region surrounding each of them where escape is impossible, even for light itself;  it is called the Event Horizon.

The supermassive black holes are all that rema...

Image via Wikipedia

Bugs are like that, too.

A Bug’s Life

First-year software development students (and even MBA’s) learn that bugs in software are more time-consuming (and thus, expensive) to fix the further along in the software development process they’re caught.  The easiest bug to fix is the one you prevent in the first place with proper architecture and design.  Sadly, these quantum bugs are hard to quantify, so good design rarely gets to take credit for these precognitive fixes.

Of the bugs that are actually found, the ones caught during development (perhaps by TDD or unit tests) are wonderfully cheap.  In many cases, mere seconds pass between detection and eradication, the developers’ fingers never lifting from the keyboard. These bugs, in fact, are the next best thing to the ones that never existed in the first place, since most of them are never recorded in a bug tracking system and are probably never seen beyond the desk of the developer who found and fixed them.

After code is checked into a version control system, bugs move through a build process and on to testers, while simultaneously being propagated to other developers’ desktops via the source code.  Bugs found at this point are slightly more costly to fix, because you’ve involved other people, and there’s very likely a process and tracking system engaged to help keep track of the critters at this point.

But there’s another reason these bugs are more costly to fix — context switches.  As soon as a developer checks in his code, he begins moving on to his next task.  He’ll forget all about the code he was working on, including in some cases tearing down whatever virtual infrastructure existed to develop that code.  When he’s just about eyeball-deep in his next task, that bug comes home to bite him — a big context switch.  If the bug is urgent, he’s going to have to drop everything and get back up to speed on that code again, leaving behind any progress he managed to make on the new project.

The New Normal

Given that a developer’s job is to develop, you’d like him to be able to devote his full attention to creating solid designs and stout code.  In most cases, the work we’re asking of these people is really fairly difficult to do well, and it can be nearly impossible if he’s not able to concentrate.  When design is interrupted by a bug or two per week, there’s very likely an impact in terms of productivity as the developer changes contexts, but concentration and software design shouldn’t be adversely affected to a large degree.

As interruptions become more frequent and prolonged, however, there can be an impact in the quality of new code that’s produced, as well.  Design becomes disjointed and inconsistent.  Code becomes sloppy.  More new bugs are introduced.  We’ve introduced a vicious cycle of downwardly-spiraling quality.

As bug counts grow, they can start to have an impact beyond development and QA areas.  Help desks can become buried in calls; bugs begin to go un-recorded and uncorrected.  The noise level caused by poor software quality can contribute further to the downward spiral of quality.

Singularity

When code quality suffers so completely that the organization cannot produce high-quality code, the transformation of the organization is complete.  Buggy code eventually results in even buggier code, and software collapses upon itself as if it was a black hole.

Is the Demise Inevitable?

Unlike a real black hole, the problem of software quality can be stemmed — especially if it’s caught early.  Much like bugs themselves, though, the later this issue is addressed, the more difficult and expensive it will be to fix the problem.  When you consider the cost of bugs in your organization, don’t forget the cumulative effects of bugs on the quality of the work you’re able to produce.

 

Enhanced by Zemanta

CodeBetter.Com wants to send you to Redmond!

Here’s a great way for you to get to Redmond this fall for one of the most popular conferences of the season.  CodeBetter.Com is giving away one conference pass to Visual Studio Live! in Redmond (October 17th – 21st, 2011), plus $500 toward travel / hotel costs.  To enter, go visit CodeBetter.Com wants to send you to Redmond! and comment / tweet / trackback to that post.  Note: you can trackback here if you want, but that’s not going to help you win the contest!  They’re going to pick a winner and announce the final recipient this Tuesday, September 20th at 12:00 EST.

Saving Microsoft

The last few years have been trying times for Microsoft.  Late to jump on the web bandwagon, they’ve never owned that platform the way they owned the desktop.  Internet Explorer remains a decidedly un-sexy choice for web browsing, and Microsoft might never recover from the black eye that was Vista.  Office now faces some really credible online competition, and Bing is still light years behind Google in the search engine war.

But the most unkindest cut of all has to be watching Apple ascend to become the world’s most valuable company.  I mean, it wasn’t enough to watch the Mac chip away at Windows (many Windows developers, in fact, claim that Macbooks are the best portable Windows development boxes).  The iPod never even flinched when the Zune came along, and the iPhone, of course, completely decimated Windows Mobile, which was already under heavy pressure from Palm.  The coup de grâce might just be the iPad, which now has some declaring the death of the traditional PC.  While it remains to be seen if (or when) PC’s are really dead, there’s no denying that there’s already been a noticeable impact on PC sales, and it’s very possible that this was a factor in HP’s decision to get out of the PC business this week.

Will the last one to leave Redmond…

So is that really the end for Microsoft?

Not necessarily.  I don’t think we’re ever going to see the heady days of near-monopoly that Microsoft enjoyed in the early 90’s, but Microsoft is still huge, they make a lot of money, and you can still find their software on most business computers.  The Xbox is doing well, Bing refuses to give up, and the newly-reborn Windows Phone 7 seems to be winning fans every day.  Microsoft’s Skype acquisition could give WP7 another shot in the arm.

But there’s no mistaking the fact that “business as usual” isn’t getting the job done for Microsoft.  The Windows folks are now hard at work on Windows 8, but early discussions about HTML5 support in Windows 8 has caused quite a lot  of anxiety among Silverlight developers, who fear they’re now stuck on a legacy platform.

Mark my words:  If Microsoft’s Windows 8 strategy causes more developers to flee to the iOS or Android platforms, you can go ahead and cue the fat lady.  You see, since the very earliest days of Windows, it’s been the growth and productivity of Microsoft’s development platform that’s attracted developers, who built the applications, which attracted the users.  Ballmer had it right way back in 2000 – they’re in big trouble without “developers, developers, developers.”

What developers want

Despite the momentum of iOS and Android, neither of these platforms can touch Visual Studio for developer productivity.  All things being equal, this should make Visual Studio the automatic winner, and when Windows ruled the desktop, it was.  Now that Windows is no longer the gorilla it once was, though, VS can’t win on productivity because you can’t use Visual Studio to produce apps that run on all the devices you want to support.

That’s the “aha” moment.

Forget Windows vs. WPF vs. Silverlight.  Visual Studio needs to support development of applications that run on all those platforms, plus iOS, plus Android.  There’s no reason Microsoft can’t deliver this functionality in Visual Studio, and nothing short of this will be remotely close to good enough.

Don’t take my word for it

About a week ago, Charlie Kindel announced that he was leaving Microsoft to go do his own thing.  Charlie was the GM of the Windows Phone Developer Ecosystem, and before that, he led the Windows Home Server team.  Although this background admittedly makes Charlie a little biased, in an interview on Geekwire, Todd Bishop asked Charlie an interesting question about mobile platform development:

If you build an app for your new company, which mobile platform will you target first?

Kindel: Hypothetically, if my new company were to build mobile apps, we’d target WP7 first. You know the old saying “Code Talks”:  I know I can build a beautiful and functional WP7 app in a fraction of the time it would take to build an iOS or Android app. Startups are about executing quickly. But I’m sure we’d quickly take what we learned there and apply it on all the popular devices.

Right there, you have the value proposition for a cross-platform development tool, because although I think Charlie is right about the productivity gains on Visual Studio, I’m skeptical that most startups are really going to target WP7 ahead of iOS or Android.  In fact, we’re now on the  verge of HTML5 being that go-to platform, and right now, HTML5 development tools are so immature that Visual Studio just doesn’t have a productivity edge vs. anything else.

This is a big image problem for Microsoft now, and as more business apps need to target multiple platforms, it’s going to start costing Microsoft more market share and more profits from its last real stronghold: businesses.

So, what do I want to see?

Ok, here’s the todo list, Microsoft:

  • It’s beyond ridiculous that WPF and Silverlight continue to be separate.  I understand that you can do more stuff on the desktop than you can do when you’re deployed as an internet app.  Fine.  You don’t need a whole other client framework for that.  Stop it. Now.  Thank you.
  • I want to use the VS2010 layout tools I’d use for Silverlight development to build an HTML5 application.
  • I want to have all the declarative validation I create using data annotations create Javascript for those HTML5 applications.  We’ve seen hints of this in MVC already.
  • While we’re at it, I want to deploy the same application to the Windows desktop (or tablet) or a Silverlight client, or an HTML 5 client, or even a native iOS or Android client.  All of these clients use declarative, hierarchical UI layout frameworks, and all of them can support some form of .Net via mono.
  • Xbox, too — HTML5 would be fine, but I want to run the same apps there, too.
  • If you can’t (or won’t) make Silverlight cross-platform on the client, then can it and double down on HTML5.
  • End the fractured development tool practices that have plagued Microsoft.  Silverlight vs. WPF is the classic example, but there have been countless examples of competing data access technologies and other frameworks, too.  It feels confused and disjointed, and it’s not helping.

With these things in place, Microsoft would have a real shot at being the premier development platform for business applications for another generation.  I know that the position in the past has been to tie Microsoft development to Microsoft deployment platforms, but that fight is lost, and it’s time to garrison the last thing that Microsoft still does better than anyone else on the planet.

With developers in-hand, there’s no reason Microsoft can’t fight to take back OS platforms on phones, tablets, and so on, but if they lose the battle for developers, the flow of apps will dry up, and there will literally be no way for stop the hemorrhaging.

 

Enhanced by Zemanta