Can I get the impossible delivered next week?

Can I get the impossible delivered next week?

How many times have you participated in this typical IT conversation?

Business: “How long is it going to take to enable that new FlimFlam engage-the-client-better module with those extra customizations we talked about?”

IT: “Last time we looked at that we said it would take a little over three months including changing the custom code in order to be able to use that new module.”

Business: “Well, we need it turned on by the end of next month when the marketing campaign starts.”

IT: “Um, err, change all the custom code, install the new module and enable it with those additional features in two not three months?”

Business: “Yes.”

The theme of this classic IT delivery challenge is the same:

In order to meet a communicated business need within a business defined time-frame, a perceived number of technical tasks need to be accomplished that don’t initially appear feasible in that given time-frame.

The initial reaction by most technically minded people is this is completely impossible.  And yes, Agile or other project delivery methodologies have built in capabilities to handle fixed dates and variable scopes.  But if you are faced with this common theme of questioning, I am going out on a limb and guessing you are not working in a truly Agile shop or else this question isn’t likely to be asked in this manner in the first place.

So, you rush out and grab your FlimFlam experts to communicate the need and the desired end date.  You tell them this is the most important thing to work on and then you go back to your desk to get ready for the next crisis of systems delivery.

Nope.

Consider taking the time to put together a work delivery estimate as your first priority.  Why do this seemingly futile exercise when the business has already stated what they need and when they need it by?

You need to have some estimate data in hand to have a conversation with the business on what is truly feasible within a pre-communicated time-frame.

This conversation serves a few purposes that are critical to you and your team’s ability to deliver a quality solution:

  • Establishes what in-flight work will be put on hold/delayed until this higher priority request is completed.
  • Durations of time on sub-tasks enable the business to prioritize what features they truly need versus those that are just “nice to have”.
  • Establishes a baseline so when other high priority requests come in or new feature requests are added, you can dust off the previous estimate, revise with the new needs, and re-engage in the conversation.

I can’t count the number of times I have chatted with a business sponsor that swore up and down they just had to have every item in their request delivered by an irrational date.  Yet when presented with some work estimate that indicated everything wasn’t realistically possible within the time-frame [based on the cumulative hours/weeks associated with their request broken down into granular tasks], that same business sponsor started cutting “low priority” features left and right to meet their date.

Business: “You mean it is going to take a week just to get that data on the screen within the application?  We can just use that old report that shows the same data on paper.  Scratch that feature.”

Technical/Engineering Challenge

The typical technical/engineer mindset applied to the theme of delivering unfamiliar technology within an aggressive (arbitrary?) time frame is to think it impossible given the number of unknowns.  Get ready for panic and shortness of breath from your less seasoned technical staff.  Giving them coaching and a framework to break down the seemingly huge and complicated requests into a logical sequence of executable tasks is the other side of the estimation challenge.

To help enable the technical resources to break down “the work” into meaningful, estimate able and negotiable chunks, I’ve attached a simple work estimation spreadsheet:

Sample IT Work Estimation Template 10-05-10 [xls]

Sample IT Work Estimation Template 10-05-10 [xlsx]

Below is a brief explanation of how the template works:

There are three tabs:

1.      Template = where one fills in the data to create an IT work estimate.

2.      Calculations = where certain values are used in calculations on the Template sheet.  Changing numbers here causes the whole estimate to change.

3.      Assumptions = certain assumptions, copyright and reference back to this article for explaining how the template works.

Template

The top portion of the template (red arrow below) is designed to capture the basics about the estimate to tell it apart from other estimates: Name, project, dates, etc.  Think of all the fields that would help you and your organization know what you are estimating.

The main section of the template is where the work break down and granular task estimating occurs.  Gray fields throughout are auto-calculated but can be typed over if needed.  The first task heading “1. Architecture Tasks” shown below is a section to capture the non-negotiable tasks that need to be executed in order for any business functionality to be delivered.  This could be getting servers installed or setting up a new project in MS Visual Studio 2039 or creating a new code branch for the required new features which involves an act of Congress in your organization.

The “description” column is for the estimator to describe, in low tech language, what granular task is needed.  Since one may need to sit down and discuss this with a non-technical business person, some effort to use language that is more descriptive and less heads down techie would be beneficial.

The “Low” and “High” are for the estimator to enter the approximate minimum and maximum hours needed to complete the task.  The “Average” column in gray is calculated as simply the average to be used for the roll-up calculations at the end of the template.  If there are a number of unknowns or the tasks could be quick but depending on X, Y or Z, might run long, use a wide range for “low” and “high”.  This also presents the business conversational element of “Well, if this step goes well, it could be as short as 3 hours.  But, if turns out we do need to engage the storage team and get more storage allocated, then this could take as long as 20 hours.”  This is useful in setting the business’s expectations around variability in the estimate so they don’t get too fixed on a single, implicit number when things don’t follow the “happy path”.

The “Actual” column is useful for the engineer to record the actual time it took them to complete the tasks or the number of hours consumed up to the date the template is revised for some status reporting reason.  This also paves the way for estimators to get better at more evidence based estimating/scheduling.

The “Complete” column is for the task executor to record if they indeed completed the task or if it  still needs work to finish.  Entering a “Y” means the task is complete.  Blank or anything else means the task still needs work to complete.

The “Estimate Remaining” is gray and thus calculated as the remaining hours of the “Estimate Average” minus the “Actual” if the “Complete” column doesn’t have a “Y”.  If the “Actual” turned out to be more than the “Estimate Average” then the column is left blank.

The “Notes” column (not shown here) is for any task notes that help the estimator or anyone reading the estimate to know some additional details about task that is driving the estimated hours.

The next section called “2. Development of Functional Unit 1” represents a block of work that roles up to some business identifiable chunk of work.  Feel free to change the section name to reflect something all project stakeholders would understand.  These sections are designed to be the negotiable features that can serve as that business conversation to determine what are the exact features needed and the corresponding work durations.  Feel free to cut/paste as many new sections as is representative of the work break down requested.

In the above example from the template, Sample Task 2.1 represents a task that was estimated to be 7.5 hours but actually took 4.5 and is compete.

Sample Task 2.2 represents a task with 18.00 hours completed so far, thus calculated is the remaining 5 hours from the 23 hour estimate.

The “Sub Total” represents the min/max of the total work effort (142.25 and 181.25 in this example, which is a full ~40 hour delta) for an average of 161.75 hours of which 65 have been completed and 101 hours remain against the average.  Thus, the business expectations can be set relative to the ~40 hour swing to help the planning for best and worst case delivery.

Below the “Sub Total” section are tasks that are relative to the overall IT work:

In this section, any standard deliverables or work associated with the doing the actual development work can be captured.  In this example section, I added “unit testing” which I calculated as a percentage of the sub total hours of development work above.  The percentage is pulled from the “Calculations” tab.  In this case, I am calculating that unit testing takes 80% of the hours estimated towards development.  You can add/remove entries or adjust calculations on the “Calculations” tab to capture the hours needed to deliver a quality solution that so many developers forget to include in their hard core development work estimating.

The two documentation entries represent either a fixed amount of time, in this case 10 hours, or hours that are a percentage of the total development work, in this case, 20% of the total hours.  Feel free to add and subtract items that come up regularly to make the overall estimate more complete.  Need production turn over documentation?  Add an entry here.  Need some change control document to push a solution into the next environment?  Add an entry here.  Over time, this section will settle into capturing the work that is regularly needed in every project but is easy to forget to estimate for each time.

Lastly, the final section includes the total hours for all the work which is especially useful in answering that initial question: “How long is it going to take to enable that new FlimFlam engage-the-client-better module?”  One additional element that helps beyond the “how many hours” is the “Total Work Days” calculation which is based on a more realistic number of productive hours per day plus any reduction in time to cover other assignments that aren’t specific to project work such as researching a new technology or working on a special assignment of some kind.  The calculations are in the “Calculations” tab.  In this example, the productive work day is 6 hours (not 8 as some might consider) and 80% of those 6 hours should go to projects such as this one.  Hence the “Total Work Days” is greater than simply 348.50 divided by 40.  Again, feel free to adjust these calculations to aide in matching what your resources truly can dedicate to a particular project.  Want to show the “cost” of assigning two concurrent projects to a single resource?  Drop the hours to 3 and add another 20% to cover the cost of “context switching” and see how your estimates come to reflect reality a bit more as an example.

Additional Value

Once established as the “baseline” estimate, as new requests/changes come in, add them to the previous estimation sheet, change the date and quickly be able to predict when the overall solution will now be delivered.  Get ready for another “what is pushing out the date?” discussion armed with your estimating data.

Once established on multiple projects in the “portfolio”, now you can hold these up against competing resources to show “if project X goes first, when would I get project Y next?”  This is extremely handy if your resources effectively cost “zero dollars” in a non-charge back type corporate IT model.

Please give this template a try and let me know feedback on how effective it is with technical resources as well as a conversation tool with project sponsors.

For additional practical estimation articles, consider these I’ve written in the past:

Also, for an excellent article on using a similar technique on prioritizing multiple projects, consider this great post by Peter Kretzman, “The Practical CIO: Difficulties in project prioritization & selection, part 2“.

For additional caution in getting too carried away with the accuracy of your estimate, consider Todd Williams’s article “Good Estimates Only Have a 50% Chance of Being Made”.

, , , , , ,

Related posts:

  1. More Pitfalls of Work Estimation – Part 1
  2. The Art of IT Work Estimation
  3. Agile versus Classic IT Budgeting

Is everyone at the same level of competency?

Is everyone at the same level of competency?

As I was trolling through my RSS feeds of blogs I try and keep current on and I ran across another thought provoking post by Todd (@backfromred) entitled “The Failure in Gating Process”. I wrote a brief comment to share with Todd and his readers. In the comment I stated that I both agreed and disagreed with Todd’s premise. This blog article expands on that conflict.

Todd’s premise is that having PMO enforced project gates:

  • Interrupt project progress momentum
  • Allow management and sponsorship stakeholders an excuse to zone till “quality gate #4 next month” before paying attention to the project
  • Efficiencies lost to stop, starting and context switching

It is difficult to argue that gating process or projects does have a “cost” associated with the stoppage.

Todd’s solution is better stakeholder engagement.

Again, I can’t disagree with his solution in theory. Where is disagree is in the practical implementation of “better stakeholder engagement” in the real world. First, project teams need strong, competent and knowledgeable participation from these disciplines:

  • Project and Executive Sponsorship and Prioritization
  • Project and Program Management
  • Enterprise and Application Architectural Alignment
  • Requirements Gathering, Documenting and Tracking
  • Business/Systems Analysis
  • Vendor Management (if 3rd party providers involved in solution delivery)
  • Technical Development/Integration Leadership/Delivery
  • Testing (from unit through final business acceptance)

Simply having resources “engaged” from these disciplines does equal success if the engaged resources lack competency in their discipline and/or knowledge in their discipline to bring into the project fold.

How practical is it to have strong, competent and knowledgeable resources on every IT project in your portfolio?

I argue it is rare and even if all those resources exist in the same organization, they are assigned to the highest priority and visible project(s) alone. This leaves mid to low priority projects with a less than optimal resource mix.

How do you get timely visibility into when a less than optimal resource mix staffed project team is trending off the optimal trajectory?

You need a project gating system applied to specific projects to strategically pause, evaluate and course correct those specific projects.

I chose the previous statement’s phrasing carefully and specifically to avoid claiming that having a default project gating system as a “silver bullet”1 is the answer. Again, I’m not arguing against Todd’s claim that better stakeholder engagement is needed for successful IT projects. I’m saying the practical application in the real world still requires a project quality gating system due to the inability to have strong, competent and knowledgeable resource.

I have some thoughts on attributes of a more efficient project quality gating system, but I think I’ll need another article to collect and share these thoughts bouncing around in my head.

1. Top notch blog publisher Peter Kretzman on his blog “CTO/CIO Perspectives” has an excellent article on IT “silver bullets” and how to get early indications of projects in jeopardy.

, , , , , , ,

Related posts:

  1. Agile versus Classic IT Budgeting

Organization Silos Impeding your Enterprise Architecture?

Organization Silos Impeding your Enterprise Architecture?

There are countless sources extolling the benefits of a strong enterprise architecture strategy. The experts all agree on an effective enterprise architecture and even more so the larger the organization’s consumption of IT services. But recently, I’ve been reminded that the IT organizational structure, especially the structure aligned to providing new projects and new technical solutions has a dramatic impact on the ability for an organization to realize the benefits of established enterprise architecture.

In short, the more the IT new project and solution delivery organization is directly aligned with the individual business unit or group it supports, the more likely a spaghetti architecture will be the result. To help outline this point, below is a quick graphic of a sample organization chart that shows the two extremes:

Extreme A = Business group/unit functionally aligned:

Blog - Organizational Structure and Enterprise Architecture A

Extreme B = Common function/service aligned:

Blog - Organizational Structure and Enterprise Architecture B

The “extreme A” example is functioning with each vertical group acting as silo. Each silo is held accountable for delivering solutions to their business unit. In turn, each business unit will drive their IT silo to meet their needs exclusively. There is no inherent need to collaborate with their peers supporting other business units even if there are “enterprise architecture goals”. In my observation, collaboration might actually be viewed as a distraction to getting work done. There is essentially no organizational driver to force standard solutions and re-use of technology assets.

Sure, an architect in one group might have a strong personal relationship with an architect in another group, but unless they are trying to share ideas that happen to be at relatively the same level of “maturity” for each silo’s needs, they will unlikely be able to produce a common, re-useable technology used across both silos. Architect A attempts to work out a common solution with Architect B but suddenly Architect B’s project gets accelerated. Suddenly Architect B has to quick assemble a slimmed down solution that can’t be dependent on Architect A’s requirements or time-line. And sure, everyone might meet and agree to “retrofit” Architect B’s project with a more standardized solution with Architect A’s project in a later phase/iteration, but it would take an amazing level of cross-silo organizational governance to make sure that happens. If that standardization would in any way delay Architect A’s silo from delivering from the business, the standardization most likely gets pushed far and far out until no one remembers or even exists that remembers why the architecture alignment was needed in the first place.

Other potentially non-technology specific negative byproducts can evolve from this structure:

  • Strong versus weak silos

One silo maybe more effective at deploying more current technology by the very nature of the business unit’s needs. As an example, consider a customer product or engineer unit compared to say, finance or one that uses in-house developed technology versus another with outsourced/SaaS technology. This creates a problem of talented architects and developers/others actively seek out roles within the perceived “strong silo” creating an even stronger silo compared to the other silos.

  • Overall increased IT cost of ownership

As each silo is standing up technology to meet the business unit’s needs, they are all solving basically a significant number of the same problems with different solutions. Those solutions come with reoccurring maintenance, product end of life and all the traditional support and vendor management overheard. (I’ve written extensively about vendor management in past articles.) Then, to drive up costs even further, as some business need to see a “single view of the customer” or some other cross silo business challenge pops up, the cost to map all the data across the disparate systems is exceedingly high. In addition, not only does the mapping need to occur, but the technical (and probably some business) workflow needs to be altered to continue to keep all the data in sync. Someone suggests we need “Master Data Management” and now you have the cost and complexity of implementing a system to manage the data across all your systems.

The “extreme B” example brings the solution needs from each separate business unit/group into a common functionally aligned project delivery discipline. The goal is to leverage best practices/success stories from working with one business unit/group across to the other business units/groups via the common management structure by discipline. More specifically, the goals and objectives of each IT service discipline or function can be aligned to efficiencies and re-use within the specific discipline. Project Managers ensure they have common mechanisms to track and report on the project process. Business Analysts make sure they have a common framework for capturing requirements. Architects, develop unique frameworks for common requirements across all projects. Developers follow a consistent coding standard and reuse objects for data access, error reporting, and instrumentation, etc. The same goes for the other disciplines. Each discipline can focus on efficiencies aligned within their area of responsibility. And since each discipline has a somewhat singular work input and output for all business units, there is complete enterprise visibility to what is needed per each discipline. Thus, ultimately, the architecture team is looking across the entire enterprise at all the requirements and multi-year plans and can be charged with common frameworks for essential IT needs:

  • Authentication
  • Authorization
  • Entitlement Management
  • Business Rules/Workflow Management
  • Business Event Management
  • Reporting
  • Data Structure, Storage, Management, Retention, Archiving
  • Auditing and Compliance
  • Capacity Planning
  • Disaster Recovery

… and probably a bunch more that don’t immediately come to mind!

In conclusion, a strong enterprise architecture strategy gets a significant boost from a discipline aligned organizational structure rather than a business unit/group aligned structure.

Anyone have any thoughts to support and/or contradict my thoughts here?

, , , , , , , , , ,

Related posts:

  1. Does Agile reduce Application Over Architecture?
  2. Vendor Management – Part 5 – More on Who Owns the Relationship

Resource Thrashing

What do I work on now?

Have you ever stepped back and observed a [maybe yours?] MidWestern IT technical team and wondered why all the engineers seem legitimately busy, yet there doesn’t appear to be a proportional amount of production (or test, or QA) project changes and/or deployments?  Phones are ringing, emails are being sent, multiple instant message chat session windows are open, requirements and design documents are being revised and shared but environment changes aren’t being implemented.  Sure, your organization maybe large enough that duration from project kick-off till the first production deployment could be over a year and a half or more.  In a past life, I was responsible for a web customer product wide single sign-on system that once we completed a system upgrade of some sort, we had to immediately kick off a project to begin the next upgrade because the software and OS refresh cycles were averaging 1.5 years from start to finish.  Yet, have you ever considered that your resources might be getting thrashed with too many concurrent requests from too many sources without any easy mechanism to determine what to work on first, next and what can be put on hold?

I consider team member resource thrashing to be equivalent to an application server that is being overwhelmed with requests from many clients (such as web servers in a web tier).  If you have ever observed high volume systems, such as heavily used Internet web applications, there is a threshold of total requests at which the system can no longer service all the requests and disasters occur.  As the client request count rapidly approaches this threshold, the application servers continue to spin up more threads to assign to each request.  The closer to this threshold, the more CPU cycles are expended in starting, pausing and managing the threads themselves and not the work the threads need to do to actually complete the original work request itself.  The thread count climbs, the CPU cycles to manage that thread count climb and the whole system eventually falls over and the tough job of recovery under extreme load begins.

This thread thrashing is analogous to team resource thrashing.  There clearly is work for the team to be doing, but so many disparate requests are coming into the resources on the team that every team member is barely able to get someone off the phone with one request before an email arrives with additional requests.

Now, I can’t take full credit for this succinctly brilliant assessment of a common large corporate IT occurrence.  In a past professional life, my wise senior manager was astute enough to identify this depiction during a production issue.  The enterprise service my team was troubleshooting was causing performance issues across a number of highly visible customer web applications.  Many teams were demanding status, asking random technical questions and posing endless theories of the few resources on my team as to why the systems were behaving as they were.  We could barely get any real technical analysis completed as we had to appease this growing horde of interested bystanders otherwise risk being cast negatively as “non-partnering”.  The resource thrashing assessment as a barrier to what they ultimately wanted: answers caused the external groups to take some pause.  This allowed us to dig in, really figure out what is wrong and fix it.

I’ve written a series of articles on making it a priority to establish a single view of the work for you team in order to be able to do effective resource planning.  But in the process of determining the single view, also count up how many different work request sources exist.  You may be surprised to find in excess of eight, nine, ten or more ways work can be requested of a single resource.

So what is the big deal?  If someone is getting ten different sources of requests for work, how does that someone figure out what to work on and what can wait?  Most likely, the “squeaky wheel gets the most grease” adage takes over.  Additionally, many project failure assessments find that excessive multitasking can cause key deliverables to missed or delivered below quality.  (Third most frequent cause of IT project failure in a survey by Steve McConnell with construx.com)

What can you do to improve this situation?

Draw a picture

One of the first things you may want to seriously consider doing is to sit down with the individuals on your team and draw up a picture of all the work request sources.  Peer management maybe generally aware that this resource thrashing is going on, but present them with a picture of just how many sources and it makes a bigger impact.  By making the problem more visual, it sets the tone for why external groups requesting work aren’t getting the expedient service they are expecting.

Identify prioritization mechanisms

Another thing you can do is identify the most critical groups and start a dialog to establish a means of prioritizing work requests.  Is it the project management office that is feeling the most pain from not having a predictable resource model nor consistently met delivery timelines?  Get your single view of the work in front of them and the companion back log of requests and get them to help prioritize.  Setup a re-occurring meeting to bring revised views of the work requests for consistent re-prioritization and the winds of “this is hotter than any other project” blow regularly.  Is it direct product managers or business stakeholders that have to work directly with IT that are frustrated?  Do the same thing as with a project management office, just be prepared to have to cautiously identify other business groups that are claiming higher priority and be prepared for some grandstanding or other self importance postulating as the business groups with the most impact on the bottom eventually rise to the top of your list.

I would like to re-stress, it may seem like a lot of data gathering and schedule building, but I’ve never seen this type of team resource thrashing challenge get solved by any magic external group or process.  As an IT manager, you may have to roll up your sleeves and did into the scheduling data.  Finally, this isn’t a “fire and forget” exercise.  Be prepared for having to repeat this data true-up on a regular basis to continue to establish a more clear prioritized path forward.

Anyone have any other tips on how to get in front of a team of thrashed resources?

, , ,

No related posts.


Get too tactical and get set on fire

Get too tactical and get set on fire

I read Neil’s “How to do Nothing” blog article on the topic of effective management back when I first discovered his blog earlier this year (2010).  I was struck by the pure simplicity of his eight attributes of effective team management at the time.  Neil was gracious enough to comment on a recent article I published on agile project management reducing application over architecture.  I went back and re-read his article and realized it is even more impactful on successful team management than I originally thought.  I thought I would take a few paragraphs to further extend the concepts Neil outlined in this article.

Anticipate rather than react

Neil suggests having your hair on proverbial fire frequently by getting hands-on in addressing issues isn’t the most affective approach.  I agree with him that in these hair-on-fire situations, it can be exhilarating to roll-up your sleeves and jump in and be the hero that saves the project or restores production service, etc.  Once a sense of calm returns after the hero has saved the day, the hero starts itching for the next crisis to be the savior and the behavior gets repeated.  The hero may get initial fame and glory, but it is short lived.  In my experience in multiple MidWestern companies, the hero ultimately gets revealed for his/her display of reactionary management.  Senior management begins to grow tired of the peaks and valleys of crisis, pending doom, disaster avoided, pause and repeat for the next crisis.  I’ve also seen the hero scratch their heads when the next re-organization comes along and the hero finds him/herself as a technical lead over a team with a new management layer above them.

Maintain relationships outside the team

I definitely agree with Neil here.  I would even add the larger the organization, the more this is essential.  As your team provides a service to a larger project or effort, the number of ways other teams can throw a wrench in the works is almost immeasurable.  In addition, the message that gets filtered to your team and then to you may be completely disconnected from the real blockage.  Being able to pick up the phone and call a peer manager, with whom you have established a rapport, and get right to the real issue is invaluable.  Once the real issue is known, you can offer guidance to your team on how to navigate to a successful path forward.  Without this knowledge gleaned from a peer manager, you can easily get caught up in the panic the blockage creates and risk being set on fire in the process.

Big visible task boards

In a word: absolutely!  By all means, assuming everyone knows what to do is a recipe for disaster.  A disaster in a sense that tasks won’t get executed according to plan and you will be dragged back into strategizing on how to go forward while accounting for the missteps of the past.  As Neil says, use a whiteboard or in my case, use some electronic task tracking system that isn’t overly cumbersome yet makes it unbelievably clear who is working on what, doing what and when.  Have you ever considered using your bug or defect tracking system as a light weight task tracking mechanism?

Team collaboration

I’ve written on this topic in the past relative to the leveraging of self-organizing, strong teams with a focus on intense collaboration.  If you make yourself a keeper of all knowledge, you will constantly be engaged to assist with tactical decision-ing and thus at risk of again, being set on fire.  If you drive team members asking you questions back to fellow team members, they will start going to fellow team members directly.  Clearly, once this occurs, you will need to be contingent of when you will need to specifically instruct your team to collaborate with you when you are engaged in some issue that you aren’t at a point of delegating just yet.

Small incremental changes

If you are considering pitching or implementing a significant change that you have dreamed up that will get everyone working better, more efficient and at the same time cure cancer, you might want to put on the brakes.  People accept change in a variety of ways.  The more aggressive the change from the current norm the more likely you will have to invest additional time in addressing how each impacted individual reacts to the change.  If you stretch your change implementation out over time so it seems like you are providing “just in time” solutions to the ultimate problem, you will most likely achieve a more successful end result all around.  Plus, with each incremental small change, you can more easily course correct or tweak your next change to be even more effective and hopefully, even less visible.

Inspect and adapt

Getting the team to contribute ideas and suggest process improvements takes the decision-ing pressure off of you and empowers the people doing the work and most impacted by the process to suggest the most effective improvements.  Plus, if the team is on board with an improvement, they are more than likely to make the implementation successful, because it is their improvement.  On the flip side, how motivated are you if you have the ultimate “my boss told me to do it, thus if it doesn’t work, not mu problem” excuse at the ready when the slightest problem is encountered?  What motivation does anyone have in that situation to try and make it work?

Hire great people

These almost goes without saying … hire great people and then just get out of their way so they can do good work.

Commit to personal development

If don’t have the flexibility to move poor performers out and attract in top performers, and let’s face it, MidWestern companies don’t always have the most flexible staffing models nor the local talent pool to quickly make team changes happen.  Bringing up the level of your current team is more than likely your best option.  Take advantage of any tuition or training bugets available to strongly encourage folks to get away from their desks and get exposed to some additional improvement perspectives.  Consider incremental “stretch” assignments to use as opportunitties to challenge those that have a weakness in a certain skill set.  Catch up with them “offline” and chat with them about what problems or challenges they are facing and offer some tips to help them get over those hurrdles.

Summary

In summary, the concept that a strong manager is precieved to be “doing nothing” is a compelling goal for any team manager to attain.  By essentially making it a top priority for you to empower your team to function as autonomously as possible, it allows you to focus on inter-company relationship building and other functions only you can do.  It also allows you to identifying process and skill set weaknesses without being tactically involved in the work to consider small, incremental improvements and then absorb the result.

, , , , , ,

No related posts.


Don't give up on the Gantt!

Don't give up on the Gantt!

I’ve written before that I am not an “Agile” nor “agile” development nor project management expert. I’ve previously proposed that one by product of “agile” development and project management in general is a reduction in over architect-ed software solutions. With project requirements being represented as stories and tools such as Kanban boards for lean software development to show the flow of work through a process, one might think the classic waterfall project management work and schedule reporting tool, the Gantt chart, is obsolete. Before you abandon this tried and true project schedule reporting solution for the more transparent status inherent in agile project management, you may want to keep reading.

So, you have succeeded in transitioning your IT project management and delivery methodology to one that is more aligned with “agile” than “waterfall”. Your non-IT stakeholders are more engaged than ever in the project requirements definition, prioritization and sprint/release scheduling process. You are tempted to stop trying to use MS Project or other tools to represent schedules in Gantt form. In a word: “don’t”.

One of the main criticisms of complete agile project management is: “The dangers are the loss of recognition that systems/solutions change continually over time as well as team members” Put in more direct, bullet point form from my experience of this criticism expressed by product or management stakeholders:

  • What is the big picture?

  • I have the big picture in mind, when am I going to get X?

  • At the current burn rate, how much time will be invested before I get Y?

  • If I add/subtract resources, what will be the impact on the big picture?

These are all criticism subsets that can be directly addressed with the data collection associated with producing a classic Gantt chart. So don’t throw away those Gantt chart creation disciplines just yet.

What is the big picture? I have the big picture in mind, when am I going to get X?

Let the sprint/release iterations continue, but don’t let too much time go by past the releases to meet with the product stakeholders and update your rendition of the product road-map. (You have your road-map, right? You aren’t letting the business surprise you with all requests, right?) This is a great opportunity to get an early indicator if product stakeholders have become caught up in their immediate needs and lost sight of the “cost” of those features. By “cost” I mean the investment in features now means pushing out the product road map. You can gently remind them how the “cost” appears graphically in a Gantt. The Gantt view of their product can clearly show “milestone 45” getting pushed into the next quarter due to their recent feature bonanza. It is much easier to have the “hey, I thought I was getting milestone 45 this quarter” discussion as soon as the schedule shows initial signs of slippage rather then at the start of the next quarter.

At the current burn rate, how much time will be invested before I get Y? If I add/subtract resources, what will be the impact on the big picture?

Here again is where your mastery of tools such as MS Project and the Gantt chart view of the product road map are exceedingly important. If you are working for a MidWestern company, rather than say, a start-up, you can’t ignore traditional budget and resource management constraints. As a start-up in growth mode, your focus is getting your product out the door with the resources you have or your resources plus the on boarding of additional resources. In a MidWestern company, you are most likely trying to maximize the resources you have or being asked to reduce your head count while still meeting project expectations. Thus, prioritizing features to be delivered against a shrinking resource pool is a given. You need something beyond the agile sprints/releases under way to manage project stakeholder expectations on what can be accomplished when.

The Gantt view of the work allows for a graphical view of “if this is more important, what else is impacted” realities. Additionally, add in the need to manage a fixed pool of resources across multiple agile projects and you absolutely have to have some way to represent a view of the prioritized work across all resources. In addition, you may need additional tools to help represent the “what if our team member Sally gets assigned to the VP’s ‘special project’, what does that do to our resource model and what can get done when?”

Conclusion

As an IT manager or IT project lead in a MidWestern company that is moving towards more agile project management and technology delivery, you may be tempted to relax the project management disciplines that come with the more traditional waterfall approach, specifically tracking project schedules in a Gantt chart. In order to avoid the inevitable product stakeholder expectation mismatches as well as pending budget cutbacks and/or uncontrollable resource re-allocations, keep collecting that “program management” data. Continue to have re-occurring meetings to review the “big picture” Gantt chart view of the work and use other tools to reflect “what if” scenarios and their impact on the big picture.

, , , , , ,

Related posts:

  1. Does Agile reduce Application Over Architecture?

Over Architected?

Over Architected?

First off, I am by no means an “Agile” (nor even “agile” with a small “a”) software development expert.  There are many more individuals out there that are way more experienced and in a much better position to speak authoritatively about the Agile methodology and its associated principles on driving efficient and effective software development.  A few blogs where I consistently find great Agile perspectives are included at the end of this article.  But as I’ve participated in Agile and agile-like development efforts over the years, I’m finding an interesting pattern.  Agile and agile-like approaches have a positive by product of reducing the occurrence of over architect-ed software solutions. Over architect-ed solutions put stress on the delivery of a software application project as well as drive up the cost of software development and maintenance, in general, disproportionately to the business value produced.

As an example, a sample development effort starts out with:

Product: “We need a super widget in the product by next release, can we have that?”

Project: “We are going to need detailed requirements for the super widget in order to start developing it.”

Product: “Oh, it needs to be able to interface with the dry cleaners to know when it is time to pick up the laundry as well as make coffee for the customer before they get out of bed in the morning.  Basically the same features as our competitor’s recent release, but with these additional benefits. <Or some other description that is actionable at a high level, but lacks the detailed requirements needed to feed a development team with actionable development tasks>”

Development: “Ok, we’ll get started since we don’t have much time before the code freeze for the next release.”

… Time goes by …

A project status meeting occurs sometime in the future.

Product: “So, where are we with that super widget?”

Development: “We have the basic framework setup but it isn’t going to have all the features working by the next release.”

Product: “But I thought …”

Development: “But you said …”

Project: “What happened?  How come we are X days from the code freeze and we don’t have a viable solution?  I thought …”

Now sure, this isn’t the most perfect example of capturing how the requirements drift between stakeholders leads into an over architect-ed solution, but hopefully you get the scenario.  Or possibly another example would be when a solution is developed and released into the production environment.  Extending the example above, weeks later, when enhancements, tweaks or feature extensions are requested, a tense conversation occurs around:

Product: “But I thought the super widget would do X?  How come I am hearing it will take 30 hours of development to get X?  The testing cycle is already elongated due to the complexity of the super widget thus I thought it included X?”

Development: “But we said that the framework would support X, but we never said X would actually function without more development!  We developed the super widget to do W, Y and Z but only stubbed out X.”

Project: “But according to the requirements, it says X should be …”

And yes, an argument can be made that if:

  • more effective requirements gathering occurred
  • more effective project management captures more depth of what would be developed and available when
  • more effective product management defines a more exhaustive product feature road map that more clearly outlines what would be available and not available when, feature-wise

… These problems wouldn’t have occurred.  But the nature of an agile-like approach puts a tighter focus on all the stakeholders:

Product: They can share the “overall vision” of what they ultimately desire the product to do, but they are forced to consider what they really need within the shorter duration of the agile-like release schedule.  Thus, product walks away with a clearer picture of what they are really getting in the next release.

Development: They get the benefit of the product’s “overall vision”, yet, they get to quickly dive into the critical features and start the dialog of how long different feature components will take to develop.  Thus, development knows exactly what they need to do now for the next release, yet they benefit from knowing where this product feature is going in the future.

Project: As long as they keep the product and development stakeholders talking about granularly defining what needs to really be built by when, the project management function has much greater clarity into what is going on and what details to track.

From what I am observing, all of the above create a stakeholder forum of information sharing that reduces the likelihood that an over architect-ed application will get developed. Most importantly, instead of leaving the feature set open and vague enough to allow a creative and motivated development team to start building and building and building only to re-surface with a highly complex solution to a loosely defined problem or need, it brings more cohesion between what is really needed first.  Once the “first” has been built, the “seconds” and “thirds” get built inline with the product roadmap.

In researching this theory, I wasn’t able to find any articles that linked agile-like development efforts to a direct reduction in software over architect-ing.  This article entitled “Agile Architecture: Strategies for Scaling Agile Development” had some interesting content on baking architecture into an agile-like effort.

Anyone else have any direct experience in agile-like compared to waterfall-like development efforts yielding less application over architecture?  Can anyone share any links to good web articles on this topic?

Agile related blogs I follow:

David’s Software Development Survival Guide

http://softwaresurvival.blogspot.com/

NOOP.NL

http://www.noop.nl/

Software Project Management

http://blog.brodzinski.com/

fragile

http://fragile.org.uk/

Regular Geek

http://regulargeek.com/

Critical Results

http://criticalresults.com/

, , , , , ,

No related posts.


Know when to call in some help during an outage

Know when to call in some help during an outage

Anyone that works in the Information Technology field knows that production technology systems, from time to time, will have problems. From a functional defect that has everyone scratching their heads as to how it wasn’t discovered by seemingly endless rounds of QA to full blown hardware failures that take down entire suites of applications, no matter how much is invested in “highly available” and “redundant” technologies, failures are bound to occur. For IT Managers and IT Engineers, how one handles these failures from inception through service restoration and finally root cause analysis is critical. Sure, the priority is to restore full service availability as soon as possible. But, if you neglect some key technical support quality attributes in the process, which I’ll highlight in this series of articles, you may find you both succeeded and failed in restoring service at the same time. Succeeded and failed at the same time you wonder? Please read on and I will attempt to shed some light on this success with failure construct and considerations on how to avoid the failure “pitfalls”.

Pitfall = Challenges in an Extended Outage

So, you’ve bought into the need to be response based on a previous article touting the benefits to you (being viewed as a leader and raise and bonus positives) and your organization (calmly restore production IT services to normal working order). You’ve communicated in a personal style with incremental positive facts and indicated at what timing points you will be updating the stakeholders on your progress as indicated in the previous article. If the problem can be easily identified and corrected quickly including a rather direct way to explain why it happened, pat yourself on the back for a job well done. Now get ready for the after math of re-explaining what happened a hand full of times over and possibly participate in some post issue shoring up of the technology (see root cause analysis considerations posted here previously). But what happens when the status reporting is going on longer and longer and you can tell that the natives are getting restless as they are starting to grow concerned at the length of the outage and at the lack of a clear “it will be fixed in 5 minutes” status report? When an outage becomes an extended outage, time to ratchet up the communication plan and bring in some help.

Problem isn’t Obviously Fixable in Short Order? Get Help

Most likely, as time is going by, more people are aware of the outage and thus the list of stakeholders is growing larger. Also, the likelihood those stakeholders are senior technical people offering to give you a hand is slim and none … and slim left town as the saying goes. I would venture to say that the stakeholders are a growing list of non-technical people that are impacted in some way by the production situation continuing to be a problem. More and more managers on the operations and product side of the service are getting engaged as possible customer complaints are mounting or call center call volumes are reaching levels of concern. There maybe more people engaged to discuss what to do if the outage continues and an alternative, possibly more manual means is needed to meet customer SLAs. By the way, manual usually means more work done by people, hence more people getting engaged to see if they have to bring in even more people to ensure the alternative service delivery option has the right, skilled and trained staff. Company marketing resources could be engaged to offer advice on how best to let customers know the service is having a greater than normal duration outage and what the company plans to do to service their needs. I am not trying to paint a picture of doom and gloom for the primarily technical audience for this article. I know the technical mind wants to have all the people just stop talking so the real work of fixing the technology can take place. But on the business side of the technology in trouble, there are company stakeholders and customers of some form or another that are materially impacted in some way by having the usually highly reliable technology fail to function correctly.

Thus, as time goes by, your incrementally positive but not “it’s fixed” communications aren’t enough to appease the masses. You are either going to have to spend more and more time explaining to new people joining the situation what happened when, what has been ruled out, what is next to investigate, etc. or risk becoming non-communicative in order get some focused time to fix things, thus putting all your hard work at risk as outlined in this previous article. It is time to ask for some help.

Hopefully you have already engaged your management to keep them apprised of the situation as suggested in this previous series of articles. Thus, you may already be getting asked if you need help because you have informed your management and thus they are starting to ask the “hey, you are doing a good job, but can we help?” type of questions.

Ask for and accept help

I can’t stress it enough: avoid the notion that the fix is “just around the corner and if I only spend 10 more minutes researching …”. Ask for and accept help. To start, get someone engaged to be the status communicator so you have less distractions and more time to dig into the problem. The status communicator needs to have level of competence in the following skill areas:

  1. Enough of a technical background to take technical status bits from you and quickly understand what you are saying without a 5 hour white-board deep dive session.

  2. Ability to communicate in “business speak” not “techno-speak”.

  3. Enough understanding of the players involved organizational chart-wise to know how and when to communicate with stakeholders and when to recognize the VP of Product is looking for status and it is time to get your VP peer manager involved.

Your manager is in the best position to act in this capacity if they aren’t already doing so. As managers, you stand to lose huge management credibility and leadership points of you just sit on the sidelines and hope the problem goes away or you are somehow hoping for plausible deny-ability to relieve you of your responsibility in this situation. Roll up your sleeves and get engaged. Start sharing what is going on in a polite but authoritative tone to build confidence and most importantly, buy more time for your engineers to dig in and figure out what is going wrong and fix it.  This previous series of articles offers additional tips.

In summary, as the outage is dragging on, be mindful that not everyone involved has the priority of discovering the coveted technical root cause. For engineers, as an extended outage is building, don’t keep trying to take on the rolls of technical investigator and communications expert. Get help. Managers, get involved and start shielding your engineers from the constant barrage of status requests and allow them more focused attention on digging in and finding out what is really going on and get it fixed.

We’ve extended the need for responsiveness to reports of production support problems to include an initial take on the art of creating an effective status communication approach as well as when to admit your need help and get your manager and/or team lead involved directly. Look for additional articles to identify more technical support pitfalls and steps to take to avoid them.

, , , , , , , ,

Related posts:

  1. Pitfalls of IT Technical Support and How to Avoid Them – Providing Status
  2. Pitfalls of IT Technical Support and How to Avoid Them – Responsiveness

Respond and forget, right?

Respond and forget, right?

Anyone that works in the Information Technology field knows that production technology systems, from time to time, will have problems.  From a functional defect that has everyone scratching their heads as to how it wasn’t discovered by seemingly endless rounds of QA to full blown hardware failures that take down entire suites of applications, no matter how much is invested in “highly available” and “redundant” technologies, failures are bound to occur.  For IT Managers and IT Engineers, how one handles these failures from inception through service restoration and finally root cause analysis is critical.  Sure, the priority is to restore full service availability as soon as possible.  But, if you neglect some key technical support quality attributes in the process, which I’ll highlight in this series of articles, you may find you both succeeded and failed in restoring service at the same time.  Succeeded and failed at the same time you wonder?  Please read on and I will attempt to shed some light on this success with failure construct and considerations on how to avoid the failure “pitfalls”.

Pitfall = Providing Status

So, you’ve bought into the need to be responsive based on the previous article touting the benefits to you (being viewed as a leader and raise and bonus positives which are always good) and to your organization (calmly restoring production IT services to normal working order).  So, all you have to do is “respond” by sending an email right away, jumping on a conference line quickly or changing a status in a production trouble ticket tracking system promptly and you are done, right?  You can now disappear into the depths of your logs files and your performance counters and your packet traces only to resurface when you have found the real cause of the problem, right?  Never under estimate the extent to which people, lacking timely information people, will panic.

To help illustrate, we can extend the example from the responsiveness article of needing a plumber to call you back quickly to address the hot water heater that is pouring water all over your basement floor and not delivering any hot water to any faucet in your house.  Consider that a plumber does call you back promptly to indicate they are able to start looking into your leaking hot water tank right away.  But after that responsive call back, time keeps ticking by without any indication if your tank can be fixed or needs to be replaced or is about to explode and flood your basement in the process.

Note: Yes, you can walk down into the basement to physically see the plumber’s progress or lack there of, but pretend you can’t easily do that to allow this extended plumbing example to help frame the context for this article.  Let’s say you left your home for work right after you confirmed the plumber was engaged to fix your problem.

So, without any further status from the plumber besides his or her initial: “Yes, I look into your hot water tank problem right away”, how do you know what is going on?  The plumber could be minutes away from turning off the water main to stop the river forming in your basement followed quickly by unloading a delivery truck approaching your house with a brand new hot water heater or sitting down on the couch to catch a baseball game on TV completely ignoring your water dilemma.  Thus, how do you know what is going on?  You don’t, unless you are physically watching the plumber’s every move or the plumber is providing frequent status as to what is going on with your hot water crisis.

Frequently provide status

So, how does one keep the panic to a minimum once initially responding to the production issue?  Reduce panic by frequently communicating status of what is going on in the troubleshooting process.  This sounds simple enough, just keep everyone informed:

  • “I just VPNed into the network”
  • “I am pulling up a terminal session with the server now.”
  • “I am typing my user name.”
  • “I am typing my password.”
  • “Ooops, wrong password, trying again.”
  • “I am now at a command shell …”

Obviously, that is going too far into the over communication side of the status equation.  What you are trying to find is the artful balance in the level of detail and frequency to share status.  As in all things technological, there is no silver bullet, no industry established check list and no “do this and it will work for every situation written on a stone tablet somewhere to implement with guaranteed success.  One has to put some energy into looking for clues as to what is going to work best in the given situation and then constantly monitor the results of the your communication approach to tweak as necessary.

But this sure seems like a lot of work that doesn’t get directly at fixing the true technical problem?  Correct.  As I mentioned previously, you can dedicate all your efforts to fixing the problem as quickly as possible, but be prepared for the consequences of various negative backlashes surrounding non-technical and peer management’s frustration of being left in the dark for who knows how long starting from problem occurrence and ending at problem resolution.  Plus, you can safely anticipate the root cause analysis aftermath being painful and extended due to this lack of communication frustration you have helped create.  Thus, I am arguing the time invested up front in an effective communication approach will pay large dividends in avoiding post service restoration negativity and an elongated investment in root cause analysis malaise.

Art of an Effective Status Communication Approach

So how does one determine a successful status communication approach?  First, suspend your technical or engineering brain that puts speedy problem resolution as the highest priority in any production outage situation.  Recall that once you put aside the technology, people are involved in the production outage.  Harkin back to the plumbing crisis example above, if you are at work wondering how much your water bill is going to be as your basement floods, what would be your reaction to getting call or a note from your plumbing saying:

“Hey, this is Bob the plumber, just wanted to let you know I stopped the geyser erupting in your basement.  A replacement water tank is on a delivery truck and should be arriving at your house within the hour.  I’ll let you know when it gets here and what the next steps are in about an hour or so.”

Imagine the feeling of relief at getting such an update at work.  Now, carry those feelings of relief over to the other people involved in the production outage situation.  They are fretting over lost revenues or having to explain to their management what happened, why and what is going to be put in place so it never happens again with absolutely no clue at this moment on answers to any of those questions at the moment.

Can you make everyone relax and go about their day with a smile with a few simple sentences on what is going on?  Not a chance, but you can help keep the people involved more calm and less likely to break out in irrationality by providing indications of where you are in the troubleshooting process.

Consider this revision to the step by step over communication example from above:

“Everyone, this is Bob from systems support.  I was able to get online and successfully access the production server that is hosting the application that is involved in the production outage.  This is a good sign in that we able to start debugging immediately without any infrastructure barriers at this point.  I will now start investigating the error logs that should give some further technical direction on what is going on.  I will let everyone know what I discover in 15 minutes from now.”

Similar to the status update from your plumber, there are key elements in this status message that address the human side of the outage:

  1. Saying your name

Saying your name seems over simplistic, but giving your name instead of hiding behind the anonymity of an artificial company group such as “systems support” makes a small but important personal connection to all of the people involved that possess likelihood to panic at a moments notice.  This is similar logic as to why people prefer talking to a human rather than interacting with an automated “push or say 1 and then entering your 45 digit account number” system when calling to resolve an incorrect cell phone, gas or electric bill.

  1. Providing legitimate positive news, even if it is somewhat insignificant to correcting the real problem

Again, seems simplistic, but by indicating you were able to get online and get into some level of technology to begin troubleshooting, it helps to give additional confidence to the non-technical individuals participating in the outage that some potential barriers to real problem resolution have been crossed.  Look for opportunities to share facts that narrow the problem down, even if they only narrow the problem down ever so slightly.  The increased feeling of progress that the elements of narrowing down the problem create help to continue to enforce feelings of increasing control over a seemingly out of control situation to the non-technical people involved.  Again, you are looking for balance.  “I successfully typed my password” does no invoke that much confidence.  Thus look for real progress facts that can be shared that focus on narrowing the problem scope rather than just facts for the sake of facts.  Lastly, I chose the word “facts” specifically.  Make sure you communicate facts and not speculation at this early problem engagement level.  I’ll cover some suggestions on how to share speculation in another article.

  1. Indicate when the next status communication will occur

Giving people an indication of when they can anticipate an update on what is going on or what you are doing provides two significant benefits.  The first is it allows everyone participating in the outage who is not directly involved in restoring service the ability to relax just a bit and prepare for the when they need to be engaged next.  They know there is nothing tactically they can really do to solve the immediate problem.  They know they are effectively 100% dependant on technical resources to do the real work of finding the problem and fixing it.  They desperately want to hear: “the problem is X and I’ve fixed it.”  But since you nor anyone else is at that point in the troubleshooting process, a time in the not too distant future where such a phrase might be uttered is the next best thing.

The second is it gives you much needed breathing room.  Instead of hearing “Is it fixed yet? How about now?  Now?  Maybe now?” every couple of minutes, you’ve clearly set the expectation that you need some uninterrupted time to do some digging in order to provide anything valuable as far as investigative analysis.  Thus, you now have some time to completely disengage from the noise associated with the problem and roll up your sleeves and immerse yourself in performance and log data to try and figure out what is going wrong with the technology.

Communicating Status – Approach in Summary

  1. Use your name and thus communicate in a more personal tone to increase confidence in non-technical participants … avoiding the opposite completely impersonal tone of “tech resource number 12”
  2. Provide positive news to further increase confidence and reduce the panic building in others with facts (not opinions), even if those facts are small troubleshooting milestones and not grandiose “ah ha!” findings.  Make sure to balance the too small “I pressed enter and …” type facts.
  3. Indicate you need time to dig deeper and set the timing expectations of when others can await the next element of status from you to buy uninterrupted investigation time and allow others to put off panicking for a period of time.

We’ve extended the need for responsiveness to reports of production support problems to include an initial take on the art of creating an effective status communication approach.  Look for additional articles to identify more technical support pitfalls and steps to take to avoid them.

, , , , , , ,

Related posts:

  1. Pitfalls of IT Technical Support and How to Avoid Them – Responsiveness
  2. Vendor Management – Part 14 – Tech Support – Part 2 of 2

Respond ASAP!

Respond ASAP!

Anyone that works in the Information Technology field knows that production technology systems, from time to time, will have problems.  From a functional defect that has everyone scratching their heads as to how it wasn’t discovered by seemingly endless rounds of QA* to full blown hardware failures that take down entire suites of applications, no matter how much is invested in “highly available” and “redundant” technologies, failures are bound to occur.  For IT Managers and IT Engineers, how one handles these failures from inception through service restoration and finally root cause analysis is critical.  Sure, the priority is to restore full service availability as soon as possible.  But, if you neglect some key technical support quality attributes in the process, which I’ll highlight in this series of articles, you may find you both succeeded and failed in restoring service at the same time.  Succeeded and failed at the same time you wonder?  Please read on and I will attempt to shed some light on this success with failure construct and considerations on how to avoid the failure “pitfalls”.

The First Pitfall = Responsiveness

The very first pitfall in providing any level of technical support to a production system is to fail to be responsive.  Imagine for a minute that you are a home owner and you don’t know the first thing about plumbing (and maybe you don’t know much about plumbing).  You turn on the facet expecting to feel some soothing hot water spray over your hands and yet, the water remains freezing cold.  You wait the typical number of seconds by when you would expect to at least sense the water turning from freezing cold to mildly tolerable; yet it isn’t happening.  You turn off the facet in disgust and trudge down to the basement to confront the source of pleasant, warm water: the hot water heater.  To your surprise, the hot water heater is where you expected it to be physically, but what you didn’t expect to discover is a steady stream of water pouring from underneath the unit and running a short length to your basement floor drain.  Since you don’t know the first thing about plumbing, your immediate and only thought is to call a plumber to come over as soon as possible.

Assume you have contact information for three plumbers in the area that you either have had satisfactory work completed prior or had reliable information from friends and neighbors that indicated they were prompt and reliable.  You hastily dial each plumber only to be greeted by pleasant yet unhelpful answering machines:

“… your call is very important to us.  Please leave a message and someone will be with you shortly <beep>”

You leave a hasty yet detailed message about the unnatural spring that has sprung forth from your hot water heater in the basement.  You provide a home phone, cell phone and your spouse/roommate/significant other’s contact information to ensure there are multiple ways your urgently needed plumber can contact you.

Minutes go by.

Hours go by.

Hours turn into days …

Okay, you get the picture.  What you and your lack of hot water disaster need is some initial response from a plumber.  Without any response of any kind, one can only plunge deeper and deeper into panic that you will forever by taking cold showers and personally draining the local fresh water lake via your leaking hot water heater.

I believe this analogy translates well to the notion of being responsive to production support issues.  Replace the panicked home owner with the leaking hot water heater and no response from any plumbers with an IT manager that is responsible for a technology service that is aware the technology is broke, but has no clue if/when it will be attended too by an engineer and you can image the panic the IT manager is feeling.

Thus, to not respond to a production support call or page promptly when the technology you are responsible for could be broken is liken to a previously reliable plumber never even returning your call about your broken hot water heater.

To Avoid the First Pitfall = Responsiveness = Respond in a Timely Fashion

Yes, this may seem ever so simple, but by responding to a call or page related to a production support issue in the expected manner will go an exceedingly long way to avoid the first pitfall and put the IT manager’s mind at ease.  Respond in the “expected manner”?  Yes, if the expectation is that you verbally answer a call or dial into a conference line or bridge to announce your availability or log into an automated support management system and perform some simple acknowledgement that you are aware your services are needed, and then do just that.  Nothing will be gained by sending an email to your manager when the clear expectation is you join a conference bridge line to be informed of the situation.  It may seem painful, irritating, not worth your energy, etc.  Allowing your immediate distaste for the production support situation that you are about to be drug into block you from “doing the right thing” well create the perception that you are not reliable and thus not a leader.  You don’t want those negative perceptions to be linked to your professional image, especially around raise and bonus time.  Plus, once you have been linked to those perceptions, it is going to take above and beyond effort for some time to reverse them.

Know the response expectations

As mentioned above, make sure you know what the response expectations are.  Make sure you have a clear understanding what the SLAs** are for the services for which you will be contacted.  If you have 30 minutes to respond, then make sure you make every effort to respond within that 30 minutes; sooner the better.  Are you supposed to respond via phone, email or join a conference line?  Make sure you are clear on what you are supposed to do.  Are you supposed to login to a problem management system and update the status on a support ticket?  Make sure you have confirmed you can do this remotely as well as in the office (assuming you are providing off hours support).  Know the customer SLAs for the service you are supporting.  If the service is available to customers 24/7 but the real customer service agreement is from 7am to 7pm, know that so if a call comes in before 7pm, be of the mindset that the system needs urgent attention until someone of authority indicates the problem can be handled the next day, not the other way around.  Are there priority levels assigned to the problems that get communicated out?  If so, make sure you know so that you are confident you can ignore a priority twelve problem till the next day, and so on.

Additionally, even though there are SLAs with different response times by priority, make every effort to understand what really constitutes a priority for the service rather than just arbitrary numbers.  Is there a particular “high value” customer or customer group that requires high touch service?  Is there a particular business function that is mission critical or if not completed successfully in a timely fashion, will create a rippling effect of additional problems within the support organization?  Develop a firm grasp of these unique support situations.  Even through they technically might not match the “priority 1 – entire system is down” criteria, they still are viewed by senior stakeholders as important to the business.  Hence, treating them as such will go along way to create the perception you care about your role, the company and have leadership potential.  Alternatively, think positive perception for raise and bonus time can’t be a bad thing.

Lastly, consider that response is just that: response.  Compare these two examples of responsiveness to the same problem:

Example A

Bob on his cell phone calling into the production situation conference line: “Hey, this Bob from FlimFlam support, I just got a text that there is a problem and to join this line.  How can I help?”

Voice on conference line: <Briefly explains that the production system is throwing error codes left and right and the system is essentially unusable>

Bob: “Hmm, I don’t know what could be causing that situation off the top of my head.  I am in the car and about 30 minutes away from being online to begin troubleshooting.  Is that problem?”

Example B

Bob, checking his cell phone for a text message to join a production situation conference line, thinks to himself: “I bet FlimFlam is throwing errors again.  I’ll get home, get online, see what might be going on and then join the conference line.”

In both examples, the time to resolve the production situation is probably the same.  I would actually argue that the time to restore service is probably quicker in option B than in A due to less communication and interaction time compared to hands on technical troubleshooting time.  But if it takes 60 minutes to restore service in example A compared to only 45 minutes in example B, the perception of the quality of technical support provided in example A is much higher than example B due to the higher level of communication and responsiveness involved in example A.  Back up to the leaking hot water heater example from the beginning of this article, that 30+ minute driving commute from receiving the text message to join the call to get engaged is similar to leaving a message for a plumber and not hearing back.  The perceived lack of responsiveness will work against any heroic technical feats of system restoration because those that don’t fully appreciate that you pulled off a systems miracle behind the scenes are only aware of the stress you caused them by not responding promptly to their communication needs first, technical second.

Look for additional articles to identify more technical support pitfalls and steps to take to avoid them.

* QA, Quality Assurance, the process or set of processes used to measure and assure the quality of a product.

*SLA, Service Level Agreement is a part of a service contract where the level of service is formally defined. In practice, the term SLA is sometimes used to refer to the contracted delivery time (of the service) or performance.

, , , , , ,

No related posts.