Showing posts with label turn around. Show all posts
Showing posts with label turn around. Show all posts

2007/06/03

Turnarounds

My previous post on 'Digging Out', a methodology built from experience in a number of turnarounds, can't stand alone without some justification:
What have I done to be credible?

Here's a sample taken from a version of my CV:
Completing large business critical projects on-time and on-spec. In complex political environments achieving all client outcomes without formal authority.
  • ABN online registration system - Business Entry Point, DEWRSB.
  • Y2K conversion - Goodman Fielders (Milling and Baking)
  • Y2K remediation, CDM
  • DFAT (ADCNET Release 3, ADCNET build/release system)
  • TNT
  • Unisys/Customs (EDI & Finest/Nomad)
  • CSIRO - all Australian daily weather database
  • Diskrom ($350k project income in a year)
ABN registrations:
The ATO paid for and ran the project. The software contractor was combative and unhelpful. The environment complex - around 6 different organisations for this one project, and another 10 projects. To get anything done required a huge amount of effort, negotiation and discussion.

The software contractor hadn't built any load monitoring and response time facilities into the system nor made any provision for capacity planning and performance analysis.

On my own recognisance, I designed and built some simple but sufficient tools, did the performance analysis - and accurately predicted the final peak load - 20 times the design load, and after diagnosing a catastrophic failure mode, designed a simple solution.

This 100% availability over 3 months was not accidental and directly contributed to 600,000 registrations of 3.3M being done on-line (around 15 times the estimate) and the site not attracting bad press like other aspects of the operation. Definitely a high-profile site.

The software contractor had to be dragged kicking and screaming all the way through the process. But I got the outcome the client needed - the site kept running with good response time on the busiest day. Some years later I analysed the performance data from the time and uncovered a few more nascent problems that we'd skirted around.

Goodman Fielders (Milling and Baking)
This was a Y2K project - they needed to migrate a legacy (Universe/Pick) application to a Y2K compliant platform and simultaneously upgrade the software and remerge all the 6 variants.

The application ran their business - ordering, tracking, accounting - the whole shooting match.
And for accounting systems, the Y2K deadline was the start of the financial year - 1-July-1999.

The work got done, I contributed two major items: deferring non-accounting conversion and moving the new single system to a properly managed facility.


DFAT (ADCNET Release 3, UNCLgate, ADCNET build/release system)
This was bizzare, I ended up sitting 20' from the desk I used when I worked on the system being replaced ('the IBM'), when it was being commissioned.

ADCNET failed and went to trial with the developer losing on all counts. It's worth reading the decision on 'austlii.edu.au' [Federal Court]. That account is certainly more detailed than the ANAO report. It was obvious in 1995 that the project could never deliver, let alone by the deadline. So I did my tasks and went my way.

To be called back again to degug and test an email gateway between the IBM and ADCNET (R2) for Unclassified messages. This was the first time I realised that being better than the incumbent staff in their own disciplines was 'a career limiting move'. Showing experienced, supposedly expert, programmers how to read basic Unix 'man' pages and act on them was a real lesson. A major problem that caused queued messages to be discarded was found and fixed by my testing - along with a bunch of the usual monitoring, administration and performance issues being solved.

I was called back again to help with the Y2K converstion of ADCNET (departmental staff were doing it). The system was over a million lines of code and the release/development environment bespoke. And required maintenance work on the dependencies/make side of the software had never been done. A few months part-time work saw all that tamed.

TNT
Went for a year as an admin. Did what I could, but they were past redemption... Bought out by the Dutch Post Office (KPN) soon after I'd arrived.
Presented a paper at a SAGE-AU conference detailing my experience with their 'technical guru' - who'd considered himself "World's Greatest Sys Admin". Google will find the paper for you.
It was so woeful, it defies description.

Unisys/Customs (EDI & Finest/Nomad)
In early 1995 was called in to replace a SysAdmin for 8 weeks on the Customs "EDI Gateway" project. The project and site were a disaster - so much so that Unisys listed it as a "World Wide Alert" - the step before they lost the customer and hit the law courts.

In two months the team stabilised the systems, going from multiple kernel 'panics' [a very bad thing] per week, 8-10 hour delays in busy hour and lost messages - to 100% uptime over 6 systems, 1-2 second turnarounds and reasonable documentation, change processes and monitoring/diagnosis tools. The Unisys managers were very appreciative of my efforts and contributions. This same sort of chaos that was evident in the 2005 Customs Cargo clearance System debacle. [The 'COMPILE' system ran on Unisys 2200 and was being replaced over a 10-year period. It was the back-end for the EDI systems I worked on.]

So much so, that I was called back for another few months to stabilise another system running ADABAS/Natural legacy applications that provided the Financial Management & Information Systems and Payroll/Personnel system. Another high-profile, critical system.

CSIRO - all Australian daily weather database
The research scientists on the project I worked for created some tools to analyse weather data - and had found a commercial partner to sell them. The partner was not entirely happy due to extended delays and many unkept promises. I'd been told that to buy the entire dataset - a Very Good Thing for the commercial partner - was not affordable, around $20,000 for the 100 datasets from the Bureau of Meteorology. When I contacted the BoM, they not only provided the index in digital form for free, but the whole daily datasets would cost around $1,750. I scammed access to another machine with the right tape drive, wrote scripts and did magic - and stayed up overnight reading the 70-odd tapes. In pre-Linux days, there was no easy way to compress the data and move it around.

The whole dataset as supplied was 10Gb raw - and I only had a 1Gb drive on my server [$3,000 for the drive!].

It took 6 weeks to fully process the data into their file format. And of course I had to invent a rational file scheme and later wrote a program to specifically scan and select datasets from the collection.

The Commercial Partner got to release the product at the ABARE 'Outlook' conference with a blaze of CSIRO publicity. Don't know what the sales were - but they were many times better.
The research scientist got a major promotion directly as a result, and I was forced to leave for having made it all possible.

Diskrom ($350k project income in a year)
In under a year I learnt SGML, hypertext and enough about handling large text databases [1991 - before the Web had arrived], took over and completed 3-4 stalled and failing projects, migrated datasets and ssytems, designed tools and file structures/naming conventions and completed the first merge of the Income Tax Assessment Act with the Butterworths commentary, speedup processing of a critical step by 2,000 times - all of which directly contributed $350,000 in revenue [apportioned to my effort] - or around 12 times my salary.

So it's natural that everybody else in the office was given a pay rise, I was told that I was technically brilliant but not worthy of a rise and one of the 'political players' was promoted to manage the group. With a number of other key technical 'resources' I left to pursue other avenues.

Diskrom was shut down just a few years later when a new chief of the AGPS (Aus. Gov. Printing Service) reviewed the contract and decided they were being ripped off. They'd provided all the infrastructure and services, with the commercial partner paying for staff and computers - and despite lucrative contracts and overseas work, never seen any return.

Digging Out - Turning around challenged Technical Projects/Environments

Something I wrote in 2002:

‘Digging Out’ - 7 Steps to regaining control

This is a process to regain administrative control of a set of systems. It can be practised alone or by groups and does not require explicit management approval, although that will help.

‘Entropy’ is the constant enemy of good systems administration – if it has blown out of control, steps must be taken to address it and regain control. The nature of systems administration is that there is always more than can be done, so deciding what not to do, where to stop, becomes critical in managing work loads. The approach is to ‘work smarter, not harder’. Administrators must have sufficient research, thinking & analysis time to achieve this – about 20% ‘free time’ is a good target.

This process is based on good troubleshooting technique, the project management method (plan, schedule, control) and the quality cycle (measure, analyse, act, review).

The big difference from normal deadline based project management is the task focus, not time. Tasks will take whatever time can be spared from the usual run of crises and ‘urgent’ requests until the entropy is under (enough) control.

Recognition

Do you have a problem? Are you unable to complete your administration tasks to your satisfaction within a reasonable work week? Most importantly, do you feel increasing pressure to perform, ‘stressed’?

Gather

The Quality Cycle first step is ‘Measure’. First you have to consciously capture all the things that 1) you would like to do to make your life easier and 2) take up good chunks of your time.

The important thing is to recognise and capture real data. As the foundation, this step requires consistent, focussed attention and discipline.

The method of data capture is unimportant. Whatever works for the individual and fits naturally in their work cycle – it must NOT take significant extra time or effort.

Analyse

Group, Rewrite, Prioritise.

Create a ‘hard’ list of specific tasks that can be implemented as mini projects that can be self managed. Individual tasks must be achievable in reasonable time – such as 1-2 days effort. Remember you are already overloaded and less than fully productive from accumulated over stress.

Order the list by 1) business impact and 2) Daily Work-time gained.

The initial priority is to gain some ‘freeboard’ – time to plan, organise and anticipate, not just react.

Prioritisation can be done alone if there is not explicit management interest.

It will surprise you what management are prepared to let slide – this can save you considerable time and angst.

Act


Having chosen your first target, create time to achieve it. This requires discipline and focus. Every day you will have to purposefully make time to progress your goal. This means for a short period spending more time working or postponing

Do not choose large single projects initially, break them into small sub projects.

When you start, schedule both regular reviews and a ‘drop-dead’ review meeting – a time by which if you haven’t made appreciable progress on your task to review

Review

How did it go? Did you achieve what you wanted? Importantly, have you uncovered additional tasks? Are some tasks you’ve identified not necessary.

If your managers are involved, regular meetings to summarise and report on progress and obstacles will keep both you and them focussed and motivated.

‘Lightweight’, low time-impact processes are the watchword here. You are trying to regain ‘freeboard’, you do NOT need additional millstones dragging you further into the quagmire.

Iterate

Choose what to do next. If you’ve identified extra or unnecessary work items, re-analyse.

When do you stop this emergency management mode? When you’ve gained enough freeboard to work effectively.

A short time after the systems are back in control and you are working (close to) normal hours, you should consider scheduling a break. You’ve been overworking for some time and have lost motivation and effectiveness. A break should help you freshen up, gain some perspective and generate ideas for what to do next.

Maintain

What are you and your managers going to do to keep on top of things? How did you slide into the ‘tar pit’ in the first place? What measures or indicators are available to warn if this repeats.

How will you prevent continuous overload from recurring?