## 2010/07/30

### What can you learn from a self-proclaimed "World's Greatest"?

Note: This document is copyright© Steve Jenkin 1998-2010. It may not be
reproduced, modified or distributed in any way without the explicit
permission of the author. [Which you can expect to be given.]

Lessons from the Worlds' Greatest Sys Admin - July 1998
Presented at SAGE-AU Conference, July 1998
Contents
Introduction
Background
Some WGSA Attributes
Sayings of the WGSA.
Some Sound Management Laws
So What?
How do you work with a "World's Greatest ..."
Some "Good Stuff" I learnt from friends.
Some of the WGSA's work
Summary

# Introduction

## Why

(2005) The most frequent comment I received on the 'WGSA' talk was:
"So you think you are the World's Best Sys Admin?"

Answer: No, I am not the "WGSA".

This paper is about someone, who is really an amalgamn of a number of people, who regarded themselves as "The World's Best Sys Admin". They never verbalised this opinion - they just lived it.

Unfortunately, as is the case with all self-appointed 'guru's I've met, they had limited raw talent and an arrogance that prevented them admitting less-than-perfect performance, taking on-board any useful criticism or correction or learning new tools, techniques, processes, andorganisations from others they didn't consider an authority.

or What Not To Do...

I included a section on Good Things I've Learned to show that I wasn't totallypreoccupied with the negative :-) But there were just too many good stories, and I really had to let steam off over this...

Do you think you know the identity of "The WGSA"?

You don't. For those that may even think it's them - No, it's is not you.

The observations and opinions here came over a considerable period of time. It's not a single person - and it's not just Administrators. I've met "WG Programmer", Architect, Designer, Tester, Integrator, Networker, Technical Manager and CIO.

So - onto the main game - the paper as presented to SAGE-AU in Old Parliment House.
If you were there - did you catch any of the lollies I threw, or even a chocolate egg?

Feedback is something I'm interested in.

Drop me a line if you have your own stories, can add more useful models, or if you're late to this and found it useful. Suggestions for improvement gratefully accepted and acknowledged.

## Why?

To codify and inform.

Once a problem is recognised and named, you can start to understand and address it.

## Audience

- If you work for one.
- If you work with one.
Managers
- If you have one working for you.

## Format

• Talk
• War stories
• Feedback
• And lots of Opinion.

## The BIG Questions

• So What?
• How do you work with one?

# Background

I spent a year in 96/97 contracting in Sydney for what should've been a large, prosperous Australian multinational. They hadn't paid a dividend since 1990 and were taken over by a Dutch company at the end of 1996.

The I.T. group was appalling.
Staff turnover in the Unix Support and Networking areas was high - close to 100% in 12-18 months! The company spent under 1% of turnover on I.T. - against the industry average of 5+%.

It felt like we were doing the impossible - and we were.

They'd outsourced their mainframe, embraced 'open systems', installed a large scale WAN, gone client-server, were developing GUI and O-O applications and had an Internet presence.

They'd also radically downsized in two steps:- from 200+ to 30-40 staff in 18 months.

They were fully Buzzword Compliant, but were going nowhere.

I was privileged to meet two people - both ex-telecoms technicians who had moved into computing. One, the self-proclaimed WGSA, had been responsible for setting up the Unix environment, and it's associated X.25 network, and been the Unix support manager for a couple of years, until finally taking a job in 'Technology Planning' - but just doing more of the same.

The other ex-tech I've remained good friends with. He moved into Networking after a career in Civil Aviation, then a TAFE. He had enough PC, Unix, and Internet knowledge 'to be dangerous'. He'd left behind at the TAFE an environment where just 2 of them supported and ran the
whole state TAFE network, +1 for Unix, 1 for printers and passwords, and 2 on the HelpDesk. When the lot was outsourced and a crack systems company took over - they boast they can cut 10%-20% from any operation - they ended up having to spend more.

The contrast was stark and savage - one had left behind a legacy of chaos and disorder, the other was undoing the damage and providing real business productivity.

This talk is about that experience and what I've learned.

These are my values and principles. Your mileage may vary.
• Know why you're there - To statisfy others business
needs.
• Know what you Know, Know what you Don't Know,   and don't be afraid to get assistance.
• Obey Sound Management Laws.
• Learn, Develop, and Stay Current.
We learn through Invention, Discovery, and Failure
"That which isn't growing, is dying"
• Give Value for Money.
• Actively seek ways to put yourself out of work.
• Minimise recurrent costs - wages, maintenance/support/rental charges
• Maximise Reusability, Flexibility, Functionality, Reliability/Robustness.
• Provide what's needed, not apparently wanted.
• Listen and communicate with your users.
• Provide Solutions.
• Focus on Outcomes.

# Some WGSA Attributes

• They don't exist outside fertile ground. They have to be allowed
and encouraged by management and peers.
• Only Dysfunctional people thrive and rise in dysfunctional
environments.
Good people leave broken places - possibly after fighting for a time.
The only other alternative is to withdraw and retreat into minimal
performance.
• People are the ONLY asset of I.T. Organisations

• Hardware: $1M to zero in 3 years • Software:$100k to zero in 3 minutes
• Network: $250/point to zero in 3 seconds • Indicators: • High staff turnover • High contractor ratio • N.I.H. - Resistance to Change • Lack of "Professional" work habits - Defined Processes, Designated Responsibilities, Delegated Authority • Lack of History, Documentation, Policy, Procedures, Config Mgt, Version Control, Handovers, Induction • Chaos and Frenzy. Apparently understaffed and overworked - definitely unorganised "No time to fix problems, too busy fixing faults." • Nobody tasked with automating jobs or passing work back to level 1 support. • Every install project goes into crash mode. No standard, fast, system builds. • Maintenance frenzy - never seems to get any better. • Lack of fault analysis, reviews, Post Mortems, Post Implementation Reviews, capacity planning. • Single source of innovation and improvement - The "Guru" • The "BIG BANG FIX" is coming [Or the"Silver Bullet"] Nothing can be done, because "someone" [WGSA or friend] is creating "the Solution to All Our Problems". • (Senior) Management "Swooping" is allowed and tolerated. • " Don't show me problems, Show me solutions" • Mentoring and skills transfer absent • Constant reactive, not proactive, administration. • No organisation accountability - fail to do any task - routine or project - with impunity. • Blaming and Recrimination normal. No attempt to perform 'root cause analysis' and rectify faults. • No recognition or rewards for work well done. • Few Diagnostic, debugging, or troubleshooting Tools - even for common failure modes. • No Communication - up or down. • No Performance Indicators or Measurement/Assessment # About The WGSA Of course he was the best. He had read every single 'white paper' from the vendor, and with his photographic memory, could recite it all back. All he needed to know was in those papers, and the manuals he'd read. He didn't need to meet and talk with his peers, he had none anyway! He had no need of professional organisations or finding out what had worked, or not, for other people. If he didn't have time to do something himself, he would get in a contractor, create a project, or hire a consultant. Funnily, these people were always only of very modest ability. The projects mostly ran out of money in "phase 1", when only the basic work was being done and well before the real benefits were to accrue. He'd written 25,000 lines of shell script to provide a "common" menuing and execution environment. It was a most flexible, adaptable, and configurable environment - and surprisingly similar to that run by his previous employer. Just the thing to control 12 machines... It was a real engineering triumph - for 1982! He'd built and deployed all this with no version control, configuration management, or documented release and maintenance procedures - and certainly no review. His crowning glory, "Xferutility", 7,000 lines in a single script, heavily utilised 'comes from' control files [they just appeared places, with no trace of whence they came], and could use 'rcp', 'ftp', and e-mail to achieve the functionality of uucp. Plus, it was the transfer mechanism, the interactive menu, the scheduler, and the status reporter. All things to all people bar those left to maintain it. Having not apparently done "Programming 1A", he'd not been introduced to the concepts of "coupling and cohesion" - put together everything that belongs together, separate unrelated concerns - and least necessary complexity. To go from the login prompt to the first displayed menu, over a dozen files or scripts were executed - often in perverse order. The system drive defaults would overwrite the local definitions! He also seemed unaware of basic capacity planning issues - like tracking the number of systems in the machine room and providing adequate rack space and cabling. Backups were another story entirely. The I.T. department policy was to have separate small systems for every division, no two the same. In 12 months it went from 12 systems in the machine room, to 23. And then to 35+ in the next 6 months. Having labelled me "a cannonball contractor who won't be around in the long term", he resigned the week he penned it, took an overseas holiday [run in the same flexible fashion], and rejoined his previous employer, through a services company, performing Network Management. # Sayings of the WGSA. A few of these are paraphrases. What I find myself saying often is :- Why would you want it any other way?, and (2005)Would you expect any less?. The answers to these questions are usally: Yes, any other way, and "NO!". Sayings and tactics of WGSA and friends: • A basic tactic: Plan, Plan, Plan - and produce massive documents everyone else has to review. Nothing will actually get done. • Another basic tactic: Reality is at Fault, Adjust your Perceptions. • "It's worked that way for 3 years - it couldn't be broken now." [A basic tactic. You obviously have got the nature of the fault wrong. Ignorance and Rigidity are a powerful combination.] • Another basic tactic: Concentrate on the trivial, the Big Issues will fix themselves. • "You don't understand the full range of issues or complexities." [I know, you don't.] • "It works/worked fine for me..." [Hasn't told you or Reality is at fault.] • "Read the documentation I wrote." [But hasn't told you about.] • "You have to fully document that." [An attempt to divert, stall, or put you off.] • "The client doesn't want that." [Were they ever asked? Were they ever given options?] • "They [the clients] never asked." [Deflection. Clients are expected to be technical experts.] • "It's UnAuthorised." [But where is the Policy on that?] • "It's not Standard" [It's free. We have to pay heaps or the other boys will think we're not cool.] • "We can't afford that." [May be true, but unlikely based on the money chucked around on junkets/trinkets for the favoured few.] • "It's freeware. It's not supported." [Often said without a hint of irony in response to 'costs too much'.] • "If you can Cost Justify that..." [A stalling tactic. Nothing you put up will ever get approved.] • "You Just ..." [Makes you out to be a fool/incompetent, even though there is no way you could've known.] • "Why haven't you ... <;said angrily>" [So how would you know to do that, when you haven't been told about it?] • "It's really flexible/efficient/configurable/Easy when you use it... " [Defending a wildly over-complicated script] • "We need it because ...or We have to do it that way." [Of course there is nothing written to back it up. The WGSA wrote it, so it's going to stay.] • "We won't discuss that [now]." [No argument if there is no discussion.] • "That's not the way we do it around here." [No change is possible. Of course, nothing is written down and there is no Policy to back that up.] • "You can't do/say that." [Controlling.] • "What is the Vendor's policy on replacing that?" [Deflect and control. Of course the vendor doesn't have a written policy on when something is broken.] • "The Vendor's White Paper/Documentation says ..." [Appeal to another Authority. Stifle argument. Don't let facts or prior experience get in the way.] • "The Consultant's Report says ..." [Appeal to another Authority.] Remember, there are rules for him and another set for you. He will ignore e-mail, talk about you behind your back, set impossible deadlines [for you], and not keep his promises. Don't expect to be told about important stuff that impacts you, or that you happen to be expert in. You won't get invited to meetings, see reports, or be involved in the 'discussions' held before major decisions are announced. Rumour, disinformation, and 'Need to Know' are powerful tools for the WGSA. He will casually drop bombshells, regularly spring 'surprises' on you, and practices 'Divide and Conquer' extremely well. He allocates work, but will never help or clarify what he wants. And of course, won't follow up on it. He may fly into a 'justifiable rage' if he comes back in a month and something hasn't been done to his satisfaction... It's not easy being so perfect and all-knowing all the time. Rational argument won't work with the WGSA. What matters is that he thought it up, he's important, and the bosses [his mates], think he is an absolute Guru on everything. And if you ever get close to criticising him or winning an argument - slander and libel work just fine for him. # Some Sound Management Laws (2005) Note:I don't try to come up with any principles or Laws for The "WGSA" follows. There is probably only one: "Seize Ever Opportunity". Which isn't a bad dictum, if it respects other people, fulfills your business's needs and goals and isn't only about advancing your personal agenda. My version of "Sound Management Laws" are presented for you to consider and understand where I am "Coming From": • Delegate Authority with Responsibility and Accountability. • Follow up, Follow through, Be Consistent. • Value and Empower your staff: People are your only asset. • Do It NOW! • Follow The Quality Circle: Plan, Act, Evaluate. • Encourage and Reward Professional Behaviour, deal quickly with repeats of poor behaviour. • Lead by Example. • Forge, maintain, and support Teams. In I.T. there are special management considerations: • Users come first • Satisfy Business Needs • Actively sell your successes and services to your users. • Constantly set and manage users expectations. • Inform, advise, consult • Be Honest and forthright - especially about your mistakes and failures. Take care to explain Why it won't happen again. • Be Proactive. You get to drive the technology, they drive the business operations. • Know Yourself, Your Staff, Your Tools. • Never take on a job you cannot do. • Don't give others jobs they can't do. • Risk Management, Reviews, and 'Performance Audits' are your chief tools in establishing a Learning organisation. Good working relationships between management and staff take time and effort to develop. They proceed through the following stages and are fragile. The whole lot, years of work, can be destroyed in an instant with a lie. What management want are people they trust, work very hard, and consistently produce quality work. People who hold the company's best interests to heart. Development Stages of People and Teams: • Honesty, Integrity, Openness, Frankness, Consistency • TRUST • RESPECT • LOYALTY • COMMITMENT, CARING # So What? Since the advent of the 486 in ~91, cheap LAN's in ~94, and the Net in ~96, I.T. systems and infrastructure have become essential and critical for all business operations. Systems Administration, Networking, Help Desk, and Database Admin are the glue that holds it all together from day to day. There is a myth that software doesn't wear out like machinery. The bits don't change, so it must be OK! By implication, you don't need to "maintain" systems and software, like you do machines. So why aren't we all running 286's and DOS 3.3? It's called 'bit rot'. The software doesn't change, but the environment does - which gives the same net effect. Year 2000 isn't a problem until your clock says 01/01/00. My argument is that company profitability is related directly to, ignoring management and leadership issues, staff efficiency [$ cost / $sales] and new product evolution. These are driven directly by I.T. capability, which requires systems be constantly upgraded and enhanced - just to stay where you are! Similarly, I.T. operations staff must be continually increasing their own efficiency just to keep up. (2005) See the 2003 Harvard Business Review article "I.T. Doesn't Matter" by Nicholas G. Carr. Effective Systems Administration is the single greatest point of leverage in the I.T. infrastructure - which is itself the single greatest point of leverage in an organisation. It amplifies and extends the thinking, analysis, and decision making ability of the people in the organisation. Even sometimes the managers. It can even provide some corporate memory - a prerequisite for Knowledge. It's obvious the software in airplanes, spaceships, nuclear reactors, medical instruments, weapons systems, banks, and ATM's has to be correct, robust, and dependable or there are disastrous, often immediate, consequences. People die or billions goes missing. [Roll on NT - reactor control!] What's not obvious is the long, lingering decline and demise of businesses - large and small. The cost to Australia of losing a multi-billion dollar multinational company is incalculable. Well managed and well lead, it could still be a potent force on the global stage. Instead we have lost profits, destroyed assets, and put a few thousand people out of work. (2005) On May 28 2001, Australia's fourth largest telco, One-Tel, ceased trading on the ASX. The Packer and Murdoch families, who control the media conglomerates PBL and News Corporation, lost about A$1Billion in the debacle. A major factor in the failure was uncollected "receivables". The computer billing system was faulty.

One.Tel closely followed the failure of HIH Insurance and Impulse Airlines.

That's a disaster 10 times bigger than TWA-800 going down outside New York just after take-off in 97, and they are still fishing out pieces. Just because it is in glorious slow motion - taking a decade, not a minute to unfold - doesn't mean we shouldn't still be as concerned with businesses going down as with aircraft crashes. People lives are destroyed and assets lost just as thoroughly in both types of crashes.

The government and professional bodies should be just as concerned with these outcomes and ensuring they can never happen again.

# How do you work with a "World's Greatest ..."

My style has been described as "Straight Up the Middle, with lots of smoke and noise."

My only response is to recognise an intractable situation early and leave as quickly as you can. A luxury I can afford, having no dependants and a low level of debt.

# Some "Good Stuff" I learnt from friends.

• Know what's important. Focus on that, ignore the trivia.
• Practice - Order, Discipline, Rigour.
• The job isn't done until your records are up to date.
• Professionals do for $100k what anyone can do for$1M.
• Remember Good Ways to do things when you see them.
• ASK other people - what works? What doesn't.
• STANDARDISE. Make it so they is just one way things happen.
• Be prepared to work odd hours to not impact your users.
• Clean up as you go.
• The details are important.
• DON'T accept a job you can't do.
• Be Proactive, not Reactive.
• Practice 'Root Cause Analysis' - fix faults and processes, not just
symptoms.
• You have to stay on the leading edge. This takes lots of time and experimentation.
• There is NO substitute for ability, experience, and general knowledge.
• Aim for 100% reliability. Know what you have to do to achieve it.
• If you make Rules, apply them without exception.
You may get called The Network Nazi, but it will all work and you will be respected.
• Be personally flexible when dealing with users. Meet their needs, not just their expressed desire. This may involve some education.
• Let users know what's happening.
• Protect your staff from the vicissitudes of Management.
• Freeware is FINE. If it meets the need, use it.
• Know and Explore your tools.

# Some of the WGSA's work

Here is a [longish] list of some of the wonderful technical and process problems I came across. Remember this was a largish, not huge, enterprise. There were only 75 Unix hosts, a thousand or so users [total], and a network that went to less than 100 sites.

Many of the systems were front-ends to the mainframe or a production system for the business.
The Unix support team was mostly 3 people, sometimes with a manager, sometimes with people doing performance analysis/reporting, or 'implementations' - such as HP Openview [I.T. Operations].

• Common Environment: 25,000 lines of Shell Script. A good technology for 1982, not 1997.
Very poorly written. Basic programming rules of 'Coupling and Cohesion' violated.
• All actions implemented as shell functions, but merged with interactive menu system. Extremely heavy reliance on Environment variables, with perverse re-mapping of names.
• 'Standard Operating Environment'. More shell script! No concept of standard builds, current patch levels, consistent program versions, or automatic software updates. 12 or more months of wasted effort. [Sold to the management initially on the great results from HP's internal network.
With 100,000 PC's and 23,000 Unix hosts spread over 660 sites, they saved US$200M/year in support costs alone by adopting a 'Common Operating Environment'. That was based on keeping all systems up to the same versions of software and config files.] • Xferutility: 7,000 lines of shell script, doing a subset of uucp's functionality. Insidious bugs like:- • Using the (local) return code of 'rsh' and thinking it was all working. • Using rcp and not checking for a previous aborted transfer. • [Destination file ends up with zero modes. Not writable by owner. Copy aborts, but script keeps chugging along.] • HP-UX 10 'bug'. #!/bin/ksh missing. Default '/sbin/sh' used with surprising results - 'exit' doesn't work. • /usr/local/bin banned. All executables and tools to reside in admin's home directories. • Common Admin logons banned. But 'essential utilities', like Xferutility, used a common account with .rhosts trusted all over the place, and even privileged access possible with sudo. • Common User Home directories basic to functioning of 'Common Environment' scripts. Ran ~/.profile to start menu, which [eventually] ran ~/$LOGNAME.profile.
• NO master passwd file. No unique UID's, but notionally unique LOGNAME's.
• NO mechanism to add or remove users from multiple machines.
• NO retiring of unused accounts. No checking for intrusions.
• Default password of LOGNAME. Never checked and never reset.
• Help Desk's 'Password reset' function broken on most machines. No corrective action taken.
• Crack broke 80% of the passwords on the central admin hub. [Including that of WGSA]. Nothing was done.
• WGSA login setup on all systems, with .rhosts back to the admin hub, and 'sudo' access to 'mv' and 'cp'. WGSA had two passwords, family member names + digit. These were well publicised to all admins, and others.
• NO definitive list of managed hosts.
• DNS control files rebuilt every time from a 'hosts' file with 'host_to_named'.
• NO alternate DNS primary.
A single central machine contained all the network services - DNS, e-mail, dial-in access, administration, master copies of scripts and system config files, root
passwords for all machines. This 'admin hub' was trusted, and could access all other systems. There was no fail-over system or contingency plan for massive failure.
• Crippled DNS secondaries.
This was for 'security'. There was NO IP access control in the network. A user with only a little knowledge could navigate the entire network. There was an IP path back to the central DNS, and the IP number were allocated in an orderly fashion.
• Internal domain left at: XXXXX.com. Even where a firewall was installed with the domain of XXXXX.com.au!
• Even with over 2000 device entries in the DNS, and a strong numbering plan initiated by Networks, running sub-domains was firmly and frequently rejected.
• Win-NT and DHCP posed no problem for the DNS. Permanent number leases were granted.
• 10 or more IP address ranges in use. Including a Class-B [the company owned], and other cute addresses like, 150.150.x.x [Wells Fargo's!]
• IP over X.25 was chosen in 1994.
Routers were 'too expensive'. By March 1996 there were massive network failures - morning and afternoon - due to overload of the $250k X.25 switches. Expensive terminal servers were deployed widely, 'because they handle IP over X.25'. Most production support problems related to config mgt, Network, printers, or terminal servers. • Untested backup tapes. In spite of a failure resulting in almost total loss of backup tapes for a system, no testing of readability of backups was performed. • Configuration Managements consisted of copies of scripts in WGSA home directory. NO mechanism for rolling out fixes to faults as found. • Version Control consisted of block comments at the start of the scripts. • Common Code duplicated across 'menus'. • Hard coded 'user types' in Common Environment scripts. • File names not distinguished by hostname. All called 'AdminMenu' for the 'Admin' user. • nonStandard capitalisation of file Names and environment variables. • Very early version of 'sudo' used and modified. Non-standard config files. No repository of config files. No version control. [And WGSA didn't believe me when I found a long standing bug in his code.] • No reviews of code, scripts, systems. • Little testing of new code. Try it live! • No documented procedures for standard tasks. • No records of faults fixed. • No regular analysis or reporting of production faults. • No running sheets on production faults. • No weekly section meeting. No dissemination of information, plans. • No standard machine builds. [Complex and long procedures to build the production systems - with many variants.] • No capability to track or report critical file changes on production systems. • Network Naming Standard defined [but not for Printers and print queues]: ux div 2 loc nr : 11 chars. Accepted by hostname, not by uname • ux = Unix, • div = Division 3 letter code, • 2 = 1st digit of state postcode, • loc = 3 letter code for town/suburb, arbitrarily assigned, • nr = 2 digit machine number WGSA Response: Set hostname to the long name, and uname to the old short name! [So what's the standard??] • X.400 was chosen as the 'Standard external E-mail system'. • HP Openview [@$100k ?], was chosen as the corporate mail system - 'because it could make an address a program'.
• External E-mail addresses were:- Firstname_Lastname@XXXXX.com.au
It took a long and bloody fight to get a script into production that used the Net standard of 'First.Last@XXXXX.com.au', plus generate all the usual abbreviations, and allow specific people to be included/excluded. This of course was removed a week or so after I left... [Only for them to hurriedly fall back to a manual list once they found a mail-loop problem.]
• There were over 10 printing mechanisms, no map of network printers, and no naming standard for printers. [There was a printer called 'printer', and more than one called 'laser'.] Of course, nothing was documented on how it all worked, what got changed, or subtle faults found.
• No disaster recovery or contingency plans existed. Hardware in the old AIX boxes occasionally died and caused not inconsiderable panic to the new admins.
• The machine room had no sensible layout - even though it was newly installed in 96. There was a single ethernet for all the production, development, accounting, and maintenance systems.
• Disk Layouts were recorded nowhere.
• There was no consistency or standard way to way out Logical Volumes on disks.
• The Journalling Filesystem [Veritas], was supposedly 'banned' from all HP-UX 10 systems. The defrag and on-the-fly extend utilities were an extra [pay for] package, so the 'free' part couldn't be used.
The watchword for the I.T. branch was 'CHEAP'.
[Do you think that was in any way related to the company dying?]

# Summary

There are some people out there that don't just think, but know, they are the best.

They are dangerous.

Left unchecked they will not only make life a misery for everyone around, they help bring companies,
even very large ones, down.

What singles them out is their inability to take input from others.

Typical behaviours are:
• Rigidity. Nothing can be changed.
• Control. They have to say how everything is done.
• Fixation. Things have to be done their way or not at all.
• Discipline, rigour, defined processes. Usually absent. Always perverted.
• Favoured Few. There is always an inner sanctum who control everything.
If they are well settled and well regarded, the organisation is dysfunctional. It will be soul destroying staying.

The only defense I know against them, once entrenched, is to leave.

And thank you all for your patience. I hope you have taken something away from all this...