Saturday, August 3, 2013

SpaceX Glitches - Countering Over-Engineering

Over-engineering? (Portland, OR)
SpaceX is arguably the most successful private/commercial/new space company to date. After being the third country entity to develop the technology and hardware that can bring cargo to the International Space Station and back, it's taking strides towards bringing people to the coveted destination in the sky, working on a version 1.12.0 of the Falcon 9 rocket and also on a reusable rocket, the Grasshopper.

With all the success SpaceX is having, we should not forget the problems and misses of recent launches, for example an shut-down in-flight and problems with thrusters required to get the Dragon to the ISS when in orbit.
From the SpaceX Updates page:
After Dragon achieved orbit, the spacecraft experienced an issue with a propellant valve. One thruster pod is running. We are trying to bring up the remaining three. 
One may look at these issues and be concerned about SpaceX design, manufacturing quality, engineering, redundancy and what not. I am actually encouraged by these glitches.

From where I am sitting as a software industry professional of 17 years it looks like SpaceX is taking a page out of the software industry, where over-engineering is the silent killer of projects, causing them to be late, costly, and even dead on a much late arrival. Obviously if these problems would deteriorate missions into catastrophes we wouldn't be praising SpaceX, but at the same time I would actually be also worried if SpaceX didn't have any problem whatsoever. The space (no pun intended) between failure and over-engineering is the sweet spot that allows companies and projects to be successful by moving as fast as possible with positive results and subsequently keeping cost down.

As a young engineer I confess I wanted to create the perfect design for everything I did, build it with no compromises and call it done when there were no bugs left. Many gray hairs later and time in the real world there's a phrase I repeat every time an existing component, application or software architecture is compared to a suggested one - "Unwritten applications are perfect". There are no compromises, no bugs, no problems in a product that lives as a PowerPoint presentation, Word document or the back of a napkin.

As a product moves through the digestive tract that leads from people's minds to a product, it reaches a right of passage, a decision point if you will - continue to tinker until it is "perfect", which is asymptotically never, or make it good-enough, willing to iterate after it ships, realizing this version is not the last. As a perfectionist in rehab it is important to remember that. A French proverb captures it well - "Le mieux est l'ennemi du bien." (relevant also to this post on which I've been working on and off for a couple of months...)

Consciously making the choice in advance to not release an imaginary product but instead a real one means it will have problems, but when these happen, no one is surprised and the system as a whole is robust enough to counteract and built with change in mind - from cameras that have a firmware that can be upgraded to software components that validate their input, all the way to airplanes being able to land with no engines or a Falcon 9 rocket that can operate with only eight operational engines.

Let's face it - sh*t happens, so assuming it doesn't or worse - assuming one can work on a system until it is perfect, both carry great risk to actually completing a mission or releasing a viable product.

In the next few years I bet SpaceX will show us that it can continue and iterate on its success, yield better and better products and show us that over-engineering should not be found not only in software but also as part of rocket-science.

2 comments:

D. Messier said...

The approach works as long as SpaceX is launching cargo vehicles filled with clothing, food and other easily replaceable commodities. This is why the commercial cargo program is a perfect vehicle for SpaceX to gain experience and make mistakes. NASA is not going to be overly surprised if they lose a Dragon (as almost happened on the last flight). Nor will the ISS program really be much affected unless SpaceX and Orbital start losing a lot of freighters.

This also explains why the military is a bit leery of using SpaceX until the bugs are worked out. Those satellites are worth a billion dollars or more and are essential to military readiness and national security. The costs of losing one -- and replacing it -- far outweighs any money they would save. You want rockets of extremely high reliability with proven, set designs that have flown over and over again.

And it goes without saying that NASA will want the same thing with commercial crew.

Amnon I. Govrin said...

Thanks for your comment, Doug!
So far SpaceX have shown they can walk that line of creating a robust enough system. I completely agree that the stakes are higher with larger, heavier, more expensive and more alive payloads.
Glitches are a part of the game, and are to be expected from complex systems. If they continue their streak of successes, I'd prefer seeing that the same problem doesn't come up twice, however at the same time it is encouraging that 2 parachutes out of 3 are enough or that 8 engines out of 9 are enough - it's system-design robustness and reliability vs specific component robustness.
Obviously I am implicitly assuming or at least hoping that they won't have a catastrophic cascade of failures. Going back to software, there are cases when after release a cascade of events leads to a recall or urgent fix, which in case of a rocket catastrophe is regretfully not an option.