posted: October 14, 2018
tl;dr: Team harmony trumps purely rational, technical decision making...
This is the second of two posts on Japanese business culture, gleaned from my time working at a smartwatch startup company, AT&E Laboratories, that had a partnership with Seiko in the late 1980s and early 1990s. Part one covers after work drinking.
The wristwatch pager had a low-power CMOS microcontroller designed and manufactured by Seiko, optimized for use in digital watches. It didn’t resemble any processor I had ever run across: it had a 4-bit data bus and a 12-bit instruction bus. The only way to program it was in its own assembly language. Ultimately we made that microcontroller do amazing things; someday I’ll write a post just on that topic.
Being a custom Seiko microcontroller with its own assembly language that only Seiko engineers had experience using, the responsibility for writing the software for the watch was given to (or claimed by - I wasn’t part of the discussions) Seiko. AT&E’s main responsibility was the RF chips for the digitized FM subcarrier receiver. I also designed a standard cell array chip that packetized the incoming bitstream and presented fixed-length data packets to the microcontroller. The software had to manage all these chips, parse incoming packets, control the display of information on the LCD panel, and process commands from the user pressing the watch’s buttons.
When I joined AT&E, the first generation watch was being internally tested in preparation for the first public field trials. There were some concerns about the quality of the watch software, so my initial assignment was that of a test engineer. Another engineer had rigged together a testbed driven by an IBM PC, and I wrote a bunch of test scripts. I also did a lot of unstructured testing: manual testing, often following hunches, to come up with sequences of events that surfaced new issues, which might then become a new test script. I actually really enjoy testing, for several reasons: it provides an exposure to the system as a whole; writing test software is still writing software, albeit a different kind of software; and I really like finding and fixing bugs before they get out to customers.
I found numerous issues, which I diligently documented and fed back to the software team at Seiko in Japan. Some of them were quite serious: there were multiple ways to crash the watch, whether by pressing buttons and executing certain command sequences, or by receiving certain time bomb packets that would crash the software when it processed them. Apple watch users are probably used to having their watch crash on occasion, but in this timeframe it was unheard of for a digital watch to crash. It was first and foremost a watch: every time you glanced at it, you expected to see the time of day, not a boot sequence. You expected to see the seconds value increment every second on the second, and to never see a delay in incrementing the value, or to see the display freeze for a while and then skip ahead in time. That was the quality standard we had to achieve, and the watch software was nowhere close.
The Seiko team worked many long hours trying to fix the problems I was reporting. However every time they gave me a new release of code which might fix just a few of the most serious issues, other features would break or there would be other ways to surface the same problems. The total issue list was not going down, and we were not converging on a release candidate. I’m a firm believer that you cannot test in quality: quality has to be designed in from the start. This project proved that once and for all for me.
My boss saw these struggles and got his hands on the source code, which he shared with me and others. Even though we had to learn a new microcontroller architecture, instruction set, and assembly language, it was clear that the software was an unruly, unstructured mess. To be fair to the original developers, there was a huge constraint they were dealing with: the processor had very little memory for what it had to do, and so all sorts of tricks had to be done to conserve memory. Seemingly anytime the same three assembly language instructions had to be used in two different places, they were turned into a subroutine, perhaps with a flag or two to modify the behavior slightly. There were global variables galore. The thread of execution jumped all around the entire memory space. It was the ultimate in spaghetti code. It looked like code I might have written before I had taken my first Computer Science class and learned about structured programming, functions, modules, and the importance of variable scoping. I half-seriously considered moving to Japan to teach the Japanese the fundamentals of modern software development.
The code’s structure (or lack thereof) explained why the testing was not converging on a release candidate: the code was too inter-related to be able to debug. My boss, completely on AT&E’s dime and without the prior approval or even knowledge of Seiko, launched a project to write, from scratch, all the software for the watch. I continued testing while he and a small team tooled away (I did get to write the code that interfaced to the packet deserializer chip I designed). Even though the AT&E team started while the Seiko team’s code was in test, and even though no one at AT&E had ever used the Seiko microcontroller before, within a few months AT&E had software which could actually pass all my test cases and which appeared to be rock solid. I couldn’t make it crash, and I was still able to crash the Seiko code.
By this time the public rollout of the watch was months behind schedule, due in part to the watch software quality issues. When upper management at AT&E saw the test results of the AT&E software, they were thrilled: we finally had a release candidate that could be burned into the ROM of what they hoped were tens or hundreds of thousands of watches (there was no way to do a field software upgrade). All that needed to be done was to inform Seiko of the existence of the AT&E software and show them the test results. The best, most rational way forward would then be to kill the Seiko software and go with the AT&E software. My boss flew to Japan with the top brass from AT&E to make this happen.
The most rational way forward, however, had not taken into account people’s feelings and the cultural imperative to save face. Saving face means providing a graceful way for someone to reach an end, rather than it being obvious to all that the person has flaws which led to a defeat. Saving face matters in many cultures, and it can also be expressed as politeness and civility. It is important in Japan and Asia, and it was particularly important here. Even the existence of the AT&E software project was insulting to the Seiko team, because it represented a lack of faith that the Seiko team would deliver. The test results which showed the AT&E software to be better were career-threatening to the Seiko team. Needless to say the Seiko folks did not share the enthusiasm for the AT&E software felt by upper management at AT&E.
A face saving solution was eventually developed and agreed to by both sides: a new software project would be launched to produce a “best of both worlds” solution, that would take the best code from each team’s software so that each team contributed to the final software. What this meant in practice was that the timekeeping math routines from the Seiko software (one of the more isolated portions of the Seiko code) were ported into the AT&E software, even though there was nothing wrong with the timekeeping in the AT&E software. Comments and copyright notices were rewritten to provide equal billing to both companies and their developers. It took more time to develop this release, which further delayed the public launch of the watch, but it was a necessary step to preserve the companies’ partnership and to save face. People’s feelings matter, often more than purely technical concerns.