The problem with rich, deep interfaces

The problem with rich, deep interfaces is that when you make a mistake, it can be a really big mistake. And you don't know what you've just done.

Definitions

Before we get any deeper into the topic, I'd like to define a few terms I have been and will be using. These are my own inventions, because I'm completely self-taught in the field of interaction design. Corrections are welcome, of course.

Rich interface
A rich interface has a large number of interaction handles in whatever interaction modes it supports. For example, a program may have a large number of keyboard shortcuts, like the document editing pane in Textpad, where [CTRL]+[1] may be set up to compile the current document as a java program, and [CTRL]+[SHIFT]+[U] capitalizes the first letter of each word in the current selection. Rich touch screen interfaces might allow multiple-touch commands, pressure-variant commands, or timing-variant commands. Right-click menus (context menus, for Mac users) add a menu to nearly every point on the screen, differing in content depending upon the context. The classic rich interface is the airplane cockpit, with a huge number of controls.
Deep interface
A deep interface allows the user to modify data, programs, or their environment at a deep level, by bypassing what would normally be a longer sequence of commands in another program. For example, in a file manager program, the context menu for a group of selected files might include an "Archive these files..." option, allowing a (nearly) one-click operation that saves those files in a zip file. Without that command, the user would have to 1) open an archive program, 2) select a set of files to include, and 3) save the archive file to some location. Deep interfaces are not minimal, that is, there is redundancy present in the command set. The classic deep interface is the right-click menu, allowing common complex actions to be executed easily.

The myth of the skilled user

A skilled user appears to handle a rich, deep interface as if he or she is conducting an orchestra, right? Except we know that a conductor is not actually controlling each person or instrument, but is instead serving as a focal point or synchronizer and has more control over the feel of the music, not the actual notes played. Can you imagine an interface that would allow a conductor to control an entire orchestra of instruments? It would likely fail to be sufficient for even the most proficient user.

Let me switch to another interface for a moment. Have you ever used the Mouse Gestures extension for Firefox? It exploits the otherwise wasted potential of the right mouse button. The left button does most of the work, clicking, holding, dragging, and double-clicking. Why does the right button only work with clicking? Mouse Gestures exploits the unused potential of right-button dragging. Pre-defined combinations of drag directions cause actions within the browser, and users can define their own as well. It can save the user a lot of menu navigation. Unfortunately, when you accidentally drag with the right mouse button, BAM, everything changes. What's worse, you're not sure what command you have just issued to the browser. You could have turned off javascript, which may result in hard-to-diagnose problems in later browsing. You could have changed the browser's User Agent string, cache settings, or closed or opened a tab. Even the geek who plays a computer like an orchestra makes mistakes sometimes. Miskeys and misclicks plague the best of us.

Flailing

Typos and accidental clicks occur on a regular basis, even though keyboards and mice present a fairly discrete interface. You've either clicked or you haven't. But what happens with more "organic" interfaces, like gesture interpretation, a la Minority Report? Imagine entering the wrong command by accident, then waving your hands as files disappear and you begin to panic. The flailing-about is interpreted as another set of commands, and more unexpected events occur. You're screwed.

We've all seen this happen to others with the keyboard + mouse interface, and it has happened to most of us as well. You click on a button, nothing happens, you click again, the computer responds to both, you try to quickly cancel but accidentally hit the wrong button, something else starts happening... Flailing is a very real problem. (At least one Macintosh program will detect random keypresses and literally ask if your cat is walking on the keyboard, thereby blocking potential accidental commands from executing.)

Technology imitating nature

Interfaces that work with existing gestures and actions tend to fare the best. The drag-and-drop action is highly intuitive, allowing users to discover other uses that the programmers have provided for. (Discovery is the process of acting outside the known rules of the interface and being pleasantly surprised. (This includes acting on a hunch and being right about it.) Discovery's evil twin is the typo or mis-click.) For example, I used to send files over instant messenger by right-clicking on a contact, selecting "Send File...", finding the file, and hitting OK. One day I decided, on a whim, to drag a file onto a buddy on my list. It immediately started a file transfer. This kind of feature discovery drives the best interface interactions. Users then have no use for a manual or help file; all the information they need is accessible through discovery.

A panicking user provides an interesting problem, though. They may make many unintended command inputs. An ideal interface uses a known "panic action" as the "stop" command. An article in Discover on a by-wire hydrogen car [Not a subscriber? Use a free login] illustrates this wonderfully. In the Hy-Wire's joystick-like control system, the driver squeezes to stop. This perfectly matches the natural panic response as the driver sees an obstacle ahead. Touch screen programmers have a great deal of difficult work to do when interpreting chorded (multiple-touch) gestures -- the controls are often built in response to pre-existing gesture patterns, instead of forcing the human to confom to the machine.

Unfortunately, there isn't always an intuitive action command for the user to stumble upon, and there may simply be too many commands for an efficient action-command mapping to be built. Additionally, similar actions in different programs will have different effects. (I use two popular graphics editing programs, Adobe Photoshop and The GIMP, and I am constantly confusing the letter-to-tool associations. Even Undo works slightly differently.) The inevitable result is that even the most proficient users will make mistakes, sometimes because of their very proficiency -- interacting with multiple systems which vary with respect to intuitiveness will always cause some gear-shifting problems.

Insufficient feedback and recovery

The worst that can happen with a rich interface is a lack of feedback and recovery mechanisms. In other words: "What just happened?" and "How do I undo that?" In general, actions either affect the data or the interface, and many users do not understand the difference. Upon accidentally entering full-screen mode, for instance, a user's first response might be to hit [CTRL]-[Z] (undo), not realizing that the interface has changed, not the data. Most programs provide data recovery in the form of "undo", whereby the data may be reverted to a recent version, but none that I know of provide recovery for the interface. Even so, Macintosh computers provide excellent feedback in the form of animations, showing exactly what is changing in the interface.

Flailing and lack of feedback and recovery form a vicious cycle.

A solution?

And that's the problem with rich, deep interfaces, at least insofar as they are implemented today. They provide a great deal of power with few safegaurds, erring on the side of speed rather then caution. Each easily-accessible feature opens a world of both possibility and frustration. However, there is a way out:

  • Keep the interace intuitive by modeling it after existing user behaviors.
  • Provide feedback to the user when they have executed a command, allowing them to respond appropriately.
  • Provide recovery mechanisms so the user may undo both data and interface alterations.
  • Detect flailing and ask the user to verify that they indeed know what they are doing.
  • Respect conventions set by other interfaces, enabling the user to apply knowledge outside of your interface.

These tips can be applied to any sort of interface, of course, but they are especially important with the rich, deep varieties.


Responses: 1 so far Feed icon

  1. Cory Capron says:

    Every time I go to read this something comes up before I finish it and I end up starting over. Interesting read.

    We are totally beyond due for an epic. Not sure when a good walkabout would fit, but I'm game for one of them too. Kinda sleepy at the moment, but you know my number.

Commenting is not yet reimplemented after the Wordpress migration, sorry! For now, you can email me and I can manually add comments.