Following its release in 2004, Halo 2 instantly became the most popular multiplayer game on Xbox Live. It held that position for almost two years, and you can make a decent argument that the primary reason Xbox Live survived its infancy was the massive popularity of this single title. During the game’s six-year lifetime, more than 6.6 million players played over 499 million hours of Halo 2 online multiplayer. The development team at Bungie took a bold risk in building a new type of online experience, and it was a massive success and made millions of people happy.
Which is why I’m glad I didn’t succeed in killing it in the lab.
During development, Halo 2 was codenamed “Prophets” after the new race of aliens being added to the Halo universe. At the time, most researchers at Microsoft supported three to five titles each, but because this was a major tentpole title for the original Xbox, there were two user experience researchers assigned to help with the game full-time, myself and Randy Pagulayan. Both of us were trained scientists with PhDs in experimental psychology and early members of Microsoft’s Games User Research team. Our job was to use qualitative and quantitative techniques like usability studies, playtests, and surveys to give design teams insights into how their games would be received after they were released.
This is a story about a time when I failed to be a good prophet, where my attempts to project research data into the future led to a conflict between the research team at Microsoft and the design team at Bungie. Usually, public discussions about games user research focus on the times we were right, the times when data fixed game design. This story is one of the other times, when two otherwise competent researchers drew the wrong conclusions about an innovative piece of game design and made bad recommendations, and how the game succeeded in spite of that.
Prior to Halo 2, most online games didn’t have matchmaking. Instead, the default solution to finding other people to play with online was to use lobbies. Players would select a lobby from a list, reading short descriptions to decide which one was right for them. If the lobby turned out to be occupied by jerks or more talented players, you could back out and choose a new lobby to suit your tastes.
The great advantage of these lobby systems was control. The lobby creator had the ability to set up a highly curated experience, allowing just the maps and game modes that they liked, kicking out players who didn’t play their way. It was routine to see lobbies that proudly announced “no snipers” or “<specific map name> free-for-all only.”
In contrast, the proposed Halo 2 system took almost all choices away, replacing them with a system where players only got to choose the general type of match (e.g. Free for all, Big Team Battle, etc.) and then Bungie would choose the map, gametype, and opponents. The image below is a near-final pre-release screenshot of the “Optimatch hopper.” (The terminology we used proved not to have the same staying power as the system itself.)
Here’s how GameSpy described the Halo 2 system in an article published before the game was released:
“… In an interesting twist, the gametypes, maps, vehicles, and just about everything else are set by Bungie.
“While this might sound weird at first, it’s a good idea for a number of reasons. By guaranteeing that everyone is optimized for the same type of game, Bungie can ensure that all of the games will run smoothly. They can also be positive that all the rankings will be consistent, since nearly everyone will be playing on the same maps with roughly the same number of players. At any time, they can just push some updates to Xbox Live, and everyone will be playing new games. The ranking system is set up by match type, so you might be #25 in the Assault mode, but only #78 in Slayer …”
This description sounds incredibly mundane and obvious now, but that’s because this system succeeded so well that it became the new standard for multiplayer games going forward. Halo 2 won so completely that it’s hard to imagine how online play worked before.
Again, I’m really glad I wasn’t able to kill it.
Our task as researchers was to make sure players would understand the new paradigm. Since it was so different from what our players were used to and from what had been done in the first Halo, we wanted to put the new design in front of real players as early as possible, starting with paper prototypes and written descriptions. Players were shown descriptions and wireframe interfaces for several different options of how they could play multiplayer games, including the new matchmaking system and private games, but not including traditional user-created lobbies.
The overwhelming reaction we got from our participants was, “We understand but we hate it.” Almost unanimously, the players we talked to told us they wanted the level of personal control a lobby system gave them and didn’t think the benefits of the new matchmaking system were worth what they were giving up. It’s hard to imagine now, but the “push one button and trust us” approach came across as creepy and controlling to players who were used to choosing for themselves.
Seeing ourselves as righteous champions of the users, Randy and I went to Bungie and told them that players hated the new design and that we should consider other ways of doing multiplayer. The designers stuck to their guns, insisting that their vision of the future was better than the status quo, and history has proven them absolutely right. Players loved the new system and it became the gold standard for online gameplay. The Halo 2 matchmaking study has been the single biggest “miss” of my career. (Well, my career so far.)
What happened? How did our study produce results so at odds with what actually happened after release? The answer is two intertwined mistakes, one made by the participants and one made by the researchers.
The participants’ mistake was one of “affective forecasting,” guessing how they’d feel in a hypothetical situation. There’s an amazing amount of literature about how bad humans are at estimating how hypothetical situations will affect them emotionally. Even big life changing events such as becoming paraplegic or winning the lottery are difficult to judge in the abstract.
If you’d asked the research team at the time, we would have said, “Of course humans are bad at affective forecasting, but that’s not what we’re doing.” Our study originally began as “Will players understand this system?,” a legitimate research question that we were able to answer with a solid “yes.” But when participants also expressed opinions about the system, we treated those opinions as truth rather than as guesses.
The crux of the problem was that our participants had never experienced an online shooter with real matchmaking. Again, this seems ridiculous now, because matchmaking is a standard feature in every online multiplayer game. But at the time, most of our participants had only played multiplayer on their local network or at best on a dorm network. Less than 16% of Americans had broadband when we ran this study in 2003. We were effectively asking our participants to make a judgment between a known experience (current lobbies) and an unknown experience (fair and accurate online matchmaking in a large online population). For the current system, they understood both the costs and benefits. For the proposed system of matchmaking, they could only really understand what they were giving up. This made the proposed system seem like a much worse change than it actually was.
So when our participants told us that they would not enjoy the system, we as researchers then made our own mistake and conveyed those comments as accurately representing how most players would feel about the system after they’d actually played it. And after some heated arguments and back and forth, Bungie chose to push on ahead with their novel matchmaking system over our objections, which turned out to be exactly the right call.
Of course, when the Bungie designers overruled the research team, they weren’t pointing out our methodological flaws or making an argument against affective forecasting. They had a uniquely clear design vision which had been built on solid principles and then hotly debated within the studio, producing a battle-hardened belief on the part of key engineering and design leaders that this was the way to produce a great multiplayer experience. One of them privately told me afterwards that no possible set of results from this study would have convinced them to change course. Almost always, when I’ve presented research findings to teams and been overruled like this, I get to say “I told you so” in some highly professional way after the game ships. But on Halo 2, the other side of the argument turned out to be correct and the gaming world is better for it.
What we as researchers should have done is be more discriminating in how we presented the results. Our data was true when looked at from a specific angle: “Here’s what some players will say when they hear about the system for the first time.” From there, we could have worked with the team to test different ways of presenting the system to improve that first impression and increase the speed at which players realized the true value of the system. The data wasn’t fundamentally bad, if only we hadn’t taken it at face value.
I’ve thought quite a bit about this incident since, and here are a few of the lessons I’ve taken away. Hopefully, sharing this story will allow others to skip past these particular mistakes and make more interesting new mistakes of their own.
Lesson 1: Sometimes researchers should lose the argument
UX researchers tend to get into the habit of thinking that we are discovering capital T Truth. This can lead to a lot of frustration when other parties in the development process don’t accept our findings. Now, we’re usually right, but false positives, false negatives, and outright mistakes are always possible.
Games user research is a vital voice in the development process even though we’re no more perfect than anyone else involved. We’re supposed to advocate passionately for our understanding of the player experience, but we’re not always meant to win. In fact, I’d argue that, just like for the players in our games, there is an ideal level of failure for researchers. If every study is equally successful, it just means that we aren’t innovating enough or taking on challenging research topics. We need to take risks, and that means we have to lose sometimes.
Lesson 2: Being wrong isn’t the end of the research relationship
The Halo 2 user research effort was an intense experience. Microsoft made a heavy bet on this title, dedicating a level of research bandwidth that would have otherwise supported half a dozen games. Bungie, a studio that was notoriously selective about its partners, took a leap of faith in allowing us unpreceded access to its development process during one of the fiercest crunches in its history. There was enormous pressure on Randy and I to deliver value, to turn user feedback into design impact on a game that was important to so very many people. Faced with a clear message from our participants, terrified that the design was going to negatively affect the experiences of millions of players, we made the decision that this was a fight we needed to have. We were wrong and we lost.
This story happened in 2003, in the middle of the Halo 2 development cycle, and we went on to do quite a number of other successful studies on the game. After it shipped, Randy and I dove into supporting the sequel, Halo 3, which became one of the most successful games user research efforts ever. The same two researchers, the same designers, the same franchise, and our work ended up on the cover of Wired magazine and was a major milestone in the adoption of user research in the games industry. In fact, this study and its failure directly contributed to those subsequent successes. One of the conclusions that Bungie leadership drew was that research needed to be more closely integrated with the design team to prevent this kind of thing from happening again, and that integration was key to our Halo 3 effort. I even went on to be hired directly by Bungie to create and lead their own internal research team a few years later.
Researchers aren’t perfect, but our partners don’t actually need us to be perfect. They need us to honestly represent the player voice to the best of our ability, to push ourselves to innovate and take risks, and to admit and adapt when we’re wrong.
Lesson 3: Research and design operate on the same playing field by different rules
Players can only speak from their own experience, either in their past play or what’s immediately in front of them in the lab. Since a researcher’s job boils down to amplifying the player voice, we share the same limitation. Our prophecies are only as true as what the players are reacting to.
Designers don’t share that limitation. They can come up with ideas that bear little or no relation to what’s come before, which can make those ideas difficult to test early enough to do any good. There are ways to evaluate novel ideas, but as researchers we need to recognize that those ways are much riskier than our other tools and temper our conclusions accordingly.
Ironically, a good counterexample of presenting a novel experience in an understandable way was demonstrated by the Bungie team during Halo 2’s development. The matchmaking system discussed here was only one part of a larger set of new multiplayer features introduced in Halo 2, and many of the other features encountered similar resistance. In order to convince skeptical Microsoft execs, the design team created a video simulating what playing with friends would be like in the final product. While this particular solution wouldn’t have worked for matchmaking, it’s an example of the extra level of effort and creativity it takes to convey novel experiences.
Lesson 4: Making methodological mistakes and facing their consequences is the best way to understand research design
I certainly knew about the problems with affective forecasting before this study. But I still let myself be drawn in by the participants’ strong opinions of the new system and presented their forecasts as facts. Having had that happen once and experienced the humiliating consequences, I’ve been a lot less prone to making that particular error ever since.
It’s all well and good to memorize the principles of good research design, but you will never feel them in your bones until you violate those principles and experience the results first hand. It doesn’t matter if you broke the rules intentionally or if there were extenuating circumstances. Having to throw out days or weeks of hard work due to methodological problems leaves useful scars. Coloring outside the lines can be the best way to learn why the lines were needed in the first place.
But the other thing you discover is that sometimes … you get away with it. Some rules of good research turn out to have the force of natural law, while others are merely guidelines. Games user research is an applied field, done at breakneck speed with limited resources under messy conditions. Not every study is going to be a perfect jewel of experimental design. Stakeholders will push for changes to the study plan, participants will fail to show up, equipment will break, and the job of a researcher becomes about choosing the least damaging way to adjust to circumstances. Understanding which principles have flex to them and which are inviolable makes us better researchers, able to adapt and deliver the greatest value to our teams and our games.
Innovative design means taking risks, and in this case the risks taken by the Bungie design team paid off in spectacular fashion. The larger role of user research in the development process is about helping to offset those design risks, enabling our designers to try new things while detecting and fixing potential problems before they frustrate real players. But inside of our profession, we’re also taking our own little risks, making judgment calls about study designs and which issues are worth fighting for. The particular choices we made in this case didn’t pay off, but that doesn’t change the fact that research risks are necessary. A good researcher will always have to use their experience and their gut to take just the right level of risks so as to be the best possible partners to our designers and produce the best game for our players.
Special thanks to Randy Pagulayan for helping with this article and for being brave enough to share the public shame with me. And thanks to Chris Butcher, David Candland, Curtis Creamer, Max Hoberman, and Jason Jones for helping to refresh my memory about this story and for suggestions on the best way to tell it.
John Hopson is a 16-year veteran of the games industry, having been the lead researcher for games and series such as Halo, Age of Empires, Destiny, World of Warcraft, Overwatch, and Hearthstone. He has a Ph.D. in experimental psychology from Duke University and is the author of a number of articles on the intersection of psychology and games, including the infamous “Behavioral Game Design.” John is currently the head of analytics for ArenaNet. All views expressed here do not represent ArenaNet or its employees.