Amazon released its 20 sites to the public. It inched ever closer to picking a new home for its second headquarters.
Then came the question from on high. On high being Senior Vice President of News Bill Church. He wanted to know how GateHouse Media could tell the Amazon HQ2 story from a new angle and how we could make it fun.
GateHouse had a vested interest in the story since we own several publications in finalist cities or their surrounding areas or states.
For the answer to that question, Church turned to Data Projects Editor Emily Le Coz and Director of Innovation Tony Elkins.
Le Coz and Elkins gathered their team: data reporter Lucille Sherman, de//space (GateHouse's innovation lab) project designers Mara Corbett and Tyson Bird and developer Dak Le.
It started with a couple of blue-sky comments as we talked about all the angles other media outlets already covered — the positives, the negatives, the expert analysis — and evolved into the most technically advanced project de//space has taken on to date (gatehousenews.com/amazon).
We knew we didn't want to tell the same story as everyone else. We wanted to base our story strictly off the data, and more importantly, we wanted the reader to be in charge. It had to be a game.
We designed a few intro screens to begin the storytelling process. We had a look and the game was beginning to take shape, but we need a powerful algorithm to drive it.
It would have to score the cities based on the users' choices. And because we wanted the users to drive the story, their choices — no matter how misguided — must result in a winning site generated from real data.
At the same time, we had to figure out how the website would automatically score the users' results and compare them to other cities. We also had to account for the fact that every driver and question is different — some numeric, some subjective.
Le Coz, Sherman and Le spent a lot of time thinking and came up with a solution to address this problem: Ask the user to rank the drivers at the start, then assign points using the initial driver as a weight. That way, the user's belief about what site should win does have some bearing, but it's being compared against the hard data Le Coz and Sherman collected.
This was the right option for the scoring function. However, it added a great deal of complexity because there isn't one way to start the game — there are seven ways. The code had to be rewritten to be more dynamic, because it wasn't a straightforward pace from start to finish. The entire site is dependent on your selection in the driver ranking.
After the first beta test, we discovered another issue: Some cities (New York) rank really high. New York has a large, diverse population that takes transit, is environmentally conscious, is educated, etcetera. Regardless of the driver rank, New York scores high.
We had to re-evaluate the algorithm. In the end, we gave the subjective questions a weight of their own so if you think a site city should have HQ2, the score function assigns more points to smaller site. This helps the non-New York sites have a chance when they get beaten out in things like connectivity or diversity.
In the end, the code behind the scoring function works like this:
The user ranking of drivers (1-7) is written into an array based on the order they select. If you rank sustainability as #1, the first item in the array is sustainability. This array is used to organize the presentation of the questions. In this same example, it means you'll see sustainability first. It also means that sustainability will get the highest multiplier (weight) in the scoring function.
On the backend, there are 21 arrays with each site ranked inside the array (all 20 cities inside 21 arrays). Le Coz and Sherman did the heavy lifting — it allows for the site that "best matches" to receive the most points. For example, in the cell service question, because New York and Newark have near 95% cell coverage, they receive the maximum points.
Within each question, the answers are assigned a numeric value from 5 - 1. In most cases, "5" value questions are things like "matters a lot" and 1 is "doesn't matter". When the right advance arrow is clicked, the value of the answer (5 through 1) is multiplied by the factor specified by the driver (first driver x7, second x6, etc) and this score is assigned to the site array described above. The point value is reduced by .25 for each site in the site array, so the site that best matches receives the maximum points, and the site that least matches receives the least points.
There are two exceptions to the above. First, some sites have the exact same data value. This is often the case with Washington, D.C. and Northern Virginia, because they’re both in the same metro area. In this case, the .25 reduction function is removed. The algorithm assigns a point value to D.C., then assigns that same point value to Northern Virginia, then reduces by .25 when moving on to the next site.
The second exception is on "subjective" questions, like site size. These points are assigned in a "ripple" out from the user selection. For example, if the user selects "medium metro", the medium cities receive 5 (multiplied by the special subjective factor) and the largest and smallest cities receive less than 1.
On the last set of questions, clicking the right advance arrow fires a chain of functions that totals up the sites’ scores from the previous questions, ranks the sites based on point value, chooses the top three sites, and calls to a Google Sheet to pull in the appropriate data points that match the winning site. It's for this reason that you sometime see a half-second "glitch" where the screen displays the site but not the state or data points. We're grabbing a lot of data from a Google sheet. The site has an added PHP function that writes the user's winner to a SQL table so we can return the number of times that site has won. All sites have won at least once.
The story was cleverly written to have spaces where unique numbers can go without issue. It reads the same for everyone — the only difference is the numbers.
We needed the writing and design and UX to engage readers through all 21 questions, or it wouldn't matter how complex the algorithm was. And that meant we couldn't be timid. Or expected. Or cluttered.
The final site reflects a strong sense of personality and style. We even spent probably more time than we should have designing share cards for each winning site to share on multiple social platforms.
In the end, we met our goal. We told the story everyone else was telling, but in a unique way. And more importantly, we put the reader in charge of the data and they got to be Jeff Bezos, without the paycheck that comes along with it unfortunately.
TONY ELKINS is the Director of Innovation for GateHouse Media in Austin, Texas, and runs de//space, the company's innovation lab. He is the former Assistant Managing Editor of the Sarasota Herald-Tribune and contributed to the teams that won two Pulitzer prizes as well as two additional finalists.
MARA CORBETT is a projects designer with a focus on user experience, project management and typography. She graduated from Syracuse University with a degree in graphic design after spending four years involved with student media, including one year as Editor in Chief of The Daily Orange, the award-winning independent student newspaper of Syracuse, New York.
TYSON BIRD is a projects designer with a focus on development, video and data visualization. Originally from Sandpoint, Idaho, he studied journalism graphics and entrepreneurial management at Ball State University. He now blurs the line between design and development, bringing news design and video to life online and learning new skills as needed for projects.