Category Archives: Motivation

The First Sentence of the Great Analytics Novel

Thedarktower7 I’ve written many times before about the importance of promoting O.R. to the general public. One of the ideas that’s been suggested by several people is the possibility of writing a work of fiction whose main character (our hero) is an O.R./Analytics person. I still believe this is a great idea, if executed properly.

Today, my wife brought to my attention The Bulwer-Lytton Fiction Contest, which, according to their web page, consists of the following:

Since 1982 the English Department at San Jose State University has sponsored the Bulwer-Lytton Fiction Contest, a whimsical literary competition that challenges entrants to compose the opening sentence to the worst of all possible novels. The contest (hereafter referred to as the BLFC) was the brainchild (or Rosemary’s baby) of Professor Scott Rice, whose graduate school excavations unearthed the source of the line “It was a dark and stormy night.” Sentenced to write a seminar paper on a minor Victorian novelist, he chose the man with the funny hyphenated name, Edward George Bulwer-Lytton, who was best known for perpetrating The Last Days of PompeiiEugene AramRienziThe CaxtonsThe Coming Race, and – not least – Paul Clifford, whose famous opener has been plagiarized repeatedly by the cartoon beagle Snoopy. No less impressively, Lytton coined phrases that have become common parlance in our language: “the pen is mightier than the sword,” “the great unwashed,” and “the almighty dollar” (the latter from The Coming Race, now available from Broadview Press).

Just like an awful first sentence can be a good indicator of a terrible book, the converse can also be true. Take, for example, the first sentence of Stephen King’s The Dark Tower series, which I happen to be reading (and loving) as we speak:

The man in black fled across the desert, and the gunslinger followed.

It’s such a strong, mysterious, and captivating sentence…

…which brings me to the point of this post. If it’s going to be difficult to write The Great Analytics Novel, what if we start by thinking about what would be the perfect, most compelling sentence to start such a novel? Yes, I propose a contest. Let’s use our artistic abilities and suggest starting sentences. Feel free to add them as comments to this post. Who knows? Maybe someone will get inspired and start writing the novel.

Here’s mine:

Upon using the word “mathematical” he knew he had lost the battle for, despite the dramatic cost savings, their logical reasoning was instantly halted, like a snowshoe hare frozen in fear of its chief predator: the Canada lynx.

I can’t wait to read your submissions!

3 Comments

Filed under Analytics, Books, Challenge, INFORMS Public Information Committee, Motivation, Promoting OR

Optimally Resting NBA Players

To celebrate the start of the 2013-2014 NBA season this past Tuesday, I decided to write a post on basketball. More specifically, on the important issue of how to give players some much needed rest in an “optimal” way. My inspiration came from an article by Michael Wallace published on ESPN.com on October 19. Here are some relevant excerpts:

After playing in the Miami Heat’s first five preseason games, LeBron James sat out Saturday night’s 121-96 victory over the San Antonio Spurs to rest…James said the decision to sit was part of the team’s “maintenance” process. Heat teammate Dwyane Wade played Saturday and scored 25 points in 26 minutes, but previously skipped three preseason games…”No, no injuries — just not suiting up,” James said. “It’s OK for LeBron to take one off.”

The key term here is maintenance process. You may also recall that, back in November 2012, the Spurs were fined $250,000 by the league after coach Popovich sent Duncan, Parker, Ginobili, and Green home right before a game against the Miami Heat.

So we want to rest our players to keep them healthy, but this cannot come at the expense of losing games. There are many factors to be taken into account here, such as players’ current physical condition, strength and tightness of schedule, and match-ups (how well a team stacks up against another team), to name a few. This is definitely not an easy problem. However, some insight is better than no insight at all. Therefore, let’s see what we can do with a simple O.R. model, and then we can talk about the strengths and weaknesses of our initial approach. (Here’s where you, dear reader, are supposed to chime in!)

Let’s begin with two simple assumptions: (i) when it comes to resting, we have to take players’ individual needs into account, i.e., we’ll use player-specific data; and (ii) when it comes to the likelihood of beating an opposing team, it’s better to think in terms of full lineups, rather than in terms of individual players, i.e., we’ll use lineup-specific data. The data in assumption (i) comes from doctors, players’ medical records, and coaches’ strategies. In essence, it boils down to one number: how many minutes, at most, should each player play in each game, under ideal circumstances. A useful measure of the strength of a lineup is its adjusted plus-minus score (see, for example, the work of Wayne Winston and his book Mathletics). In summary, it’s a number that tells you how many points a given lineup plays above (or below) an average lineup in the league over 48 minutes (or over 100 possessions, or another metric of reference).

For the sake of explanation, I’ll pretend to be in charge of resting Miami Heat players (surprise!). I’ll refer to a generic lineup by the letter i (i=1,\ldots,8), to a generic player by the letter j (j= LeBron, D-Wade, …, Andersen (Bird Man)), and to a generic game by the letter k.

We’re now ready to begin. Fasten your seat belts!

What are the decisions to be made? Let’s consider a planning horizon that consists of the next 7 games (or pick your favorite number). So k=1,\ldots,7. For the Heat, the first 7 games of the 2013-2014 season are against the following teams: Bulls, 76ers, Nets, Wizards, Raptors, Clippers, and Celtics. For each one of my potential lineups i and each game k, I want to figure out the number of minutes I should use lineup i during game k. Because this is an unknown number right now, it’s a variable in the model. Let’s call it x_{ik}. Note it’s also OK to think of x_{ik} as a percentage, rather than minutes. I’ll adopt the latter interpretation.

What are the constraints in this problem? There are three main constraints to worry about: (a) make sure to pick enough lineups to play each game in its entirety; (b) make sure your lineups are good enough to hopefully beat your opponents in each game; (c) keep track of players’ minutes, and don’t let them get out of hand. The next step is to represent each constraint mathematically.

Constraint (a): Pick enough lineups to completely cover each game. For every game k, we want to impose the following constraint:

\displaystyle \sum_{i=1}^{10} x_{ik}=1

This means that if we sum the percentage of time each lineup is used during game k, we reach 100%.

Constraint (b): Choose your lineups so that you expect to score enough points in every game to beat your opponents. In this example, I’ll focus on plus-minus scores, but as a coach you could focus on any metric that matters to you. Given a lineup i, let p_i be its adjusted plus-minus score. For example, the lineup of LeBron, Wade, Bosh, Chalmers, and Allen in the 2012-2013 season had the amazing p_i score of +36.9 (you can obtain these numbers, and many other neat statistics, from the web site stats.nba.com). Now let’s say you have the plus-minus score of your opponent in game k, which we’ll call P_k. One way to increase your chances of victory is by requiring that the expected plus-minus score of your lineup combination in game k exceed P_k by a certain amount. Therefore, for every game k, we write the following constraint:

\displaystyle \sum_{i=1}^{10} p_i x_{ik} \geq P_k + 0.5

I want to emphasize two things. First, p_i can be any measure of goodness of your lineup, and it can take into account the specific opponent in game k. Likewise, P_k can be any measure of goodness of team k, as long as it’s consistent with p_i. Second, you’re not restricted to having only one of these constraints. If many measures of goodness matter to you, add them all in. For example, if you’re playing a team that’s particularly good at rebounding and you believe that rebounding is the key to beating them (e.g. Heat vs. Pacers), then either replace the constraint above with the analogous rebounding version, or include the rebounding version in addition to the constraint above. Finally, note that I picked 0.5 as a fixed amount by which to exceed P_k, but it could be any number you wish, of course. It can even be a number that varies depending on the opponent.

Constraint (c): Keep track of how many minutes your players are playing above and beyond what you’d like them to play. For any given player j and any given game k, let m_{jk} be j‘s ideal number of playing minutes in game k (make it zero if you want the player to sit out). When it’s not possible to match m_{jk} exactly, we need to know how many minutes player j played under or over m_{jk}. Let’s call these two unknown numbers (variables) u_{jk} and o_{jk}, respectively. So, for every player j and game k, we write the following constraint:

\displaystyle 48\left(\sum_{i \text{ that includes } j} x_{ik}\right) + u_{jk} - o_{jk}=m_{jk}

The expression “i that includes j” under the summation means that we’re summing variables x_{ik} for all lineups of which j is a member. We’re multiplying the summation by 48 minutes because x_{ik} is in percentage points and m_{jk} is in minutes.

What is our goal? (a.k.a. objective function) It’s simple: we don’t want players to play too many minutes above m_{jk}. Because this overage amount is captured by variable o_{jk}, we can write our goal as:

\displaystyle \text{minimize } \sum_{j=1}^{9} \sum_{k=1}^{7} o_{jk}

This minimizes the total overage in playing minutes. For a more balanced solution, it’s also possible to minimize the maximum overage over all players, or add weights in front of the o_{jk} variables to give preference to some players.

Now what? Well, the next step would be to solve this model and see what happens. I created a Microsoft Excel spreadsheet that can be solved with Excel Solver or OpenSolver. You can download it from here. Feel free to adapt it to your own needs and play around with it (this is the fun part!). Because my model was limited in size (I can’t use OpenSolver on my Mac at home), the solution isn’t very good (too many overage minutes). However, by adding more players and more lineups, the quality will certainly improve (use OpenSolver to break free from limits on model size). Here are some notes to help you understand the spreadsheet:

  • Variables x_{ik} are in the range B18:H25.
  • Variables u_{jk} and o_{jk} are in ranges B56:J62 and B65:J71, respectively.
  • Constraints (a) are implemented in rows 27, 28, 29.
  • Constraints (b) are implemented in rows 33, 34, 35.
  • The left-hand side of constraints (c) are in the range B74:J80. This range is required to be equal to the range B47:J53 (where the m_{jk} are) inside the Solver window.
  • The objective function whose formula appears above is in cell J21.

What are the pros and cons of this model? Can you make it better? No model is perfect. There are always real-life details that get omitted. The art of modeling is creating a model that is detailed enough to provide useful answers, but not too detailed to the point of requiring an unreasonable amount of time to solve. The definitions of “detailed enough” and “unreasonable amount of time” are mostly client-specific. (What would please Erik Spoelstra and his coaching staff?) What do you think are the main strengths and weaknesses in the model I describe above? What would you change? Good data is a big issue in this particular case. If you don’t like my data, can you propose alternative sources that are practical? I believe there’s plenty to talk about in this context, and I’m looking forward to receiving your feedback. Maybe we can converge to a model that is good enough for me to go knocking on the Miami Heat’s door! (Don’t worry. In the unlikely event they open the door, I’ll share the consulting fees.)

3 Comments

Filed under Applications, Linear Programming, Modeling, Motivation, Sports

Did You See Any OR During Apple’s iPhone 5 Announcement? I did!

On September 12, Apple finally announced its much-awaited iPhone 5. I didn’t have time to watch the keynote speech, but I watched the shorter 7-minute video that’s posted on Apple’s web site featuring Jony Ive, Apple’s Senior Vice President, Design. In that video, at around the 5-minute, 26-second mark, something they said caught my attention: the way they put parts together during the assembly process. I encourage you to watch that part of the video before reading on.

Jony Ive says:

Never before, have we built a product with this extraordinary level of fit and finish. We’ve developed manufacturing processes that are our most complex and ambitious.

And on Apple’s web site, they say this:

During manufacturing, each iPhone 5 aluminum housing is photographed by two high-powered 29MP cameras. A machine then examines the images and compares them against 725 unique inlays to find the most precise match for every single iPhone.

So let’s see if I understood this correctly. In a typical manufacturing operation, the multiple parts that get put together to create a product are put together without much fuss. A machine makes part A, another machine makes part B, and perhaps a robotic arm or a third machine takes any one of the many part A’s that are coming down a conveyor belt and attaches it to any one of the many part B’s that are coming down another conveyor belt. What Apple did was to improve on the “any one” choice. I don’t know if Apple pioneered this idea, I’d say probably not, but this is the first time I hear about something like this. If you’ve seen this before, let me know in the comments.

Before OR comes into play, Computer Science does its job in the form of computer vision / image processing algorithms. The photographs of the parts are analyzed and (I’m guessing) a fitness score is calculated for every possible matching pair of parts A (the housing) and B (the inlay). What happens next? How do they pick the winning match? Here are some possibilities:

  1. Each part A is matched with the part B, among the 725 candidates, that produces the best matching score.
  2. A 725 by 725 matrix of fitness scores is created between 725 parts of type A and 725 parts of type B, and the best 725 matches are chosen so as to maximize the overall fitness score (i.e. the sum of the fitness scores of all the chosen matches).
  3. Proceed as in the previous case, but pick the 725 matches that maximize the minimum fitness score. That is, we worry about the worst case and don’t let the worst match be too bad when compared to the best match.

After these 725 pairs are put together, new sets of parts A and B come down the conveyor belt and the matching process is repeated. Possibility number 1 is the fastest (e.g. do a binary search, or build a priority queue), but not necessarily the best because every now and then a bad match will have to be made. Possibilities 2 (an assignment problem) and 3 (assignment problem with a max-min objective) are better, in my opinion, with the third one being my favorite. They are, however, more time consuming than possibility 1. Jony Ive says the choice is made “instantaneously”, which doesn’t preclude something fancier than possibility 1 from being used given the assignment problems are pretty small.

The result? In the words of Jony Ive:

The variances from product to product, we now measure in microns.

It is well-known that OR plays a very important role in manufacturing (facility layout, machine/job scheduling, etc.) but it’s not every day that people stop to think about what happens in a manufacturing plant. This highly-popular announcement being watched by so many people around the world painted a very clear picture of the kinds of problems high-tech manufacturing facilities face. I think it’s a great example of what OR can do, and how relevant it is to our companies and our lives.

Leave a comment

Filed under Applications, iPhone, Motivation, Promoting OR

Enforcing a Restricted Smoking Policy on the UM Campus: a TSP Variant

The Coral Gables campus of the University of Miami is slowly transitioning into a smoke-free campus. (I can’t wait for that to happen.) Presently, there are a number of designated smoking areas (DSAs) around campus and nobody is supposed to smoke anywhere else. Here’s a map of campus with red dots representing DSAs (right-click on it and open it in a new tab to see a larger version):

Unfortunately, enforcement of this smoking policy is nowhere to be seen. The result? Lots of students smoking wherever they want and, even worse, smoking while walking around campus, which is a great way to maximize their air pollution effect. Don’t you love people who live in the universe of me, myself, and I? But let me stop ranting and return to operations research…

As someone who does not enjoy (and is allergic to) cigarette smoke, I started thinking about how to use OR to help with the enforcement effort. Let’s say there will be an enforcer (uniformed official) whose job is to walk around campus in search of violators. Based on violation reports submitted by students, faculty and staff, the University can draw a second set of colored dots, say black, on the above map. These black dots represent the non-smoking areas in which violations have been reported most often. For simplicity, let’s call them violation areas, or VAs.

In possession of the VA map, what is the enforcer supposed to do? You probably answered “walk around campus visiting each VA”. If you’re now thinking about the Traveling Salesman Problem (TSP), you’re on the right track. The enforcer has to visit each VA and return to his/her starting point. However, this is not quite like a pure TSP. Let me explain why. First of all, unlike the pure TSP, the enforcer has to make multiple passes through the VAs on a single day. Secondly, it’s also likely that some VAs are more popular than others. Therefore, we’d like the enforcer to visit them more often. Finally, we want the multiple visits to each VA to be spread throughout the day. With these considerations in mind, let me define the Smoking Policy Enforcement Problem (SPEP): We are given a set of n locations on a map. For each location i, let v_i be the minimum number of times the enforcer has to visit i during the day, and let s_i be the minimum separation between consecutive visits to location i. In other words, each time the enforcer visits i, he/she has to visit at least s_i other locations before returning to i. The goal is to find a route for the enforcer that satisfies the visitation requirements (v_i and s_i) while minimizing the total distance traveled.

After a few Google searches, I discovered that the SPEP is not a new problem. This shouldn’t have come as a surprise, given the TSP is one of the most studied problems in the history of OR. The article I found, written by R. Cheng, M. Gen, and M. Sasaki, is entitled “Film-copy Deliverer Problem Using Genetic Algorithms” and appeared in Computers and Industrial Engineering 29(1), pp. 549-553, 1995. Here’s how they define the problem:

There are a few minor differences with respect to the SPEP. In the above definition, s_i=1 for every location i. What they call d_i is what I call v_i, and they require exactly d_i visits, whereas I require at least v_i visits.

I wasn’t aware of this TSP variant and I think it’s a very interesting problem. I’m happy to have found yet another application for it. Can you think of other contexts in which this problem appears? Let me know in the comments.

2 Comments

Filed under Applications, Motivation, Traveling Salesman Problem

Choosing Summer Camps for Your Kids

Today I’m going to write about a decision that’s made by many American families each year: how to pick summer camps for our kids. There are several issues to take into account, such as cost, benefit, hours, and kids’ preferences. I’ll introduce an optimization model for summer camp selection through a numerical example. The example portrays a large family, but the same ideas apply if a few smaller families want to get together and solve this problem. This way they can take advantage of the discounts and take turns driving the kids around.

The Joneses have six kids: Amy, Beth, Cathy, David, Earl and Fred (yes, their first names are alphabetically sorted, matching their increasing order of age; Mr. and Mrs. Jones always knew they’d have six kids and hence named their firstborn with an ‘F’ name). This year, they’ve narrowed down their list of potential summer camps to the following ten: Math, Chess, Nature, Crafts, Cooking, Gymnastics, Soccer, Tennis, Diving, and Fishing. The Nature camp takes kids on a hike through the woods with the guidance of a biologist; they make frequent stops upon encountering specific plants and animals, during which a mini science lecture is delivered (pretty cool!). The Cooking camp involves cooking chemistry instruction, à la Alton Brown (also pretty cool).

The following table contains some data related to each camp:

The Cost column indicates the cost per child. The Discount column indicates the percentage discount that each child enrolled after the first would receive on the cost of each camp. For instance, if three children are enrolled in Math camp, the first would cost $1100, and the second and third would cost $770 each (30% less). The Hours column is self-explanatory and the last two columns indicate whether or not that particular camp develops mental and physical abilities, respectively (a value of one = yes, zero=no).

The next table shows some of the child-specific requirements:

For example, the Joneses want Fred to attend at least 3 camps that develop mental abilities, and at least 1 camp that develops physical abilities. The last two columns in the above table indicate the minimum and maximum number of camp hours for each child over the 9-week summer break.

The next thing parents need to take into account are their children’s preferences. So here they are:

The smaller the number in the above table, the more desirable a particular camp is. For example, Amy is a bit of a math nerd, and if we were to flip David’s preference scores for Math and Tennis, he could be classified as a bit of a jock. Some conflicts exist, in the sense that not all camps are compatible with each other in terms of time schedules. In this particular case, let’s assume that no child can attend both the Soccer and Tennis camps, or both the Nature and Soccer camps. Here’s how we are going to use this preference table to create a sense of fairness among the children: whenever a child that prefers camp X to camp Y goes to camp Y and doesn’t go to camp X, nobody else gets to go to camp X either. For example, if Amy goes to Nature camp and isn’t sent to either Math or Chess camp, none of her siblings are allowed to go to Math or Chess either. Conversely, if the Joneses decide to send Earl to Chess camp and Fred to Tennis camp, they must also send Earl to Tennis camp (because Earl prefers Tennis to Chess, and “Fred is going! Why can’t I go too!”). Clearly, there are other ways to use/interpret this table, such as trying to send everyone to at least one of their top N choices, but we won’t consider those alternatives here.

After taking all of the above issues and conditions into account, here’s a solution that satisfies all the requirements while resulting in the minimum cost of $22,180.00 (You guessed it…the Joneses are probably *not* among the 99%):

Amy goes to Math, Crafts, Cooking, and Tennis; Beth goes to Math, Cooking, Tennis, and Fishing; Cathy goes to Math, Crafts, Tennis, and Fishing; David goes to Math, Cooking, and Fishing; Earl goes to Math, Tennis, and Fishing; and Fred goes to Math, Crafts, Cooking, and Tennis. Mmm…interestingly, everyone goes to Math camp. I think the Joneses are on to something…

Depending on your own requirements, preferences, and costs your solution may differ, of course. But this should give you an idea of how this simple problem can easily become very complicated to solve. No need to fear, though! Operations Research is here!

Food for Thought: Here’s an interesting question that helps illustrate how high-quality solutions can be counterintuitive: by looking at the preferences table, we see that everyone prefers Soccer to Tennis. In addition, Soccer camp is less expensive than Tennis camp. So how come we send almost everyone to Tennis camp? Isn’t that strange? Let me know what you think in the comments below! That’s one of the advantages of using an analytical approach to decision making: it helps us find solutions we wouldn’t even consider otherwise because they don’t seem to make sense (at least not at first).

If you’re curious about how I managed to find the optimal solution, read on!

Details of the Analysis:

To find the minimum-cost solution, we can create a mathematical representation of the problem, a.k.a. a model, and then solve this model with the help of a computer. Let’s see how.

The first obvious decision to make is who goes where. So let the binary variable x_{ij} equal 1 when child i goes to camp j, and equal to 0 otherwise. We’ll also need another binary variable y_j that is equal to 1 when at least one child goes to camp j and equal to 0 when none of the children go to camp j. We are now ready to write our objective function and constraints. I’ll refer to the problem data using the column headings of the tables above. The subscript i will always refer to a child, and the subscript j will always refer to a camp.

To minimize the total cost, we write the following objective function:

\displaystyle \min \sum_i \sum_j (1-\mathrm{Discount}_j)\mathrm{Cost}_j x_{ij} + \sum_j \mathrm{Discount}_j \mathrm{Cost}_j y_j

Note how we are using the y_j variable to handle the discount for sending more than one child to camp j: we charge every child the discounted price in the double summation and add the discount back in only once if y_j=1.

Now we have to deal with the four requirements: minimum and maximum hours, mental activity, and physical activity. For every child i, we have to write the following four constraints:

\displaystyle \sum_j \mathrm{Hours}_j x_{ij} \geq \mathrm{MinTimeReq}_i

\displaystyle \sum_j \mathrm{Hours}_j x_{ij} \leq \mathrm{MaxTimeReq}_i

\displaystyle \sum_j \mathrm{IsMental}_j x_{ij} \geq \mathrm{MentalReq}_i

\displaystyle \sum_j \mathrm{IsPhysical}_j x_{ij} \geq \mathrm{PhysicalReq}_i

Next, we enforce the preference rules. Let’s recall the example involving Earl and Fred: if Earl goes to Chess camp and someone else (it doesn’t matter who) goes to Tennis camp, then Earl has to go to Tennis camp as well. Here’s what this constraint would look like:

x_{\mathrm{Earl},\mathrm{Chess}} + y_{\mathrm{Tennis}} - x_{\mathrm{Earl},\mathrm{Tennis}} \leq 1

Of course, we have to repeat this constraint for every child i and every pair of camps j_1 and j_2 such that child i prefers j_1 to j_2 in the following way:

x_{ij_2} + y_{j_1} - x_{ij_1} \leq 1

The camp compatibility constraints say that no child i can attend both Soccer and Tennis, or both Nature and Soccer, therefore:

x_{i,\mathrm{Soccer}} + x_{i,\mathrm{Tennis}} \leq 1

x_{i,\mathrm{Nature}} + x_{i,\mathrm{Soccer}} \leq 1

Finally, we need to relate the x_{ij} and y_j variables by stating that unless y_j=1 , no x_{ij} can be equal to 1 . So we write the following constraint for all values of i and j :

x_{ij} \leq y_j

And that’s the end of our model. Here’s a representation of this mathematical model in AMPL in case you want to play with it yourself. This is the model I used to obtain the numerical results reported above. Enjoy!

5 Comments

Filed under Applications, INFORMS Monthly Blog Challenge, Integer Programming, Mathematical Programming, Modeling, Motivation, Promoting OR, Summer camp

Political Districting and OR

When I think about how Operations Research can contribute to the world of politics, one of the first things that come to mind is the redistricting process which, of course, has been receiving a lot of attention after the 2010 census. Here’s an excerpt from the census.gov web site:

Another major use for decennial census data is for geographically defining state legislative districts, a “redistricting” process that begins in 2011. The census data allow state officials to realign congressional and state legislative districts in their states, taking into account population shifts since the last census and assuring equal representation for their constituents in compliance with the “one-person, one-vote” principle of the 1965 Voting Rights Act.

One of my favorite articles on the topic is “An Optimzation-Based Heuristic for Political Districting” by Anuj Mehrotra, Ellis Johnson and George Nemhauser, which appeared in Management Science in 1998 (henceforth referred to as the MJN paper). For a districting plan to be acceptable, it has to be fair/unbiased (to avoid gerrymandering), where fairness can be defined in terms of criteria that districts have to satisfy, such as population equality, contiguity, and compactness. Contiguity means that the district cannot have holes in it. Compactness is a bit more difficult to define, but it essentially means that districts should look more like a circle or square than like a snake or a tree branch. For example, would you say that this state senate district in Florida is compact?

Another important issue is that the methodology used to create the districts be transparent. This is a strong point in favor of using mathematical techniques such as those of Operations Research. The algorithm and software code used to create the districts can be made publicly available for anyone to inspect. Specific questions about the methodology can be answered precisely and unambiguously. There’s no beating around the bush. I applaud efforts such as the Public Mapping Project, which attempts to make the districting process more transparent and increase public participation. Nevertheless, given how difficult the problem is (in both practical and mathematical terms), I believe that it would also be important to consider plans created with the help of OR. As no mathematical model is perfect, those plans will probably need to be fine-tuned by a human being using tools such as this district builder. Of crucial importance, however, is the need for clear rules to guide such modifications. When is it OK to move part of a district’s border? by how much? etc. Given that current OR software tools are capable of producing multiple high-quality solutions at the end of a run, perhaps an even better alternative is to present the decision makers with a myriad of OR-generated and man-made plans, and let them pick the most adequate one. This practice is common in sports scheduling, where the leagues typically ask schedulers to provide them with many alternative solutions to choose from.

At this point, I’d like to switch gears and explore the districting problem in more detail from an OR point of view (using some ideas from the MJN paper). Districts are made up of smaller geographical units, for example counties (or pieces of counties). For illustration purposes, if a state like Ohio, which has 88 counties, is creating 18 congressional districts and those districts are to be made up of counties only, the potential number of distinct districts is

{88 \choose 18} = 2,\!418,\!561,\!960,\!739,\!869,\!780

That number is greater than the number of miles I’d cover if I could travel at the speed of light for 400,000 years! (Without a bathroom break!) Obviously, many of those districts are not legitimate because they violate contiguity and/or compactness constraints. However, even if those invalid districts are eliminated, the remaining number of possible districts is still very large. Ideally, however, we’d like to use district building blocks that are much smaller than counties, such as zip codes, but that is still very challenging (and would increase the above number even more!). In the MJN paper, the authors create an initial plan using counties as building blocks and then fine tune the districts’ borders to equalize populations. Here are the “before” and “after” districting plans for South Carolina:

Pretty good improvement, huh? Their basic methodology is to associate one decision with each potential district indicating whether or not it is part of the final plan (a binary variable). The optimization model then selects a subset of the districts that covers each county exactly once (a set partitioning problem). Because the number of variables is huge (as we’ve seen above), they resort to a technique called branch-and-price. In short, branch-and-price solves an optimization problem by considering only a subset of its variables at a time and by adding in promising variables little-by-little as the search progresses. It’s possible to define mathematically what a missing “good” variable is, and to construct one (i.e. a new district) on-the-fly. During this process, many set partitioning problems have to be solved. Because these intermediate problems are not always solved to optimality, the final solution obtained in MJN, albeit very good, isn’t technically an optimal solution, and that’s why they use the word “heuristic” in the title of their paper.

Acknowledgment: I’d like to thank Brady Hunsaker for providing a nice collection of links in the comments section of this blog post by Michael Trick.

Update: Here is another post I came across this week that also talks about using OR for political districting.

9 Comments

Filed under Applications, INFORMS Monthly Blog Challenge, Integer Programming, Modeling, Motivation, Politics, Promoting OR

Ten Freshmen + 46 Slides = 1 Hour of OR Fun

I just finished my presentation to business undergraduate students and, from what I could tell by looking at them, I think it was successful. Of course the real test will be whether someone stops by my office saying “I love OR! Can I work with you?”. I want to thank our vice dean for this opportunity and I am looking forward to doing it again next year.

I closed the presentation with a little “quiz” based on a very nice paper by Brown, Klein, Rosenthal and Washburn entitled Steaming on Convex Hulls. Here’s how it goes (you can open the image on a new window to make it larger):

An aircraft carrier can run with 2 or 4 engines online. The graph below shows gallons of gasoline used per hour versus possible speeds for each engine configuration. How would you run the ship to cover 100 miles in 4 hours?

According to the article, the Navy spends over 1 billion dollars a year on surface combatants alone. An officer who became a ship commander after graduating from the academy was smart enough to solve the above problem the right way. His ship was saving so much fuel that it had to be inspected under the suspicion that it was violating safety regulations. But we all know it wasn’t. It was just a case of using analytical techniques to make better decisions.

8 Comments

Filed under Applications, Linear Programming, Motivation, Promoting OR

Getting Freshmen Excited About OR

The vice dean for undergraduate programs at the school of business asked me to make a presentation to a group of freshmen. My job is to tell them about the field of Management Science and the research that goes on in my department. The main goal is to get these students excited about research early on. Hopefully, they’ll get involved in undergraduate research projects and even consider joining our PhD program further down the road. My understanding is that every department in the school will make a similar presentation, but I’ll tell the students that OR is by far “the coolest topic” (sorry “other departments”, but I think it is!).

I think this is a great idea, especially because Management Science (or OR) is not a required class for all business majors and I believe that every business school graduate should at least know what OR is and what it can do for you (fortunately, OR is a required class for all MBA students in our school).

I’m putting together a presentation with the following outline:

  1. Introduction (who I am, my background, etc.)
  2. What is Management Science? (that’s where I tell them to use the name OR instead :-)
  3. Real-life applications of OR
  4. Research interests of the Management Science department (with a focus on my interests, at the request of the vice dean)
  5. Research opportunities for undergraduate students

The room is booked for 1 hour and 45 minutes, and I was told I can use as much time as I want. Boy, that’s a lot! For item number 3, I’ll pick a diverse collection of applications covering a wide range of topics. For item 4, I’ll tell them about things my colleagues have worked on, things I’ve done, and things I’m currently doing. Then I’ll move on to item 5 and close the presentation with problems on which I’d like to work with an undergraduate student (nothing that requires advanced OR knowledge, of course). One caveat is that I must tell them that my projects require some knowledge of computer programming and basic understanding of linear and integer programming (which they could get by taking one of our classes or by reading on their own).

I’ve also put together a Google document entitled A Hyperlinked Introduction to the World of Operations Research and Management Science, which I’m going to hand out at the end of the talk.

The purpose of this document is to function as an organized list of links to OR resources and interesting applications that the students can easily navigate to. It contains a superset of the real-world applications I’m going to tell them about, and it’s supposed to complement my talk. I hope this turns out to be useful to other people as well. Feel free to use it and let me know if you have any suggestions for improvement. I’m sure there are many interesting links that I forgot to include there.

6 Comments

Filed under Motivation, Promoting OR, Research

An OR Song: Math’s in Hiding

One of my goals with this blog is to help spread the word about the fabulous field of Operations Research (OR). I’m saddened, however, by how much the average person still hates mathematics and sees it as being of no use outside the classroom. Popular movies and TV shows (with very few exceptions) don’t portray math positively either (see, for example, this post by my former Carnegie Mellon classmate Abraham Flaxman).

Why do most people think it’s OK to hate math? I bet that if someone came on TV and said “I hate History” it would be a big deal, and critics would promptly jump in to provide explanations about the merits of the field. But hating math is commonplace; it’s “normal”. I beg to differ. I believe that it’s just a matter of teaching it differently and showing students, from the very beginning, all the wonderful things that math can accomplish. This is a crucial change that needs to begin in high school (if not earlier).

So I decided to write a song. More precisely, I decided to write alternative lyrics for an existing song. After all, everyone likes music, right? The song I chose to hijack was “Hallelujah”, written by Leonard Cohen in the 80′s. It’s a beautiful song that has been recorded by many different artists, the late Jeff Buckley being among the most popular ones (here’s a YouTube video). Personally, I prefer Allison Crowe’s interpretation because it’s a little more dramatic.

Before I show you the lyrics, a few comments are in order:

  • Although the song is about OR, I use the word “math” throughout because it is a more recognizable word. Besides, OR is about using math to improve decision making, and part of my goal is to help people realize that (there’s a shout out to INFORMS at the end :-)
  • I decided to refer to math as “she” because (i) it creates a parallel to the “she” in the original lyrics; (ii) this kind of anthropomorphism has been used before; (iii) it sounds more melodic and poetic than using “it”. However, if you are offended by that, or think this is politically incorrect, just replace “she/her” with “it/its”.
  • I’ve put the original Hallelujah lyrics side-by-side with my lyrics to make it easier for you to sing it. I chose the words carefully so that the cadence/tempo would remain the same. Trust me. I’ve tried it myself and it works.
  • I only have the lyrics so far (no recording) because I’m not a good singer, and I haven’t found anyone to sing it yet. I’d like to have a YouTube video clip with the lyrics as subtitles and a photo slide show, in sync with lyrics, showing examples of OR success stories.
  • I’ve included a few footnote questions about subtle references that appear in the lyrics (OK, I admit, most of them are easy). I figured it would be a fun little quiz anyway. If you know the answers, write them down in the comments below.

So, without further ado, here it is: Math’s in Hiding. Enjoy! And let me know what you think. I’d love to hear your thoughts!

Going back to the topic of who could record this: ideally it should be someone who is familiar with OR. Well, I just happen to know the perfect person. Her name is Lucia. In addition to being a very talented singer and song writer (her debut album was released last year and she recently sang the national anthem at a Marlins game), she was my student in the MBA Optimization class a few years ago. Wouldn’t it be great to have an OR-themed concert at an INFORMS meeting with free tickets distributed to the general public?

Regardless of whether my lyrics are good or bad, I believe that songs like this are another way for us to try to plant the seed of love-for-math in the heart of future generations. At least it’s worth a try.

6 Comments

Filed under Motivation, Music, Promoting OR, Teaching

Don’t Get Discouraged! It Happens to Everyone

Once in a while, I feel a bit tired and discouraged, especially when I find myself stuck while trying to solve a particular problem. I attack it from different angles, try special cases, write down thousands of lines of code, and the little brat doesn’t give in. Sometimes, the best conclusion I am able to obtain is “none of these methods work”. Frustration sets in and, many years ago, I used to think to myself: “I wish I were as smart as person X; they never run into trouble like this.” But guess what? They do! An extremely successful researcher, whom I greatly respect and admire, once told me this:

At least half of my research has been unsuccessful.

Here’s another quote that I enjoy very much. It’s by Egon Balas and it appears in the epilogue of his book Will to Freedom:

…This is the flavor of mathematical discovery. It is an uneven process that often becomes hectic, with periods of sleepless of half-sleepless nights. It requires the kind of passionate concentration in the grip of which you forget about everything else for a while. To be successful at it, you must have “fire in your belly.” And it certainly helps if your basic inclination is to persist and not give up in the face of difficulties, not to become dejected in case of setbacks, but to try again and again until you manage to find the right way.

In the same book, Balas refers to a former colleague and collaborator of his, Grigore Moisil (a famous Romanian algebraist), who had some interesting views on how to do mathematics:

Mathematics is not necessarily done at your desk. Mathematics is done when you wake up in the morning and do not immediately get out of bed; it’s done in the bathtub; it’s done while sitting on the toilet; it’s done while you are dressing; and it’s done while you are taking a walk.

I agree with Moisil. That’s part of the fun with math; you can work on it just about anywhere. (I’m particularly fond of all kinds of waiting rooms: doctor’s/dentist’s office, airport gates, nail salon while waiting for wifey, etc.) In fact, changing your work environment may actually help. I’ve had many good ideas away from my desk.

Do you have a motivational quote or text to which you refer when feeling a bit discouraged? I’d love to read about them in the comments.

6 Comments

Filed under Motivation, Research