Category Archives: Integer Programming

Bringing Research into the Classroom: Can Relevant and Impactful be Easy to Explain?

math-equation_chalkboard O.R. researchers and practitioners are constantly churning out papers that tackle a wide variety of important and hard-to-solve practical problems. On one hand, as a researcher, I understand how difficult these problems can be and how it’s often the case that fancy math and complex algorithms need to be used. On the other hand, as someone who teaches optimization to MBA students who aren’t easily excited by mathematics, I’m always looking for motivational examples that are both interesting and not too complex to be understood in 5 minutes. (That’s the little slot of time I reserve at the beginning of my lectures to go over an application before the lecture itself starts.)

Every now and then, I come across a paper that fits the bill perfectly: it addresses an important problem, produces impactful results, and (here comes the rare part), accomplishes the previous two goals by using math that my MBA students can follow 100%, while being confident that they themselves could replicate it given what they learned in my course (the optimization models).

The paper to which I’m referring has recently appeared in Operations Research (Articles in Advance, January 2017): The Impact of Linear Optimization on Promotion Planning, by Maxime C. Cohen, Ngai-Hang Zachary Leung, Kiran Panchamgam, Georgia Perakis, and Anthony Smith (http://dx.doi.org/10.1287/opre.2016.1573).

If I had to pick one word to describe this paper, it would be BEAUTIFUL.

I immediately proceeded to put together a 5-minute summary presentation (8 slides) to cover the problem, approach, and results. I’ll be showing this to 100 of my MBA students on this coming Tuesday (Valentine’s Day!). I hope they love it as much as I did. Feel free to show this presentation to your own students if you wish, and let me know how it went down in the comments.

A recent Poets & Quants article explains how business schools with the highest quality teaching strive to bring their faculty’s research into the classroom so that students get to learn the latest and greatest ideas. The O.R. paper above is a perfect example of when this can be done effectively.

Advertisements

Leave a comment

Filed under Analytics, Applications, Integer Programming, Linear Programming, Modeling, Motivation, Promoting OR, Research, Teaching

How to Build the Best Fantasy Football Team, Part 2

UPDATE on 10/5/2015: Explained how to model a requirement of baseball leagues (Requirement 4).

UPDATE on 10/8/2015: Explained how to model a different objective function (Requirement 5).

 

fantasy-football-ringYesterday, I wrote a post describing an optimization model for picking a set of players for a fantasy football team that maximizes the teams’ point projection, while respecting a given budget and team composition constraints. In this post I’ll assume you’re familiar with that model. (If you are not, please spend a few minutes reading this first.)

Fellow O.R. blogger and Analytics expert Matthew Galati pointed out that my model did not include all of the team-building constraints that appear on popular fantasy football web sites. Therefore, I’m writing this follow-up post to address this issue. (Thanks, Matthew!) My MBA student Kevin Bustillo was kind enough to compile a list of rules from three sites for me. (Thanks, Kevin!) After looking at them, it seems my previous model fails to deal with three kinds of requirements:

  1. Rosters must include players from at least N_1 different NFL teams (N_1=2 for Draft Kings and N_1=3 for both Fan Duel and Yahoo!).
  2. Rosters cannot have more than N_2 players from the same team (N_2=4 for Fan Duel and N_2=6 for Yahoo! Draft Kings does not seem to have this requirement).
  3. Players in the roster must represent at least N_3 different football games (Only Draft Kings seems to have this requirement, with N_3=2).

Let’s see what the math would look like for each of the three requirements above. (Converting this math into Excel formulas shouldn’t be a problem if you follow the methodology I used in my previous post.) I’ll be using the same variables I had before (recall that binary variable x_i indicates whether or not player i is on the team).

Requirement 1

Last time I checked, the NFL had 32 teams, so let’s index them with the letter j=1,2,\ldots,32 and create 32 new binary variables called y_j, each of which is equal to 1 when at least one player from team j is on our team, and equal to zero otherwise. The requirement that our team must include players from at least N_1 teams can be written as this constraint:

\displaystyle \sum_{j=1}^{32} y_j \geq N_1

The above constraint alone, however, won’t do anything unless the y_j variables are connected with the x_i variables via additional constraints. The behavior that we want to enforce is that a given y_j can only be allowed to equal 1, if at least one of the players from team j has its corresponding x variable equal to 1. To make this happen, we add the constraint below for each team j:

\displaystyle y_j \leq \sum_{\text{all players } i \text{ that belong to team } j} x_i

For example, if the Miami Dolphins are team number 1 and their players are numbered from 1 to 20, this constraint would look like this: y_1 \leq x_1 + x_2 + \cdots + x_{20}

Requirement 2

Repeat the following constraint for every team j:

\displaystyle \sum_{\text{all players } i \text{ that belong to team } j} x_i \leq N_2

Assuming again that the first 2o players represent all the players from the Miami Dolphins, this constraint on Fan Duel would look like this: x_1 + x_2 + \cdots + x_{20} \leq 4

Requirement 3

My understanding of this requirement is that it applies to short-term leagues that get decided after a given collection of games takes place (it could even be a single-day league). This could be implemented in a way that’s very similar to what I did for requirement 1. Create one binary z_g variable for each game g. It will be equal to 1 if your team includes at least one player who’s participating in game g, and equal to zero otherwise. Then, you need this constraint

\displaystyle \sum_{\text{all games } g} z_g \geq N_3

as well as the constraint below repeated for each game g:

\displaystyle z_g \leq \sum_{\text{all players } i \text{ that participate in game } g} x_i


Additional Requirements Submitted by Readers

I earlier claimed that this model can be adapted to fit fantasy leagues other than football. So here’s a question I received from one of my readers:

For fantasy baseball, some players can play multiple positions. E.g. Miguel Cabrera can play 1B or 3B. I currently use OpenSolver for DFS and haven’t found a good way to incorporate this into my model. Any ideas?

Let’s call this…

Requirement 4: What if some players can be added to the team at one of several positions?

Here’s how to take care of this. Given a player i, let the index t=1,2,\ldots,T_i represent the different positions he/she can play. Instead of having a binary variable x_i representing whether or not i is on the team, we have binary variables x_{it} (as many as there are possible values for t) representing whether or not player i is on the team at position t. Because a player can either not be picked or picked to play one position, we need the following constraint for each of these multi-position players:

\displaystyle \sum_{t=1}^{T_i} x_{it} \leq 1

Because we have replaced x_i with a collection of x_{it}‘s, we need to replace all occurrences of x_i in our model with (x_{i1} + x_{i2} + \cdots + x_{iT_i}).

In the Miguel Cabrera example above, let’s say Cabrera’s player ID (the index i) is 3, and that t=1 represents the first-base position, and t=2 represents the third-base position. The constraint above would become

x_{31} + x_{32} \leq 1

And we would replace all occurrences of x_3 in our model with (x_{31} + x_{32}).

That’s it!


Reader rs181602 asked me the following question:

I was wondering, is there a way to add an additional constraint that maximizes the minimum rating of the chosen players, if each player has some rating score. I tried to think that out, but can’t seem to get it to be linear.

Let’s call this…

Requirement 5: What if I want to maximize the point projection of the worst player on the team? (In other words, how do I make my worst player as good as possible?)

It’s possible to write a linear model to accomplish this. Technically speaking, we would be changing the objective function from maximizing the total point projection of all players on the team to maximizing the point projection of the worst player on the team. (There’s a way to do both together (sort of). I’ll say a few words about that later on.)

Here we go. Because we don’t know what the projection of the worst player is, let’s create a variable to represent it and call it z. The objective then becomes:

\max z

You might have imagined, however, that this isn’t enough. We defined in words what we want z to be, but we still need formulas to make z behave the way we want. Let M be the largest point projection among all players that could potentially be on our team. It should be clear to you that the constraint z\leq M is a valid ceiling on the value of z. In fact, the value of z will be limited above by 9 values/ceilings: the 9 point projections of the players on the team. We want the lowest of these ceilings to be as high as possible.

When a player i is not on the team (x_i=0), his point projection p_i should not interfere with the value of z. When player i is on the team (x_i=1), we would like p_i to become a ceiling for z, by enforcing z\leq p_i. The way to make this happen is to write a constraint that changes its behavior depending on the value of x_i, as follows:

z \leq p_ix_i + M(1-x_i)

We need one of these for each player. To see why the constraint above works, consider the two possibilities for x_i. When x_i=0 (player not on the team), the constraint reduces to z\leq M (the obvious ceiling), and when x_i=1 (player on the team), the constraint reduces to z\leq p_i (the ceiling we want to push up).

BONUS: What if I want, among all possible teams that have the maximum total point projection, the one team whose worst player is as good as possible? To do this, you solve two optimization problems. First solve the original model maximizing the total point projection. Then switch to this \max z model and include a constraint saying that the total point projection of your team (the objective formula of the first model) should equal the total maximum value you found earlier.

That’s it!


And that does it, folks!

Does your league have other requirements I have not addressed here? If so, let me know in the comments. I’m sure most (if not all) of them can be incorporated.

24 Comments

Filed under Analytics, Applications, Integer Programming, Modeling, Motivation, Sports

How to Build the Best Fantasy Football Team

Note 1: This is Part 1 of a two-part post on building fantasy league teams. Read this first and then read Part 2 here.

Note 2: Although the title says “Fantasy Football”, the model I describe below can, in principle, be modified to fit any fantasy league for any sport.

footballI’ve been recently approached by several people (some students, some friends) regarding the creation of optimal teams for fantasy football leagues. With the recent surge of betting sites like Fan Duel and Draft Kings, this has become a multi-million (or should I say, billion?) dollar industry. So I figured I’d write down a simple recipe to help everybody out. We’re about to use Prescriptive Analytics to bet on sports. Are you ready? Let’s do this! I’ll start with the math model and then show you how to make it all work using a spreadsheet.

The Rules

The fantasy football team rules state that a team must consist of:

  • 1 quarterback (QB)
  • 2 running backs (RB)
  • 3 wide receivers (WR)
  • 1 tight end (TE)
  • 1 kicker
  • 1 defense

Some leagues also have what’s called a “flex player”, which could be either a RB, WR, or TE. I’ll explain how to handle the flex player below. In addition, players have a cost and the person creating the team has a budget, call it B, to abide by (usually B is $50,000 or $60,000).

The Data

For each player i, we are given the cost mentioned above, call it c_i, and a point projection p_i. The latter is an estimate of how many points we expect that player to score in a given week or game. When it comes to the defense, although it doesn’t always score, there’s also a way to calculate points for it (e.g. points prevented). How do these point projections get calculated, you may ask? This is where Predictive Analytics come into play. It’s essentially forecasting. You look at past/recent performance, you look at the upcoming opponent, you look at players’ health, etc. There are web sites that provide you with these projections, or you can calculate your own. The more accurate you are at these predictions, the more likely you are to cash in on the bets. Here, we’ll take these numbers as given.

The Optimization Model

The main decisions to be made are simple: which players should be on our team? This can be modeled as a yes/no decision variable for each player. So let’s create a binary variable called x_i which can only take two values: it’s equal to the value 1 when player i is on our team, and it’s equal to the value zero when player i is not on our team. The value of i (the player ID) ranges from 1 to the total number of players available to us.

Our objective is to create a team with the largest possible aggregate value of projected points. That is, we want to maximize the sum of point projections of all players we include on the team. This formula looks like this:

\max \displaystyle \sum_{\text{all } i} p_i x_i

The formula above works because when a player is on the team (x_i=1), its p_i gets multiplied by one and is added to the sum, and when a player isn’t on the team (x_i=0) its p_i gets multiplied by zero and doesn’t get added to the final sum. The mechanism I just described is the main idea behind what makes all formulas in this model work. For example, if the point predictions for the first 3 players are 12, 20, and 10, the maximization function start as: \max 12x_1 + 20x_2 + 10x_3 + \cdots

The budget constraint can be written by saying that the sum of the costs of all players on our team has to be less than or equal to our budget B, like this:

\displaystyle \sum_{\text{all }i} c_i x_i \leq B

For example, if the first 3 players cost 9000, 8500, and 11000, and our budget is 60,000, the above formula would look like this: 9000x_1 + 8500x_2 + 11000x_3 + \cdots \leq 60000.

To enforce that the team has the right number of players in each position, we do it position by position. For example, to require that the team have one quarterback, we write:

\displaystyle \sum_{\text{all } i \text{ that are quarterbacks}} x_i = 1

To require that the team have two running backs and three wide receivers, we write:

\displaystyle \sum_{\text{all } i \text{ that are running backs}} x_i = 2

\displaystyle \sum_{\text{all } i \text{ that are wide receivers}} x_i = 3

The constraints for the remaining positions would be:

\displaystyle \sum_{\text{all } i \text{ that are tight ends}} x_i = 1

\displaystyle \sum_{\text{all } i \text{ that are kickers}} x_i = 1

\displaystyle \sum_{\text{all } i \text{ that are defenses}} x_i = 1

The Curious Case of the Flex Player

The flex player adds an interesting twist to this model. It’s a player that, if I understand correctly, takes the place of the kicker (meaning we would not have the kicker constraint above) and can be either a RB, WR, or TE. Therefore, right away, we have a new decision to make: what kind of player should the flex be? Let’s create three new yes/no variables to represent this decision: f_{\text{RB}}, f_{\text{WR}}, and f_{\text{TE}}. These variables mean, respectively: is the flex RB?, is the flex WR?, and is the flex TE? To indicate that only one of these things can be true, we write the constraint below:

f_{\text{RB}} + f_{\text{WR}} + f_{\text{TE}} = 1

In addition, having a flex player is equivalent to increasing the right-hand side of the constraints that count the number of RB, WR, and TE by one, but only for a single one of those constraints. We achieve this by changing these constraints from the format they had above to the following:

\displaystyle \sum_{\text{all } i \text{ that are running backs}} x_i = 2 + f_{\text{RB}}

\displaystyle \sum_{\text{all } i \text{ that are wide receivers}} x_i = 3 + f_{\text{WR}}

\displaystyle \sum_{\text{all } i \text{ that are tight ends}} x_i = 1 + f_{\text{TE}}

Note that because only one of the f variables can be equal to 1, only one of the three constraints above will have its right-hand side increased from its original value of 2, 3, or 1.

Other Potential Requirements

Due to personal preference, inside information, or other esoteric considerations, one might want to include other requirements in this model. For example, if I want the best team that includes player number 8 and excludes player number 22, I simply have to force the x variable of player 8 to be 1, and the x variable of player 22 to be zero. Another constraint that may come in handy is to say that if player 9 is on the team, then player 10 also has to be on the team. This is achieved by:

x_9 \leq x_{10}

If you wanted the opposite, that is if player 9 is on the team then player 10 is NOT on the team, you’d write:

x_9 + x_{10} \leq 1

Other conditions along these lines are also possible.

Putting It All Together

If you were patient enough to stick with me all the way through here, you’re eager to put this math to work. Let’s do it using Microsoft Excel. Start by downloading this spreadsheet and opening it on your computer. Here’s what it contains:

  • Column A: list of player names.
  • Column B: yes/no decisions for whether a player is on the team (these are the x variables that Excel Solver will compute for us).
  • Columns C through H: flags indicating whether or not a player is of a given type (0 = no, 1 = yes).
  • Columns I and J: the cost and point projections for each player.

Now scroll down so that you can see rows 144 through 150. The cells in column B are currently empty because we haven’t chosen which players to add to the team yet. But if those choices had been made (that is, if we had filled column B with 0’s and 1’s), multiplying column B with column C in a cell-wise fashion and adding it all up would tell you how many quarterbacks you have. I have included this multiplication in cell C144 using the SUMPRODUCT formula. In a similar fashion, cells D144:H144 calculate how many players of each kind we’d have once the cells in column B receive values. The calculations of total team cost and total projected points for the team are analogous to the previous calculations and also use the SUMPRODUCT formula (see cells I144 and J144). You can try picking some players by hand (putting 1’s in some cells of column B) to see how the values of the cells in row 144 will change.

If you now open the Excel Solver window (under the Data tab, if your Solver add-in is active), you’ll see that I already have the entire model set up for you. If you’ve never used Excel Solver before, the following two-part video will get you started with it: part 1 and part 2.

The objective cell is J144, and that’s what we want to maximize. The variables (a.k.a. changing cells) are the player selections in column B, plus the flex-player type decisions (cells D147:F147). The constraints say that: (1) the actual number of players of each type (C144:H144) are equal to the desired number of each type (C146:H146), (2) the total cost of the team (I144) doesn’t exceed the budget (I146), (3) the three flex-player binary variables add up to 1 (D150 = F150), and, (4) all variables in the problem are binary. (I set the required number of kickers in cell G146 to zero because we are using the flex-player option. If you can have both a flex player and a kicker, just type a 1 in cell G146.) If you click on the “Solve” button, you’ll see that the best answer is a team that costs exactly $50,000 and has a total projected point value of 78.3. Its flex player ended up being an RB.

This model is small enough that I can solve it with the free student version of Excel Solver (which comes by default with any Office installation). If you happen to have more players and your total variable count exceeds 200, the free solver won’t work. But don’t despair! There exists a great Solver add-in for Excel that is also free and has no size limit. It’s called OpenSolver, and it will work with the exact same setup I have here.

That’s it! If you have any questions or remarks, feel free to leave me a note in the comments below.

UPDATE: In a follow-up post, I explain how to model a few additional fantasy-league requirements that are not included in the model above.

18 Comments

Filed under Analytics, Applications, Integer Programming, Modeling, Motivation, Sports

Improving Traveling Umpire Solutions the Miami Heat Way: Not one, not two, not three…

Those who know me are aware of my strong passion for basketball, so I had to find a way to relate this post to my favorite sport. Fans of basketball in general, and of the Miami Heat in particular, might be familiar with this video clip in which LeBron James makes a bold prediction. Back in 2010, when asked how many titles the Heat’s big three would win together, he replies “Not one, not two, not three, not four, not five, not six, not seven, …” While I’d love to see them win 8 titles, it sounds a bit (a lot) unlikely. But I can’t complain about their record so far. Winning 2 titles in 3 finals’ appearances isn’t bad at all. But what does this have to do with baseball umpires? Let’s get back to OR for a moment.

A couple of years ago, I wrote a post about scheduling baseball umpires. In that same article I co-authored with Hakan Yildiz and Michael Trick, we talked about a problem called the Traveling Umpire Problem (TUP), which doesn’t include all the details from the real problem faced by MLB but captures the most important features that make the problem difficult. Here’s a short description (detailed description here):

Given a double round-robin tournament with 2N teams, the traveling umpire problem consists of determining which games will be handled by each one of N umpire crews during the tournament. The objective is to minimize the total distance traveled by the umpires, while respecting constraints that include visiting every team at home, and not seeing a team or venue too often.

And when I say difficult, let me tell you something, it’s really hard to solve. For example, there are 16-team instances (only 8 umpires) for which no feasible solution is known.

Two of my Brazilian colleagues, Lucas de Oliveira and Cid de Souza, got interested in the TUP and asked me to join them in an effort to try to improve the quality of some of the best-known solutions in the TUP benchmark. There are 25 instances in the benchmark for which we know a feasible solution (upper bound) and a lower bound, but not the optimal value. Today, we’re very happy to report that we managed to improve the quality of many of those feasible solutions. How many, you ask? I’ll let LeBron James himself answer that question:

“Not one, not two, not three, … not ten, … not eighteen, … not twenty-three, but 24 out of 25.”

OK, LeBron got a bit carried away there. And he forgot to say we improved 25 out of the 25 best-known lower bounds too. This means those pesky optimal solutions are now sandwiched between numbers much closer to each other.

Here’s the approach we took. First, we strengthened a known optimization model for the TUP, making it capable of producing better bounds and better solutions in less time. Then, we used this stronger model to implement a relax-and-fix heuristic. It works as follows. Waiting for the optimization model to find the optimal solution would take forever because there are too many binary decision variables (they tell you which venues each umpire visits in each round of the tournament). At first, we require that only the decisions in round 1 of the tournament be binary (i.e. which games the umpires will be assigned to in round 1) and solve the problem. This solves pretty fast, but allows for umpires to be figuratively cut into pieces and spread over multiple venues in later rounds. Not a problem. That’s the beauty of math models: we test crazy ideas on a computer and don’t slice people in real life. We fix those round-1 decisions, require that only round-2 variables be binary, and solve again. This process gets repeated until the last round. In the end, we are not guaranteed to find the very best solution, but we typically find a pretty good one.

Some possible variations of the above would be to work with two (or more) rounds of binary variables at a time, start from the middle or from the end of the tournament, etc. If you’re interested in more details, our paper can be downloaded here. Our best solutions and lower bounds appear in Table 10 on page 22.

We had a lot of fun working on the TUP, and we hope these new results can help get more people excited about working on this very challenging problem.

Leave a comment

Filed under Applications, Heuristics, Integer Programming, Research, Sports, Traveling Umpire Problem

Adjusting Microwave Cook Times with OR: Inspired By An xkcd Comic

I’m a big fan of xkcd.com, a webcomic written by Randall Munroe. Last Monday’s comic, entitled “Nine”, became the inspiration for this post. Here it is:

The alt-text reads:

FYI: If you get curious and start trying to calculate the time adjustment function that minimizes the gap between the most-used and least-used digit (for a representative sample of common cook times) without altering any time by more than 10%, and someone asks you what you’re doing, it’s easier to just lie.

It seems Randall is trying to find (or has already found) a closed-form function to accomplish this task. I don’t endorse number discrimination either; unlike my wife who insists on adjusting restaurant tips so that the cents portion of the total amount is either zero or 50, but I digress… I’m not sure exactly how to find these adjusted cook times with a closed-form function, but I can certainly do it with OR, more specifically with an integer program. So here we go. For simplicity, I’ll restrict myself to cook times under 10 minutes.

Let’s begin with an example. As I think about the microwave usage in my house, I end up with the following cook times and how often each one is used:

\begin{tabular}{c|c}  {\bf Cook Time} & {\bf Usage Frequency (\%)} \\  \hline  :30 & 20 \\  1:00 & 30 \\  1:30 & 10 \\  2:00 & 30 \\  4:30 & 10  \end{tabular}

So let’s first calculate the usage frequency of each digit from zero to nine. The above table can be interpreted in the following way. For every 100 times I use the microwave, 20 of those times I type a 3 followed by a zero (to input a 30-second cook time), 30 of those times I type a 1 followed by two zeroes, etc. Therefore, during these 100 uses of the microwave, I type a total of 280 digits. Out of those 280 digits, 160 are zeroes, 40 are 1’s, 30 are 2’s, and so on. Hence, the usage frequency of zero—the most-used digit—is \frac{160}{280} \approx 57.1\%. (Usage frequencies for the remaining digits can be calculated in a similar way.) Digits 5 through 9 are apparently never used in my house, so the current difference between the most-used and least-used digit in my house is (57.1-0)%.

If, as Randall suggests, I’m allowed to adjust cook times by no more than 10% (up or down) and I want to minimize the difference in usage between the most-used and least-used digit, here’s one possible adjustment table:

\begin{tabular}{c|c}  {\bf Original Cook Time} & {\bf Adjusted Cook Time} \\  \hline  :30 & :31 \\  1:00 & 0:58 \\  1:30 & 1:36 \\  2:00 & 2:09 \\  4:30 & 4:47  \end{tabular}

Now let’s compare the usage frequency of each digit before and after the adjustment:

\begin{tabular}{c|cc}  & \multicolumn{2}{c}{\bf Usage Frequency (\%)}\\  {\bf Digit} & {\bf Before Adjustment} & {\bf After Adjustment} \\  \hline  0 & 57.1 & 12 \\  1 & 14.3 & 12 \\  2 & 10.7 & 12 \\  3 & 14.3 & 12 \\  4 & 3.6 & 8 \\  5 & 0 & 12 \\  6 & 0 & 4 \\  7 & 0 & 4 \\  8 & 0 & 12 \\  9 & 0 & 12  \end{tabular}

After the adjustment, the most frequently used digits are 0, 1, 2, 3, 5, 8, and 9 (12% of the time), whereas the least frequently used digits are 6 and 7 (4% of the time). The difference now is 12-4=8%, which is significantly less than 57.1%. In my household, there’s absolutely no way to do better than that. Guaranteed! (Note: this doesn’t mean there aren’t other adjustment tables that achieve the same 8% difference. In fact, there are many other ways to achieve the 8% difference.)

If you’re curious about how I computed the time-adjustment table, read on. I’ll explain the optimization model that was run behind the scenes and I’ll even provide you with an Excel spreadsheet that allows you to compute your own adjusted cook times. Let the fun begin!

Let T be the set of typical cook times for the household in question. In my case, T=\{\text{:30, 1:00, 1:30, 2:00, 4:30}\}. For each i \in T, let R(i) be the set of cook times that fall within 10%—or any other range you want—of i. For example, R(\text{:30})=\{\text{:27, :28, :29, :30, :31, :32, :33}\}. In addition, for each i \in T, let f(i) be the usage frequency of cook time i. In my example, f(\text{:30}) = 20, f(\text{1:00})=30, and so on.

For each i \in T and j \in R(i), create a binary variable y_{ij} that is equal to 1 if cook time i is to be adjusted to cook time j, and equal to zero otherwise. There are \sum_{i \in T} |R(i)| such variables. In my example, 119 of them. Because each original cook time has to be adjusted to (or mapped to) a unique (likely different) cook time, the first constraints of our optimization model are

\displaystyle \sum_{j \in R(i)} y_{ij} = 1, \enspace \text{for all} \; i \in T

To be able to calculate the difference between the most-used and least-used digit (in order to minimize it), we need to know how many times each digit is used. Let this quantity be represented by variable z_d, for all d \in \{0,1,\ldots,9\}. In my house, before the adjustment, z_0=160 and z_1=40. We now need to relate variables z_d and y_{ij}.

For each d \in \{0,\ldots,9\} and j \in \bigcup_{i \in T} R(i), let c_d(j) equal the number of times digit d appears in cook time j. For example, c_0(\text{1:00})=2, c_3(\text{1:30})=1, and c_2(\text{4:30})=0. We are now ready to write the following constraint

\displaystyle z_d = \sum_{i \in T} \sum_{j \in R(i)} f(i) c_d(j) y_{ij}, \enspace \text{for all} \; d \in \{0,\ldots,9\}

Once the adjusted cook times are chosen by setting the appropriate y_{ij} variables to 1, the above constraint will count the total number of times each digit d is used, storing that value in z_d.

Our goal is to minimize the maximum difference, in absolute value, between all distinct pairs of z_d variables. Because the absolute value function is not linear, and we want to preserve linearity in our optimization model (why?), I’m going to use a standard modeling trick. Let w be a new variable representing the maximum difference between any distinct pair of z_d variables. The objective function is simple: \text{minimize} \; w. To create a connection between z_d and w, we include the following constraints in the model

\displaystyle z_{d_1} - z_{d_2} \leq w, \enspace \text{for all} \; d_1 \neq d_2 \in \{0,\ldots,9\}

With 10 digits, we end up with 90 such constraints, for a grand total of 105 constraints, plus 130 variables. This model is small enough to be solved with the student version of Excel Solver. I would, however, recommend using OpenSolver, if you can.

Here’s my Excel sheet that implements the model described above. It shouldn’t be too hard to understand, but feel free to ask me questions about it in the comments below. Variable cells are painted gray. Variables y_{ij} are in column E, variables z_d are in the range K3:T3, and variable w is in cell K96. The c_d(j) values and the formulas relating z_d with y_{ij} are calculated with the help of SUMIF functions in the range K3:T3. The differences between all pairs of z_d variables are in column U. All Solver parameters are already typed in. The final values assigned to z_d variables (range K3:T3) represent absolute counts. To obtain the percentages I list above, divide those values by the sum of all z_d‘s. (The same applies to the value of w in cell K96.)

Feel free to modify this spreadsheet with your own favorite cook times and help end number discrimination in the microwaving world! Enjoy!

2 Comments

Filed under Applications, Integer Programming, Modeling

Improving a Homework Problem from Ragsdale’s Textbook

UPDATE (1/15/2013): Cliff Ragsdale was kind enough to include the modification I describe below in the 7th edition of his book (it’s now problem 32 in Chapter 6). He even named a character after me! Thanks, Cliff!

When I teach the OR class to MBA students, I adopt Cliff Ragsdale’s textbook entitled “Spreadsheet Modeling and Decision Analysis“, which is now in its sixth edition. I like this book and I’m used to teaching with it. In addition, it has a large and diverse collection of interesting exercises/problems that I use both as homework problems and as inspiration for exam questions.

One of my favorite problems to assign as homework is problem number 30 in the Integer Linear Programming chapter (Chapter 6). (This number refers to the 6th edition of the book; in the 5th edition it’s problem number 29, and in the 4th edition it’s problem number 26.) Here’s the statement:

The emergency services coordinator of Clarke County is interested in locating the county’s two ambulances to maximize the number of residents that can be reached within four minutes in emergency situations. The county is divided into five regions, and the average times required to travel from one region to the next are summarized in the following table:

The population in regions 1, 2, 3, 4, and 5 are estimated as 45,000,  65,000,  28,000,  52,000, and 43,000, respectively. In which two regions should the ambulances be placed?

I love this problem. It exercises important concepts and unearths many misconceptions. It’s challenging, but not impossible, and it forces students to think about connecting distinct—albeit related—sets of variables; a common omission in models created by novice modelers. BUT, in its present form, in my humble opinion, it falls short of the masterpiece it can be. There are two main issues with the current version of this problem (think about it for a while and you’ll see what I mean):

  1. It’s easy for students to eyeball an optimal solution. So they come back to my office and say: “I don’t know what the point of this problem is; the answer is obviously equal to …” Many of them don’t even try to create a math model.
  2. Even if you model it incorrectly, that is, by choosing the wrong variables which will end up double-counting the number of people covered by the ambulances, the solution that you get is still equal to the correct solution. So when I take points off for the incorrect model, the students come back and say “But I got the right answer!”

After a few years of facing these issues, I decided I had had enough. So I changed the problem data to achieve the following (“evil”) goals:

  1. It’s not as easy to eyeball an optimal solution as it was before.
  2. If you write a model assuming every region has to be covered (which is not a requirement to begin with), you’ll get an infeasible model. In the original case, this doesn’t happen. I didn’t like that because this isn’t an explicit assumption and many students would add it in.
  3. If you pick the wrong set of variables and double-count the number of people covered, you’ll end up with an incorrect (sub-optimal) solution.

These improvements are obtained by adding a sixth region, changing the table of distances, and changing the population numbers as follows:

The new population numbers (in 1000’s) for regions 1 through 6 are, respectively, 21, 35, 15, 60, 20, and 37.

I am now much happier with this problem and my students are getting a lot more out of it (I think). At least I can tell you one thing: they’re spending a lot more time thinking about it and asking me intelligent questions. Isn’t that the whole purpose of homework? Maybe they hate me a bit more now, but I don’t mind practicing some tough love.

Feel free to use my modification if you wish. I’d love to see it included in the 7th edition of Cliff’s book.

Note to instructors: if you want to have the solution to the new version of the problem, including the Excel model, just drop me a line: tallys at miami dot edu.

Note to students: to preserve the usefulness of this problem, I cannot provide you with the solution, but if you become an MBA student at the University of Miami, I’ll give you some hints.

7 Comments

Filed under Books, Integer Programming, Modeling, Teaching

Choosing Summer Camps for Your Kids

Today I’m going to write about a decision that’s made by many American families each year: how to pick summer camps for our kids. There are several issues to take into account, such as cost, benefit, hours, and kids’ preferences. I’ll introduce an optimization model for summer camp selection through a numerical example. The example portrays a large family, but the same ideas apply if a few smaller families want to get together and solve this problem. This way they can take advantage of the discounts and take turns driving the kids around.

The Joneses have six kids: Amy, Beth, Cathy, David, Earl and Fred (yes, their first names are alphabetically sorted, matching their increasing order of age; Mr. and Mrs. Jones always knew they’d have six kids and hence named their firstborn with an ‘F’ name). This year, they’ve narrowed down their list of potential summer camps to the following ten: Math, Chess, Nature, Crafts, Cooking, Gymnastics, Soccer, Tennis, Diving, and Fishing. The Nature camp takes kids on a hike through the woods with the guidance of a biologist; they make frequent stops upon encountering specific plants and animals, during which a mini science lecture is delivered (pretty cool!). The Cooking camp involves cooking chemistry instruction, à la Alton Brown (also pretty cool).

The following table contains some data related to each camp:

The Cost column indicates the cost per child. The Discount column indicates the percentage discount that each child enrolled after the first would receive on the cost of each camp. For instance, if three children are enrolled in Math camp, the first would cost $1100, and the second and third would cost $770 each (30% less). The Hours column is self-explanatory and the last two columns indicate whether or not that particular camp develops mental and physical abilities, respectively (a value of one = yes, zero=no).

The next table shows some of the child-specific requirements:

For example, the Joneses want Fred to attend at least 3 camps that develop mental abilities, and at least 1 camp that develops physical abilities. The last two columns in the above table indicate the minimum and maximum number of camp hours for each child over the 9-week summer break.

The next thing parents need to take into account are their children’s preferences. So here they are:

The smaller the number in the above table, the more desirable a particular camp is. For example, Amy is a bit of a math nerd, and if we were to flip David’s preference scores for Math and Tennis, he could be classified as a bit of a jock. Some conflicts exist, in the sense that not all camps are compatible with each other in terms of time schedules. In this particular case, let’s assume that no child can attend both the Soccer and Tennis camps, or both the Nature and Soccer camps. Here’s how we are going to use this preference table to create a sense of fairness among the children: whenever a child that prefers camp X to camp Y goes to camp Y and doesn’t go to camp X, nobody else gets to go to camp X either. For example, if Amy goes to Nature camp and isn’t sent to either Math or Chess camp, none of her siblings are allowed to go to Math or Chess either. Conversely, if the Joneses decide to send Earl to Chess camp and Fred to Tennis camp, they must also send Earl to Tennis camp (because Earl prefers Tennis to Chess, and “Fred is going! Why can’t I go too!”). Clearly, there are other ways to use/interpret this table, such as trying to send everyone to at least one of their top N choices, but we won’t consider those alternatives here.

After taking all of the above issues and conditions into account, here’s a solution that satisfies all the requirements while resulting in the minimum cost of $22,180.00 (You guessed it…the Joneses are probably *not* among the 99%):

Amy goes to Math, Crafts, Cooking, and Tennis; Beth goes to Math, Cooking, Tennis, and Fishing; Cathy goes to Math, Crafts, Tennis, and Fishing; David goes to Math, Cooking, and Fishing; Earl goes to Math, Tennis, and Fishing; and Fred goes to Math, Crafts, Cooking, and Tennis. Mmm…interestingly, everyone goes to Math camp. I think the Joneses are on to something…

Depending on your own requirements, preferences, and costs your solution may differ, of course. But this should give you an idea of how this simple problem can easily become very complicated to solve. No need to fear, though! Operations Research is here!

Food for Thought: Here’s an interesting question that helps illustrate how high-quality solutions can be counterintuitive: by looking at the preferences table, we see that everyone prefers Soccer to Tennis. In addition, Soccer camp is less expensive than Tennis camp. So how come we send almost everyone to Tennis camp? Isn’t that strange? Let me know what you think in the comments below! That’s one of the advantages of using an analytical approach to decision making: it helps us find solutions we wouldn’t even consider otherwise because they don’t seem to make sense (at least not at first).

If you’re curious about how I managed to find the optimal solution, read on!

Details of the Analysis:

To find the minimum-cost solution, we can create a mathematical representation of the problem, a.k.a. a model, and then solve this model with the help of a computer. Let’s see how.

The first obvious decision to make is who goes where. So let the binary variable x_{ij} equal 1 when child i goes to camp j, and equal to 0 otherwise. We’ll also need another binary variable y_j that is equal to 1 when at least one child goes to camp j and equal to 0 when none of the children go to camp j. We are now ready to write our objective function and constraints. I’ll refer to the problem data using the column headings of the tables above. The subscript i will always refer to a child, and the subscript j will always refer to a camp.

To minimize the total cost, we write the following objective function:

\displaystyle \min \sum_i \sum_j (1-\mathrm{Discount}_j)\mathrm{Cost}_j x_{ij} + \sum_j \mathrm{Discount}_j \mathrm{Cost}_j y_j

Note how we are using the y_j variable to handle the discount for sending more than one child to camp j: we charge every child the discounted price in the double summation and add the discount back in only once if y_j=1.

Now we have to deal with the four requirements: minimum and maximum hours, mental activity, and physical activity. For every child i, we have to write the following four constraints:

\displaystyle \sum_j \mathrm{Hours}_j x_{ij} \geq \mathrm{MinTimeReq}_i

\displaystyle \sum_j \mathrm{Hours}_j x_{ij} \leq \mathrm{MaxTimeReq}_i

\displaystyle \sum_j \mathrm{IsMental}_j x_{ij} \geq \mathrm{MentalReq}_i

\displaystyle \sum_j \mathrm{IsPhysical}_j x_{ij} \geq \mathrm{PhysicalReq}_i

Next, we enforce the preference rules. Let’s recall the example involving Earl and Fred: if Earl goes to Chess camp and someone else (it doesn’t matter who) goes to Tennis camp, then Earl has to go to Tennis camp as well. Here’s what this constraint would look like:

x_{\mathrm{Earl},\mathrm{Chess}} + y_{\mathrm{Tennis}} - x_{\mathrm{Earl},\mathrm{Tennis}} \leq 1

Of course, we have to repeat this constraint for every child i and every pair of camps j_1 and j_2 such that child i prefers j_1 to j_2 in the following way:

x_{ij_2} + y_{j_1} - x_{ij_1} \leq 1

The camp compatibility constraints say that no child i can attend both Soccer and Tennis, or both Nature and Soccer, therefore:

x_{i,\mathrm{Soccer}} + x_{i,\mathrm{Tennis}} \leq 1

x_{i,\mathrm{Nature}} + x_{i,\mathrm{Soccer}} \leq 1

Finally, we need to relate the x_{ij} and y_j variables by stating that unless y_j=1 , no x_{ij} can be equal to 1 . So we write the following constraint for all values of i and j :

x_{ij} \leq y_j

And that’s the end of our model. Here’s a representation of this mathematical model in AMPL in case you want to play with it yourself. This is the model I used to obtain the numerical results reported above. Enjoy!

5 Comments

Filed under Applications, INFORMS Monthly Blog Challenge, Integer Programming, Mathematical Programming, Modeling, Motivation, Promoting OR, Summer camp