Category Archives: Integer Programming

Adjusting Microwave Cook Times with OR: Inspired By An xkcd Comic

I’m a big fan of xkcd.com, a webcomic written by Randall Munroe. Last Monday’s comic, entitled “Nine”, became the inspiration for this post. Here it is:

The alt-text reads:

FYI: If you get curious and start trying to calculate the time adjustment function that minimizes the gap between the most-used and least-used digit (for a representative sample of common cook times) without altering any time by more than 10%, and someone asks you what you’re doing, it’s easier to just lie.

It seems Randall is trying to find (or has already found) a closed-form function to accomplish this task. I don’t endorse number discrimination either; unlike my wife who insists on adjusting restaurant tips so that the cents portion of the total amount is either zero or 50, but I digress… I’m not sure exactly how to find these adjusted cook times with a closed-form function, but I can certainly do it with OR, more specifically with an integer program. So here we go. For simplicity, I’ll restrict myself to cook times under 10 minutes.

Let’s begin with an example. As I think about the microwave usage in my house, I end up with the following cook times and how often each one is used:

\begin{tabular}{c|c}  {\bf Cook Time} & {\bf Usage Frequency (\%)} \\  \hline  :30 & 20 \\  1:00 & 30 \\  1:30 & 10 \\  2:00 & 30 \\  4:30 & 10  \end{tabular}

So let’s first calculate the usage frequency of each digit from zero to nine. The above table can be interpreted in the following way. For every 100 times I use the microwave, 20 of those times I type a 3 followed by a zero (to input a 30-second cook time), 30 of those times I type a 1 followed by two zeroes, etc. Therefore, during these 100 uses of the microwave, I type a total of 280 digits. Out of those 280 digits, 160 are zeroes, 40 are 1′s, 30 are 2′s, and so on. Hence, the usage frequency of zero—the most-used digit—is \frac{160}{280} \approx 57.1\%. (Usage frequencies for the remaining digits can be calculated in a similar way.) Digits 5 through 9 are apparently never used in my house, so the current difference between the most-used and least-used digit in my house is (57.1-0)%.

If, as Randall suggests, I’m allowed to adjust cook times by no more than 10% (up or down) and I want to minimize the difference in usage between the most-used and least-used digit, here’s one possible adjustment table:

\begin{tabular}{c|c}  {\bf Original Cook Time} & {\bf Adjusted Cook Time} \\  \hline  :30 & :31 \\  1:00 & 0:58 \\  1:30 & 1:36 \\  2:00 & 2:09 \\  4:30 & 4:47  \end{tabular}

Now let’s compare the usage frequency of each digit before and after the adjustment:

\begin{tabular}{c|cc}  & \multicolumn{2}{c}{\bf Usage Frequency (\%)}\\  {\bf Digit} & {\bf Before Adjustment} & {\bf After Adjustment} \\  \hline  0 & 57.1 & 12 \\  1 & 14.3 & 12 \\  2 & 10.7 & 12 \\  3 & 14.3 & 12 \\  4 & 3.6 & 8 \\  5 & 0 & 12 \\  6 & 0 & 4 \\  7 & 0 & 4 \\  8 & 0 & 12 \\  9 & 0 & 12  \end{tabular}

After the adjustment, the most frequently used digits are 0, 1, 2, 3, 5, 8, and 9 (12% of the time), whereas the least frequently used digits are 6 and 7 (4% of the time). The difference now is 12-4=8%, which is significantly less than 57.1%. In my household, there’s absolutely no way to do better than that. Guaranteed! (Note: this doesn’t mean there aren’t other adjustment tables that achieve the same 8% difference. In fact, there are many other ways to achieve the 8% difference.)

If you’re curious about how I computed the time-adjustment table, read on. I’ll explain the optimization model that was run behind the scenes and I’ll even provide you with an Excel spreadsheet that allows you to compute your own adjusted cook times. Let the fun begin!

Let T be the set of typical cook times for the household in question. In my case, T=\{\text{:30, 1:00, 1:30, 2:00, 4:30}\}. For each i \in T, let R(i) be the set of cook times that fall within 10%—or any other range you want—of i. For example, R(\text{:30})=\{\text{:27, :28, :29, :30, :31, :32, :33}\}. In addition, for each i \in T, let f(i) be the usage frequency of cook time i. In my example, f(\text{:30}) = 20, f(\text{1:00})=30, and so on.

For each i \in T and j \in R(i), create a binary variable y_{ij} that is equal to 1 if cook time i is to be adjusted to cook time j, and equal to zero otherwise. There are \sum_{i \in T} |R(i)| such variables. In my example, 119 of them. Because each original cook time has to be adjusted to (or mapped to) a unique (likely different) cook time, the first constraints of our optimization model are

\displaystyle \sum_{j \in R(i)} y_{ij} = 1, \enspace \text{for all} \; i \in T

To be able to calculate the difference between the most-used and least-used digit (in order to minimize it), we need to know how many times each digit is used. Let this quantity be represented by variable z_d, for all d \in \{0,1,\ldots,9\}. In my house, before the adjustment, z_0=160 and z_1=40. We now need to relate variables z_d and y_{ij}.

For each d \in \{0,\ldots,9\} and j \in \bigcup_{i \in T} R(i), let c_d(j) equal the number of times digit d appears in cook time j. For example, c_0(\text{1:00})=2, c_3(\text{1:30})=1, and c_2(\text{4:30})=0. We are now ready to write the following constraint

\displaystyle z_d = \sum_{i \in T} \sum_{j \in R(i)} f(i) c_d(j) y_{ij}, \enspace \text{for all} \; d \in \{0,\ldots,9\}

Once the adjusted cook times are chosen by setting the appropriate y_{ij} variables to 1, the above constraint will count the total number of times each digit d is used, storing that value in z_d.

Our goal is to minimize the maximum difference, in absolute value, between all distinct pairs of z_d variables. Because the absolute value function is not linear, and we want to preserve linearity in our optimization model (why?), I’m going to use a standard modeling trick. Let w be a new variable representing the maximum difference between any distinct pair of z_d variables. The objective function is simple: \text{minimize} \; w. To create a connection between z_d and w, we include the following constraints in the model

\displaystyle z_{d_1} - z_{d_2} \leq w, \enspace \text{for all} \; d_1 \neq d_2 \in \{0,\ldots,9\}

With 10 digits, we end up with 90 such constraints, for a grand total of 105 constraints, plus 130 variables. This model is small enough to be solved with the student version of Excel Solver. I would, however, recommend using OpenSolver, if you can.

Here’s my Excel sheet that implements the model described above. It shouldn’t be too hard to understand, but feel free to ask me questions about it in the comments below. Variable cells are painted gray. Variables y_{ij} are in column E, variables z_d are in the range K3:T3, and variable w is in cell K96. The c_d(j) values and the formulas relating z_d with y_{ij} are calculated with the help of SUMIF functions in the range K3:T3. The differences between all pairs of z_d variables are in column U. All Solver parameters are already typed in. The final values assigned to z_d variables (range K3:T3) represent absolute counts. To obtain the percentages I list above, divide those values by the sum of all z_d‘s. (The same applies to the value of w in cell K96.)

Feel free to modify this spreadsheet with your own favorite cook times and help end number discrimination in the microwaving world! Enjoy!

2 Comments

Filed under Applications, Integer Programming, Modeling

Improving a Homework Problem from Ragsdale’s Textbook

When I teach the OR class to MBA students, I adopt Cliff Ragsdale’s textbook entitled “Spreadsheet Modeling and Decision Analysis“, which is now in its sixth edition. I like this book and I’m used to teaching with it. In addition, it has a large and diverse collection of interesting exercises/problems that I use both as homework problems and as inspiration for exam questions.

One of my favorite problems to assign as homework is problem number 30 in the Integer Linear Programming chapter (Chapter 6). (This number refers to the 6th edition of the book; in the 5th edition it’s problem number 29, and in the 4th edition it’s problem number 26.) Here’s the statement:

The emergency services coordinator of Clarke County is interested in locating the county’s two ambulances to maximize the number of residents that can be reached within four minutes in emergency situations. The county is divided into five regions, and the average times required to travel from one region to the next are summarized in the following table:

The population in regions 1, 2, 3, 4, and 5 are estimated as 45,000,  65,000,  28,000,  52,000, and 43,000, respectively. In which two regions should the ambulances be placed?

I love this problem. It exercises important concepts and unearths many misconceptions. It’s challenging, but not impossible, and it forces students to think about connecting distinct—albeit related—sets of variables; a common omission in models created by novice modelers. BUT, in its present form, in my humble opinion, it falls short of the masterpiece it can be. There are two main issues with the current version of this problem (think about it for a while and you’ll see what I mean):

  1. It’s easy for students to eyeball an optimal solution. So they come back to my office and say: “I don’t know what the point of this problem is; the answer is obviously equal to …” Many of them don’t even try to create a math model.
  2. Even if you model it incorrectly, that is, by choosing the wrong variables which will end up double-counting the number of people covered by the ambulances, the solution that you get is still equal to the correct solution. So when I take points off for the incorrect model, the students come back and say “But I got the right answer!”

After a few years of facing these issues, I decided I had had enough. So I changed the problem data to achieve the following (“evil”) goals:

  1. It’s not as easy to eyeball an optimal solution as it was before.
  2. If you write a model assuming every region has to be covered (which is not a requirement to begin with), you’ll get an infeasible model. In the original case, this doesn’t happen. I didn’t like that because this isn’t an explicit assumption and many students would add it in.
  3. If you pick the wrong set of variables and double-count the number of people covered, you’ll end up with an incorrect (sub-optimal) solution.

These improvements are obtained by adding a sixth region, changing the table of distances, and changing the population numbers as follows:

The new population numbers (in 1000′s) for regions 1 through 6 are, respectively, 21, 35, 15, 60, 20, and 37.

I am now much happier with this problem and my students are getting a lot more out of it (I think). At least I can tell you one thing: they’re spending a lot more time thinking about it and asking me intelligent questions. Isn’t that the whole purpose of homework? Maybe they hate me a bit more now, but I don’t mind practicing some tough love.

Feel free to use my modification if you wish. I’d love to see it included in the 7th edition of Cliff’s book.

Note to instructors: if you want to have the solution to the new version of the problem, including the Excel model, just drop me a line: tallys at miami dot edu.

Note to students: to preserve the usefulness of this problem, I cannot provide you with the solution, but if you become an MBA student at the University of Miami, I’ll give you some hints.

7 Comments

Filed under Books, Integer Programming, Modeling, Teaching

Choosing Summer Camps for Your Kids

Today I’m going to write about a decision that’s made by many American families each year: how to pick summer camps for our kids. There are several issues to take into account, such as cost, benefit, hours, and kids’ preferences. I’ll introduce an optimization model for summer camp selection through a numerical example. The example portrays a large family, but the same ideas apply if a few smaller families want to get together and solve this problem. This way they can take advantage of the discounts and take turns driving the kids around.

The Joneses have six kids: Amy, Beth, Cathy, David, Earl and Fred (yes, their first names are alphabetically sorted, matching their increasing order of age; Mr. and Mrs. Jones always knew they’d have six kids and hence named their firstborn with an ‘F’ name). This year, they’ve narrowed down their list of potential summer camps to the following ten: Math, Chess, Nature, Crafts, Cooking, Gymnastics, Soccer, Tennis, Diving, and Fishing. The Nature camp takes kids on a hike through the woods with the guidance of a biologist; they make frequent stops upon encountering specific plants and animals, during which a mini science lecture is delivered (pretty cool!). The Cooking camp involves cooking chemistry instruction, à la Alton Brown (also pretty cool).

The following table contains some data related to each camp:

The Cost column indicates the cost per child. The Discount column indicates the percentage discount that each child enrolled after the first would receive on the cost of each camp. For instance, if three children are enrolled in Math camp, the first would cost $1100, and the second and third would cost $770 each (30% less). The Hours column is self-explanatory and the last two columns indicate whether or not that particular camp develops mental and physical abilities, respectively (a value of one = yes, zero=no).

The next table shows some of the child-specific requirements:

For example, the Joneses want Fred to attend at least 3 camps that develop mental abilities, and at least 1 camp that develops physical abilities. The last two columns in the above table indicate the minimum and maximum number of camp hours for each child over the 9-week summer break.

The next thing parents need to take into account are their children’s preferences. So here they are:

The smaller the number in the above table, the more desirable a particular camp is. For example, Amy is a bit of a math nerd, and if we were to flip David’s preference scores for Math and Tennis, he could be classified as a bit of a jock. Some conflicts exist, in the sense that not all camps are compatible with each other in terms of time schedules. In this particular case, let’s assume that no child can attend both the Soccer and Tennis camps, or both the Nature and Soccer camps. Here’s how we are going to use this preference table to create a sense of fairness among the children: whenever a child that prefers camp X to camp Y goes to camp Y and doesn’t go to camp X, nobody else gets to go to camp X either. For example, if Amy goes to Nature camp and isn’t sent to either Math or Chess camp, none of her siblings are allowed to go to Math or Chess either. Conversely, if the Joneses decide to send Earl to Chess camp and Fred to Tennis camp, they must also send Earl to Tennis camp (because Earl prefers Tennis to Chess, and “Fred is going! Why can’t I go too!”). Clearly, there are other ways to use/interpret this table, such as trying to send everyone to at least one of their top N choices, but we won’t consider those alternatives here.

After taking all of the above issues and conditions into account, here’s a solution that satisfies all the requirements while resulting in the minimum cost of $22,180.00 (You guessed it…the Joneses are probably *not* among the 99%):

Amy goes to Math, Crafts, Cooking, and Tennis; Beth goes to Math, Cooking, Tennis, and Fishing; Cathy goes to Math, Crafts, Tennis, and Fishing; David goes to Math, Cooking, and Fishing; Earl goes to Math, Tennis, and Fishing; and Fred goes to Math, Crafts, Cooking, and Tennis. Mmm…interestingly, everyone goes to Math camp. I think the Joneses are on to something…

Depending on your own requirements, preferences, and costs your solution may differ, of course. But this should give you an idea of how this simple problem can easily become very complicated to solve. No need to fear, though! Operations Research is here!

Food for Thought: Here’s an interesting question that helps illustrate how high-quality solutions can be counterintuitive: by looking at the preferences table, we see that everyone prefers Soccer to Tennis. In addition, Soccer camp is less expensive than Tennis camp. So how come we send almost everyone to Tennis camp? Isn’t that strange? Let me know what you think in the comments below! That’s one of the advantages of using an analytical approach to decision making: it helps us find solutions we wouldn’t even consider otherwise because they don’t seem to make sense (at least not at first).

If you’re curious about how I managed to find the optimal solution, read on!

Details of the Analysis:

To find the minimum-cost solution, we can create a mathematical representation of the problem, a.k.a. a model, and then solve this model with the help of a computer. Let’s see how.

The first obvious decision to make is who goes where. So let the binary variable x_{ij} equal 1 when child i goes to camp j, and equal to 0 otherwise. We’ll also need another binary variable y_j that is equal to 1 when at least one child goes to camp j and equal to 0 when none of the children go to camp j. We are now ready to write our objective function and constraints. I’ll refer to the problem data using the column headings of the tables above. The subscript i will always refer to a child, and the subscript j will always refer to a camp.

To minimize the total cost, we write the following objective function:

\displaystyle \min \sum_i \sum_j (1-\mathrm{Discount}_j)\mathrm{Cost}_j x_{ij} + \sum_j \mathrm{Discount}_j \mathrm{Cost}_j y_j

Note how we are using the y_j variable to handle the discount for sending more than one child to camp j: we charge every child the discounted price in the double summation and add the discount back in only once if y_j=1.

Now we have to deal with the four requirements: minimum and maximum hours, mental activity, and physical activity. For every child i, we have to write the following four constraints:

\displaystyle \sum_j \mathrm{Hours}_j x_{ij} \geq \mathrm{MinTimeReq}_i

\displaystyle \sum_j \mathrm{Hours}_j x_{ij} \leq \mathrm{MaxTimeReq}_i

\displaystyle \sum_j \mathrm{IsMental}_j x_{ij} \geq \mathrm{MentalReq}_i

\displaystyle \sum_j \mathrm{IsPhysical}_j x_{ij} \geq \mathrm{PhysicalReq}_i

Next, we enforce the preference rules. Let’s recall the example involving Earl and Fred: if Earl goes to Chess camp and someone else (it doesn’t matter who) goes to Tennis camp, then Earl has to go to Tennis camp as well. Here’s what this constraint would look like:

x_{\mathrm{Earl},\mathrm{Chess}} + y_{\mathrm{Tennis}} - x_{\mathrm{Earl},\mathrm{Tennis}} \leq 1

Of course, we have to repeat this constraint for every child i and every pair of camps j_1 and j_2 such that child i prefers j_1 to j_2 in the following way:

x_{ij_2} + y_{j_1} - x_{ij_1} \leq 1

The camp compatibility constraints say that no child i can attend both Soccer and Tennis, or both Nature and Soccer, therefore:

x_{i,\mathrm{Soccer}} + x_{i,\mathrm{Tennis}} \leq 1

x_{i,\mathrm{Nature}} + x_{i,\mathrm{Soccer}} \leq 1

Finally, we need to relate the x_{ij} and y_j variables by stating that unless y_j=1 , no x_{ij} can be equal to 1 . So we write the following constraint for all values of i and j :

x_{ij} \leq y_j

And that’s the end of our model. Here’s a representation of this mathematical model in AMPL in case you want to play with it yourself. This is the model I used to obtain the numerical results reported above. Enjoy!

5 Comments

Filed under Applications, INFORMS Monthly Blog Challenge, Integer Programming, Mathematical Programming, Modeling, Motivation, Promoting OR, Summer camp

Political Districting and OR

When I think about how Operations Research can contribute to the world of politics, one of the first things that come to mind is the redistricting process which, of course, has been receiving a lot of attention after the 2010 census. Here’s an excerpt from the census.gov web site:

Another major use for decennial census data is for geographically defining state legislative districts, a “redistricting” process that begins in 2011. The census data allow state officials to realign congressional and state legislative districts in their states, taking into account population shifts since the last census and assuring equal representation for their constituents in compliance with the “one-person, one-vote” principle of the 1965 Voting Rights Act.

One of my favorite articles on the topic is “An Optimzation-Based Heuristic for Political Districting” by Anuj Mehrotra, Ellis Johnson and George Nemhauser, which appeared in Management Science in 1998 (henceforth referred to as the MJN paper). For a districting plan to be acceptable, it has to be fair/unbiased (to avoid gerrymandering), where fairness can be defined in terms of criteria that districts have to satisfy, such as population equality, contiguity, and compactness. Contiguity means that the district cannot have holes in it. Compactness is a bit more difficult to define, but it essentially means that districts should look more like a circle or square than like a snake or a tree branch. For example, would you say that this state senate district in Florida is compact?

Another important issue is that the methodology used to create the districts be transparent. This is a strong point in favor of using mathematical techniques such as those of Operations Research. The algorithm and software code used to create the districts can be made publicly available for anyone to inspect. Specific questions about the methodology can be answered precisely and unambiguously. There’s no beating around the bush. I applaud efforts such as the Public Mapping Project, which attempts to make the districting process more transparent and increase public participation. Nevertheless, given how difficult the problem is (in both practical and mathematical terms), I believe that it would also be important to consider plans created with the help of OR. As no mathematical model is perfect, those plans will probably need to be fine-tuned by a human being using tools such as this district builder. Of crucial importance, however, is the need for clear rules to guide such modifications. When is it OK to move part of a district’s border? by how much? etc. Given that current OR software tools are capable of producing multiple high-quality solutions at the end of a run, perhaps an even better alternative is to present the decision makers with a myriad of OR-generated and man-made plans, and let them pick the most adequate one. This practice is common in sports scheduling, where the leagues typically ask schedulers to provide them with many alternative solutions to choose from.

At this point, I’d like to switch gears and explore the districting problem in more detail from an OR point of view (using some ideas from the MJN paper). Districts are made up of smaller geographical units, for example counties (or pieces of counties). For illustration purposes, if a state like Ohio, which has 88 counties, is creating 18 congressional districts and those districts are to be made up of counties only, the potential number of distinct districts is

{88 \choose 18} = 2,\!418,\!561,\!960,\!739,\!869,\!780

That number is greater than the number of miles I’d cover if I could travel at the speed of light for 400,000 years! (Without a bathroom break!) Obviously, many of those districts are not legitimate because they violate contiguity and/or compactness constraints. However, even if those invalid districts are eliminated, the remaining number of possible districts is still very large. Ideally, however, we’d like to use district building blocks that are much smaller than counties, such as zip codes, but that is still very challenging (and would increase the above number even more!). In the MJN paper, the authors create an initial plan using counties as building blocks and then fine tune the districts’ borders to equalize populations. Here are the “before” and “after” districting plans for South Carolina:

Pretty good improvement, huh? Their basic methodology is to associate one decision with each potential district indicating whether or not it is part of the final plan (a binary variable). The optimization model then selects a subset of the districts that covers each county exactly once (a set partitioning problem). Because the number of variables is huge (as we’ve seen above), they resort to a technique called branch-and-price. In short, branch-and-price solves an optimization problem by considering only a subset of its variables at a time and by adding in promising variables little-by-little as the search progresses. It’s possible to define mathematically what a missing “good” variable is, and to construct one (i.e. a new district) on-the-fly. During this process, many set partitioning problems have to be solved. Because these intermediate problems are not always solved to optimality, the final solution obtained in MJN, albeit very good, isn’t technically an optimal solution, and that’s why they use the word “heuristic” in the title of their paper.

Acknowledgment: I’d like to thank Brady Hunsaker for providing a nice collection of links in the comments section of this blog post by Michael Trick.

Update: Here is another post I came across this week that also talks about using OR for political districting.

9 Comments

Filed under Applications, INFORMS Monthly Blog Challenge, Integer Programming, Modeling, Motivation, Politics, Promoting OR

The Joy of Baking (Optimally)

‘Tis the season of baking all kinds of things: cookies, cakes, breads, brownies, pies, and my favorite Brazilian dessert “pudim de leite moça“, which is depicted below. Click here for the step-by-step recipe.

Many OR bloggers, such as Laura McLay and Anna Nagurney, actually enjoy baking, and they both have written posts on the subject (e.g. here and here). I happen to include myself in this group and, yes, I made the pudim shown above (using  my mom’s recipe).

My goal today is to approach the art of baking from an optimization point of view. Let’s say you have a long list of items to bake. Perhaps you’re hosting a mega party at your house, or you’re helping your local church or favorite charity with their holiday cooking. You have an oven that can only fit so much at a time (think of area, or volume). Each item to be baked occupies some space in the oven and needs to bake for a specific amount of time. In what order should you bake your items so that you finish as soon as possible? (Side note: it may not be obvious at first sight, but this is the same problem faced by a container port that needs to decide the order in which to unload cargo ships.)

In the OR world, this is a job scheduling problem with a cumulative machine. The jobs are the tasks to be performed (items to bake), the machine (or resource) is the oven. We say the oven is cumulative, as opposed to disjunctive, because it can deal with (bake) multiple items at a time. The unknowns in this optimization problem are the start times of each job (when to begin baking each item). The objective is to minimize the makespan, which is defined as the finish time of the last job (the time at which it’s OK to turn off the oven). Finally, this is a non-preemptive problem because, typically, once you start baking something, it stays in the oven until it’s done.

This problem occurs so often in practice that the Constraint Programming (CP) community created a global constraint to represent it. It’s called the cumulative constraint (what a surprise!). Here’s a reference. For example, let’s say that we have a 10-cubic-foot (cf) oven and we need to bake five items. The baking times (in minutes) are 20, 25, 40, 30, and 30. The space requirements in cf are, respectively, 6, 4, 5, 6, 4. If the time at which we begin baking item i is denoted by the variable s_i, we can write the following in a CP model:

\mathrm{cumulative}([s_1,s_2,s_3,s_4,s_5],[20,25,40,30,30],[6,4,5,6,4],10)

The above constraint makes sure that the start times s_i are such that the capacity of the oven is never exceeded. To minimize the makespan, we have to minimize the maximum among s_1+6, s_2+4, s_3+5, s_4+6, and s_5+4.

It’s easy to incorporate some real-life details into this model. For example:

  • Not every item will be ready to go into the oven at time zero. After all, you’re making them as you go. To take care of this, add a ready-time r_i (i.e. a lower bound) to the appropriate variable: r_i \leq s_i.
  • If a given item does not occupy the entire oven, but you still prefer to bake it alone, just artificially increase its space requirement c_i to be equal to the oven’s capacity C.
  • If you’re baking both savory and sweet things, you probably don’t want to mix them up in the oven. In that case, simply solve the problem twice.
  • If, for some reason, item i must be finished before item j starts baking (e.g. they need different temperatures), just include the constraint s_i + p_i \leq s_j, where p_i is the baking time of item i.

We could, of course, have approached this problem from an Integer Programming point of view. In that case, we’d have binary variables x_{it} that are equal to 1 if you start baking item i at time t, and equal to zero otherwise. For more details on this formulation, including model files and some pretty tough instances, take a look at the CuSPLIB web page.

In the spirit of holiday baking, I will close with some pictures of past baking jobs ran on my house’s machine (a.k.a. oven). Enjoy! :-)

Key Lime Pie

Carrot Oatmeal Cookies (recipe here)

Sparkling Ginger Chip Cookies (recipe here)

Irish Soda Bread

Six-Seed Soda Bread (recipe here)


11 Comments

Filed under Applications, Constraint Programming, CuSPLIB, Food, Holidays, INFORMS Monthly Blog Challenge, Integer Programming, Modeling, People

Using Airline Miles to Buy Magazines: a Hidden Deeper Lesson

Last month, I received a letter from American Airlines’ Rewards Processing Center saying that I have 4735 miles that are about to expire and asking whether I’d be interested in redeeming them for some magazine subscriptions. These were the choices:

Magazine Issues Miles Needed
Afar 6 600
Architectural Digest 12 800
Barron’s 26 1700
Business Week 50 1600
Cat Fancy 12 500
Cigar Aficionado 6 400
Daily Variety 252 5500
Diabetes Forecast 12 500
ESPN the Magazine 26 500
Ebony and Jet 38 800
Elle 12 400
Entertainment Weekly 55 1300
Essence 12 500
Essence 2yrs 24 800
Fortune 25 1400
Golf Digest 12 700
Golf Magazine 12 700
Golfweek 45 1300
Martha Stewart Living 24 1400
Money 12 800
New York Magazine 46 700
Sports Illustrated 56 1400
SI and Golf Mag 68 1500
Sports Illustrated KIDS 12 1000
The Economist 51 3200
The Wall Street Journal 190 2800
US News & World Report 12 700
Variety 50 5500
Wine Spectator 15 900
Wired 12 400

Wow! 30 different magazines! How could I possibly decide…oh, wait! Maybe I can use some analytical techniques to help me make a better decision…what’s that thing called again…O.R.!!!

First, what do I want to accomplish?

a) Get as many issues of whatever magazine as possible: this is what a dentist’s office or any place with a waiting room might want to do. Note that the WSJ subscription provides 190 issues, but let’s say I only want to consider actual magazines. One possible answer is to subscribe to ESPN the Magazine, Ebony and Jet, Entertainment Weekly, New York Magazine, and Sports Illustrated. That will buy me 221 issues and use 4700 of my 4735 miles. Just out of curiosity, I could have gotten 286 issues if I hadn’t excluded the WSJ.

b) Get as many different subscriptions as possible: This one is easy. Just pick the magazines that require the least number of miles. One solution is Afar, Cat Fancy, Cigar Aficionado, Diabetes Forecast, ESPN the Magazine, Elle, Essence, US News & World Report, and Wired. That’s a total of 9 subscriptions, using 4500 of my 4735 miles (a waste of 235 miles).

c) Get the “best” 9 magazines you can afford: Since I know I can get 9 subscriptions, what are the 9 that make me use the most out of my 4735 miles? Maybe they’ll constitute a better set of 9 than those I found in item (b) above (positive correlation between quality and miles needed?). In this case the answer is to substitute New York Magazine for Essence in the set of 9 found in item (b). This alternative totals 144 issues and wastes only 35 miles.

d) Get the 9 magazines that provide the largest total number of issues: In that case I’d want to go with Cat Fancy, Cigar Aficionado, Diabetes Forecast, ESPN the Magazine, Ebony and Jet, Elle, Essence, New York Magazine and Wired. That’s a total of 176 issues and a waste of 35 miles as well.

For those of you who are thinking “who cares?”, don’t let appearances fool you. Say that I am United Way and 4735 is my budget (in thousands of US$). There are 30 charities (“magazines”) asking me for money (the “miles needed”) and telling me that they will help a certain number of people (the “issues”, in thousands). If I fund the charities in rows 9, 10, 12, 21 and 22 of the above table, I’ll spend US$ 4,700,000 and help 221 thousand people. That’s the best I can do! (if I am not allowed to fund the charity in row 26).

Lots of other problems can be cast into this framework. Another example: NASA is sending a spaceship to Mars. My available miles are the spaceship’s cargo capacity, the magazines are scientific experiments to be conducted in space (each requiring equipment to be loaded into the spaceship (the “miles needed”) and promising a certain (quantifiable) scientific benefit (the “issues”). This a generic resource allocation problem and the mathematical model we have used here is called The 0-1 Knapsack Problem.

Technical Details:

Want to solve your own budget allocation problem? Here’s how you can do it. We have n projects that require funding and a budget B . For each project i = 1,\ldots,n , let m_i be how much money it needs and let p_i be its payoff (e.g. lives saved). Let the binary variable x_i be equal to 1 if we decide to allocate money to project i , and equal to zero otherwise.

The objective is to maximize the payoff \sum_{i=1}^n p_i x_i .

The constraint is to respect the budget: \sum_{i=1}^n m_i x_i \leq B .

This model also allows you to include some useful side constraints. For example: “if I fund project i , then I must also fund project j ” would be written as x_i \leq x_j . And “if I fund project i , then I cannot fund project j “, would become x_i + x_j \leq 1 . I’ve talked about these kinds of logical conditions in another post.

Here’s the AMPL file I used to solve the magazine problem and its variations. Enjoy!

P.S.: Being an Economics major and an amazing cook, my wife requested The Economist and Martha Stewart Living; that’s what her objective function told her to do, I guess.

4 Comments

Filed under Applications, Integer Programming, Knapsack, Modeling

Let’s Join O.R. Forces to Crack the 17×17 Challenge

I recently came across this post by bit-player, which refers to this post by William Gasarch, on a very interesting feasibility problem: given an m \times n grid, assign one of c colors to each position on the grid so that no rectangle ends up with the same color in all of its vertices (see the original post for applications of this problem). Many results are known for different values of m , n , and c , but the challenge (which comes with a reward of US$289) is to decide whether a feasible solution exists when m=n=17 and c = 4 .

Apparently, some attempts have been made to use Integer Programming (IP) to solve this problem:

SAT-solvers and IP-programs have been used but have not worked— however, I don’t think they were tried that seriously.

By writing this post, I hope to get enough O.R. people excited to brainstorm together in search of a solution. Given this is a feasibility problem, another method of choice here would be Constraint Programming (more on that later).

I decided to begin with the first IP formulation that came to mind. I didn’t expect it to close the deal, but I wanted to see how far it would take me. Here it is: let the binary variable x_{ijt} equal 1 if the cell in row i and column j receives color t . There are two groups of constraints.

Every cell must have a color:

\sum_{t=1}^c x_{ijt} = 1, \enspace \forall \; i,j

Every rectangle can have at most 3 vertices with the same color:

x_{ijt} + x_{i+a,jt} + x_{i,j+b,t} + x_{i+a,j+b,t} \leq 3, \enspace \forall \; i,j,a,b,t

where

1 \leq i \leq m-1 , 1 \leq j \leq n-1 , 1 \leq a \leq m-i , and 1 \leq b \leq n-j .

The objective function does not matter (in theory), but my limited experiments indicate that it’s better to minimize the sum of all x_{ijt} than it is to maximize it. Gasarch also provides this diagram with a rectangle-free subset of size 74 for a 17 x 17 grid. I then added constraints of the form x_{ij1}=1 for every cell (i,j) with an “R” in the diagram. This may be too restrictive, since it’s not clear to me whether a feasible solution to the 17 x 17 grid must contain that diagram as a subset. If that’s true, the latter constraints help with symmetry breaking and also substantially reduce the problem size. To reduce some of the color symmetry in the problem, I also arbitrarily chose cell (1,1) to contain color 2, i.e. x_{112}=1 . Note that this still allows colors 3 and 4 to be swapped. I could have set x_{123}=1 , but that would mean taking a risk.

Finally, for the 17 x 17 grid, it is suspected that each line and each row will have three colors used four times and one color used five times, hence, I also included the following constraints:

4 \leq \sum_{j=1}^n x_{ijt} \leq 5, \enspace \forall \; i,t

4 \leq \sum_{i=1}^m x_{ijt} \leq 5, \enspace \forall \; j,t

Once again, if my understanding is correct, there’s no formal proof that the above constraints are valid.

Here’s a ZIMPL model that can be used to  generate the .LP files and here’s the 17 x 17 LP. I set CPLEX’s MIP emphasis to 4 (look for hidden feasible solutions) and first ran a 14 x 14 as a warm-up. CPLEX 12.1 finds a feasible solution in under 5 seconds (I substituted 1 for 4 in the color usage constraints above). I stopped the 15 x 15 problem after 12 fruitless hours of search, so 14 x 14 is apparently the largest of the easy instances.

I started the 17 x 17 run last Thursday. The initial IP has 1156 variables and 74,620 constraints. Pre-processing reduces that to 642 variables and 15,775 constraints. After approximately 122 hours and 77,000 nodes on a dual-core machine (3.79 GHz), no solution was found.

Now it’s time for smarter ideas. Here are a few candidates:

1) Try a column-generation approach. It’s easy to find solutions for 8×8 and 9×9 grids, which will appear as sub-grids of a 17×17 solution. So it may be possible to write a set partitioning formulation (with side constraints) that has the 8×8 and 9×9 solutions as variables.

2) Try a meta-heuristic approach (e.g. simulated annealing). This problem is probably one of those for which an almost-feasible solution (like this one) can be very far from a feasible solution.

3) Try constraint programming. I think the main difficulty here will be finding a good variable and value selection heuristic. I started building a Comet model for the problem using the global constraint cardinality; here it is. It still does not include the constraints that fix the color of the rectangle free subset provided by Gasarch, but those are very easy to add (see the code that is commented out).

Let me know what your thoughts and hunches are!

6 Comments

Filed under Challenge, Constraint Programming, Integer Programming, Modeling

Facebook’s Farmville: What’s the Fastest Way to Get Rich?

One can’t help but notice that a lot of people have been playing Farmville on Facebook lately. Some of my relatives, friends, students and co-workers have taken the plunge. I don’t have time to play Farmville, but I thought it would be interesting to study it. It’s a simple and fun game, easy to explain, and apparently addictive.

You begin the game with a 12 by 12 piece of land. That gives you 144 one by one squares in which to plant crops. To simplify the explanation, I’m going to (1) restrict my attention to the four crops initially available (strawberries, eggplant, wheat, and soybeans); (2) assume crops will be harvested as soon as they are ripe (that is, ignore wilting); (3) disregard the random bonuses and gifts that show up every now and then. Blogger Mark Newheiser has a nice post about Farmville in which he includes a useful table. The cost column includes the cost of plowing the land (15 coins) plus the cost of purchasing the seeds. Because the harvest times of all four crops are multiples of four hours, we can think in terms of 4-hour periods.

Question: What’s the largest amount of cash one can obtain in 6 days? (need to rest on the 7th day).

You initially have 200 coins, one ripe and one half-grown square of strawberries, one ripe and one half-grown square of eggplant. By harvesting the two ripe squares, your fortune grows to 323 coins. I’ll also go ahead and clean (delete) the two half-grown squares so that we can begin with a clean slate. It turns out that the answer to the above question is to plant strawberries only, and as much as you can. Here’s the planting schedule:

Period Strawberry Squares
0 12
1 17
2 24
3 34
4 47
5 66
6 92
7 129

At period 8 (after 32 hours), you’ll have enough money to fill all your 144 squares with strawberries and you just continue to plow, plant and harvest them all every 4 hours. After 6 days (36 four-hour periods), you’ll have 44,913 coins (adjusted to take into account that 4 squares were already plowed at time zero). This solution is not very exciting and, moreover, easy to predict. Taking a second look at the cost table, we can see that the profit per hour of strawberries, eggplant, wheat, and soybeans are 2.5, 1, 0.903, and 1.375, respectively. This fact, combined with the fact that strawberries are the fastest to grow, yields the above result. As one progresses through the game, other crops become available and strawberries’ dominance goes away. However, we can show that things can get very complicated by changing the numbers a little bit. If we increase the cost of strawberries to 31 (including plowing), their profit per hour changes to 1 (worse than soybeans). Here’s the new optimal planting schedule for 6 days, which yields a profit of 15,725 coins:

Period Strawberry Squares Soybean Squares
0 2 8
1 3
6 1 15
7 7 1
8 6 2
9 7
12 5 26
13 8
14 13
15 29
16 25 8
17 2 27
18 57
22 16
23 75
24 69
29 16 59
30 85
35 59

Note that the best course of action now is to mix strawberries and soybeans and, at least to me, the proper way of doing it isn’t so obvious (hence the value of using Operations Research). If you’re now wondering whether there would be a situation in which the best idea is to use three different crops, the answer is yes! If we decrease the cost of eggplant to 19 (including plowing), for instance, the optimal planting schedule results in a profit of 18,157 coins and looks like this:

Period Strawberry Squares Soybean Squares Eggplant Squares
0 17
12 1 37 18
13 1
14 1 1
15 1
16 1
17 2
18 123
24 18
26 1
27 1
28 1
29 3
30 123
35 3

In conclusion, I believe that Farmville is a rich decision-making environment that can be used to illustrate some of the analytical techniques from the field of Operations Research. This simple example shows how complicated things can become, even with only four crops!

One possibility of extending this analysis would be to consider that one isn’t willing to log in every 4 hours or so in order to harvest crops, so here’s another question:

Question: Given a schedule of off-line hours (i.e. not available for harvesting), how much money can you make in 6 days?

Finally, I haven’t actually answered the question that appears in the title of this blog post: what’s the fastest way to obtain X coins? I’ll leave these last two questions as an exercise.

Technical Details:

This section of the post is dedicated to those who want to know more about how this analysis was conducted. I used an integer programming model written in AMPL (I think dynamic programming would work too). My variables are: P_{it} = number of squares planted with crop i in period t , H_{it} = number of squares of crop i harvested in period t , A_t = area in use in period t , M_t = money available in period t . Periods go from 0 to T (in our example, T=36 ). So we are interested in maximizing M_T .

The constraints update the values of A_t and M_t based on what you do in period t . Here they are (I’m omitting the details of boundary conditions):

\displaystyle A_t = A_{t-1} + \sum_{i} P_{it} - \sum_{i} H_{it} \enspace \forall \; t

\displaystyle M_t = M_{t-1} - \sum_{i} c_i P_{it} + \sum_{i} r_i H_{it} \enspace \forall \; t

where c_i and r_i are, respectively, the cost and revenue of crop i . Because we don’t consider wilting, H_{it}=P_{i(t-g_i)} , where g_i is the time it takes crop i to grow and ripen. Therefore, the H_{it} variables can be eliminated. Finally, we require M_t \geq 0 and 0 \leq A_t \leq 144 . Here’s the mixed-integer programming (MIP) model in AMPL in case anyone wants to play with it. When it is optimal to use more than one crop, the MIP can get pretty hard to solve, considering its size.

12 Comments

Filed under Applications, Exam Fun, Facebook, Farmville, Integer Programming, Modeling, Teaching

Introducing CuSPLIB

A couple of colleagues and I are doing research on single-machine cumulative scheduling problems (CuSP). As part of this effort, we’ll have to create some benchmark instances on which to test our algorithms. Some time ago, I searched around for problem instances and could not find any. People seem to be more interested in the Resource Constrained Project Scheduling Problem (RCPSP), of which the CuSP is a special case/subproblem. One of the experts in the area told me that he was unaware of any standard CuSP benchmarks and that difficult instances were hard to generate. Therefore, I decided to make our instances public, hoping that (i) this could be helpful/useful to someone else out there, and (ii) this could attract more attention to this problem. As a result, CuSPLIB is born! It includes a few pieces of code (instance generator, MIP and CP models), an initial set of 10 instances, and some discussion about integer programming models for the problem. I intend to talk about other (alternative) models and include some references in the near future. The preliminary computational results are interesting and make me believe that it’s not that difficult to find challenging instances. Let me know what you think, and feel free to contribute to CuSPLIB! I’ll be updating it little by little as our research progresses.

2 Comments

Filed under Constraint Programming, CuSPLIB, Integer Programming, Modeling, Research

Modeling Logical Conditions

circuitBinary variables are extensively used in integer programming (IP) models to represent yes/no decisions (“Should we open a new restaurant in Coconut Grove?”), and logical conditions (“If it rains tomorrow, I will not go to the beach.”). Most people are familiar with the project selection problem in which each variable Xi is equal to 1 if project “i” is selected, and it’s equal to zero otherwise. To model the condition “if project 1 is selected, then project 2 must also be selected” one can write “X1 ≤ X2″, and “if project 2 is selected, project 3 must not be selected” becomes “X2 + X3 ≤ 1″. Even students who are learning about integer programming for the first time are usually capable of coming up with those two constraints after a few minutes (or hours). A little bit of trial-and-error typically gets you there.

What if the logical condition is a tad more complicated? For instance:

If it rains tomorrow or the Heat defeat the Spurs and grandma doesn’t come to visit on the weekend, then I’ll either go to South Beach and not drink a mojito or I’ll fly to Vegas.

I’m sure it only took you about 7 seconds to figure out that the corresponding IP model is

w + q – x ≥ 0

z + w + q ≥ y

1 + q ≥ x + p

y + p – q – z ≤ 1

This assumes that the binary variables x, y, z, w, p, q are assigned the following meanings: x = whether it rains; y = whether Heat defeat Spurs; z = whether grandma comes to visit, etc.

My students always ask me if there’s a more systematic way to come up with the linear constraints other than trial-and-error or memorization of a few simple cases. This led me to write a 4-page handout on the topic. More recently, I searched around for a while (OK, I confess that my search wasn’t very extensive) and found two references: an article on INFORMS Transactions on Education and H. P. Williams’s book. It turns out that Williams (section 9.2) does an excellent job explaining the process (if only I had known that two years ago…).

I hope this handout can still be helpful to someone out there (instead of just sitting in my hard drive). As always, your comments and feedback are welcome.

3 Comments

Filed under Integer Programming, Modeling, Teaching