# Category Archives: Integer Programming

## Adjusting Microwave Cook Times with OR: Inspired By An xkcd Comic

I’m a big fan of xkcd.com, a webcomic written by Randall Munroe. Last Monday’s comic, entitled “Nine”, became the inspiration for this post. Here it is:

FYI: If you get curious and start trying to calculate the time adjustment function that minimizes the gap between the most-used and least-used digit (for a representative sample of common cook times) without altering any time by more than 10%, and someone asks you what you’re doing, it’s easier to just lie.

It seems Randall is trying to find (or has already found) a closed-form function to accomplish this task. I don’t endorse number discrimination either; unlike my wife who insists on adjusting restaurant tips so that the cents portion of the total amount is either zero or 50, but I digress… I’m not sure exactly how to find these adjusted cook times with a closed-form function, but I can certainly do it with OR, more specifically with an integer program. So here we go. For simplicity, I’ll restrict myself to cook times under 10 minutes.

Let’s begin with an example. As I think about the microwave usage in my house, I end up with the following cook times and how often each one is used:

$\begin{tabular}{c|c} {\bf Cook Time} & {\bf Usage Frequency (\%)} \\ \hline :30 & 20 \\ 1:00 & 30 \\ 1:30 & 10 \\ 2:00 & 30 \\ 4:30 & 10 \end{tabular}$

So let’s first calculate the usage frequency of each digit from zero to nine. The above table can be interpreted in the following way. For every 100 times I use the microwave, 20 of those times I type a 3 followed by a zero (to input a 30-second cook time), 30 of those times I type a 1 followed by two zeroes, etc. Therefore, during these 100 uses of the microwave, I type a total of 280 digits. Out of those 280 digits, 160 are zeroes, 40 are 1′s, 30 are 2′s, and so on. Hence, the usage frequency of zero—the most-used digit—is $\frac{160}{280} \approx 57.1\%$. (Usage frequencies for the remaining digits can be calculated in a similar way.) Digits 5 through 9 are apparently never used in my house, so the current difference between the most-used and least-used digit in my house is (57.1-0)%.

If, as Randall suggests, I’m allowed to adjust cook times by no more than 10% (up or down) and I want to minimize the difference in usage between the most-used and least-used digit, here’s one possible adjustment table:

$\begin{tabular}{c|c} {\bf Original Cook Time} & {\bf Adjusted Cook Time} \\ \hline :30 & :31 \\ 1:00 & 0:58 \\ 1:30 & 1:36 \\ 2:00 & 2:09 \\ 4:30 & 4:47 \end{tabular}$

Now let’s compare the usage frequency of each digit before and after the adjustment:

$\begin{tabular}{c|cc} & \multicolumn{2}{c}{\bf Usage Frequency (\%)}\\ {\bf Digit} & {\bf Before Adjustment} & {\bf After Adjustment} \\ \hline 0 & 57.1 & 12 \\ 1 & 14.3 & 12 \\ 2 & 10.7 & 12 \\ 3 & 14.3 & 12 \\ 4 & 3.6 & 8 \\ 5 & 0 & 12 \\ 6 & 0 & 4 \\ 7 & 0 & 4 \\ 8 & 0 & 12 \\ 9 & 0 & 12 \end{tabular}$

After the adjustment, the most frequently used digits are 0, 1, 2, 3, 5, 8, and 9 (12% of the time), whereas the least frequently used digits are 6 and 7 (4% of the time). The difference now is 12-4=8%, which is significantly less than 57.1%. In my household, there’s absolutely no way to do better than that. Guaranteed! (Note: this doesn’t mean there aren’t other adjustment tables that achieve the same 8% difference. In fact, there are many other ways to achieve the 8% difference.)

If you’re curious about how I computed the time-adjustment table, read on. I’ll explain the optimization model that was run behind the scenes and I’ll even provide you with an Excel spreadsheet that allows you to compute your own adjusted cook times. Let the fun begin!

Let $T$ be the set of typical cook times for the household in question. In my case, $T=\{\text{:30, 1:00, 1:30, 2:00, 4:30}\}$. For each $i \in T$, let $R(i)$ be the set of cook times that fall within 10%—or any other range you want—of $i$. For example, $R(\text{:30})=\{\text{:27, :28, :29, :30, :31, :32, :33}\}$. In addition, for each $i \in T$, let $f(i)$ be the usage frequency of cook time $i$. In my example, $f(\text{:30}) = 20$, $f(\text{1:00})=30$, and so on.

For each $i \in T$ and $j \in R(i)$, create a binary variable $y_{ij}$ that is equal to 1 if cook time $i$ is to be adjusted to cook time $j$, and equal to zero otherwise. There are $\sum_{i \in T} |R(i)|$ such variables. In my example, 119 of them. Because each original cook time has to be adjusted to (or mapped to) a unique (likely different) cook time, the first constraints of our optimization model are

$\displaystyle \sum_{j \in R(i)} y_{ij} = 1, \enspace \text{for all} \; i \in T$

To be able to calculate the difference between the most-used and least-used digit (in order to minimize it), we need to know how many times each digit is used. Let this quantity be represented by variable $z_d$, for all $d \in \{0,1,\ldots,9\}$. In my house, before the adjustment, $z_0=160$ and $z_1=40$. We now need to relate variables $z_d$ and $y_{ij}$.

For each $d \in \{0,\ldots,9\}$ and $j \in \bigcup_{i \in T} R(i)$, let $c_d(j)$ equal the number of times digit $d$ appears in cook time $j$. For example, $c_0(\text{1:00})=2$, $c_3(\text{1:30})=1$, and $c_2(\text{4:30})=0$. We are now ready to write the following constraint

$\displaystyle z_d = \sum_{i \in T} \sum_{j \in R(i)} f(i) c_d(j) y_{ij}, \enspace \text{for all} \; d \in \{0,\ldots,9\}$

Once the adjusted cook times are chosen by setting the appropriate $y_{ij}$ variables to 1, the above constraint will count the total number of times each digit $d$ is used, storing that value in $z_d$.

Our goal is to minimize the maximum difference, in absolute value, between all distinct pairs of $z_d$ variables. Because the absolute value function is not linear, and we want to preserve linearity in our optimization model (why?), I’m going to use a standard modeling trick. Let $w$ be a new variable representing the maximum difference between any distinct pair of $z_d$ variables. The objective function is simple: $\text{minimize} \; w$. To create a connection between $z_d$ and $w$, we include the following constraints in the model

$\displaystyle z_{d_1} - z_{d_2} \leq w, \enspace \text{for all} \; d_1 \neq d_2 \in \{0,\ldots,9\}$

With 10 digits, we end up with 90 such constraints, for a grand total of 105 constraints, plus 130 variables. This model is small enough to be solved with the student version of Excel Solver. I would, however, recommend using OpenSolver, if you can.

Here’s my Excel sheet that implements the model described above. It shouldn’t be too hard to understand, but feel free to ask me questions about it in the comments below. Variable cells are painted gray. Variables $y_{ij}$ are in column E, variables $z_d$ are in the range K3:T3, and variable $w$ is in cell K96. The $c_d(j)$ values and the formulas relating $z_d$ with $y_{ij}$ are calculated with the help of SUMIF functions in the range K3:T3. The differences between all pairs of $z_d$ variables are in column U. All Solver parameters are already typed in. The final values assigned to $z_d$ variables (range K3:T3) represent absolute counts. To obtain the percentages I list above, divide those values by the sum of all $z_d$‘s. (The same applies to the value of $w$ in cell K96.)

Feel free to modify this spreadsheet with your own favorite cook times and help end number discrimination in the microwaving world! Enjoy!

Filed under Applications, Integer Programming, Modeling

## Improving a Homework Problem from Ragsdale’s Textbook

When I teach the OR class to MBA students, I adopt Cliff Ragsdale’s textbook entitled “Spreadsheet Modeling and Decision Analysis“, which is now in its sixth edition. I like this book and I’m used to teaching with it. In addition, it has a large and diverse collection of interesting exercises/problems that I use both as homework problems and as inspiration for exam questions.

One of my favorite problems to assign as homework is problem number 30 in the Integer Linear Programming chapter (Chapter 6). (This number refers to the 6th edition of the book; in the 5th edition it’s problem number 29, and in the 4th edition it’s problem number 26.) Here’s the statement:

The emergency services coordinator of Clarke County is interested in locating the county’s two ambulances to maximize the number of residents that can be reached within four minutes in emergency situations. The county is divided into five regions, and the average times required to travel from one region to the next are summarized in the following table:

The population in regions 1, 2, 3, 4, and 5 are estimated as 45,000,  65,000,  28,000,  52,000, and 43,000, respectively. In which two regions should the ambulances be placed?

I love this problem. It exercises important concepts and unearths many misconceptions. It’s challenging, but not impossible, and it forces students to think about connecting distinct—albeit related—sets of variables; a common omission in models created by novice modelers. BUT, in its present form, in my humble opinion, it falls short of the masterpiece it can be. There are two main issues with the current version of this problem (think about it for a while and you’ll see what I mean):

1. It’s easy for students to eyeball an optimal solution. So they come back to my office and say: “I don’t know what the point of this problem is; the answer is obviously equal to …” Many of them don’t even try to create a math model.
2. Even if you model it incorrectly, that is, by choosing the wrong variables which will end up double-counting the number of people covered by the ambulances, the solution that you get is still equal to the correct solution. So when I take points off for the incorrect model, the students come back and say “But I got the right answer!”

After a few years of facing these issues, I decided I had had enough. So I changed the problem data to achieve the following (“evil”) goals:

1. It’s not as easy to eyeball an optimal solution as it was before.
2. If you write a model assuming every region has to be covered (which is not a requirement to begin with), you’ll get an infeasible model. In the original case, this doesn’t happen. I didn’t like that because this isn’t an explicit assumption and many students would add it in.
3. If you pick the wrong set of variables and double-count the number of people covered, you’ll end up with an incorrect (sub-optimal) solution.

These improvements are obtained by adding a sixth region, changing the table of distances, and changing the population numbers as follows:

The new population numbers (in 1000′s) for regions 1 through 6 are, respectively, 21, 35, 15, 60, 20, and 37.

I am now much happier with this problem and my students are getting a lot more out of it (I think). At least I can tell you one thing: they’re spending a lot more time thinking about it and asking me intelligent questions. Isn’t that the whole purpose of homework? Maybe they hate me a bit more now, but I don’t mind practicing some tough love.

Feel free to use my modification if you wish. I’d love to see it included in the 7th edition of Cliff’s book.

Note to instructors: if you want to have the solution to the new version of the problem, including the Excel model, just drop me a line: tallys at miami dot edu.

Note to students: to preserve the usefulness of this problem, I cannot provide you with the solution, but if you become an MBA student at the University of Miami, I’ll give you some hints.

Filed under Books, Integer Programming, Modeling, Teaching

## The Joy of Baking (Optimally)

‘Tis the season of baking all kinds of things: cookies, cakes, breads, brownies, pies, and my favorite Brazilian dessert “pudim de leite moça“, which is depicted below. Click here for the step-by-step recipe.

Many OR bloggers, such as Laura McLay and Anna Nagurney, actually enjoy baking, and they both have written posts on the subject (e.g. here and here). I happen to include myself in this group and, yes, I made the pudim shown above (using  my mom’s recipe).

My goal today is to approach the art of baking from an optimization point of view. Let’s say you have a long list of items to bake. Perhaps you’re hosting a mega party at your house, or you’re helping your local church or favorite charity with their holiday cooking. You have an oven that can only fit so much at a time (think of area, or volume). Each item to be baked occupies some space in the oven and needs to bake for a specific amount of time. In what order should you bake your items so that you finish as soon as possible? (Side note: it may not be obvious at first sight, but this is the same problem faced by a container port that needs to decide the order in which to unload cargo ships.)

In the OR world, this is a job scheduling problem with a cumulative machine. The jobs are the tasks to be performed (items to bake), the machine (or resource) is the oven. We say the oven is cumulative, as opposed to disjunctive, because it can deal with (bake) multiple items at a time. The unknowns in this optimization problem are the start times of each job (when to begin baking each item). The objective is to minimize the makespan, which is defined as the finish time of the last job (the time at which it’s OK to turn off the oven). Finally, this is a non-preemptive problem because, typically, once you start baking something, it stays in the oven until it’s done.

This problem occurs so often in practice that the Constraint Programming (CP) community created a global constraint to represent it. It’s called the cumulative constraint (what a surprise!). Here’s a reference. For example, let’s say that we have a $10$-cubic-foot (cf) oven and we need to bake five items. The baking times (in minutes) are $20$, $25$, $40$, $30$, and $30$. The space requirements in cf are, respectively, $6$, $4$, $5$, $6$, $4$. If the time at which we begin baking item $i$ is denoted by the variable $s_i$, we can write the following in a CP model:

$\mathrm{cumulative}([s_1,s_2,s_3,s_4,s_5],[20,25,40,30,30],[6,4,5,6,4],10)$

The above constraint makes sure that the start times $s_i$ are such that the capacity of the oven is never exceeded. To minimize the makespan, we have to minimize the maximum among $s_1+6$, $s_2+4$, $s_3+5$, $s_4+6$, and $s_5+4$.

It’s easy to incorporate some real-life details into this model. For example:

• Not every item will be ready to go into the oven at time zero. After all, you’re making them as you go. To take care of this, add a ready-time $r_i$ (i.e. a lower bound) to the appropriate variable: $r_i \leq s_i$.
• If a given item does not occupy the entire oven, but you still prefer to bake it alone, just artificially increase its space requirement $c_i$ to be equal to the oven’s capacity $C$.
• If you’re baking both savory and sweet things, you probably don’t want to mix them up in the oven. In that case, simply solve the problem twice.
• If, for some reason, item $i$ must be finished before item $j$ starts baking (e.g. they need different temperatures), just include the constraint $s_i + p_i \leq s_j$, where $p_i$ is the baking time of item $i$.

We could, of course, have approached this problem from an Integer Programming point of view. In that case, we’d have binary variables $x_{it}$ that are equal to 1 if you start baking item $i$ at time $t$, and equal to zero otherwise. For more details on this formulation, including model files and some pretty tough instances, take a look at the CuSPLIB web page.

In the spirit of holiday baking, I will close with some pictures of past baking jobs ran on my house’s machine (a.k.a. oven). Enjoy! :-)

Key Lime Pie

Sparkling Ginger Chip Cookies (recipe here)

## Using Airline Miles to Buy Magazines: a Hidden Deeper Lesson

Last month, I received a letter from American Airlines’ Rewards Processing Center saying that I have 4735 miles that are about to expire and asking whether I’d be interested in redeeming them for some magazine subscriptions. These were the choices:

 Magazine Issues Miles Needed Afar 6 600 Architectural Digest 12 800 Barron’s 26 1700 Business Week 50 1600 Cat Fancy 12 500 Cigar Aficionado 6 400 Daily Variety 252 5500 Diabetes Forecast 12 500 ESPN the Magazine 26 500 Ebony and Jet 38 800 Elle 12 400 Entertainment Weekly 55 1300 Essence 12 500 Essence 2yrs 24 800 Fortune 25 1400 Golf Digest 12 700 Golf Magazine 12 700 Golfweek 45 1300 Martha Stewart Living 24 1400 Money 12 800 New York Magazine 46 700 Sports Illustrated 56 1400 SI and Golf Mag 68 1500 Sports Illustrated KIDS 12 1000 The Economist 51 3200 The Wall Street Journal 190 2800 US News & World Report 12 700 Variety 50 5500 Wine Spectator 15 900 Wired 12 400

Wow! 30 different magazines! How could I possibly decide…oh, wait! Maybe I can use some analytical techniques to help me make a better decision…what’s that thing called again…O.R.!!!

First, what do I want to accomplish?

a) Get as many issues of whatever magazine as possible: this is what a dentist’s office or any place with a waiting room might want to do. Note that the WSJ subscription provides 190 issues, but let’s say I only want to consider actual magazines. One possible answer is to subscribe to ESPN the Magazine, Ebony and Jet, Entertainment Weekly, New York Magazine, and Sports Illustrated. That will buy me 221 issues and use 4700 of my 4735 miles. Just out of curiosity, I could have gotten 286 issues if I hadn’t excluded the WSJ.

b) Get as many different subscriptions as possible: This one is easy. Just pick the magazines that require the least number of miles. One solution is Afar, Cat Fancy, Cigar Aficionado, Diabetes Forecast, ESPN the Magazine, Elle, Essence, US News & World Report, and Wired. That’s a total of 9 subscriptions, using 4500 of my 4735 miles (a waste of 235 miles).

c) Get the “best” 9 magazines you can afford: Since I know I can get 9 subscriptions, what are the 9 that make me use the most out of my 4735 miles? Maybe they’ll constitute a better set of 9 than those I found in item (b) above (positive correlation between quality and miles needed?). In this case the answer is to substitute New York Magazine for Essence in the set of 9 found in item (b). This alternative totals 144 issues and wastes only 35 miles.

d) Get the 9 magazines that provide the largest total number of issues: In that case I’d want to go with Cat Fancy, Cigar Aficionado, Diabetes Forecast, ESPN the Magazine, Ebony and Jet, Elle, Essence, New York Magazine and Wired. That’s a total of 176 issues and a waste of 35 miles as well.

For those of you who are thinking “who cares?”, don’t let appearances fool you. Say that I am United Way and 4735 is my budget (in thousands of US$). There are 30 charities (“magazines”) asking me for money (the “miles needed”) and telling me that they will help a certain number of people (the “issues”, in thousands). If I fund the charities in rows 9, 10, 12, 21 and 22 of the above table, I’ll spend US$ 4,700,000 and help 221 thousand people. That’s the best I can do! (if I am not allowed to fund the charity in row 26).

Lots of other problems can be cast into this framework. Another example: NASA is sending a spaceship to Mars. My available miles are the spaceship’s cargo capacity, the magazines are scientific experiments to be conducted in space (each requiring equipment to be loaded into the spaceship (the “miles needed”) and promising a certain (quantifiable) scientific benefit (the “issues”). This a generic resource allocation problem and the mathematical model we have used here is called The 0-1 Knapsack Problem.

Technical Details:

Want to solve your own budget allocation problem? Here’s how you can do it. We have $n$ projects that require funding and a budget $B$. For each project $i = 1,\ldots,n$, let $m_i$ be how much money it needs and let $p_i$ be its payoff (e.g. lives saved). Let the binary variable $x_i$ be equal to 1 if we decide to allocate money to project $i$, and equal to zero otherwise.

The objective is to maximize the payoff $\sum_{i=1}^n p_i x_i$.

The constraint is to respect the budget: $\sum_{i=1}^n m_i x_i \leq B$.

This model also allows you to include some useful side constraints. For example: “if I fund project $i$, then I must also fund project $j$” would be written as $x_i \leq x_j$. And “if I fund project $i$, then I cannot fund project $j$“, would become $x_i + x_j \leq 1$. I’ve talked about these kinds of logical conditions in another post.

Here’s the AMPL file I used to solve the magazine problem and its variations. Enjoy!

P.S.: Being an Economics major and an amazing cook, my wife requested The Economist and Martha Stewart Living; that’s what her objective function told her to do, I guess.

Filed under Applications, Integer Programming, Knapsack, Modeling

## Let’s Join O.R. Forces to Crack the 17×17 Challenge

I recently came across this post by bit-player, which refers to this post by William Gasarch, on a very interesting feasibility problem: given an $m \times n$ grid, assign one of $c$ colors to each position on the grid so that no rectangle ends up with the same color in all of its vertices (see the original post for applications of this problem). Many results are known for different values of $m$, $n$, and $c$, but the challenge (which comes with a reward of US\$289) is to decide whether a feasible solution exists when $m=n=17$ and $c = 4$.

Apparently, some attempts have been made to use Integer Programming (IP) to solve this problem:

SAT-solvers and IP-programs have been used but have not worked— however, I don’t think they were tried that seriously.

By writing this post, I hope to get enough O.R. people excited to brainstorm together in search of a solution. Given this is a feasibility problem, another method of choice here would be Constraint Programming (more on that later).

I decided to begin with the first IP formulation that came to mind. I didn’t expect it to close the deal, but I wanted to see how far it would take me. Here it is: let the binary variable $x_{ijt}$ equal 1 if the cell in row $i$ and column $j$ receives color $t$. There are two groups of constraints.

Every cell must have a color:

$\sum_{t=1}^c x_{ijt} = 1, \enspace \forall \; i,j$

Every rectangle can have at most 3 vertices with the same color:

$x_{ijt} + x_{i+a,jt} + x_{i,j+b,t} + x_{i+a,j+b,t} \leq 3, \enspace \forall \; i,j,a,b,t$

where

$1 \leq i \leq m-1$, $1 \leq j \leq n-1$, $1 \leq a \leq m-i$, and $1 \leq b \leq n-j$.

The objective function does not matter (in theory), but my limited experiments indicate that it’s better to minimize the sum of all $x_{ijt}$ than it is to maximize it. Gasarch also provides this diagram with a rectangle-free subset of size 74 for a 17 x 17 grid. I then added constraints of the form $x_{ij1}=1$ for every cell $(i,j)$ with an “R” in the diagram. This may be too restrictive, since it’s not clear to me whether a feasible solution to the 17 x 17 grid must contain that diagram as a subset. If that’s true, the latter constraints help with symmetry breaking and also substantially reduce the problem size. To reduce some of the color symmetry in the problem, I also arbitrarily chose cell (1,1) to contain color 2, i.e. $x_{112}=1$. Note that this still allows colors 3 and 4 to be swapped. I could have set $x_{123}=1$, but that would mean taking a risk.

Finally, for the 17 x 17 grid, it is suspected that each line and each row will have three colors used four times and one color used five times, hence, I also included the following constraints:

$4 \leq \sum_{j=1}^n x_{ijt} \leq 5, \enspace \forall \; i,t$

$4 \leq \sum_{i=1}^m x_{ijt} \leq 5, \enspace \forall \; j,t$

Once again, if my understanding is correct, there’s no formal proof that the above constraints are valid.

Here’s a ZIMPL model that can be used to  generate the .LP files and here’s the 17 x 17 LP. I set CPLEX’s MIP emphasis to 4 (look for hidden feasible solutions) and first ran a 14 x 14 as a warm-up. CPLEX 12.1 finds a feasible solution in under 5 seconds (I substituted 1 for 4 in the color usage constraints above). I stopped the 15 x 15 problem after 12 fruitless hours of search, so 14 x 14 is apparently the largest of the easy instances.

I started the 17 x 17 run last Thursday. The initial IP has 1156 variables and 74,620 constraints. Pre-processing reduces that to 642 variables and 15,775 constraints. After approximately 122 hours and 77,000 nodes on a dual-core machine (3.79 GHz), no solution was found.

Now it’s time for smarter ideas. Here are a few candidates:

1) Try a column-generation approach. It’s easy to find solutions for 8×8 and 9×9 grids, which will appear as sub-grids of a 17×17 solution. So it may be possible to write a set partitioning formulation (with side constraints) that has the 8×8 and 9×9 solutions as variables.

2) Try a meta-heuristic approach (e.g. simulated annealing). This problem is probably one of those for which an almost-feasible solution (like this one) can be very far from a feasible solution.

3) Try constraint programming. I think the main difficulty here will be finding a good variable and value selection heuristic. I started building a Comet model for the problem using the global constraint cardinality; here it is. It still does not include the constraints that fix the color of the rectangle free subset provided by Gasarch, but those are very easy to add (see the code that is commented out).

Let me know what your thoughts and hunches are!

## Facebook’s Farmville: What’s the Fastest Way to Get Rich?

One can’t help but notice that a lot of people have been playing Farmville on Facebook lately. Some of my relatives, friends, students and co-workers have taken the plunge. I don’t have time to play Farmville, but I thought it would be interesting to study it. It’s a simple and fun game, easy to explain, and apparently addictive.

You begin the game with a 12 by 12 piece of land. That gives you 144 one by one squares in which to plant crops. To simplify the explanation, I’m going to (1) restrict my attention to the four crops initially available (strawberries, eggplant, wheat, and soybeans); (2) assume crops will be harvested as soon as they are ripe (that is, ignore wilting); (3) disregard the random bonuses and gifts that show up every now and then. Blogger Mark Newheiser has a nice post about Farmville in which he includes a useful table. The cost column includes the cost of plowing the land (15 coins) plus the cost of purchasing the seeds. Because the harvest times of all four crops are multiples of four hours, we can think in terms of 4-hour periods.

Question: What’s the largest amount of cash one can obtain in 6 days? (need to rest on the 7th day).

You initially have 200 coins, one ripe and one half-grown square of strawberries, one ripe and one half-grown square of eggplant. By harvesting the two ripe squares, your fortune grows to 323 coins. I’ll also go ahead and clean (delete) the two half-grown squares so that we can begin with a clean slate. It turns out that the answer to the above question is to plant strawberries only, and as much as you can. Here’s the planting schedule:

 Period Strawberry Squares 0 12 1 17 2 24 3 34 4 47 5 66 6 92 7 129

At period 8 (after 32 hours), you’ll have enough money to fill all your 144 squares with strawberries and you just continue to plow, plant and harvest them all every 4 hours. After 6 days (36 four-hour periods), you’ll have 44,913 coins (adjusted to take into account that 4 squares were already plowed at time zero). This solution is not very exciting and, moreover, easy to predict. Taking a second look at the cost table, we can see that the profit per hour of strawberries, eggplant, wheat, and soybeans are 2.5, 1, 0.903, and 1.375, respectively. This fact, combined with the fact that strawberries are the fastest to grow, yields the above result. As one progresses through the game, other crops become available and strawberries’ dominance goes away. However, we can show that things can get very complicated by changing the numbers a little bit. If we increase the cost of strawberries to 31 (including plowing), their profit per hour changes to 1 (worse than soybeans). Here’s the new optimal planting schedule for 6 days, which yields a profit of 15,725 coins:

 Period Strawberry Squares Soybean Squares 0 2 8 1 3 6 1 15 7 7 1 8 6 2 9 7 12 5 26 13 8 14 13 15 29 16 25 8 17 2 27 18 57 22 16 23 75 24 69 29 16 59 30 85 35 59

Note that the best course of action now is to mix strawberries and soybeans and, at least to me, the proper way of doing it isn’t so obvious (hence the value of using Operations Research). If you’re now wondering whether there would be a situation in which the best idea is to use three different crops, the answer is yes! If we decrease the cost of eggplant to 19 (including plowing), for instance, the optimal planting schedule results in a profit of 18,157 coins and looks like this:

 Period Strawberry Squares Soybean Squares Eggplant Squares 0 17 12 1 37 18 13 1 14 1 1 15 1 16 1 17 2 18 123 24 18 26 1 27 1 28 1 29 3 30 123 35 3

In conclusion, I believe that Farmville is a rich decision-making environment that can be used to illustrate some of the analytical techniques from the field of Operations Research. This simple example shows how complicated things can become, even with only four crops!

One possibility of extending this analysis would be to consider that one isn’t willing to log in every 4 hours or so in order to harvest crops, so here’s another question:

Question: Given a schedule of off-line hours (i.e. not available for harvesting), how much money can you make in 6 days?

Finally, I haven’t actually answered the question that appears in the title of this blog post: what’s the fastest way to obtain X coins? I’ll leave these last two questions as an exercise.

Technical Details:

This section of the post is dedicated to those who want to know more about how this analysis was conducted. I used an integer programming model written in AMPL (I think dynamic programming would work too). My variables are: $P_{it}$ = number of squares planted with crop $i$ in period $t$, $H_{it}$ = number of squares of crop $i$ harvested in period $t$, $A_t$ = area in use in period $t$, $M_t$ = money available in period $t$. Periods go from $0$ to $T$ (in our example, $T=36$). So we are interested in maximizing $M_T$.

The constraints update the values of $A_t$ and $M_t$ based on what you do in period $t$. Here they are (I’m omitting the details of boundary conditions):

$\displaystyle A_t = A_{t-1} + \sum_{i} P_{it} - \sum_{i} H_{it} \enspace \forall \; t$

$\displaystyle M_t = M_{t-1} - \sum_{i} c_i P_{it} + \sum_{i} r_i H_{it} \enspace \forall \; t$

where $c_i$ and $r_i$ are, respectively, the cost and revenue of crop $i$. Because we don’t consider wilting, $H_{it}=P_{i(t-g_i)}$, where $g_i$ is the time it takes crop $i$ to grow and ripen. Therefore, the $H_{it}$ variables can be eliminated. Finally, we require $M_t \geq 0$ and $0 \leq A_t \leq 144$. Here’s the mixed-integer programming (MIP) model in AMPL in case anyone wants to play with it. When it is optimal to use more than one crop, the MIP can get pretty hard to solve, considering its size.

## Introducing CuSPLIB

A couple of colleagues and I are doing research on single-machine cumulative scheduling problems (CuSP). As part of this effort, we’ll have to create some benchmark instances on which to test our algorithms. Some time ago, I searched around for problem instances and could not find any. People seem to be more interested in the Resource Constrained Project Scheduling Problem (RCPSP), of which the CuSP is a special case/subproblem. One of the experts in the area told me that he was unaware of any standard CuSP benchmarks and that difficult instances were hard to generate. Therefore, I decided to make our instances public, hoping that (i) this could be helpful/useful to someone else out there, and (ii) this could attract more attention to this problem. As a result, CuSPLIB is born! It includes a few pieces of code (instance generator, MIP and CP models), an initial set of 10 instances, and some discussion about integer programming models for the problem. I intend to talk about other (alternative) models and include some references in the near future. The preliminary computational results are interesting and make me believe that it’s not that difficult to find challenging instances. Let me know what you think, and feel free to contribute to CuSPLIB! I’ll be updating it little by little as our research progresses.

## Modeling Logical Conditions

Binary variables are extensively used in integer programming (IP) models to represent yes/no decisions (“Should we open a new restaurant in Coconut Grove?”), and logical conditions (“If it rains tomorrow, I will not go to the beach.”). Most people are familiar with the project selection problem in which each variable Xi is equal to 1 if project “i” is selected, and it’s equal to zero otherwise. To model the condition “if project 1 is selected, then project 2 must also be selected” one can write “X1 ≤ X2″, and “if project 2 is selected, project 3 must not be selected” becomes “X2 + X3 ≤ 1″. Even students who are learning about integer programming for the first time are usually capable of coming up with those two constraints after a few minutes (or hours). A little bit of trial-and-error typically gets you there.

What if the logical condition is a tad more complicated? For instance:

If it rains tomorrow or the Heat defeat the Spurs and grandma doesn’t come to visit on the weekend, then I’ll either go to South Beach and not drink a mojito or I’ll fly to Vegas.

I’m sure it only took you about 7 seconds to figure out that the corresponding IP model is

w + q – x ≥ 0

z + w + q ≥ y

1 + q ≥ x + p

y + p – q – z ≤ 1

This assumes that the binary variables x, y, z, w, p, q are assigned the following meanings: x = whether it rains; y = whether Heat defeat Spurs; z = whether grandma comes to visit, etc.

My students always ask me if there’s a more systematic way to come up with the linear constraints other than trial-and-error or memorization of a few simple cases. This led me to write a 4-page handout on the topic. More recently, I searched around for a while (OK, I confess that my search wasn’t very extensive) and found two references: an article on INFORMS Transactions on Education and H. P. Williams’s book. It turns out that Williams (section 9.2) does an excellent job explaining the process (if only I had known that two years ago…).

I hope this handout can still be helpful to someone out there (instead of just sitting in my hard drive). As always, your comments and feedback are welcome.