Tag Archives: modeling

Theory versus Practice, Intangibles, Intuition, and the Usefulness of Imperfection

I felt like putting in writing a few thoughts I often find myself telling my students, and hence this post. You can download a (nicer) PDF version of it here.

Theory versus Practice: A Challenge

It was a rainy Saturday afternoon…

“Ready? Go!”

As she reached for her pencil and began scribbling, I opened an empty Excel spreadsheet and started typing in the data and thinking out loud.

“Eight integer variables, three constraints, plus lower and upper bounds. How is it going over there?”

“I’m doing OK. Let me think.”

A minute later, I click “Solve” and declare victory.

“There you go: $2.80. Can’t beat that!”

“Five buns? Do you expect people to eat a burger with five buns?”

“Hmm…you’re right. One more constraint… voilà $2.62. The best burger on the market! What did you get?”

“Mine costs $2.61.”

“I win! Ha, ha!”

“But look at your burger! Nobody would ever want to eat this crap. Mine is much more appetizing.”

As I reluctantly examined the two solutions, there was no denying the obvious.

That was me and my wife solving the Good Burger puzzle. The challenge is to create the most expensive burger out of a given set of ingredients such as beef patties, cheese, lettuce, tomato, etc., while keeping sodium, fat, and calorie counts in check. Her hand-calculated solution was suboptimal by one cent. It was technically worse than mine, but what does optimality really mean in practice?

Optimality or Bust?

Once upon a time, at the board meeting of a for-profit company…

“Today, I’m excited to report the results from the cost-cutting initiative that my team of analytics experts developed over the last six months. We managed to produce a solution to our problem that improves profits by 6%.”

“But is this solution optimal?”

Said no one ever after hearing “improves profits by 6%.”

Intangibles and Intuition

The two anecdotes above illustrate an important idea that I emphatically stress to my students every spring semester: models aren’t perfect, and that’s perfectly OK. There’s a reason why business analytics is known as “the science of better” rather than “the science of provably optimal.” More often than not, it is impossible to capture all nuances of a real-life problem into a mathematical model. Therefore, solutions produced by such a model are to be taken with a grain of salt and cautious optimism. Do these numbers still make sense after the omitted intangibles are brought back into the picture? If so, great. If not, can we slightly modify the solution? Do we need to revise the model? When building my burger, I ignored flavor considerations and overall gastronomic appeal, whereas my wife didn’t.

Another matter with which non-experts struggle is keeping their intuition from negatively affecting their modeling choices. Or, as I like to say in class, “don’t try to solve the problem; focus on representing the problem.” A mathematical model is a translation, say, from English to formulas, of the story that warrants investigation. Letting your understanding of the story bias you into thinking the solution should/shouldn’t look a certain way, and adding that assumption into your model, can be detrimental. As humans, we tend to think intuitively, which in turn limits the universe of possibilities we look at. Good solutions to complex problems can be, at least at first sight, counterintuitive. Make sure your model has the freedom to consider “weird” courses of action as well.

The Usefulness of Imperfection

Imperfect models create imperfect solutions that can still be useful. Having a solution in the ballpark of good answers is better than looking at a blank slate and not knowing where to begin. Solving a model many times with a range of input values improves your understanding of how certain numbers relate to others. As your understanding of the problem improves, that (initially) counterintuitive solution starts to make sense. Improved understanding leads to revising your initial assumptions and even realizing that what you thought mattered is secondary. What really matters is this other number to which you didn’t pay much attention to begin with. In extreme cases, this first model may lead you to conclude that you need to create a completely different model to solve a completely different problem, and that’s OK too! You are learning about your problem in the process of trying to solve your problem. Finally, it may very well be that your problem, as originally stated, has no solution. This would have been difficult to detect by hand because complex problems have several moving parts with intricate relationships of cause and effect. Having a model whose output says “infeasible” can be a valuable tool in a meeting. It is concrete proof that the original question and/or assumptions are inconsistent in some way. (Assuming there’s no mistake in the model, of course.) From there, it will be a much shorter path to convincing people to revise the question/assumptions than it would have been otherwise. Who would have thought? Modeling also makes your meetings more efficient!

Advertisements

1 Comment

Filed under Analytics, Modeling, Promoting OR, Teaching, Tips and Tricks

Semantic Typing: When Is It Not Enough To Say That X Is Integer?

Andre Cire, John Hooker, and I recently finished a paper on an interesting, and somewhat controversial, topic that relates to high-level modeling of optimization problems. The paper is entitled “Modeling with Metaconstraints and Semantic Typing of Variables“, and its current version can be downloaded from here.

Here’s the abstract:

Recent research in the area of hybrid optimization shows that the right combination of different technologies, which exploits their complementary strengths, simplifies modeling and speeds up computation significantly. A substantial share of these computational gains comes from better communicating problem structure to solvers. Metaconstraints, which can be simple (e.g. linear) or complex (e.g. global) constraints endowed with extra behavioral parameters, allow for such richer representation of problem structure. They do, nevertheless, come with their own share of complicating issues, one of which is the identification of relationships between auxiliary variables of distinct constraint relaxations. We propose the use of additional semantic information in the declaration of decision variables as a generic solution to this issue. We present a series of examples to illustrate our ideas over a wide variety of applications.

Optimization models typically declare a variable by giving it a name and a canonical type, such as real, integer, binary, or string. However, stating that variable x is integer does not indicate whether that integer is the ID of a machine, the start time of an operation, or a production quantity. In other words, variable declarations say little about what the variable means. In the paper, we argue that giving a more specific meaning to variables through semantic typing can be beneficial for a number of reasons. For example, let’s say you need an integer variable x_j to represent the machine assigned to job j. Instead of writing something like this in your modeling language (e.g. AMPL):

var x{j in jobs} integer;

it would be beneficial to have a language that allows you to write something like this

x[j] is which machine assign(job j);

To see why, take a look at the paper ;-)

Leave a comment

Filed under Modeling, Research

Improving a Homework Problem from Ragsdale’s Textbook

UPDATE (1/15/2013): Cliff Ragsdale was kind enough to include the modification I describe below in the 7th edition of his book (it’s now problem 32 in Chapter 6). He even named a character after me! Thanks, Cliff!

When I teach the OR class to MBA students, I adopt Cliff Ragsdale’s textbook entitled “Spreadsheet Modeling and Decision Analysis“, which is now in its sixth edition. I like this book and I’m used to teaching with it. In addition, it has a large and diverse collection of interesting exercises/problems that I use both as homework problems and as inspiration for exam questions.

One of my favorite problems to assign as homework is problem number 30 in the Integer Linear Programming chapter (Chapter 6). (This number refers to the 6th edition of the book; in the 5th edition it’s problem number 29, and in the 4th edition it’s problem number 26.) Here’s the statement:

The emergency services coordinator of Clarke County is interested in locating the county’s two ambulances to maximize the number of residents that can be reached within four minutes in emergency situations. The county is divided into five regions, and the average times required to travel from one region to the next are summarized in the following table:

The population in regions 1, 2, 3, 4, and 5 are estimated as 45,000,  65,000,  28,000,  52,000, and 43,000, respectively. In which two regions should the ambulances be placed?

I love this problem. It exercises important concepts and unearths many misconceptions. It’s challenging, but not impossible, and it forces students to think about connecting distinct—albeit related—sets of variables; a common omission in models created by novice modelers. BUT, in its present form, in my humble opinion, it falls short of the masterpiece it can be. There are two main issues with the current version of this problem (think about it for a while and you’ll see what I mean):

  1. It’s easy for students to eyeball an optimal solution. So they come back to my office and say: “I don’t know what the point of this problem is; the answer is obviously equal to …” Many of them don’t even try to create a math model.
  2. Even if you model it incorrectly, that is, by choosing the wrong variables which will end up double-counting the number of people covered by the ambulances, the solution that you get is still equal to the correct solution. So when I take points off for the incorrect model, the students come back and say “But I got the right answer!”

After a few years of facing these issues, I decided I had had enough. So I changed the problem data to achieve the following (“evil”) goals:

  1. It’s not as easy to eyeball an optimal solution as it was before.
  2. If you write a model assuming every region has to be covered (which is not a requirement to begin with), you’ll get an infeasible model. In the original case, this doesn’t happen. I didn’t like that because this isn’t an explicit assumption and many students would add it in.
  3. If you pick the wrong set of variables and double-count the number of people covered, you’ll end up with an incorrect (sub-optimal) solution.

These improvements are obtained by adding a sixth region, changing the table of distances, and changing the population numbers as follows:

The new population numbers (in 1000’s) for regions 1 through 6 are, respectively, 21, 35, 15, 60, 20, and 37.

I am now much happier with this problem and my students are getting a lot more out of it (I think). At least I can tell you one thing: they’re spending a lot more time thinking about it and asking me intelligent questions. Isn’t that the whole purpose of homework? Maybe they hate me a bit more now, but I don’t mind practicing some tough love.

Feel free to use my modification if you wish. I’d love to see it included in the 7th edition of Cliff’s book.

Note to instructors: if you want to have the solution to the new version of the problem, including the Excel model, just drop me a line: tallys at miami dot edu.

Note to students: to preserve the usefulness of this problem, I cannot provide you with the solution, but if you become an MBA student at the University of Miami, I’ll give you some hints.

7 Comments

Filed under Books, Integer Programming, Modeling, Teaching

Choosing Summer Camps for Your Kids

Today I’m going to write about a decision that’s made by many American families each year: how to pick summer camps for our kids. There are several issues to take into account, such as cost, benefit, hours, and kids’ preferences. I’ll introduce an optimization model for summer camp selection through a numerical example. The example portrays a large family, but the same ideas apply if a few smaller families want to get together and solve this problem. This way they can take advantage of the discounts and take turns driving the kids around.

The Joneses have six kids: Amy, Beth, Cathy, David, Earl and Fred (yes, their first names are alphabetically sorted, matching their increasing order of age; Mr. and Mrs. Jones always knew they’d have six kids and hence named their firstborn with an ‘F’ name). This year, they’ve narrowed down their list of potential summer camps to the following ten: Math, Chess, Nature, Crafts, Cooking, Gymnastics, Soccer, Tennis, Diving, and Fishing. The Nature camp takes kids on a hike through the woods with the guidance of a biologist; they make frequent stops upon encountering specific plants and animals, during which a mini science lecture is delivered (pretty cool!). The Cooking camp involves cooking chemistry instruction, à la Alton Brown (also pretty cool).

The following table contains some data related to each camp:

The Cost column indicates the cost per child. The Discount column indicates the percentage discount that each child enrolled after the first would receive on the cost of each camp. For instance, if three children are enrolled in Math camp, the first would cost $1100, and the second and third would cost $770 each (30% less). The Hours column is self-explanatory and the last two columns indicate whether or not that particular camp develops mental and physical abilities, respectively (a value of one = yes, zero=no).

The next table shows some of the child-specific requirements:

For example, the Joneses want Fred to attend at least 3 camps that develop mental abilities, and at least 1 camp that develops physical abilities. The last two columns in the above table indicate the minimum and maximum number of camp hours for each child over the 9-week summer break.

The next thing parents need to take into account are their children’s preferences. So here they are:

The smaller the number in the above table, the more desirable a particular camp is. For example, Amy is a bit of a math nerd, and if we were to flip David’s preference scores for Math and Tennis, he could be classified as a bit of a jock. Some conflicts exist, in the sense that not all camps are compatible with each other in terms of time schedules. In this particular case, let’s assume that no child can attend both the Soccer and Tennis camps, or both the Nature and Soccer camps. Here’s how we are going to use this preference table to create a sense of fairness among the children: whenever a child that prefers camp X to camp Y goes to camp Y and doesn’t go to camp X, nobody else gets to go to camp X either. For example, if Amy goes to Nature camp and isn’t sent to either Math or Chess camp, none of her siblings are allowed to go to Math or Chess either. Conversely, if the Joneses decide to send Earl to Chess camp and Fred to Tennis camp, they must also send Earl to Tennis camp (because Earl prefers Tennis to Chess, and “Fred is going! Why can’t I go too!”). Clearly, there are other ways to use/interpret this table, such as trying to send everyone to at least one of their top N choices, but we won’t consider those alternatives here.

After taking all of the above issues and conditions into account, here’s a solution that satisfies all the requirements while resulting in the minimum cost of $22,180.00 (You guessed it…the Joneses are probably *not* among the 99%):

Amy goes to Math, Crafts, Cooking, and Tennis; Beth goes to Math, Cooking, Tennis, and Fishing; Cathy goes to Math, Crafts, Tennis, and Fishing; David goes to Math, Cooking, and Fishing; Earl goes to Math, Tennis, and Fishing; and Fred goes to Math, Crafts, Cooking, and Tennis. Mmm…interestingly, everyone goes to Math camp. I think the Joneses are on to something…

Depending on your own requirements, preferences, and costs your solution may differ, of course. But this should give you an idea of how this simple problem can easily become very complicated to solve. No need to fear, though! Operations Research is here!

Food for Thought: Here’s an interesting question that helps illustrate how high-quality solutions can be counterintuitive: by looking at the preferences table, we see that everyone prefers Soccer to Tennis. In addition, Soccer camp is less expensive than Tennis camp. So how come we send almost everyone to Tennis camp? Isn’t that strange? Let me know what you think in the comments below! That’s one of the advantages of using an analytical approach to decision making: it helps us find solutions we wouldn’t even consider otherwise because they don’t seem to make sense (at least not at first).

If you’re curious about how I managed to find the optimal solution, read on!

Details of the Analysis:

To find the minimum-cost solution, we can create a mathematical representation of the problem, a.k.a. a model, and then solve this model with the help of a computer. Let’s see how.

The first obvious decision to make is who goes where. So let the binary variable x_{ij} equal 1 when child i goes to camp j, and equal to 0 otherwise. We’ll also need another binary variable y_j that is equal to 1 when at least one child goes to camp j and equal to 0 when none of the children go to camp j. We are now ready to write our objective function and constraints. I’ll refer to the problem data using the column headings of the tables above. The subscript i will always refer to a child, and the subscript j will always refer to a camp.

To minimize the total cost, we write the following objective function:

\displaystyle \min \sum_i \sum_j (1-\mathrm{Discount}_j)\mathrm{Cost}_j x_{ij} + \sum_j \mathrm{Discount}_j \mathrm{Cost}_j y_j

Note how we are using the y_j variable to handle the discount for sending more than one child to camp j: we charge every child the discounted price in the double summation and add the discount back in only once if y_j=1.

Now we have to deal with the four requirements: minimum and maximum hours, mental activity, and physical activity. For every child i, we have to write the following four constraints:

\displaystyle \sum_j \mathrm{Hours}_j x_{ij} \geq \mathrm{MinTimeReq}_i

\displaystyle \sum_j \mathrm{Hours}_j x_{ij} \leq \mathrm{MaxTimeReq}_i

\displaystyle \sum_j \mathrm{IsMental}_j x_{ij} \geq \mathrm{MentalReq}_i

\displaystyle \sum_j \mathrm{IsPhysical}_j x_{ij} \geq \mathrm{PhysicalReq}_i

Next, we enforce the preference rules. Let’s recall the example involving Earl and Fred: if Earl goes to Chess camp and someone else (it doesn’t matter who) goes to Tennis camp, then Earl has to go to Tennis camp as well. Here’s what this constraint would look like:

x_{\mathrm{Earl},\mathrm{Chess}} + y_{\mathrm{Tennis}} - x_{\mathrm{Earl},\mathrm{Tennis}} \leq 1

Of course, we have to repeat this constraint for every child i and every pair of camps j_1 and j_2 such that child i prefers j_1 to j_2 in the following way:

x_{ij_2} + y_{j_1} - x_{ij_1} \leq 1

The camp compatibility constraints say that no child i can attend both Soccer and Tennis, or both Nature and Soccer, therefore:

x_{i,\mathrm{Soccer}} + x_{i,\mathrm{Tennis}} \leq 1

x_{i,\mathrm{Nature}} + x_{i,\mathrm{Soccer}} \leq 1

Finally, we need to relate the x_{ij} and y_j variables by stating that unless y_j=1 , no x_{ij} can be equal to 1 . So we write the following constraint for all values of i and j :

x_{ij} \leq y_j

And that’s the end of our model. Here’s a representation of this mathematical model in AMPL in case you want to play with it yourself. This is the model I used to obtain the numerical results reported above. Enjoy!

5 Comments

Filed under Applications, INFORMS Monthly Blog Challenge, Integer Programming, Mathematical Programming, Modeling, Motivation, Promoting OR, Summer camp

Big Bang Theory Party Planning

Penny is organizing a party at her apartment, but she is on a tight budget. Having a working knowledge of all of the important things in the universe, Sheldon knows everything about linear programming and offered to help her. He postulates that it’s ideal to have two kinds of mixed nuts: a plain party mix, and a luxury mix (for those with a distinct taste like himself). Based on the expected number of guests, Howard quickly calculates that they’ll need a total of at least 10 pounds of snacks, but no more than 6 pounds of each kind of mix. On his white board, Sheldon has already come up with the following table:

Raj wants to dip the hazelnuts into liquor, but that’s not in the budget, so he gives up. Leonard reminds everyone that, because of their allergies, it’s important to keep the average allergenicity level per pound in both mixes to no more than 3. Write an optimization model to help Penny prepare the two kinds of snacks at minimum cost. But be careful: Sheldon will check it later for correctness!

This post is part of my series “Having Fun with Exam Questions”. Previous questions dealt with Farmville, vampires, and (potentially) Valentine’s day.

1 Comment

Filed under Big Bang Theory, Exam Fun, Linear Programming, Modeling, Teaching

Find Out What Happens to Mr. Lovr

I’ve had a number of people tell me that they like my Valentine’s Day post, but many of them did not solve the Excel Spreadsheet. That’s the most important part of the post! You have to see what happens to Mr. Lovr! There’s no set-up necessary; just open Excel Solver (it’s an add-in you have to enable in Windows and a separate program in the Mac), and click on the “Solve” button. You’ll be glad you did.

Leave a comment

Filed under INFORMS Monthly Blog Challenge, Love, Valentine's Day

Transporting Flowers with Love

Mr. Lovr, a lonely gentleman, does not want to spend Valentine’s day alone in 2011. As one of his New Year’s resolutions, he intends to send roses to nine of his lady friends. Being an avid procrastinator, however, he waits until the last minute and finds out that only eight flower shops in his city still have roses available. For lack of better names, let’s call those flower shops F1, F2, F3, …, F8. The number of bouquets of roses available at each shop are, respectively, 4, 4, 3, 2, 2, 2, 2, and 1. Based on how well he knows each of his potential valentines, whom we’re going to call V1, V2, V3, …, V9, he calculates how many bouquets he needs to send to each of them to increase his chances of going on at least one date. The numbers are, respectively, 3, 2, 2, 2, 2, 2, 2, 2, and 3. At this point, a light bulb goes off in Mr. Lovr’s head, and he remembers from his Operations Research class that this is a transportation problem. But there’s something missing…Ah! The costs! He calls each of the eight flower shops and asks how much it would cost to ship one bouquet of roses to the addresses of each of his nine lady friends. He then compiles the following table of costs (in dollars):

He also remembers that because the total supply is equal to the total demand, he can write all of the constraints in this problem as equalities. Essentially, he has to say that, for each flower shop, the number of bouquets that it ships has to be equal to the number of bouquets that it has. Similarly, he needs one constraint for each valentine saying that the number of bouquets that they receive has to be equal to the number that they want (according to his estimates above). To avoid suspicion, he also decides that it’s better for each flower shop to send no more than one bouquet to the same person. So far, so good, but he needs a specific shipment plan because he’s running out of time.

He opens up an Excel spreadsheet and creates the following layout of cells (he chose pink to make it more romantic):

Somewhere else in his spreadsheet he also typed the table of costs shown above. To his surprise, he even remembered to use the SUMPRODUCT function to calculate the total cost expression. He clicks “Solve” and finds out that the cheapest way to send all 20 bouquets will cost him $38. Not bad…but wait a second…something amazing happened! He cannot believe his eyes! The optimal solution exhibits a very curious pattern! Could it be a Valentine’s Day miracle? Could it be the power of love? If you want to see for yourself, download Mr. Lovr’s spreadsheet, open Excel Solver, and solve the model (no setup necessary, just click the “Solve” button). (NOTE: this spreadsheet has been tested with Excel 2007 under Windows XP, and with Excel 2008 under Mac OS X Snow Leopard. I’m not sure the “trick” will work in earlier Excel versions. For a free download of Excel Solver for Mac OS X, go here.)

Thanks to my love for giving me the idea for this blog post.

12 Comments

Filed under Exam Fun, INFORMS Monthly Blog Challenge, Love, Valentine's Day