Category Archives: Applications

Can We Use Social Networks to Identify Poor Decision Making?

While suffering through the usual air travel woes recently, I felt compelled to tweet my feelings:

I was pleasantly surprised to see that Paul Rubin and Matthew Saltzman followed up with an interesting exchange:

Although I tend to agree with Paul that an outsider probably does not have enough information to decide whether or not the actions he/she sees are reasonably good given the situation (especially when it comes to the incredibly complex world or airline operations), I like Matthew’s general idea of an experiment to identify whether or not the outcome of a black-box decision making process is “good”.

Can that be done? Can we observe a black-box situation (or process) long enough to be able to tell whether the analytic machine inside the box could do better?

A large number of people carry smart phones these days, with constant connectivity to the internet. Twitter, Facebook, and FourSquare (to name a few) can determine our location, and there are other Apps that tell us which of our friends (and even non-friends) are close by. With all of this connectivity and location awareness, we can think of human beings as sophisticated sensors that collect and share information. We see a fire, a car accident, a traffic jam, an arrest, a fight, and immediately share that information with our network. In addition, human sensors are much better than electronic sensors because they can detect and interpret many other things, such as: the mood in a room (after the airline changes your gate for the third time), the meaning of an image, and so on.

Consider a hypothetical situation in which a crowded venue has to be evacuated for whatever reason. Perhaps some exits will be blocked and people will be directed to go certain places, or act a certain way. Human observers may notice a problem with the way security is handling the situation from multiple locations inside the venue, and from multiple points of view. The collection of such impressions (be they tweets, Facebook status updates, or something else) may contain clues to what’s wrong with the black-box evacuation procedure devised for that venue. For example, “avoid using the south exit because people exiting through there bump into those coming down the stairs from the second floor and everyone has to slow down quite a bit.”

In a world where Analytics and OR specialists struggle to convince companies to try new ideas, could this kind of evidence/data be used to foster collaboration? “I noticed that you did X when Y happened. It turns out that if you had done Z, you’d have achieved a better outcome, and here’s why…”

Is the airline example really too complicated to be amenable to this kind of analysis? I’m not sure. But even if it is, there may be other situations in which a social network of human sensors can collect enough information to motivate someone to open that black box and tinker with its inner workings a little bit. Those of you working in the area of social networks might be aware of something along the lines of what I (with inspiration from Matthew) have proposed above. If that’s the case, I’d love to read more about it. Please let me know in the comments.

1 Comment

Filed under Applications, INFORMS Monthly Blog Challenge, People, Social Networks, Travel

Rescue Mission, Part 3 of 3

Make sure to read part 1 and part 2 first!






Notes:

The Edelman Award presentations are available here.

The INFORMS Public Information Committee (PIC) web page is here.

A long list of real-life applications of Analytics and O.R. is available here.

The topographic maps above are actual maps of Mars.

2 Comments

Filed under Applications, Heuristics, INFORMS Monthly Blog Challenge, Promoting OR

Rescue Mission, Part 2 of 3

Make sure to read part 1 first!





TO BE CONTINUED… Read part 3 here!

Notes:

The paper by Trick et al. is available here.

The Science of Better podcasts are available here.


Leave a Comment

Filed under Applications, Heuristics, INFORMS Monthly Blog Challenge, Promoting OR

Rescue Mission, Part 1 of 3







TO BE CONTINUED…  Read part 2 here!

Note: The topographic map above is an actual map of Mars.


1 Comment

Filed under Applications, Heuristics, INFORMS Monthly Blog Challenge, Promoting OR

Should You Hire Security When Tenting Your House?

Last week I had my house tented because of termites. For those of you who don’t know what “tenting” is (I didn’t until about a year ago), it amounts to wrapping an entire house inside a huge tent and filling the tent with a poisonous gas that kills everything inside (and by everything I really do mean everything). Those who have been through this experience know what a hassle it is. We received a to-do list of pre-tenting tasks, which included:

  • Remove or discard all food that isn’t canned or packaged in tightly-sealed, never-opened containers
  • Turn off all A/C units and open one window in each room of the house
  • Open all closet and cabinet doors
  • Turn off all internal and external lights (including those operating on a timer)
  • Prune/move all outdoor plants away from the house to have a clearance of at least 18 inches
  • Soak the soil around the house (up to a foot away from the structure) on the day of the tenting
  • Warn your neighbors about the tenting (so that they can keep their pets away from the house)
  • etc.

We had to sleep two nights in a hotel, with two dogs, one of which had just had knee surgery. What an adventure!

The main point of concern was that the house would stay vulnerable (open windows) and unattended during the process. On top of that, one of our neighbors told us that he knew of a house that had been robbed during tenting a couple of months ago. So we started to consider hiring a security guard to sit outside the house for 48 hours. Would that be a good idea? Let’s think about this.

Our insurance’s deductible is $2500. I assume that if thieves are willing to risk their lives (wearing gas masks; oh yeah! they do that!) to enter a tented house, they’d steal more than $2500 worth of stuff. Therefore, being robbed would cost us $2500. This doesn’t take into account that one might have irreplaceable items in the house. However, most of the time those can be taken with you (unless they are too big or inconvenient to carry). In my case, I took the external hard drive to which I back up my data, and the mechanical pencil I’ve owned and used since 1991 (yes, you guessed right, the eraser at the end doesn’t exist any more). The security company we called would charge $15 per hour for an unarmed guard to be outside our house. Multiplying that by 48 hours brings the cost of hiring security to $720.

Let’s say that the likelihood (a.k.a. probability) of being robbed while your house is tented without a security guard is p_1 (in percentage terms; for example, p_1 for the White House is pretty close to 0%), and when a security guard is on duty that likelihood is p_2. Unless p_1 > p_2, there’s no point in having this entire discussion, so I’ll assume that is true. Here’s a pretty neat rule of thumb that you can use: divide the cost of hiring security by your deductible to get a number n between zero and one (of course, if hiring a guard costs more than your deductible, don’t do it!). Unless the presence of the guard reduces your chance of being robbed (p_1) by more than n, you should not hire security! (Later on, I’ll explain where this rule comes from.) For example, in my case 720/2500 is approximately equal to 29%. If the chance of being robbed without security is 30%, unless hiring a guard brings that chance down to 1% or less, it’s better not to do it. If the value of p_1 is less than or equal to 29% to begin with (I live in a reasonably safe neighborhood), the answer is also not to hire security (probabilities cannot be negative). This rule works regardless of the value of p_1; what matters is how great the improvement to p_1 is.

In addition to looking at the numbers, we also took into account the following clause from the security company’s contract:

…the Agency makes no warranty or guarantee, including any implied warranty of merchantability or fitness, that the service supplied will avert or prevent occurrences or the losses there from which the service is designed to detect or avert.

In other words, if you hire us (the security company) and still get robbed, we have nothing to lose!

So what did we do? We chose not to hire security and, fortunately, our house was not robbed. However, even though the tenting instructions  say that you don’t have to wash your glasses and plates after returning home, we decided to do so anyway (as they say in Brazil: “seguro morreu de velho”).

Disclaimer: The advice contained herein does not guarantee that your house will not be robbed. Use it at your own risk!

Details of the Analysis

So where does that rule of thumb come from? We can look at this problem from the point of view of a decision tree, as pictured below.

In node 0, we make one of two decisions: hire a security guard (payoff = -$720, i.e. a cost), or not (payoff = -$0). For each of those decisions (branches), we create event nodes (1 and 2) to take into account the possibility of being robbed. At the top branch of the tree (node 2), the house will be robbed with probability p_2, in which case we incur an additional cost of $2500, and the house will be safe with probability (1-p_2), in which case we incur no additional expense. Therefore, the expected monetary value of hiring security, which we call EMV_2, is to spend $720+$2500 with probability p_2, and to spend $720 with probability (1-p_2). Hence

EMV_2 = - 3220p_2 - 720(1-p_2) = - 2500p_2 - 720

Through a similar analysis of the bottom branch (node 1), we conclude that the expected monetary value of not hiring security, which we call EMV_1, is to spend $2500 with probability p_1 and to spend $0 with probability (1-p_1). Therefore

EMV_1 = -2500p_1 - 0(1-p_1) = - 2500p_1

Hiring security will be the best choice when it has greater expected monetary value than not hiring security, that is when EMV_2 > EMV_1, which yields

-2500p_2 - 720 > -2500p_1

\Downarrow

2500(p_1 - p_2) > 720

\Downarrow

p_1 - p_2 > \frac{720}{2500}

which is the result we talked about earlier (recall that p_1 > p_2).

How Does Analytics Fit In?

The Analytics process is composed of three main phases: descriptive (what does the data tell you about what has happened?), predictive (what does the data tell you about what’s likely to happen?), and prescriptive (what should you do given what you learned from the data?). In this problem we can identify a descriptive phase in which we try to obtain probabilities p_1 and p_2. This could be accomplished by looking at police or insurance company records of robberies in your area. It’s not always possible to get a hold of those records, of course, so one might need to get a little creative in estimating those numbers. Having knowledge of the probabilities, the calculation described above could be classified as a prescriptive phase: what’s the course of action? Hire security if (cost of security)/(insurance deductible) < p_1 - p_2. There is no predictive phase here because our analysis does not require the knowledge of any future event (only how likely it is to occur). Operations Research can be used in some or all of these phases. Most of what I do in my research and consulting projects lies in the prescriptive phase (optimization). Recently, however, I’ve decided to broaden my horizons and learn more about the other two phases as well, starting with some self-teaching of data mining.

13 Comments

Filed under Analytics, Applications, Decision Trees, INFORMS Monthly Blog Challenge, Security

Political Districting and OR

When I think about how Operations Research can contribute to the world of politics, one of the first things that come to mind is the redistricting process which, of course, has been receiving a lot of attention after the 2010 census. Here’s an excerpt from the census.gov web site:

Another major use for decennial census data is for geographically defining state legislative districts, a “redistricting” process that begins in 2011. The census data allow state officials to realign congressional and state legislative districts in their states, taking into account population shifts since the last census and assuring equal representation for their constituents in compliance with the “one-person, one-vote” principle of the 1965 Voting Rights Act.

One of my favorite articles on the topic is “An Optimzation-Based Heuristic for Political Districting” by Anuj Mehrotra, Ellis Johnson and George Nemhauser, which appeared in Management Science in 1998 (henceforth referred to as the MJN paper). For a districting plan to be acceptable, it has to be fair/unbiased (to avoid gerrymandering), where fairness can be defined in terms of criteria that districts have to satisfy, such as population equality, contiguity, and compactness. Contiguity means that the district cannot have holes in it. Compactness is a bit more difficult to define, but it essentially means that districts should look more like a circle or square than like a snake or a tree branch. For example, would you say that this state senate district in Florida is compact?

Another important issue is that the methodology used to create the districts be transparent. This is a strong point in favor of using mathematical techniques such as those of Operations Research. The algorithm and software code used to create the districts can be made publicly available for anyone to inspect. Specific questions about the methodology can be answered precisely and unambiguously. There’s no beating around the bush. I applaud efforts such as the Public Mapping Project, which attempts to make the districting process more transparent and increase public participation. Nevertheless, given how difficult the problem is (in both practical and mathematical terms), I believe that it would also be important to consider plans created with the help of OR. As no mathematical model is perfect, those plans will probably need to be fine-tuned by a human being using tools such as this district builder. Of crucial importance, however, is the need for clear rules to guide such modifications. When is it OK to move part of a district’s border? by how much? etc. Given that current OR software tools are capable of producing multiple high-quality solutions at the end of a run, perhaps an even better alternative is to present the decision makers with a myriad of OR-generated and man-made plans, and let them pick the most adequate one. This practice is common in sports scheduling, where the leagues typically ask schedulers to provide them with many alternative solutions to choose from.

At this point, I’d like to switch gears and explore the districting problem in more detail from an OR point of view (using some ideas from the MJN paper). Districts are made up of smaller geographical units, for example counties (or pieces of counties). For illustration purposes, if a state like Ohio, which has 88 counties, is creating 18 congressional districts and those districts are to be made up of counties only, the potential number of distinct districts is

{88 \choose 18} = 2,\!418,\!561,\!960,\!739,\!869,\!780

That number is greater than the number of miles I’d cover if I could travel at the speed of light for 400,000 years! (Without a bathroom break!) Obviously, many of those districts are not legitimate because they violate contiguity and/or compactness constraints. However, even if those invalid districts are eliminated, the remaining number of possible districts is still very large. Ideally, however, we’d like to use district building blocks that are much smaller than counties, such as zip codes, but that is still very challenging (and would increase the above number even more!). In the MJN paper, the authors create an initial plan using counties as building blocks and then fine tune the districts’ borders to equalize populations. Here are the “before” and “after” districting plans for South Carolina:

Pretty good improvement, huh? Their basic methodology is to associate one decision with each potential district indicating whether or not it is part of the final plan (a binary variable). The optimization model then selects a subset of the districts that covers each county exactly once (a set partitioning problem). Because the number of variables is huge (as we’ve seen above), they resort to a technique called branch-and-price. In short, branch-and-price solves an optimization problem by considering only a subset of its variables at a time and by adding in promising variables little-by-little as the search progresses. It’s possible to define mathematically what a missing “good” variable is, and to construct one (i.e. a new district) on-the-fly. During this process, many set partitioning problems have to be solved. Because these intermediate problems are not always solved to optimality, the final solution obtained in MJN, albeit very good, isn’t technically an optimal solution, and that’s why they use the word “heuristic” in the title of their paper.

Acknowledgment: I’d like to thank Brady Hunsaker for providing a nice collection of links in the comments section of this blog post by Michael Trick.

Update: Here is another post I came across this week that also talks about using OR for political districting.

9 Comments

Filed under Applications, INFORMS Monthly Blog Challenge, Integer Programming, Modeling, Motivation, Politics, Promoting OR

The Joy of Baking (Optimally)

‘Tis the season of baking all kinds of things: cookies, cakes, breads, brownies, pies, and my favorite Brazilian dessert “pudim de leite moça“, which is depicted below. Click here for the step-by-step recipe.

Many OR bloggers, such as Laura McLay and Anna Nagurney, actually enjoy baking, and they both have written posts on the subject (e.g. here and here). I happen to include myself in this group and, yes, I made the pudim shown above (using  my mom’s recipe).

My goal today is to approach the art of baking from an optimization point of view. Let’s say you have a long list of items to bake. Perhaps you’re hosting a mega party at your house, or you’re helping your local church or favorite charity with their holiday cooking. You have an oven that can only fit so much at a time (think of area, or volume). Each item to be baked occupies some space in the oven and needs to bake for a specific amount of time. In what order should you bake your items so that you finish as soon as possible? (Side note: it may not be obvious at first sight, but this is the same problem faced by a container port that needs to decide the order in which to unload cargo ships.)

In the OR world, this is a job scheduling problem with a cumulative machine. The jobs are the tasks to be performed (items to bake), the machine (or resource) is the oven. We say the oven is cumulative, as opposed to disjunctive, because it can deal with (bake) multiple items at a time. The unknowns in this optimization problem are the start times of each job (when to begin baking each item). The objective is to minimize the makespan, which is defined as the finish time of the last job (the time at which it’s OK to turn off the oven). Finally, this is a non-preemptive problem because, typically, once you start baking something, it stays in the oven until it’s done.

This problem occurs so often in practice that the Constraint Programming (CP) community created a global constraint to represent it. It’s called the cumulative constraint (what a surprise!). Here’s a reference. For example, let’s say that we have a 10-cubic-foot (cf) oven and we need to bake five items. The baking times (in minutes) are 20, 25, 40, 30, and 30. The space requirements in cf are, respectively, 6, 4, 5, 6, 4. If the time at which we begin baking item i is denoted by the variable s_i, we can write the following in a CP model:

\mathrm{cumulative}([s_1,s_2,s_3,s_4,s_5],[20,25,40,30,30],[6,4,5,6,4],10)

The above constraint makes sure that the start times s_i are such that the capacity of the oven is never exceeded. To minimize the makespan, we have to minimize the maximum among s_1+6, s_2+4, s_3+5, s_4+6, and s_5+4.

It’s easy to incorporate some real-life details into this model. For example:

  • Not every item will be ready to go into the oven at time zero. After all, you’re making them as you go. To take care of this, add a ready-time r_i (i.e. a lower bound) to the appropriate variable: r_i \leq s_i.
  • If a given item does not occupy the entire oven, but you still prefer to bake it alone, just artificially increase its space requirement c_i to be equal to the oven’s capacity C.
  • If you’re baking both savory and sweet things, you probably don’t want to mix them up in the oven. In that case, simply solve the problem twice.
  • If, for some reason, item i must be finished before item j starts baking (e.g. they need different temperatures), just include the constraint s_i + p_i \leq s_j, where p_i is the baking time of item i.

We could, of course, have approached this problem from an Integer Programming point of view. In that case, we’d have binary variables x_{it} that are equal to 1 if you start baking item i at time t, and equal to zero otherwise. For more details on this formulation, including model files and some pretty tough instances, take a look at the CuSPLIB web page.

In the spirit of holiday baking, I will close with some pictures of past baking jobs ran on my house’s machine (a.k.a. oven). Enjoy! :-)

Key Lime Pie

Carrot Oatmeal Cookies (recipe here)

Sparkling Ginger Chip Cookies (recipe here)

Irish Soda Bread

Six-Seed Soda Bread (recipe here)


11 Comments

Filed under Applications, Constraint Programming, CuSPLIB, Food, Holidays, INFORMS Monthly Blog Challenge, Integer Programming, Modeling, People

Ten Freshmen + 46 Slides = 1 Hour of OR Fun

I just finished my presentation to business undergraduate students and, from what I could tell by looking at them, I think it was successful. Of course the real test will be whether someone stops by my office saying “I love OR! Can I work with you?”. I want to thank our vice dean for this opportunity and I am looking forward to doing it again next year.

I closed the presentation with a little “quiz” based on a very nice paper by Brown, Klein, Rosenthal and Washburn entitled Steaming on Convex Hulls. Here’s how it goes (you can open the image on a new window to make it larger):

An aircraft carrier can run with 2 or 4 engines online. The graph below shows gallons of gasoline used per hour versus possible speeds for each engine configuration. How would you run the ship to cover 100 miles in 4 hours?

According to the article, the Navy spends over 1 billion dollars a year on surface combatants alone. An officer who became a ship commander after graduating from the academy was smart enough to solve the above problem the right way. His ship was saving so much fuel that it had to be inspected under the suspicion that it was violating safety regulations. But we all know it wasn’t. It was just a case of using analytical techniques to make better decisions.

8 Comments

Filed under Applications, Linear Programming, Motivation, Promoting OR

Better Traffic Networks Through Vehicle and Signal Coordination

My friend Phil Spadaro pointed me to two interesting articles on traffic management techniques being studied by BMW and Audi here and here. The idea is to allow traffic lights and cars to communicate, which would yield better traffic flow, reducing time spent at red lights and, as a consequence, reducing fuel consumption. From the Audi article:

The results obtained during the first travolution project in 2006 were immediate and dramatic: reduced waiting times at traffic signals cut fuel consumption by 17 percent…The secret of this success: the traffic signals in Ingolstadt are controlled by a new, adaptive computing algorithm that Audi developed in cooperation with partners at colleges of advanced engineering and in business and industry. Audi has now developed travolution still further, by enabling vehicles to communicate directly with traffic light systems, using wireless LAN and UMTS links…The traffic signals transmit data that are processed into graphic form and shown on the car’s driver information display screen. The graphics tell the driver for instance what speed to adopt so that the next traffic light changes to green before the car reaches it. This speed, which keeps the traffic flowing as smoothly as possible, can then be selected at the adaptive cruise control (ACC) – but the driver can also delegate this task to the car’s control system.

The savings are significant:

When the car is part of a network in this way, the driver can reduce the amount of time spent at a standstill and cut fuel consumption by 0.02 of a litre for every traffic-light stop and subsequent acceleration phase that can be avoided. The potential is enormous: if this new technology were applied throughout Germany, exhaust emissions could be lowered by about two million tonnes of CO2 annually, equivalent to a reduction of approximately 15 percent in CO2 from motor vehicles in urban traffic.

I am sure that there are many parts of this whole coordination process that involve some OR. It must be really cool to work on a project like this. On a different, but related, note I also believe that a lot of traffic jams have psychological reasons. People’s curiosity and lack of advance planning can severely influence their driving behavior. One great example of this is the Golden Glades fork here in Miami:

People going north on I-95 (traffic pattern on the right, going upward in the picture) have to decide among one of three directions: taking the Turnpike (letter A in the picture), continuing on I-95 (letter B), or taking the rightmost exit (letter C). It just so happens that most drivers realize that they have to change lanes at the last minute and this fork is constantly congested (even without accidents). There’s a very simple simulation experiment I always wanted to run but never had the time to: simulate the traffic flow when most people decide to change lanes very close to the fork versus the situation when people change lanes at uniformly distributed points way ahead of the fork. I bet you’d see much better flow in the second case. I hope that one day, when computers can drive for us, the driving algorithms will take care of these issues.

Leave a Comment

Filed under Applications, Network Flows, Research, Traffic Management

Pronunciation Mistakes and Border Security

I love the English language. It’s compact, flexible, and easy to learn (at least easier than Latin languages like Spanish, French or Portuguese with all those verb conjugations). Although I’ve been learning English since I was twelve years old, I still make a significant number of pronunciation mistakes. Being married to an American makes my life much easier, and I really appreciate it when my wife corrects my pronunciation (after laughing out loud for 5 minutes, of course).

Yesterday, I started thinking that it may be possible to state that native speakers of language X typically make certain pronunciation mistakes when speaking English. More precisely, I started looking at my own mistakes and the mistakes of my fellow Brazilians. This led me to state the following conjecture:

Conjecture: When asked to say the following sentence out loud, 90% of all Brazilians who can speak English will make at least one pronunciation mistake:

“Sporting summer clothes, Jimmy Buffet was drinking his favorite lager as the road wound before driving over the Potomac river, where the scarce geese population has been massacred, when he saw his friends Graham and Craig from Akron, Ohio on his rear view mirror, wearing their hunting apparel.”

Try it out and let me know what happens! (I’m already counting myself as 1 out of 1.) UPDATE (4/17/10): 3 out of 3 Brazilians have made at least one mistake.

This little exercise made me think of using speech analysis for security purposes. Let’s say you’re screening people and you’re worried that they aren’t really from the country they claim to be from (e.g. fake passport). They could be asked to read a passage of text, which will then be analyzed by a computer program that will look for pronunciation mistakes (even tiny ones) or inconsistencies. Say, for example, that people from country X typically say word Y in a certain way and this person does not. That’s a first warning sign. If enough of these signs are caught, that may mean something (statistically speaking).

Has anyone heard of this kind of screening procedure? I’d love to read something about it. Perhaps OR can be used to help choose the main words that should go into the text passages, or to help with the speech analysis.

2 Comments

Filed under Applications, Security