Category Archives: Analytics

Bringing Research into the Classroom: Can Relevant and Impactful be Easy to Explain?

math-equation_chalkboard O.R. researchers and practitioners are constantly churning out papers that tackle a wide variety of important and hard-to-solve practical problems. On one hand, as a researcher, I understand how difficult these problems can be and how it’s often the case that fancy math and complex algorithms need to be used. On the other hand, as someone who teaches optimization to MBA students who aren’t easily excited by mathematics, I’m always looking for motivational examples that are both interesting and not too complex to be understood in 5 minutes. (That’s the little slot of time I reserve at the beginning of my lectures to go over an application before the lecture itself starts.)

Every now and then, I come across a paper that fits the bill perfectly: it addresses an important problem, produces impactful results, and (here comes the rare part), accomplishes the previous two goals by using math that my MBA students can follow 100%, while being confident that they themselves could replicate it given what they learned in my course (the optimization models).

The paper to which I’m referring has recently appeared in Operations Research (Articles in Advance, January 2017): The Impact of Linear Optimization on Promotion Planning, by Maxime C. Cohen, Ngai-Hang Zachary Leung, Kiran Panchamgam, Georgia Perakis, and Anthony Smith (

If I had to pick one word to describe this paper, it would be BEAUTIFUL.

I immediately proceeded to put together a 5-minute summary presentation (8 slides) to cover the problem, approach, and results. I’ll be showing this to 100 of my MBA students on this coming Tuesday (Valentine’s Day!). I hope they love it as much as I did. Feel free to show this presentation to your own students if you wish, and let me know how it went down in the comments.

A recent Poets & Quants article explains how business schools with the highest quality teaching strive to bring their faculty’s research into the classroom so that students get to learn the latest and greatest ideas. The O.R. paper above is a perfect example of when this can be done effectively.

1 Comment

Filed under Analytics, Applications, Integer Programming, Linear Programming, Modeling, Motivation, Promoting OR, Research, Teaching

How to Build the Best Fantasy Football Team, Part 2

UPDATE on 10/5/2015: Explained how to model a requirement of baseball leagues (Requirement 4).

UPDATE on 10/8/2015: Explained how to model a different objective function (Requirement 5).


fantasy-football-ringYesterday, I wrote a post describing an optimization model for picking a set of players for a fantasy football team that maximizes the teams’ point projection, while respecting a given budget and team composition constraints. In this post I’ll assume you’re familiar with that model. (If you are not, please spend a few minutes reading this first.)

Fellow O.R. blogger and Analytics expert Matthew Galati pointed out that my model did not include all of the team-building constraints that appear on popular fantasy football web sites. Therefore, I’m writing this follow-up post to address this issue. (Thanks, Matthew!) My MBA student Kevin Bustillo was kind enough to compile a list of rules from three sites for me. (Thanks, Kevin!) After looking at them, it seems my previous model fails to deal with three kinds of requirements:

  1. Rosters must include players from at least N_1 different NFL teams (N_1=2 for Draft Kings and N_1=3 for both Fan Duel and Yahoo!).
  2. Rosters cannot have more than N_2 players from the same team (N_2=4 for Fan Duel and N_2=6 for Yahoo! Draft Kings does not seem to have this requirement).
  3. Players in the roster must represent at least N_3 different football games (Only Draft Kings seems to have this requirement, with N_3=2).

Let’s see what the math would look like for each of the three requirements above. (Converting this math into Excel formulas shouldn’t be a problem if you follow the methodology I used in my previous post.) I’ll be using the same variables I had before (recall that binary variable x_i indicates whether or not player i is on the team).

Requirement 1

Last time I checked, the NFL had 32 teams, so let’s index them with the letter j=1,2,\ldots,32 and create 32 new binary variables called y_j, each of which is equal to 1 when at least one player from team j is on our team, and equal to zero otherwise. The requirement that our team must include players from at least N_1 teams can be written as this constraint:

\displaystyle \sum_{j=1}^{32} y_j \geq N_1

The above constraint alone, however, won’t do anything unless the y_j variables are connected with the x_i variables via additional constraints. The behavior that we want to enforce is that a given y_j can only be allowed to equal 1, if at least one of the players from team j has its corresponding x variable equal to 1. To make this happen, we add the constraint below for each team j:

\displaystyle y_j \leq \sum_{\text{all players } i \text{ that belong to team } j} x_i

For example, if the Miami Dolphins are team number 1 and their players are numbered from 1 to 20, this constraint would look like this: y_1 \leq x_1 + x_2 + \cdots + x_{20}

Requirement 2

Repeat the following constraint for every team j:

\displaystyle \sum_{\text{all players } i \text{ that belong to team } j} x_i \leq N_2

Assuming again that the first 2o players represent all the players from the Miami Dolphins, this constraint on Fan Duel would look like this: x_1 + x_2 + \cdots + x_{20} \leq 4

Requirement 3

My understanding of this requirement is that it applies to short-term leagues that get decided after a given collection of games takes place (it could even be a single-day league). This could be implemented in a way that’s very similar to what I did for requirement 1. Create one binary z_g variable for each game g. It will be equal to 1 if your team includes at least one player who’s participating in game g, and equal to zero otherwise. Then, you need this constraint

\displaystyle \sum_{\text{all games } g} z_g \geq N_3

as well as the constraint below repeated for each game g:

\displaystyle z_g \leq \sum_{\text{all players } i \text{ that participate in game } g} x_i

Additional Requirements Submitted by Readers

I earlier claimed that this model can be adapted to fit fantasy leagues other than football. So here’s a question I received from one of my readers:

For fantasy baseball, some players can play multiple positions. E.g. Miguel Cabrera can play 1B or 3B. I currently use OpenSolver for DFS and haven’t found a good way to incorporate this into my model. Any ideas?

Let’s call this…

Requirement 4: What if some players can be added to the team at one of several positions?

Here’s how to take care of this. Given a player i, let the index t=1,2,\ldots,T_i represent the different positions he/she can play. Instead of having a binary variable x_i representing whether or not i is on the team, we have binary variables x_{it} (as many as there are possible values for t) representing whether or not player i is on the team at position t. Because a player can either not be picked or picked to play one position, we need the following constraint for each of these multi-position players:

\displaystyle \sum_{t=1}^{T_i} x_{it} \leq 1

Because we have replaced x_i with a collection of x_{it}‘s, we need to replace all occurrences of x_i in our model with (x_{i1} + x_{i2} + \cdots + x_{iT_i}).

In the Miguel Cabrera example above, let’s say Cabrera’s player ID (the index i) is 3, and that t=1 represents the first-base position, and t=2 represents the third-base position. The constraint above would become

x_{31} + x_{32} \leq 1

And we would replace all occurrences of x_3 in our model with (x_{31} + x_{32}).

That’s it!

Reader rs181602 asked me the following question:

I was wondering, is there a way to add an additional constraint that maximizes the minimum rating of the chosen players, if each player has some rating score. I tried to think that out, but can’t seem to get it to be linear.

Let’s call this…

Requirement 5: What if I want to maximize the point projection of the worst player on the team? (In other words, how do I make my worst player as good as possible?)

It’s possible to write a linear model to accomplish this. Technically speaking, we would be changing the objective function from maximizing the total point projection of all players on the team to maximizing the point projection of the worst player on the team. (There’s a way to do both together (sort of). I’ll say a few words about that later on.)

Here we go. Because we don’t know what the projection of the worst player is, let’s create a variable to represent it and call it z. The objective then becomes:

\max z

You might have imagined, however, that this isn’t enough. We defined in words what we want z to be, but we still need formulas to make z behave the way we want. Let M be the largest point projection among all players that could potentially be on our team. It should be clear to you that the constraint z\leq M is a valid ceiling on the value of z. In fact, the value of z will be limited above by 9 values/ceilings: the 9 point projections of the players on the team. We want the lowest of these ceilings to be as high as possible.

When a player i is not on the team (x_i=0), his point projection p_i should not interfere with the value of z. When player i is on the team (x_i=1), we would like p_i to become a ceiling for z, by enforcing z\leq p_i. The way to make this happen is to write a constraint that changes its behavior depending on the value of x_i, as follows:

z \leq p_ix_i + M(1-x_i)

We need one of these for each player. To see why the constraint above works, consider the two possibilities for x_i. When x_i=0 (player not on the team), the constraint reduces to z\leq M (the obvious ceiling), and when x_i=1 (player on the team), the constraint reduces to z\leq p_i (the ceiling we want to push up).

BONUS: What if I want, among all possible teams that have the maximum total point projection, the one team whose worst player is as good as possible? To do this, you solve two optimization problems. First solve the original model maximizing the total point projection. Then switch to this \max z model and include a constraint saying that the total point projection of your team (the objective formula of the first model) should equal the total maximum value you found earlier.

That’s it!

And that does it, folks!

Does your league have other requirements I have not addressed here? If so, let me know in the comments. I’m sure most (if not all) of them can be incorporated.


Filed under Analytics, Applications, Integer Programming, Modeling, Motivation, Sports

How to Build the Best Fantasy Football Team

Note 1: This is Part 1 of a two-part post on building fantasy league teams. Read this first and then read Part 2 here.

Note 2: Although the title says “Fantasy Football”, the model I describe below can, in principle, be modified to fit any fantasy league for any sport.

footballI’ve been recently approached by several people (some students, some friends) regarding the creation of optimal teams for fantasy football leagues. With the recent surge of betting sites like Fan Duel and Draft Kings, this has become a multi-million (or should I say, billion?) dollar industry. So I figured I’d write down a simple recipe to help everybody out. We’re about to use Prescriptive Analytics to bet on sports. Are you ready? Let’s do this! I’ll start with the math model and then show you how to make it all work using a spreadsheet.

The Rules

The fantasy football team rules state that a team must consist of:

  • 1 quarterback (QB)
  • 2 running backs (RB)
  • 3 wide receivers (WR)
  • 1 tight end (TE)
  • 1 kicker
  • 1 defense

Some leagues also have what’s called a “flex player”, which could be either a RB, WR, or TE. I’ll explain how to handle the flex player below. In addition, players have a cost and the person creating the team has a budget, call it B, to abide by (usually B is $50,000 or $60,000).

The Data

For each player i, we are given the cost mentioned above, call it c_i, and a point projection p_i. The latter is an estimate of how many points we expect that player to score in a given week or game. When it comes to the defense, although it doesn’t always score, there’s also a way to calculate points for it (e.g. points prevented). How do these point projections get calculated, you may ask? This is where Predictive Analytics come into play. It’s essentially forecasting. You look at past/recent performance, you look at the upcoming opponent, you look at players’ health, etc. There are web sites that provide you with these projections, or you can calculate your own. The more accurate you are at these predictions, the more likely you are to cash in on the bets. Here, we’ll take these numbers as given.

The Optimization Model

The main decisions to be made are simple: which players should be on our team? This can be modeled as a yes/no decision variable for each player. So let’s create a binary variable called x_i which can only take two values: it’s equal to the value 1 when player i is on our team, and it’s equal to the value zero when player i is not on our team. The value of i (the player ID) ranges from 1 to the total number of players available to us.

Our objective is to create a team with the largest possible aggregate value of projected points. That is, we want to maximize the sum of point projections of all players we include on the team. This formula looks like this:

\max \displaystyle \sum_{\text{all } i} p_i x_i

The formula above works because when a player is on the team (x_i=1), its p_i gets multiplied by one and is added to the sum, and when a player isn’t on the team (x_i=0) its p_i gets multiplied by zero and doesn’t get added to the final sum. The mechanism I just described is the main idea behind what makes all formulas in this model work. For example, if the point predictions for the first 3 players are 12, 20, and 10, the maximization function start as: \max 12x_1 + 20x_2 + 10x_3 + \cdots

The budget constraint can be written by saying that the sum of the costs of all players on our team has to be less than or equal to our budget B, like this:

\displaystyle \sum_{\text{all }i} c_i x_i \leq B

For example, if the first 3 players cost 9000, 8500, and 11000, and our budget is 60,000, the above formula would look like this: 9000x_1 + 8500x_2 + 11000x_3 + \cdots \leq 60000.

To enforce that the team has the right number of players in each position, we do it position by position. For example, to require that the team have one quarterback, we write:

\displaystyle \sum_{\text{all } i \text{ that are quarterbacks}} x_i = 1

To require that the team have two running backs and three wide receivers, we write:

\displaystyle \sum_{\text{all } i \text{ that are running backs}} x_i = 2

\displaystyle \sum_{\text{all } i \text{ that are wide receivers}} x_i = 3

The constraints for the remaining positions would be:

\displaystyle \sum_{\text{all } i \text{ that are tight ends}} x_i = 1

\displaystyle \sum_{\text{all } i \text{ that are kickers}} x_i = 1

\displaystyle \sum_{\text{all } i \text{ that are defenses}} x_i = 1

The Curious Case of the Flex Player

The flex player adds an interesting twist to this model. It’s a player that, if I understand correctly, takes the place of the kicker (meaning we would not have the kicker constraint above) and can be either a RB, WR, or TE. Therefore, right away, we have a new decision to make: what kind of player should the flex be? Let’s create three new yes/no variables to represent this decision: f_{\text{RB}}, f_{\text{WR}}, and f_{\text{TE}}. These variables mean, respectively: is the flex RB?, is the flex WR?, and is the flex TE? To indicate that only one of these things can be true, we write the constraint below:

f_{\text{RB}} + f_{\text{WR}} + f_{\text{TE}} = 1

In addition, having a flex player is equivalent to increasing the right-hand side of the constraints that count the number of RB, WR, and TE by one, but only for a single one of those constraints. We achieve this by changing these constraints from the format they had above to the following:

\displaystyle \sum_{\text{all } i \text{ that are running backs}} x_i = 2 + f_{\text{RB}}

\displaystyle \sum_{\text{all } i \text{ that are wide receivers}} x_i = 3 + f_{\text{WR}}

\displaystyle \sum_{\text{all } i \text{ that are tight ends}} x_i = 1 + f_{\text{TE}}

Note that because only one of the f variables can be equal to 1, only one of the three constraints above will have its right-hand side increased from its original value of 2, 3, or 1.

Other Potential Requirements

Due to personal preference, inside information, or other esoteric considerations, one might want to include other requirements in this model. For example, if I want the best team that includes player number 8 and excludes player number 22, I simply have to force the x variable of player 8 to be 1, and the x variable of player 22 to be zero. Another constraint that may come in handy is to say that if player 9 is on the team, then player 10 also has to be on the team. This is achieved by:

x_9 \leq x_{10}

If you wanted the opposite, that is if player 9 is on the team then player 10 is NOT on the team, you’d write:

x_9 + x_{10} \leq 1

Other conditions along these lines are also possible.

Putting It All Together

If you were patient enough to stick with me all the way through here, you’re eager to put this math to work. Let’s do it using Microsoft Excel. Start by downloading this spreadsheet and opening it on your computer. Here’s what it contains:

  • Column A: list of player names.
  • Column B: yes/no decisions for whether a player is on the team (these are the x variables that Excel Solver will compute for us).
  • Columns C through H: flags indicating whether or not a player is of a given type (0 = no, 1 = yes).
  • Columns I and J: the cost and point projections for each player.

Now scroll down so that you can see rows 144 through 150. The cells in column B are currently empty because we haven’t chosen which players to add to the team yet. But if those choices had been made (that is, if we had filled column B with 0’s and 1’s), multiplying column B with column C in a cell-wise fashion and adding it all up would tell you how many quarterbacks you have. I have included this multiplication in cell C144 using the SUMPRODUCT formula. In a similar fashion, cells D144:H144 calculate how many players of each kind we’d have once the cells in column B receive values. The calculations of total team cost and total projected points for the team are analogous to the previous calculations and also use the SUMPRODUCT formula (see cells I144 and J144). You can try picking some players by hand (putting 1’s in some cells of column B) to see how the values of the cells in row 144 will change.

If you now open the Excel Solver window (under the Data tab, if your Solver add-in is active), you’ll see that I already have the entire model set up for you. If you’ve never used Excel Solver before, the following two-part video will get you started with it: part 1 and part 2.

The objective cell is J144, and that’s what we want to maximize. The variables (a.k.a. changing cells) are the player selections in column B, plus the flex-player type decisions (cells D147:F147). The constraints say that: (1) the actual number of players of each type (C144:H144) are equal to the desired number of each type (C146:H146), (2) the total cost of the team (I144) doesn’t exceed the budget (I146), (3) the three flex-player binary variables add up to 1 (D150 = F150), and, (4) all variables in the problem are binary. (I set the required number of kickers in cell G146 to zero because we are using the flex-player option. If you can have both a flex player and a kicker, just type a 1 in cell G146.) If you click on the “Solve” button, you’ll see that the best answer is a team that costs exactly $50,000 and has a total projected point value of 78.3. Its flex player ended up being an RB.

This model is small enough that I can solve it with the free student version of Excel Solver (which comes by default with any Office installation). If you happen to have more players and your total variable count exceeds 200, the free solver won’t work. But don’t despair! There exists a great Solver add-in for Excel that is also free and has no size limit. It’s called OpenSolver, and it will work with the exact same setup I have here.

That’s it! If you have any questions or remarks, feel free to leave me a note in the comments below.

UPDATE: In a follow-up post, I explain how to model a few additional fantasy-league requirements that are not included in the model above.


Filed under Analytics, Applications, Integer Programming, Modeling, Motivation, Sports

Tenure-Track Position in Big Data Analytics, University of Miami, School of Business

I’m very happy to announce that the School of Business at the University of Miami is hiring in my department! Details below. This is an exciting time to be involved in Business Analytics!

Tenure-Track Faculty Position in Management Science (Big Data Analytics)

The Management Science Department at the University of Miami’s School of Business Administration invites applications for a tenure-track faculty position at the junior or advanced Assistant Professor level to begin in the Fall of 2015. Exceptional candidates at higher ranks will be considered subject to additional approval from the administration. Salaries are extremely competitive and commensurate with background and experience. This is a nine-month appointment but generous summer research support is anticipated from the School of Business.

Applicants with research interests in all areas of Analytics will be considered, although primary consideration will be given to those with expertise in Big Data Analytics and the computational challenges of dealing with large data sets. Expertise in, or experience with, one or more of the following is particularly welcome: MapReduce/Hadoop, Mahout, Cassandra, cloud computing, mobile/wearable technologies, social media analytics, recommendation systems, data mining and machine learning, and text mining. The Management Science Department is a diverse group of faculty with expertise in several areas within Operations Research and Analytics, including statistics and machine learning, optimization, simulation, and quality management. Duties will include research and teaching at the graduate and undergraduate levels.

Applicants should possess, or be close to completing, a PhD in computer science, operations research, statistics, or a related discipline by the start date of employment. Applications should be submitted by e-mail to, and should include the following: a curriculum vitae, up to three representative publications, brief research and teaching statements, an official graduate transcript (for the junior Assistant Professor level), information about teaching experience and performance evaluations, and three letters of recommendation. All applications completed by December 1, 2014 will receive full consideration, but candidates are urged to submit all required material as soon as possible. Applications will be accepted until the position is filled.

The University of Miami offers a comprehensive benefits package including medical and dental benefits, tuition remission, vacation, paid holidays, and much more. The University of Miami is an Equal Opportunity/Affirmative Action Employer.

Leave a comment

Filed under Analytics

The First Sentence of the Great Analytics Novel

Thedarktower7 I’ve written many times before about the importance of promoting O.R. to the general public. One of the ideas that’s been suggested by several people is the possibility of writing a work of fiction whose main character (our hero) is an O.R./Analytics person. I still believe this is a great idea, if executed properly.

Today, my wife brought to my attention The Bulwer-Lytton Fiction Contest, which, according to their web page, consists of the following:

Since 1982 the English Department at San Jose State University has sponsored the Bulwer-Lytton Fiction Contest, a whimsical literary competition that challenges entrants to compose the opening sentence to the worst of all possible novels. The contest (hereafter referred to as the BLFC) was the brainchild (or Rosemary’s baby) of Professor Scott Rice, whose graduate school excavations unearthed the source of the line “It was a dark and stormy night.” Sentenced to write a seminar paper on a minor Victorian novelist, he chose the man with the funny hyphenated name, Edward George Bulwer-Lytton, who was best known for perpetrating The Last Days of PompeiiEugene AramRienziThe CaxtonsThe Coming Race, and – not least – Paul Clifford, whose famous opener has been plagiarized repeatedly by the cartoon beagle Snoopy. No less impressively, Lytton coined phrases that have become common parlance in our language: “the pen is mightier than the sword,” “the great unwashed,” and “the almighty dollar” (the latter from The Coming Race, now available from Broadview Press).

Just like an awful first sentence can be a good indicator of a terrible book, the converse can also be true. Take, for example, the first sentence of Stephen King’s The Dark Tower series, which I happen to be reading (and loving) as we speak:

The man in black fled across the desert, and the gunslinger followed.

It’s such a strong, mysterious, and captivating sentence…

…which brings me to the point of this post. If it’s going to be difficult to write The Great Analytics Novel, what if we start by thinking about what would be the perfect, most compelling sentence to start such a novel? Yes, I propose a contest. Let’s use our artistic abilities and suggest starting sentences. Feel free to add them as comments to this post. Who knows? Maybe someone will get inspired and start writing the novel.

Here’s mine:

Upon using the word “mathematical” he knew he had lost the battle for, despite the dramatic cost savings, their logical reasoning was instantly halted, like a snowshoe hare frozen in fear of its chief predator: the Canada lynx.

I can’t wait to read your submissions!


Filed under Analytics, Books, Challenge, INFORMS Public Information Committee, Motivation, Promoting OR

Winter Blues Have You Down? Miami in February is Your Town!

Want the perfect reason to come to Miami in February? What about the 2012 INFORMS Optimization Society Conference? The conference, whose theme is “Optimization and Analytics: New Frontiers in Theory and Practice”, will be hosted by the University of Miami School of Business Administration from Friday, February 24 to Sunday, February 26 on its beautiful campus in Coral Gables, Florida. We are very fortunate to have many of the top researchers in Optimization and Analytics as members of our advisory and program committees. I expect the final conference program to be full of high-quality talks.

This is my first time as a member of an organizing committee and I’m happy to say that, despite all the work, it’s been a lot of fun!

Here’s a link to the call for abstracts and posters (we’re already accepting submissions). For more information, including important dates, registration rates, plenary speakers, and hotel reservations, visit the conference web site at, or send me an e-mail (tallys at miami dot edu). I hope to see you all here!

1 Comment

Filed under Analytics, Conferences and Events, INFORMS

Should You Hire Security When Tenting Your House?

Last week I had my house tented because of termites. For those of you who don’t know what “tenting” is (I didn’t until about a year ago), it amounts to wrapping an entire house inside a huge tent and filling the tent with a poisonous gas that kills everything inside (and by everything I really do mean everything). Those who have been through this experience know what a hassle it is. We received a to-do list of pre-tenting tasks, which included:

  • Remove or discard all food that isn’t canned or packaged in tightly-sealed, never-opened containers
  • Turn off all A/C units and open one window in each room of the house
  • Open all closet and cabinet doors
  • Turn off all internal and external lights (including those operating on a timer)
  • Prune/move all outdoor plants away from the house to have a clearance of at least 18 inches
  • Soak the soil around the house (up to a foot away from the structure) on the day of the tenting
  • Warn your neighbors about the tenting (so that they can keep their pets away from the house)
  • etc.

We had to sleep two nights in a hotel, with two dogs, one of which had just had knee surgery. What an adventure!

The main point of concern was that the house would stay vulnerable (open windows) and unattended during the process. On top of that, one of our neighbors told us that he knew of a house that had been robbed during tenting a couple of months ago. So we started to consider hiring a security guard to sit outside the house for 48 hours. Would that be a good idea? Let’s think about this.

Our insurance’s deductible is $2500. I assume that if thieves are willing to risk their lives (wearing gas masks; oh yeah! they do that!) to enter a tented house, they’d steal more than $2500 worth of stuff. Therefore, being robbed would cost us $2500. This doesn’t take into account that one might have irreplaceable items in the house. However, most of the time those can be taken with you (unless they are too big or inconvenient to carry). In my case, I took the external hard drive to which I back up my data, and the mechanical pencil I’ve owned and used since 1991 (yes, you guessed right, the eraser at the end doesn’t exist any more). The security company we called would charge $15 per hour for an unarmed guard to be outside our house. Multiplying that by 48 hours brings the cost of hiring security to $720.

Let’s say that the likelihood (a.k.a. probability) of being robbed while your house is tented without a security guard is p_1 (in percentage terms; for example, p_1 for the White House is pretty close to 0%), and when a security guard is on duty that likelihood is p_2. Unless p_1 > p_2, there’s no point in having this entire discussion, so I’ll assume that is true. Here’s a pretty neat rule of thumb that you can use: divide the cost of hiring security by your deductible to get a number n between zero and one (of course, if hiring a guard costs more than your deductible, don’t do it!). Unless the presence of the guard reduces your chance of being robbed (p_1) by more than n, you should not hire security! (Later on, I’ll explain where this rule comes from.) For example, in my case 720/2500 is approximately equal to 29%. If the chance of being robbed without security is 30%, unless hiring a guard brings that chance down to 1% or less, it’s better not to do it. If the value of p_1 is less than or equal to 29% to begin with (I live in a reasonably safe neighborhood), the answer is also not to hire security (probabilities cannot be negative). This rule works regardless of the value of p_1; what matters is how great the improvement to p_1 is.

In addition to looking at the numbers, we also took into account the following clause from the security company’s contract:

…the Agency makes no warranty or guarantee, including any implied warranty of merchantability or fitness, that the service supplied will avert or prevent occurrences or the losses there from which the service is designed to detect or avert.

In other words, if you hire us (the security company) and still get robbed, we have nothing to lose!

So what did we do? We chose not to hire security and, fortunately, our house was not robbed. However, even though the tenting instructions  say that you don’t have to wash your glasses and plates after returning home, we decided to do so anyway (as they say in Brazil: “seguro morreu de velho”).

Disclaimer: The advice contained herein does not guarantee that your house will not be robbed. Use it at your own risk!

Details of the Analysis

So where does that rule of thumb come from? We can look at this problem from the point of view of a decision tree, as pictured below.

In node 0, we make one of two decisions: hire a security guard (payoff = -$720, i.e. a cost), or not (payoff = -$0). For each of those decisions (branches), we create event nodes (1 and 2) to take into account the possibility of being robbed. At the top branch of the tree (node 2), the house will be robbed with probability p_2, in which case we incur an additional cost of $2500, and the house will be safe with probability (1-p_2), in which case we incur no additional expense. Therefore, the expected monetary value of hiring security, which we call EMV_2, is to spend $720+$2500 with probability p_2, and to spend $720 with probability (1-p_2). Hence

EMV_2 = - 3220p_2 - 720(1-p_2) = - 2500p_2 - 720

Through a similar analysis of the bottom branch (node 1), we conclude that the expected monetary value of not hiring security, which we call EMV_1, is to spend $2500 with probability p_1 and to spend $0 with probability (1-p_1). Therefore

EMV_1 = -2500p_1 - 0(1-p_1) = - 2500p_1

Hiring security will be the best choice when it has greater expected monetary value than not hiring security, that is when EMV_2 > EMV_1, which yields

-2500p_2 - 720 > -2500p_1


2500(p_1 - p_2) > 720


p_1 - p_2 > \frac{720}{2500}

which is the result we talked about earlier (recall that p_1 > p_2).

How Does Analytics Fit In?

The Analytics process is composed of three main phases: descriptive (what does the data tell you about what has happened?), predictive (what does the data tell you about what’s likely to happen?), and prescriptive (what should you do given what you learned from the data?). In this problem we can identify a descriptive phase in which we try to obtain probabilities p_1 and p_2. This could be accomplished by looking at police or insurance company records of robberies in your area. It’s not always possible to get a hold of those records, of course, so one might need to get a little creative in estimating those numbers. Having knowledge of the probabilities, the calculation described above could be classified as a prescriptive phase: what’s the course of action? Hire security if (cost of security)/(insurance deductible) < p_1 - p_2. There is no predictive phase here because our analysis does not require the knowledge of any future event (only how likely it is to occur). Operations Research can be used in some or all of these phases. Most of what I do in my research and consulting projects lies in the prescriptive phase (optimization). Recently, however, I’ve decided to broaden my horizons and learn more about the other two phases as well, starting with some self-teaching of data mining.


Filed under Analytics, Applications, Decision Trees, INFORMS Monthly Blog Challenge, Security