Semantic Typing: When Is It Not Enough To Say That X Is Integer?

Andre Cire, John Hooker, and I recently finished a paper on an interesting, and somewhat controversial, topic that relates to high-level modeling of optimization problems. The paper is entitled “Modeling with Metaconstraints and Semantic Typing of Variables“, and its current version can be downloaded from here.

Here’s the abstract:

Recent research in the area of hybrid optimization shows that the right combination of different technologies, which exploits their complementary strengths, simplifies modeling and speeds up computation significantly. A substantial share of these computational gains comes from better communicating problem structure to solvers. Metaconstraints, which can be simple (e.g. linear) or complex (e.g. global) constraints endowed with extra behavioral parameters, allow for such richer representation of problem structure. They do, nevertheless, come with their own share of complicating issues, one of which is the identification of relationships between auxiliary variables of distinct constraint relaxations. We propose the use of additional semantic information in the declaration of decision variables as a generic solution to this issue. We present a series of examples to illustrate our ideas over a wide variety of applications.

Optimization models typically declare a variable by giving it a name and a canonical type, such as real, integer, binary, or string. However, stating that variable $x$ is integer does not indicate whether that integer is the ID of a machine, the start time of an operation, or a production quantity. In other words, variable declarations say little about what the variable means. In the paper, we argue that giving a more specific meaning to variables through semantic typing can be beneficial for a number of reasons. For example, let’s say you need an integer variable $x_j$ to represent the machine assigned to job $j$. Instead of writing something like this in your modeling language (e.g. AMPL):

var x{j in jobs} integer;

it would be beneficial to have a language that allows you to write something like this

x[j] is which machine assign(job j);

To see why, take a look at the paper ;-)

Filed under Modeling, Research

Optimally Resting NBA Players

To celebrate the start of the 2013-2014 NBA season this past Tuesday, I decided to write a post on basketball. More specifically, on the important issue of how to give players some much needed rest in an “optimal” way. My inspiration came from an article by Michael Wallace published on ESPN.com on October 19. Here are some relevant excerpts:

After playing in the Miami Heat’s first five preseason games, LeBron James sat out Saturday night’s 121-96 victory over the San Antonio Spurs to rest…James said the decision to sit was part of the team’s “maintenance” process. Heat teammate Dwyane Wade played Saturday and scored 25 points in 26 minutes, but previously skipped three preseason games…”No, no injuries — just not suiting up,” James said. “It’s OK for LeBron to take one off.”

The key term here is maintenance process. You may also recall that, back in November 2012, the Spurs were fined \$250,000 by the league after coach Popovich sent Duncan, Parker, Ginobili, and Green home right before a game against the Miami Heat.

So we want to rest our players to keep them healthy, but this cannot come at the expense of losing games. There are many factors to be taken into account here, such as players’ current physical condition, strength and tightness of schedule, and match-ups (how well a team stacks up against another team), to name a few. This is definitely not an easy problem. However, some insight is better than no insight at all. Therefore, let’s see what we can do with a simple O.R. model, and then we can talk about the strengths and weaknesses of our initial approach. (Here’s where you, dear reader, are supposed to chime in!)

Let’s begin with two simple assumptions: (i) when it comes to resting, we have to take players’ individual needs into account, i.e., we’ll use player-specific data; and (ii) when it comes to the likelihood of beating an opposing team, it’s better to think in terms of full lineups, rather than in terms of individual players, i.e., we’ll use lineup-specific data. The data in assumption (i) comes from doctors, players’ medical records, and coaches’ strategies. In essence, it boils down to one number: how many minutes, at most, should each player play in each game, under ideal circumstances. A useful measure of the strength of a lineup is its adjusted plus-minus score (see, for example, the work of Wayne Winston and his book Mathletics). In summary, it’s a number that tells you how many points a given lineup plays above (or below) an average lineup in the league over 48 minutes (or over 100 possessions, or another metric of reference).

For the sake of explanation, I’ll pretend to be in charge of resting Miami Heat players (surprise!). I’ll refer to a generic lineup by the letter $i$ ($i=1,\ldots,8$), to a generic player by the letter $j$ ($j=$ LeBron, D-Wade, …, Andersen (Bird Man)), and to a generic game by the letter $k$.

What are the decisions to be made? Let’s consider a planning horizon that consists of the next 7 games (or pick your favorite number). So $k=1,\ldots,7$. For the Heat, the first 7 games of the 2013-2014 season are against the following teams: Bulls, 76ers, Nets, Wizards, Raptors, Clippers, and Celtics. For each one of my potential lineups $i$ and each game $k$, I want to figure out the number of minutes I should use lineup $i$ during game $k$. Because this is an unknown number right now, it’s a variable in the model. Let’s call it $x_{ik}$. Note it’s also OK to think of $x_{ik}$ as a percentage, rather than minutes. I’ll adopt the latter interpretation.

What are the constraints in this problem? There are three main constraints to worry about: (a) make sure to pick enough lineups to play each game in its entirety; (b) make sure your lineups are good enough to hopefully beat your opponents in each game; (c) keep track of players’ minutes, and don’t let them get out of hand. The next step is to represent each constraint mathematically.

Constraint (a): Pick enough lineups to completely cover each game. For every game $k$, we want to impose the following constraint:

$\displaystyle \sum_{i=1}^{10} x_{ik}=1$

This means that if we sum the percentage of time each lineup is used during game $k$, we reach 100%.

Constraint (b): Choose your lineups so that you expect to score enough points in every game to beat your opponents. In this example, I’ll focus on plus-minus scores, but as a coach you could focus on any metric that matters to you. Given a lineup $i$, let $p_i$ be its adjusted plus-minus score. For example, the lineup of LeBron, Wade, Bosh, Chalmers, and Allen in the 2012-2013 season had the amazing $p_i$ score of +36.9 (you can obtain these numbers, and many other neat statistics, from the web site stats.nba.com). Now let’s say you have the plus-minus score of your opponent in game $k$, which we’ll call $P_k$. One way to increase your chances of victory is by requiring that the expected plus-minus score of your lineup combination in game $k$ exceed $P_k$ by a certain amount. Therefore, for every game $k$, we write the following constraint:

$\displaystyle \sum_{i=1}^{10} p_i x_{ik} \geq P_k + 0.5$

I want to emphasize two things. First, $p_i$ can be any measure of goodness of your lineup, and it can take into account the specific opponent in game $k$. Likewise, $P_k$ can be any measure of goodness of team $k$, as long as it’s consistent with $p_i$. Second, you’re not restricted to having only one of these constraints. If many measures of goodness matter to you, add them all in. For example, if you’re playing a team that’s particularly good at rebounding and you believe that rebounding is the key to beating them (e.g. Heat vs. Pacers), then either replace the constraint above with the analogous rebounding version, or include the rebounding version in addition to the constraint above. Finally, note that I picked 0.5 as a fixed amount by which to exceed $P_k$, but it could be any number you wish, of course. It can even be a number that varies depending on the opponent.

Constraint (c): Keep track of how many minutes your players are playing above and beyond what you’d like them to play. For any given player $j$ and any given game $k$, let $m_{jk}$ be $j$‘s ideal number of playing minutes in game $k$ (make it zero if you want the player to sit out). When it’s not possible to match $m_{jk}$ exactly, we need to know how many minutes player $j$ played under or over $m_{jk}$. Let’s call these two unknown numbers (variables) $u_{jk}$ and $o_{jk}$, respectively. So, for every player $j$ and game $k$, we write the following constraint:

$\displaystyle 48\left(\sum_{i \text{ that includes } j} x_{ik}\right) + u_{jk} - o_{jk}=m_{jk}$

The expression “$i$ that includes $j$” under the summation means that we’re summing variables $x_{ik}$ for all lineups of which $j$ is a member. We’re multiplying the summation by 48 minutes because $x_{ik}$ is in percentage points and $m_{jk}$ is in minutes.

What is our goal? (a.k.a. objective function) It’s simple: we don’t want players to play too many minutes above $m_{jk}$. Because this overage amount is captured by variable $o_{jk}$, we can write our goal as:

$\displaystyle \text{minimize } \sum_{j=1}^{9} \sum_{k=1}^{7} o_{jk}$

This minimizes the total overage in playing minutes. For a more balanced solution, it’s also possible to minimize the maximum overage over all players, or add weights in front of the $o_{jk}$ variables to give preference to some players.

Now what? Well, the next step would be to solve this model and see what happens. I created a Microsoft Excel spreadsheet that can be solved with Excel Solver or OpenSolver. You can download it from here. Feel free to adapt it to your own needs and play around with it (this is the fun part!). Because my model was limited in size (I can’t use OpenSolver on my Mac at home), the solution isn’t very good (too many overage minutes). However, by adding more players and more lineups, the quality will certainly improve (use OpenSolver to break free from limits on model size). Here are some notes to help you understand the spreadsheet:

• Variables $x_{ik}$ are in the range B18:H25.
• Variables $u_{jk}$ and $o_{jk}$ are in ranges B56:J62 and B65:J71, respectively.
• Constraints (a) are implemented in rows 27, 28, 29.
• Constraints (b) are implemented in rows 33, 34, 35.
• The left-hand side of constraints (c) are in the range B74:J80. This range is required to be equal to the range B47:J53 (where the $m_{jk}$ are) inside the Solver window.
• The objective function whose formula appears above is in cell J21.

What are the pros and cons of this model? Can you make it better? No model is perfect. There are always real-life details that get omitted. The art of modeling is creating a model that is detailed enough to provide useful answers, but not too detailed to the point of requiring an unreasonable amount of time to solve. The definitions of “detailed enough” and “unreasonable amount of time” are mostly client-specific. (What would please Erik Spoelstra and his coaching staff?) What do you think are the main strengths and weaknesses in the model I describe above? What would you change? Good data is a big issue in this particular case. If you don’t like my data, can you propose alternative sources that are practical? I believe there’s plenty to talk about in this context, and I’m looking forward to receiving your feedback. Maybe we can converge to a model that is good enough for me to go knocking on the Miami Heat’s door! (Don’t worry. In the unlikely event they open the door, I’ll share the consulting fees.)

Filed under Applications, Linear Programming, Modeling, Motivation, Sports

INFORMS 2013: Sunday Nuggets

I just published my Sunday blog post at this year’s INFORMS Annual Conference, which is taking place in Minneapolis, MN. Here’s a link to it. Enjoy!

Filed under INFORMS

Improving Traveling Umpire Solutions the Miami Heat Way: Not one, not two, not three…

Those who know me are aware of my strong passion for basketball, so I had to find a way to relate this post to my favorite sport. Fans of basketball in general, and of the Miami Heat in particular, might be familiar with this video clip in which LeBron James makes a bold prediction. Back in 2010, when asked how many titles the Heat’s big three would win together, he replies “Not one, not two, not three, not four, not five, not six, not seven, …” While I’d love to see them win 8 titles, it sounds a bit (a lot) unlikely. But I can’t complain about their record so far. Winning 2 titles in 3 finals’ appearances isn’t bad at all. But what does this have to do with baseball umpires? Let’s get back to OR for a moment.

A couple of years ago, I wrote a post about scheduling baseball umpires. In that same article I co-authored with Hakan Yildiz and Michael Trick, we talked about a problem called the Traveling Umpire Problem (TUP), which doesn’t include all the details from the real problem faced by MLB but captures the most important features that make the problem difficult. Here’s a short description (detailed description here):

Given a double round-robin tournament with 2N teams, the traveling umpire problem consists of determining which games will be handled by each one of N umpire crews during the tournament. The objective is to minimize the total distance traveled by the umpires, while respecting constraints that include visiting every team at home, and not seeing a team or venue too often.

And when I say difficult, let me tell you something, it’s really hard to solve. For example, there are 16-team instances (only 8 umpires) for which no feasible solution is known.

Two of my Brazilian colleagues, Lucas de Oliveira and Cid de Souza, got interested in the TUP and asked me to join them in an effort to try to improve the quality of some of the best-known solutions in the TUP benchmark. There are 25 instances in the benchmark for which we know a feasible solution (upper bound) and a lower bound, but not the optimal value. Today, we’re very happy to report that we managed to improve the quality of many of those feasible solutions. How many, you ask? I’ll let LeBron James himself answer that question:

“Not one, not two, not three, … not ten, … not eighteen, … not twenty-three, but 24 out of 25.”

OK, LeBron got a bit carried away there. And he forgot to say we improved 25 out of the 25 best-known lower bounds too. This means those pesky optimal solutions are now sandwiched between numbers much closer to each other.

Here’s the approach we took. First, we strengthened a known optimization model for the TUP, making it capable of producing better bounds and better solutions in less time. Then, we used this stronger model to implement a relax-and-fix heuristic. It works as follows. Waiting for the optimization model to find the optimal solution would take forever because there are too many binary decision variables (they tell you which venues each umpire visits in each round of the tournament). At first, we require that only the decisions in round 1 of the tournament be binary (i.e. which games the umpires will be assigned to in round 1) and solve the problem. This solves pretty fast, but allows for umpires to be figuratively cut into pieces and spread over multiple venues in later rounds. Not a problem. That’s the beauty of math models: we test crazy ideas on a computer and don’t slice people in real life. We fix those round-1 decisions, require that only round-2 variables be binary, and solve again. This process gets repeated until the last round. In the end, we are not guaranteed to find the very best solution, but we typically find a pretty good one.

Some possible variations of the above would be to work with two (or more) rounds of binary variables at a time, start from the middle or from the end of the tournament, etc. If you’re interested in more details, our paper can be downloaded here. Our best solutions and lower bounds appear in Table 10 on page 22.

We had a lot of fun working on the TUP, and we hope these new results can help get more people excited about working on this very challenging problem.

Snapple Fact 804: How Many Ways to Make Change for a Dollar?

I went camping this past weekend (my first time) and my neighbor (who’s actually my neighbor in real life and was my neighbor at the camp site) was drinking a Snapple and read the following fact back to me:

Snapple Real Fact #804: There are 293 ways to make change for a dollar.

My first reaction was “Mmm…interesting”, but I couldn’t help wondering whether the Snapple folks did their math correctly. So after I got home and unpacked the car, I wrote a little Constraint Programming code in Comet to check this fact. It turns out that the number is indeed 293 if the following two things are allowed: (i) returning a 1-dollar coin in exchange for a dollar bill, and (ii) using half-dollar coins which, in my opinion, are rare these days. Here’s a list of the 292 ways that do not include using a 1-dollar coin which, in my opinion, isn’t really “giving change”.

If you’re wondering how many ways there are when you’re not allowed to use 1-dollar or half-dollar coins, the answer is 242. Here’s a list of all such possible ways.

Update: A friend asked me what the number would be if we considered the quarters from each of the 50 states as a different coin. In that case the number of possible ways increases to 515,184 (including the 1-dollar coin).

1 Comment

Filed under Constraint Programming, Modeling

What Do Locker Rooms, Airplanes, and Urinals Have in Common?

If you go (or have ever gone) to a reasonably large gym, you might have encountered the typical locker room changing area. It looks like this:

If you go to the gym regularly (or have been a regular gym goer at some point in your life), you might have noticed an issue that arises with some frequency: the locker you’re trying to access is very close to one in front of which someone else is standing (because their locker is right next to yours). I’ll refer to this phenomenon as interference. Interference is annoying because it creates that awkward situation in which you stand there trying to be polite and wait for the other person to finish, while at the same time getting upset because you’re wasting your precious time: “Man, I was hoping to be finished with my workout in 45 minutes. I gotta go back to the office and work on that integer programming model. Why does this guy take so long to tie his shoes?”

Interference occurs in other places, of course; hence the title of this post. When boarding planes, airlines try to be as efficient as possible, that is, they try to get everyone in their seats and ready to go in the shortest possible time. What is interference during the boarding of a plane? It’s when passengers that are standing in the aisle (e.g., because they’re still trying to put their carry-on in the overhead bin) block the passage of other passengers whose seats are further down the aisle. You might think that the obvious solution is to board everyone starting from the back of the plane towards the front, right? Well, maybe. Back-to-front boarding is intuitively good, but there are other issues at play: some passengers have priority, not everyone is there when boarding starts, etc. Another strategy that seems to work well is a hybrid of back-to-front with window-to-aisle. As you might have guessed, people have used optimization and simulation to try and come up with good boarding strategies. One of these studies was published in the journal Interfaces in 2005: “America West Airlines Develops Efficient Boarding Strategies”. This is an interesting read, and I recommend it.

Where else does interference occur? This XKCD blog post talks about the International Choice of Urinal Protocol:

…the basic premise is that the first guy picks an end urinal, and every subsequent guy chooses the urinal which puts him furthest from anyone else peeing.  At least one buffer urinal is required between any two guys or Awkwardness ensues.

Randall then proceeds to analyze this protocol and concludes that it suffers from a problem of underutilization of the available urinals, depending on how many of them there are. However, if guys are smart when picking urinals, they can achieve the optimal utilization (50%).

Now back to the locker room interference problem (which is the one that bothers me most lately). Let’s try to figure out the source of the problem and propose a solution to it. When you arrive at the University of Miami gym (known as the Wellness Center), you hand in your ID to an attendant who, in return, hands you a key that’s taken from a set of drawers that look like this (men’s lockers on the left, women’s on the right):

A key comes out of the drawer and your ID goes in. Interference is created because the attendants do not (and cannot) remember which keys they have handed out recently and what the layout of the locker room looks like. (By the way, locker numbers are not in perfect sequence in the Wellness Center; numbers jump around and you frequently see people who are lost looking for their lockers.) Ideally, what we’d like to happen is for keys to be handed out in such a way that they send the next person to a locker that is far away from the last few lockers that were given away. There are other complicating issues, of course, such as the fact that you cannot control the people who are returning from their workouts, but at least you can reduce interference among new arrivals.

We don’t need to write a mathematical model for this (or do we?). Why not pre-calculate an optimal sequence of locker hand-outs (based on the locker room layout), sort the drawers in that sequence (left to right, top to bottom), and have the attendants hand out keys in this order, cycling back to the top after they reach the last drawer? It won’t be perfect, but it sure will be better than the current system.