**Tenure-Track Faculty Position in Management Science (Big Data Analytics)**

The Management Science Department at the University of Miami’s School of Business Administration invites applications for a tenure-track faculty position at the junior or advanced Assistant Professor level to begin in the Fall of 2015. Exceptional candidates at higher ranks will be considered subject to additional approval from the administration. Salaries are extremely competitive and commensurate with background and experience. This is a nine-month appointment but generous summer research support is anticipated from the School of Business.

Applicants with research interests in all areas of Analytics will be considered, although primary consideration will be given to those with expertise in Big Data Analytics and the computational challenges of dealing with large data sets. Expertise in, or experience with, one or more of the following is particularly welcome: MapReduce/Hadoop, Mahout, Cassandra, cloud computing, mobile/wearable technologies, social media analytics, recommendation systems, data mining and machine learning, and text mining. The Management Science Department is a diverse group of faculty with expertise in several areas within Operations Research and Analytics, including statistics and machine learning, optimization, simulation, and quality management. Duties will include research and teaching at the graduate and undergraduate levels.

Applicants should possess, or be close to completing, a PhD in computer science, operations research, statistics, or a related discipline by the start date of employment. Applications should be submitted by e-mail to facultyaffairs@bus.miami.edu, and should include the following: a curriculum vitae, up to three representative publications, brief research and teaching statements, an official graduate transcript (for the junior Assistant Professor level), information about teaching experience and performance evaluations, and three letters of recommendation. All applications completed by December 1, 2014 will receive full consideration, but candidates are urged to submit all required material as soon as possible. Applications will be accepted until the position is filled.

The University of Miami offers a comprehensive benefits package including medical and dental benefits, tuition remission, vacation, paid holidays, and much more. The University of Miami is an Equal Opportunity/Affirmative Action Employer.

]]>

htlatex file.tex “xhtml,charset=utf-8″ ” -cunihtf -utf8″

Overall, I’m very happy with the results produced by htlatex. Nevertheless, as I loaded file.html on my iPhone, I noticed that mobile Safari does not render all ligatures properly. For example, it has no problem with the ‘fi’ ligature, but it displays a hollow square in place of the characters for ‘ff’ and ‘ffi’ ligatures. I have not tested other mobile browsers, so I’m not sure if this is only an issue with mobile Safari. Safari on my desktop computer does not exhibit this problem.

To be safe, I thought I’d be better off removing all ligatures from the HTML file, which led me to search around for their UTF-8 codes and to write a little command-shell script that uses Perl to perform the task. Since this might turn out to be useful to someone else out there, I decided to post my shell script here. Use it at your own risk and enjoy!

perl -pi -e ‘s/\xef\xac\x80/ff/g’ file.html

perl -pi -e ‘s/\xef\xac\x81/fi/g’ file.html

perl -pi -e ‘s/\xef\xac\x82/fl/g’ file.html

perl -pi -e ‘s/\xef\xac\x83/ffi/g’ file.html

perl -pi -e ‘s/\xef\xac\x84/ffl/g’ file.html

perl -pi -e ‘s/\xc5\x92/OE/g’ file.html

perl -pi -e ‘s/\xc5\x93/oe/g’ file.html

perl -pi -e ‘s/\xc3\x86/AE/g’ file.html

perl -pi -e ‘s/\xc3\xa6/ae/g’ file.html

perl -pi -e ‘s/\xef\xac\x86/st/g’ file.html

perl -pi -e ‘s/\xc4\xb2/IJ/g’ file.html

perl -pi -e ‘s/\xc4\xb3/ij/g’ file.html

By the way, I’m only concerned with Latin ligatures, but you can find UTF-8 codes for other ligatures on this page. Bonus: here’s another useful article related to this topic: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!).

]]>

Today, my wife brought to my attention The Bulwer-Lytton Fiction Contest, which, according to their web page, consists of the following:

Since 1982 the English Department at San Jose State University has sponsored the Bulwer-Lytton Fiction Contest, a whimsical literary competition that challenges entrants to compose the opening sentence to the worst of all possible novels. The contest (hereafter referred to as the BLFC) was the brainchild (or Rosemary’s baby) of Professor Scott Rice, whose graduate school excavations unearthed the source of the line “It was a dark and stormy night.” Sentenced to write a seminar paper on a minor Victorian novelist, he chose the man with the funny hyphenated name, Edward George Bulwer-Lytton, who was best known for perpetrating

The Last Days of Pompeii,Eugene Aram,Rienzi,The Caxtons,The Coming Race, and – not least –Paul Clifford, whose famous opener has been plagiarized repeatedly by the cartoon beagle Snoopy. No less impressively, Lytton coined phrases that have become common parlance in our language: “the pen is mightier than the sword,” “the great unwashed,” and “the almighty dollar” (the latter fromThe Coming Race, now available from Broadview Press).

Just like an awful first sentence can be a good indicator of a terrible book, the converse can also be true. Take, for example, the first sentence of Stephen King’s The Dark Tower series, which I happen to be reading (and loving) as we speak:

The man in black fled across the desert, and the gunslinger followed.

It’s such a strong, mysterious, and captivating sentence…

…which brings me to the point of this post. If it’s going to be difficult to write *The Great Analytics Novel*, what if we start by thinking about what would be the perfect, most compelling sentence to start such a novel? Yes, I propose a contest. Let’s use our artistic abilities and suggest starting sentences. Feel free to add them as comments to this post. Who knows? Maybe someone will get inspired and start writing the novel.

Here’s mine:

Upon using the word “mathematical” he knew he had lost the battle for, despite the dramatic cost savings, their logical reasoning was instantly halted, like a snowshoe hare frozen in fear of its chief predator: the Canada lynx.

I can’t wait to read your submissions!

]]>

Here’s the abstract:

Recent research in the area of hybrid optimization shows that the right combination of different technologies, which exploits their complementary strengths, simplifies modeling and speeds up computation significantly. A substantial share of these computational gains comes from better communicating problem structure to solvers. Metaconstraints, which can be simple (e.g. linear) or complex (e.g. global) constraints endowed with extra behavioral parameters, allow for such richer representation of problem structure. They do, nevertheless, come with their own share of complicating issues, one of which is the identification of relationships between auxiliary variables of distinct constraint relaxations. We propose the use of additional semantic information in the declaration of decision variables as a generic solution to this issue. We present a series of examples to illustrate our ideas over a wide variety of applications.

Optimization models typically declare a variable by giving it a name and a canonical type, such as real, integer, binary, or string. However, stating that variable is integer does not indicate whether that integer is the ID of a machine, the start time of an operation, or a production quantity. In other words, variable declarations say little about what the variable means. In the paper, we argue that giving a more specific meaning to variables through semantic typing can be beneficial for a number of reasons. For example, let’s say you need an integer variable to represent the machine assigned to job . Instead of writing something like this in your modeling language (e.g. AMPL):

var x{j in jobs} integer;

it would be beneficial to have a language that allows you to write something like this

x[j] is which machine assign(job j);

To see why, take a look at the paper ;-)

]]>

After playing in the Miami Heat’s first five preseason games, LeBron James sat out Saturday night’s 121-96 victory over the San Antonio Spurs to rest…James said the decision to sit was part of the team’s “maintenance” process. Heat teammate Dwyane Wade played Saturday and scored 25 points in 26 minutes, but previously skipped three preseason games…”No, no injuries — just not suiting up,” James said. “It’s OK for LeBron to take one off.”

The key term here is *maintenance process*. You may also recall that, back in November 2012, the Spurs were fined $250,000 by the league after coach Popovich sent Duncan, Parker, Ginobili, and Green home right before a game against the Miami Heat.

So we want to rest our players to keep them healthy, but this cannot come at the expense of losing games. There are many factors to be taken into account here, such as players’ current physical condition, strength and tightness of schedule, and match-ups (how well a team stacks up against another team), to name a few. This is definitely not an easy problem. However, some insight is better than no insight at all. Therefore, let’s see what we can do with a simple O.R. model, and then we can talk about the strengths and weaknesses of our initial approach. (Here’s where *you*, dear reader, are supposed to chime in!)

Let’s begin with two simple assumptions: (i) when it comes to resting, we have to take players’ individual needs into account, i.e., we’ll use player-specific data; and (ii) when it comes to the likelihood of beating an opposing team, it’s better to think in terms of full lineups, rather than in terms of individual players, i.e., we’ll use lineup-specific data. The data in assumption (i) comes from doctors, players’ medical records, and coaches’ strategies. In essence, it boils down to one number: how many minutes, at most, should each player play in each game, under ideal circumstances. A useful measure of the strength of a lineup is its adjusted plus-minus score (see, for example, the work of Wayne Winston and his book Mathletics). In summary, it’s a number that tells you how many points a given lineup plays above (or below) an average lineup in the league over 48 minutes (or over 100 possessions, or another metric of reference).

For the sake of explanation, I’ll pretend to be in charge of resting Miami Heat players (surprise!). I’ll refer to a generic lineup by the letter (), to a generic player by the letter ( LeBron, D-Wade, …, Andersen (Bird Man)), and to a generic game by the letter .

We’re now ready to begin. Fasten your seat belts!

**What are the decisions to be made?** Let’s consider a planning horizon that consists of the next 7 games (or pick your favorite number). So . For the Heat, the first 7 games of the 2013-2014 season are against the following teams: Bulls, 76ers, Nets, Wizards, Raptors, Clippers, and Celtics. For each one of my potential lineups and each game , I want to figure out the number of minutes I should use lineup during game . Because this is an unknown number right now, it’s a variable in the model. Let’s call it . Note it’s also OK to think of as a percentage, rather than minutes. I’ll adopt the latter interpretation.

**What are the constraints in this problem?** There are three main constraints to worry about: (a) make sure to pick enough lineups to play each game in its entirety; (b) make sure your lineups are good enough to hopefully beat your opponents in each game; (c) keep track of players’ minutes, and don’t let them get out of hand. The next step is to represent each constraint mathematically.

**Constraint (a):** Pick enough lineups to completely cover each game. For every game , we want to impose the following constraint:

This means that if we sum the percentage of time each lineup is used during game , we reach 100%.

**Constraint (b):** Choose your lineups so that you expect to score enough points in every game to beat your opponents. In this example, I’ll focus on plus-minus scores, but as a coach you could focus on any metric that matters to you. Given a lineup , let be its adjusted plus-minus score. For example, the lineup of LeBron, Wade, Bosh, Chalmers, and Allen in the 2012-2013 season had the amazing score of +36.9 (you can obtain these numbers, and many other neat statistics, from the web site stats.nba.com). Now let’s say you have the plus-minus score of your opponent in game , which we’ll call . One way to increase your chances of victory is by requiring that the expected plus-minus score of your lineup combination in game exceed by a certain amount. Therefore, for every game , we write the following constraint:

I want to emphasize two things. First, can be *any* measure of goodness of your lineup, and it can take into account the specific opponent in game . Likewise, can be any measure of goodness of team , as long as it’s consistent with . Second, you’re not restricted to having only one of these constraints. If many measures of goodness matter to you, add them all in. For example, if you’re playing a team that’s particularly good at rebounding and you believe that rebounding is the key to beating them (e.g. Heat vs. Pacers), then either replace the constraint above with the analogous rebounding version, or include the rebounding version in addition to the constraint above. Finally, note that I picked 0.5 as a fixed amount by which to exceed , but it could be any number you wish, of course. It can even be a number that varies depending on the opponent.

**Constraint (c):** Keep track of how many minutes your players are playing above and beyond what you’d like them to play. For any given player and any given game , let be ‘s ideal number of playing minutes in game (make it zero if you want the player to sit out). When it’s not possible to match exactly, we need to know how many minutes player played under or over . Let’s call these two unknown numbers (variables) and , respectively. So, for every player and game , we write the following constraint:

The expression “ that includes ” under the summation means that we’re summing variables for all lineups of which is a member. We’re multiplying the summation by 48 minutes because is in percentage points and is in minutes.

**What is our goal? (a.k.a. objective function)** It’s simple: we don’t want players to play too many minutes above . Because this overage amount is captured by variable , we can write our goal as:

This minimizes the total overage in playing minutes. For a more balanced solution, it’s also possible to minimize the maximum overage over all players, or add weights in front of the variables to give preference to some players.

**Now what?** Well, the next step would be to solve this model and see what happens. I created a Microsoft Excel spreadsheet that can be solved with Excel Solver or OpenSolver. You can download it from here. Feel free to adapt it to your own needs and play around with it (this is the fun part!). Because my model was limited in size (I can’t use OpenSolver on my Mac at home), the solution isn’t very good (too many overage minutes). However, by adding more players and more lineups, the quality will certainly improve (use OpenSolver to break free from limits on model size). Here are some notes to help you understand the spreadsheet:

- Variables are in the range B18:H25.
- Variables and are in ranges B56:J62 and B65:J71, respectively.
- Constraints (a) are implemented in rows 27, 28, 29.
- Constraints (b) are implemented in rows 33, 34, 35.
- The left-hand side of constraints (c) are in the range B74:J80. This range is required to be equal to the range B47:J53 (where the are) inside the Solver window.
- The objective function whose formula appears above is in cell J21.

**What are the pros and cons of this model?** Can you make it better? No model is perfect. There are always real-life details that get omitted. The art of modeling is creating a model that is detailed enough to provide useful answers, but not too detailed to the point of requiring an unreasonable amount of time to solve. The definitions of “detailed enough” and “unreasonable amount of time” are mostly client-specific. (What would please Erik Spoelstra and his coaching staff?) What do you think are the main strengths and weaknesses in the model I describe above? What would you change? Good data is a big issue in this particular case. If you don’t like my data, can you propose alternative sources that are practical? I believe there’s plenty to talk about in this context, and I’m looking forward to receiving your feedback. Maybe we can converge to a model that is good enough for me to go knocking on the Miami Heat’s door! (Don’t worry. In the unlikely event they open the door, I’ll share the consulting fees.)

]]>

]]>

A couple of years ago, I wrote a post about scheduling baseball umpires. In that same article I co-authored with Hakan Yildiz and Michael Trick, we talked about a problem called the Traveling Umpire Problem (TUP), which doesn’t include all the details from the real problem faced by MLB but captures the most important features that make the problem difficult. Here’s a short description (detailed description here):

Given a double round-robin tournament with 2N teams, the traveling umpire problem consists of determining which games will be handled by each one of N umpire crews during the tournament. The objective is to minimize the total distance traveled by the umpires, while respecting constraints that include visiting every team at home, and not seeing a team or venue too often.

And when I say difficult, let me tell you something, it’s *really* hard to solve. For example, there are 16-team instances (only 8 umpires) for which no feasible solution is known.

Two of my Brazilian colleagues, Lucas de Oliveira and Cid de Souza, got interested in the TUP and asked me to join them in an effort to try to improve the quality of some of the best-known solutions in the TUP benchmark. There are 25 instances in the benchmark for which we know a feasible solution (upper bound) and a lower bound, but not the optimal value. Today, we’re very happy to report that we managed to improve the quality of many of those feasible solutions. How many, you ask? I’ll let LeBron James himself answer that question:

“Not one, not two, not three, … not ten, … not eighteen, … not twenty-three, but

24 out of 25.”

OK, LeBron got a bit carried away there. And he forgot to say we improved 25 out of the 25 best-known lower bounds too. This means those pesky optimal solutions are now sandwiched between numbers much closer to each other.

Here’s the approach we took. First, we strengthened a known optimization model for the TUP, making it capable of producing better bounds and better solutions in less time. Then, we used this stronger model to implement a relax-and-fix heuristic. It works as follows. Waiting for the optimization model to find the optimal solution would take forever because there are too many binary decision variables (they tell you which venues each umpire visits in each round of the tournament). At first, we require that only the decisions in round 1 of the tournament be binary (i.e. which games the umpires will be assigned to in round 1) and solve the problem. This solves pretty fast, but allows for umpires to be figuratively cut into pieces and spread over multiple venues in later rounds. Not a problem. That’s the beauty of math models: we test crazy ideas on a computer and don’t slice people in real life. We fix those round-1 decisions, require that only round-2 variables be binary, and solve again. This process gets repeated until the last round. In the end, we are not guaranteed to find the very best solution, but we typically find a pretty good one.

Some possible variations of the above would be to work with two (or more) rounds of binary variables at a time, start from the middle or from the end of the tournament, etc. If you’re interested in more details, our paper can be downloaded here. Our best solutions and lower bounds appear in Table 10 on page 22.

We had a lot of fun working on the TUP, and we hope these new results can help get more people excited about working on this very challenging problem.

]]>

Snapple Real Fact #804: There are 293 ways to make change for a dollar.

My first reaction was “Mmm…interesting”, but I couldn’t help wondering whether the Snapple folks did their math correctly. So after I got home and unpacked the car, I wrote a little Constraint Programming code in Comet to check this fact. It turns out that the number is indeed 293 if the following two things are allowed: (i) returning a 1-dollar coin in exchange for a dollar bill, and (ii) using half-dollar coins which, in my opinion, are rare these days. Here’s a list of the 292 ways that do not include using a 1-dollar coin which, in my opinion, isn’t really “giving change”.

If you’re wondering how many ways there are when you’re not allowed to use 1-dollar or half-dollar coins, the answer is 242. Here’s a list of all such possible ways.

**Update:** A friend asked me what the number would be if we considered the quarters from each of the 50 states as a different coin. In that case the number of possible ways increases to 515,184 (including the 1-dollar coin).

]]>

If you go to the gym regularly (or have been a regular gym goer at some point in your life), you might have noticed an issue that arises with some frequency: the locker you’re trying to access is very close to one in front of which someone else is standing (because their locker is right next to yours). I’ll refer to this phenomenon as *interference*. Interference is annoying because it creates that awkward situation in which you stand there trying to be polite and wait for the other person to finish, while at the same time getting upset because you’re wasting your precious time: “Man, I was hoping to be finished with my workout in 45 minutes. I gotta go back to the office and work on that integer programming model. Why does this guy take so long to tie his shoes?”

Interference occurs in other places, of course; hence the title of this post. When boarding planes, airlines try to be as efficient as possible, that is, they try to get everyone in their seats and ready to go in the shortest possible time. What is interference during the boarding of a plane? It’s when passengers that are standing in the aisle (e.g., because they’re still trying to put their carry-on in the overhead bin) block the passage of other passengers whose seats are further down the aisle. You might think that the obvious solution is to board everyone starting from the back of the plane towards the front, right? Well, maybe. Back-to-front boarding is intuitively good, but there are other issues at play: some passengers have priority, not everyone is there when boarding starts, etc. Another strategy that seems to work well is a hybrid of back-to-front with window-to-aisle. As you might have guessed, people have used optimization and simulation to try and come up with good boarding strategies. One of these studies was published in the journal Interfaces in 2005: “America West Airlines Develops Efficient Boarding Strategies”. This is an interesting read, and I recommend it.

Where else does interference occur? This XKCD blog post talks about the International Choice of Urinal Protocol:

…the basic premise is that the first guy picks an end urinal, and every subsequent guy chooses the urinal which puts him furthest from anyone else peeing. At least one buffer urinal is required between any two guys or Awkwardness ensues.

Randall then proceeds to analyze this protocol and concludes that it suffers from a problem of underutilization of the available urinals, depending on how many of them there are. However, if guys are smart when picking urinals, they can achieve the optimal utilization (50%).

Now back to the locker room interference problem (which is the one that bothers me most lately). Let’s try to figure out the source of the problem and propose a solution to it. When you arrive at the University of Miami gym (known as the Wellness Center), you hand in your ID to an attendant who, in return, hands you a key that’s taken from a set of drawers that look like this (men’s lockers on the left, women’s on the right):

A key comes out of the drawer and your ID goes in. Interference is created because the attendants do not (and cannot) remember which keys they have handed out recently and what the layout of the locker room looks like. (By the way, locker numbers are not in perfect sequence in the Wellness Center; numbers jump around and you frequently see people who are lost looking for their lockers.) Ideally, what we’d like to happen is for keys to be handed out in such a way that they send the next person to a locker that is far away from the last few lockers that were given away. There are other complicating issues, of course, such as the fact that you cannot control the people who are returning from their workouts, but at least you can reduce interference among new arrivals.

We don’t need to write a mathematical model for this (or do we?). Why not pre-calculate an optimal sequence of locker hand-outs (based on the locker room layout), sort the drawers in that sequence (left to right, top to bottom), and have the attendants hand out keys in this order, cycling back to the top after they reach the last drawer? It won’t be perfect, but it sure will be better than the current system.

]]>

Here’s an excerpt:

4,329 films were submitted to the 2012 Cannes Film Festival. This blog had

56,000views in 2012. If each view were a film, this blog would power 13 Film Festivals

Click here to see the complete report.

]]>