Monte Carlo Simulations in Tennis

I’m a huge sports fan, but I’ve never followed professional tennis. All I know about the sport is that Federer is really, really good. In fact, here are the win percentages of some of the best players (basically, all the players I know) over the past 15 years:

Year 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013
Serena Williams 85% 82% 84% 92% 93% 81% 75% 75% 78% 85% 81% 86% 88% 94% 95%
Venus Williams 82% 91% 90% 87% 84% 79% 79% 68% 83% 78% 70% 84% 73% 73% 75%
Roger Federer 55% 70% 73% 82% 93% 95% 95% 88% 81% 84% 83% 84% 86% 78% 81%
Rafael Nadal 50% 56% 64% 89% 83% 82% 88% 83% 88% 82% 88% 95%
Novak Djokovic 40% 50% 69% 78% 79% 80% 77% 92% 86% 88%

Clearly, it’s not unusual to see tennis players having years with win percentages in the 90s. This insane success rate isn’t common in every sport:

The New England Patriots did go a perfect 16-0 in the 2007 regular season, but that’s a much smaller sample than most other sports. Regardless, seeing win percentages in the 90s seems to be most common in tennis by far.

It seems as though in tennis, the better player usually wins. This is obviously true for every sport, but upsets aren’t as frequent in tennis. The top players in tennis win 60+ consecutive matches, and when an upset happens, people go crazy. The longest win streak is just 33 games in the NBA and 26 games in the MLB. So why do the same top tennis players always win?

I approached this by simplifying the sport; the goal in tennis is to win points. If you have a 50% chance of winning each point, you have a 50% chance of winning the match. But what’s your chance of winning the match if you have a 75% chance of winning each point? Calculating this mathematically is difficult, since there are all kinds of rules to factor in (deuce, tiebreakers, advanced sets vs regular sets, etc.), so I instead just used Monte Carlo methods and simulated thousands of tennis matches.

First, I wanted to see how your probability of winning a point impacts your probability of winning a game, tiebreaker, or set. When displaying statistics, I like to give confidence intervals rather than just point estimates. So here are the resulting confidence intervals (at a 95% level) after 100,000 simulations of each event:

Winning a Point 10% 20% 30% 40% 50% 60% 70% 80% 90%
Winning a Game 0.12%
Winning a Tiebreaker 0.00%
Winning a Set 0.00%

Some of these numbers are pretty insane; just a 60% chance of winning a point gives you a 96% chance of winning the set. But we’re concerned with the probability of winning a match. So I simulated that as well (using Wimbledon’s Men’s Singles rules).

I decided to calculate the probabilities of winning a match when the probabilities of winning a point range from 35% to 65%. Also, I only used 10,000 simulations this time since matches take much longer to simulate. Here are the results in a graph (I plotted the estimates and not the confidence intervals, but the max margin of error was only 0.98%, and the average margin of error was only 0.31%):

Matches Probability

The lines intersect at (53.75%, 90%) — a 53.75% chance of winning each point gives about a 90% chance of winning the match. Here are some more amazing results from the simulation:

  • Given a 56.5% chance of winning each point, about 99% of simulated matches were won.
  • Given a 60% chance of winning each point, about 99.99% of simulated matches were won.
  • Given a 61% chance of winning each point, all 10,000 simulated matches were won.

So looks like in tennis you only have to be just barely better than your opponent, and you’re almost guaranteed to win. Likewise, if you’re just barely worse than your opponent, you’re almost guaranteed to lose. That’s why the only players I’ll ever hear about are the very top ones, and that’s what lets Federer get such high win percentages and such long win streaks.


9 thoughts on “Monte Carlo Simulations in Tennis

  1. Sebastian K.

    Very interesting analysis, thank you very much for sharing!

    When asked about “Where did Djokovic improve this year compared to last year? What is the characteristic of his game that you couldn’t deal with?”, in 2011, Nadal replied “He didn’t greatly improve in one aspect of his game – he improved by only a bit but in a lot of things”.

    Goes in line with your findings and conclusion.

  2. Igor

    This is a really interesting result. Kudos!

    I am fairly new to monte carlo methods and was wondering about the details of your simulation. Did you simulate a game by randomly drawing each point with some win probability, doing so according to the structure of each set/game? In other words a game is: while neither player has won, draw a point. And a set is while neither player has won 6 games: simulate game? And so on for match?

  3. Chirag Post author


    I created functions to simulate a point, a game, a tiebreaker, a regular set, an advanced set, and a match. They all return either a 1 or a 0 (representing a win or a loss):
    — The point function randomly draws a number and compares it to the inputted win probability.
    — The game function repeatedly calls the point function until the game is over (accounting for deuce).
    — The tiebreaker function repeatedly calls the point function until the tiebreaker is over.
    — The regular set function repeatedly calls the game function (or the tiebreaker function if score is 6-6) until the set is over.
    — The advanced set function repeatedly calls the game function (with no tiebreakers) until the set is over.
    — The match function repeatedly calls the regular set function (or advanced set as the final set) until the match is over.

    So what I did is almost exactly like what you said!

  4. Vikram

    Nice use and explanation of Monte Carlo analysis.

    However, I do see one major flaw in your calculations. For each game in tennis one player repeatedly serves. When the returner wins against the server it considered a break point because generally the server has the advantage. In your analysis you do not seem to take that into account as you call the same point function for every game. I would suggest you look at the blog and this article in particular to learn about the server advantage (it starts at 65%). It seems that Monte Carlo analysis would be even better suited for these chaining probabilities.

  5. Chirag Post author


    Great catch. You’re right, I kept the model very simple for now and did not take the server advantage into consideration. However, I wouldn’t necessarily call this a flaw in the calculations; the results are correct IF win probability is held to a constant percentage. In real life, as you mention, win probability isn’t a constant percentage. So this just means the analysis can be improved to more accurately model real life. That’s actually a great idea for a future post; I can try simulating matches between professional tennis players, using historical data for their win probabilities on serves and on returns. Good observation, and thanks for giving me an idea!

  6. Pingback: Beyond The Blog | Chirag's Blog

  7. Pingback: Learn to Learn | Chirag's Blog

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s