Let’s say you work for a fast food chain, and you’re working out where to open a new location. For the sake of simplicity, let’s say that suitable units abound, and you have almost complete freedom of locale. Here’s the problem: you’re not the only chain around, and your new location will have to compete with surrounding businesses. I can imagine two possible plans of action in this case:

1. Open your new location as far as possible from any competition and hope that business will be driven by convenience. I’ll call this the even spread strategy.
2. Open your new location right next to the competition, under the assumption that they chose that place for a reason, and that you’ll be able to steal some of their customers. I’ll refer to this as clustering.

If either of these strategies dominates, we should be able to detect that in real-world data.

To get some real-world data, I headed over to OpenStreetMap, and made use of one of their bulk download tools. It turns out that all of greater London plus some surrounding countryside is only 1.4 GB, which is surprisingly manageable.

1.4 GB of London, with cycle paths highlighted, for some reason. Swipe across to reveal fast food chains (blue) and coffee shops (orange). There’s some nice clustering along the arterial routes, possibly lending some credence to the clustering strategy.

Since we’re only interested in fast food, I wrote a quick script to scan through the .xml files and extract anything matching fast_food or coffee_shop. After a little deduplication (apparently Costa, Domino’s, and Itsu are inconsistent with branding – or OSM users are), we’re left with a list of London’s most common eateries:

Coffee shops

The leaders here will probably not surprise anyone who’s been to London.

To save us from having to deal with a 14×14 matrix, I’m going to restrict the list to chains with 30 or more franchises, leaving nine (marked with a ‘★’ above). Now, to see whether their locations are clustered or evenly spread, I’ll consider each pairing of chains, $$(A, B)$$. Then for each individual location of chain $$A$$, I’ll find the closest location of chain $$B$$. The result is then a 9×9 array in which each entry is a list of distances from each $$A$$ to the closest $$B$$.

Here’s an example:

When it comes to KFCs and Caffè Nero (Caffès Nero?), we see an range of distances between 100 metres and 5 kilometres. The vertical dotted line is the mean distance; in this case, about 700 m – all in all, it’s a pretty good log-normal distribution. Whether there is clustering or not will be evident in the distributions at low distance – clustered locations will have an apparent surplus here, while chains using the even spread strategy should have very few. The distribution seen here is pretty close to what we’d expect for a random distribution, which is unsurprising given the (presumably?) low level of competition between KFC and Caffè Nero.

One interesting property of these histograms is that you might expect them to be transitive – i.e. the histogram of distances from Caffè Neros to the nearest KFC being identical to the one above. This isn’t the case, however, a fact that can be shown with a simple example:

Here, every B is near an A, but most As are not near a B.

With that out of the way, let’s move on to the giant matrix of histograms:

From this distogram matrix, I can find the most and least ‘attractive’ pairings, based off their mean distance (note that these means are done in log space, which gives slightly different results than just the mean of all distances):

Most attractive:
1. Starbucks → Pret (127 m)
2. Pret → Caffe Nero (166 m)
3. Costa → Pret (176 m)
Least attractive:
1. Domino’s → Domino’s (2.53 km)
2. Pret → Domino’s (2.52 km)
3. Burger King → Domino’s (2.51 km)

In summary, coffee shops are densely clustered, and nobody wants to be near a Domino’s (especially Domino’s). That latter example is exactly the kind of thing I was expecting to find – given Domino’s’ reliance on delivery, it makes little sense to have two franchises within one delivery-distance of each other. I’m now wondering if the Pret → Domino’s distance is indicative of an urban/suburban divide.

On the other hand, coffee shops seem to show the exact opposite trend (as well as being a nice real-world demo of the intransitivity I mentioned above). It seems that the public demand for coffee is utterly insatiable, or that people are particularly loyal to specific chains.

For each chain, using the distograms, I can work out which other chain has the lowest and highest mean distance. From the data above, we already know that the closest chain to both Starbucks and Costa is Pret. This exercise isn’t a great way of showing trends, since there are unequal numbers of each chain; Pret is a common nearby cafe because there are an awful lot of them.

We now have enough information to answer the original question: competing chains tend to be attractive, rather than repulsive.