Solving a facility location problem in near-linear time

Jun 12, 2023Β·
Kam Chuen (Alex) Tung
Kam Chuen (Alex) Tung
Β· 9 min read

0. Problem Description

Given a list of household locations 0≀A[1]≀⋯≀A[n]0 \le A[1] \le \cdots \le A[n] and candidate facility locations 0≀B[1]≀⋯≀B[m]0 \le B[1] \le \cdots \le B[m]. Your task is to place kk hospitals, i.e. choose a subset SβŠ†{1,2,…,m}S \subseteq \{1, 2, \dots, m\}, such that βˆ‘1≀i≀nmin⁑j∈S∣A[i]βˆ’B[j]∣\sum_{1 \le i \le n} \min_{j \in S} |A[i] - B[j]| is minimized. In other words, if we add up the minimum distance to a household to any of the chosen hospitals, the sum should be the minimum possible.

I. Basic solution

One key observation is that people going to the same hospital must live in neighbouring locations. The proof is as follows:

If someone on your left and someone on your right are both choosing the same hospital xx, you should go to the same one as well. If you go to any hospital yy that is left of xx, it would be better for that person on your left to go to hospital yy as well. Similarly for going to a hospital that is right of xx.

Therefore, we can formulate this problem as partitioning the people into (at most) kk contiguous sections, then assigning a hospital to each section, so that the total distance is minimized.

With this in mind, we define the dp states as follows:

Definition 1 (DP states):
Let cost(l,r)cost(l, r) be the minimum total distance for people ll to rr, if we assign them to the same section (same hospital).
Let dp[i][j]dp[i][j] be the minimum total distance for people 11 to jj, if we assign them to ii sections.

The decision for dp[i][j]dp[i][j] would be where to start the ii-th section (we know that it ends at jj). Thus we arrive at the recurrence

dp[i][j]=min⁑l<j(dp[iβˆ’1][l]+cost(l+1,j)).dp[i][j] = \min_{l < j} (dp[i - 1][l] + cost(l+1, j)).

If we implement this recurrence straightforwardly, the runtime would depend on the time needed to compute cost(β‹…,β‹…)cost(\cdot, \cdot) function. The baseline is O(nm)O(nm) per query: try all mm hospital locations and compute total cost in O(n)O(n) time. This leads to a solution with overall runtime O(kn3m)O(kn^3m).

Below we describe a few optimizations:

  • Precomputation: We can precompute cost(l,r)cost(l, r) for all ll and rr, to avoid repeated computations in the dp recurrence.
  • Prefix sum/partial sum: We can compute the total distance to a specific hospital (say at xx) for people ll to rr in O(1)O(1) time, using prefix sum. The idea is that there is a break-off point pβˆ—βˆˆ[lβˆ’1,r]p^* \in [l-1, r] so that people ll to p\*p^\* are to the left of xx and people pβˆ—+1p^* + 1 to rr are to the right of xx. Then, the sum becomes βˆ‘i=lpβˆ—(xβˆ’a[i])+βˆ‘i=pβˆ—+1r(a[i]βˆ’x)\sum_{i=l}^{p^*} (x - a[i]) + \sum_{i=p^*+1}^r (a[i] - x), which can be computed in O(1)O(1) with a prefix sum on the a[i]a[i] array and if pβˆ—p^* can be found in O(1)O(1) (how?).
  • Median: for a contiguous section of people l…rl\dots r, if we can build a hospital freely, then choosing a median of the locations a[l],…,a[r]a[l], \dots, a[r] would be optimal (proof left as exercise). By extension, we should choose either the location that is immediately to the left of the median, or immediately to the right of the median. We don’t need to consider all mm candidate locations.

Using either of the observations, the overall runtime can be reduced to quartic, e.g. O(kn2m)O(kn^2m) using (1). By combining at least two of the three observations, you can get the overall runtime down to cubic or nearly cubic, e.g. O(kn2+n2m)O(kn^2 + n^2 m) by combining (1) and (2).

II. Divide and conquer…?

It turns out that using an advanced optimization technique, called the β€œDivide and Conquer optimization” in the competitive programming community, we can obtain a solution with runtime O(knlog⁑n+n(n+m))O(kn \log n + n(n+m)), i.e. nearly quadratic!

First, precompute cost(l,r)cost(l, r) for all l,rl, r in O(n(n+m))O(n(n+m)) time. This can be done by combining all the three optimizations.

Now, on to the DP part. We will fill the DP table row by row. The goal is to compute dp[i][0..n]dp[i][0..n] given dp[iβˆ’1][0..n]dp[i-1][0..n] in O(nlog⁑n)O(n \log n) time. Since we need to do this kk times, the overall runtime (taking into account the precomputation) will be O(knlog⁑n+n(n+m))O(kn \log n + n(n+m)).

The idea here is to use an additional solution structure: monotonicity of the optimal subproblem. Let me first present formally:

Proposition 2 (monotonicity of optimal subproblem)
Let opt[i][j]opt[i][j] be the smallest index ll, such that dp[iβˆ’1][l]+cost(l+1,j)dp[i-1][l] + cost(l+1, j) is minimized. Then, opt[i][1]≀opt[i][2]≀⋯≀opt[i][n]opt[i][1] \le opt[i][2] \le \cdots \le opt[i][n].

In other words, if it is optimal to group people l...jl...j in an ii-group arrangement, and if it is optimal to group people lβ€²...j+1l'...j+1 in an ii-group arrangement, then l≀lβ€²l \le l'. The key element of the proof is that the cost function satisfies the quadrangle inequality: for a≀b≀c≀da \le b \le c \le d, cost[b][c]+cost[a][d]β‰₯cost[b][d]+cost[a][c]cost[b][c] + cost[a][d] \ge cost[b][d] + cost[a][c]. The proof is not very easy. If you want to check your proof or know how it’s done, please reach out to me.

With this, we can prove the monotonicity of optimal subproblem. Suppose on the contrary that opt[i][j]>opt[i][j+1]opt[i][j] > opt[i][j+1] for some jj. By definition of optopt we have:

  1. dp[iβˆ’1][opt[i][j]]+cost[opt[i][j]+1][j]dp[i-1][opt[i][j]] + cost[opt[i][j]+1][j] <dp[iβˆ’1][opt[i][j+1]]+cost[opt[i][j+1]+1][j]< dp[i-1][opt[i][j+1]] + cost[opt[i][j+1]+1][j]

  2. dp[iβˆ’1][opt[i][j+1]]+cost[opt[i][j+1]+1][j+1]dp[i-1][opt[i][j+1]] + cost[opt[i][j+1]+1][j+1] ≀dp[iβˆ’1][opt[i][j]]+cost[opt[i][j]+1][j+1]\le dp[i-1][opt[i][j]] + cost[opt[i][j]+1][j+1]

Adding up the two, and cancelling common terms on both sides of the inequality, we get cost[opt[i][j]+1][j]+cost[opt[i][j+1]+1][j+1]cost[opt[i][j]+1][j] + cost[opt[i][j+1]+1][j+1] <cost[opt[i][j+1]+1][j]+cost[opt[i][j]+1][j+1]< cost[opt[i][j+1]+1][j] + cost[opt[i][j]+1][j+1]. Taking a=opt[i][j+1]+1,b=opt[i][j]+1,c=j,d=j+1a=opt[i][j+1]+1, b=opt[i][j]+1, c=j, d=j+1, this becomes cost[b][c]+cost[a][d]<cost[b][d]+cost[a][c]cost[b][c]+cost[a][d] < cost[b][d] + cost[a][c], which is a violation of the quadrangle inequality.

Let me give you an idea of why this property would help. Short answer: to reduce search space.

Without this monotonicity property, one would need to go through subproblems dp[iβˆ’1][0..j]dp[i-1][0..j] to calculate dp[i][j+1]dp[i][j+1]. But suppose you know that opt[i][j]=topt[i][j]=t. Then we only need to go through dp[iβˆ’1][t..j]dp[i-1][t..j] to calculate dp[i][j+1]dp[i][j+1].

There is one small issue: if tt turns out to be really small, then it appears to be ineffective in narrowing the search range. However, notice that we can use it to narrow the search range for dp[i][jβˆ’1]dp[i][j-1]: we only need to go through dp[iβˆ’1][0..t]dp[i-1][0..t] to calculate dp[i][jβˆ’1]dp[i][j-1]!

To calculate dp[i][l..r]dp[i][l..r], let mid:=(l+r)/2mid := (l+r)/2, and let t:=opt[i][mid]t := opt[i][mid]. Then, to calculate dp[i][l..(midβˆ’1)]dp[i][l..(mid-1)] we only need to look at subproblems dp[iβˆ’1][0..t]dp[i-1][0..t], and to calculate dp[i][(mid+1)..r]dp[i][(mid+1)..r] we only need to look at subproblems dp[iβˆ’1][t..r]dp[i-1][t..r]. We do this recursively to further narrow the search range β€” divide and conquer :)

For the implementation, you can refer to Slides 67-72 of this presentation (external link) which I made a few years back. The slides use C[i][j]C[i][j] instead of opt[i][j]opt[i][j].

III. Faster than quadratic

This is not the end of the story. In fact we can solve the problem in time O(nlog⁑nlog⁑RANGE)O(n \log n \log RANGE)!! If you turn a blind eye to those logs, there is only one lonely nn term left… (Why is this surprising? Well, even the number of states of the DP is quadratic, so how can the runtime be less than that?)

This magic trick is called the β€œAliens trick” in competitive programming community. It is useful for eliminating one of your DP state dimensions. In the context of this problem, the idea is as follows:

  • Instead of keeping track of number of sections (= number of hospitals) so far, assign a penalty Ξ»\lambda to each section (i.e. it costs Ξ»\lambda to build a hospital), and hope that in the end the number of sections will be equal to kk.

So, let dpΞ»[i]dp_{\lambda}[i] be the minimum total cost for people 11 to ii, if we assign them into any number of sections, but each section increases the cost by Ξ»\lambda. Then,

Claim 1: We can compute dpλdp_{\lambda} in O(nlog⁑nlog⁑m)O(n \log n \log m) time

Claim 2: We can find a suitable λ\lambda, by computing dpλdp_{\lambda} for O(log⁑RANGE)O(\log RANGE) different values of λ\lambda. RANGERANGE is the range of the input numbers

For claim 1, let me refer you to this tutorial (external link). Again, the algorithm works because the cost function satisfies the quadrangle inequality. The quadrangle inequality means that, once subproblem ll is dominated by some later subproblem (i.e. opt[i][j]>lopt[i][j] > l for some jj), it will not be optimal anymore (i.e. opt[i][jβ€²]>lopt[i][j'] > l for all jβ€²β‰₯jj' \ge j). The method in the tutorial essentially looks to maintain the set of β€œactive” subproblems.

We need to compute the cost function O(nlog⁑n)O(n \log n) times, each computation taking O(1)O(1) time (combining optimizations 2 and 3; note that it is too slow to precompute the cost functions!). The runtime is therefore O(nlog⁑n)O(n \log n).

For claim 2, refer to this diagram below:

Varying Ξ»\lambda is akin to varying the slope of the red line, and the point that is β€œclosest” to the line corresponds to the optimal number of hospitals ii. Let’s consider extremal examples: when Ξ»\lambda is very big, the cost of building a hospital is high, so it should be optimal to have i=1i=1. When Ξ»\lambda is zero, there is no cost to build a hospital, so it should be optimal to have i=mi=m (build hospitals at all candidate locations).

If the function i↦dp[i][n]i \mapsto dp[i][n] is convex (i.e. the graph above is convex), then by varying Ξ»\lambda we can make all different values of ii optimal (or one of the optimal), and we can binary search on Ξ»\lambda to find the Ξ»\lambda so that building i=ki=k hospitals is optimal. Binary search would take O(log⁑RANGE)O(\log RANGE) iterations (note that we only need to check integer values of Ξ»\lambda), and in each iteration we would compute dpΞ»dp_\lambda for some Ξ»\lambda.

To check convexity of i↦dp[i][n]i \mapsto dp[i][n], check that dp[iβˆ’1][n]+dp[i+1][n]β‰₯2dp[i][n]dp[i-1][n] + dp[i+1][n] \ge 2 dp[i][n]. The idea is that, given an optimal partition Ciβˆ’1C_{i-1} into iβˆ’1i-1 groups and an optimal partition Ci+1C_{i+1} into i+1i+1 groups, we can construct two partitions CiC_i and Ciβ€²C_i', both into ii groups, so that their sum of costs is at most the sum of costs of Ciβˆ’1C_{i-1} and Ci+1C_{i+1} (this step uses the quadrangle inequality).

Therefore, we have an O(nlog⁑nlog⁑RANGE)O(n \log n \log RANGE) solution to the facility location problem.

IV. Afterword

Dynamic programming is a powerful algorithm paradigm. This course has shown you the basic examples and ideas that help demystify DP and turn it from magic to craft, something that you can reliably use to tackle problems. Hopefully, this writeup has given you a glimpse of the beautiful ideas and techniques that DP incorporate, to show that DP is not just a craft, but also an art.