Monday, November 4, 2013

Mid-Atlantic Regional ACM Programming Contest 2013, problem F

In the Mid-Atlantic Regional ACM Programming Contest this year, one of the more interesting problems was the following:

Suppose there are \( N \leq 10^5 \) stars such that star \( i \) has coordinates \( x_i, y_i, z_i \), where \( |x_i|, |y_i|, |z_i| \leq 10^9 \). Given \( K \leq 10^9 \), count the number of pairs of stars which are within distance \( K \) from each other. The cluster of stars is sparse -- that is, there are at most \( M = 10^5 \) pairs of stars within \( K \) of each other.

One simple solution is to think about partitioning the space up into cubes of side length \( K \). Then, for each star, it suffices to check the number of stars in the same cube and adjacent cubes and see whether they're within \( K \). Surprisingly enough, this actually runs in \( O(M) \) operations.

First we consider the case where space is one-dimensional, and our cubes are just intervals of length \( K \). Let the intervals contain \( a_1, a_2, \ldots, a_n \) elements. Note that two stars within the same interval are guaranteed to be within \( K \) of each other. Thus, \[\sum_{i=1}^n a_i^2 \leq M\] What is the number of operations that this will take? Well, for each star in interval \(i\), we examine all stars in intervals \(i - 1\), \(i\), and \(i + 1\). This is just \[\sum_{i=1}^n a_i (a_{i-1} + a_i + a_{i+1})\] Note that the \(a_i^2\) terms are \(\leq M\); furthermore, the \(a_ia_{i-1}\) terms and the \(a_ia_{i+1}\) terms are symmetric. Thus, it suffices to show that \[\sum_{i=1}^n a_i a_{i-1} \leq M\] Here we apply something known as the rearrangement inequality*, which roughly says that if you have two sequences of numbers \(x_i\) and \(y_i\), and you want to rearrange the \(x_i\) and \(y_i\) to maximise \(\sum_i x_i y_i\), then it's best to rearrange them in ascending order. In this case, our sequences are \(a_i\) and \(a_{i-1}\), and the sequences in ascending order is just \(\sum_{i=1}^n a_i^2\), which we already showed was \(\leq M\). This means that the whole algorithm takes \[\sum_{i=1}^n a_i (a_{i-1} + a_i + a_{i+1}) \leq 3 M\]

The same argument can be extended to cubes -- however, in order to guarantee that \( \sum_{i=1}^n a_i^2 \leq M \), we need to make our cubes be of size \(K / \sqrt3\) so that all points within them are guaranteed to be at most \(K\) from each other. This implies that if we instead use cubes of size \(K\), then \(\sum_{i=1}^n a_i^2 \leq 3^3 M\). We can then apply the rearrangement argument as before, except with \(27\) adjacent cubes instead of \(3\). This gives us an algorithm which runs in \(729 M\).

In general, for dimension \(D\) we'll need \((3D)^DM\) operations for the same algorithm, so in high dimensions we'd likely need to exploit other tricks, like the fact that the volume of an \(n\)-ball disappears as \(n\) increases. We could then (maybe?) approximate the sphere with small cubes of size \(\epsilon K\) instead of simply summing over \(3^D\) adjacent, larger cubes of size \(K\).

* -- You could also prove this using Cauchy-Schwarz or some other famous inequality. Rearrangement just happened to be the first one I thought of.

Friday, October 25, 2013

TopCoder SRM 595 Editorial

Overview

Fewer people showed up for this SRM than usual, possibly because it was between 12AM and 5AM in most of Europe. I haven't yet looked at the Div 2 problems, but the Div 1 medium and hard problems were easier than usual.

LittleElephantAndIntervalsDiv1

Whenever a ball is painted, it is completely repainted; therefore, its final colour depends only on the final colour chosen to paint it. Define \(s_i\) as the last stage which paints ball \(i\), or \(0\) if no stage ever paints ball \(i\). Then our solution is just \(2^N\), where \(N = |\{s_1, s_2, \ldots, s_M\} \setminus \{0\}|\).

LittleElephantAndRGB

This is one of those problems where there is an obvious but inefficient way to compute the solution (simply enumerating all tuples and checking in \(O(N^5)\) time, where \(N\) is the length of the string); thus the challenge is simply to optimise the solution so that it runs in time. In this case \(N \leq 2500\), so an \(O(N^2)\) solution should work.

Since we are looking for count pairs of two disjoint intervals and all intervals can be enumerated in \(O(N^2)\), then a natural solution to consider is to break the problem into two parts. The first part counts the number of first intervals in \(O(N^2)\), and the second part counts the number of second intervals in \(O(N^2)\) as well. As it turns out, this decomposition is sufficient to solve the problem.

For each nice \((a, b, c, d)\) tuple, either

  • \(S[a\ldots b]\) is nice
  • \(S[c\ldots d]\) is nice
  • or finally, there is some sequence of \(m\) Gs which spans both intervals (where \(m\) is minGreen)
Now suppose we want to iterate over all choices of \((c, d)\). (This is the second part of the decomposition.) If \(S[c\ldots d]\) is nice, then we're done. Otherwise, there must be some sequence of Gs which starts at the end of \(S[a\ldots b]\) and ends at the beginning of \(S[c\ldots d]\). In particular, if \(S[c\ldots d]\) has a prefix of \(g\) Gs, then \(S[a\ldots b]\) must have a suffix of at least \(m - g\) Gs. (If \(S[c\ldots d]\) is nice, we adopt the convention that \(g = m\).)

This suggests that the first part of the decomposition is computing \(count(b, g)\), the number of subsequences which end before \(b\) and has a suffix of at least \(g\) Gs (or is nice). Then our answer is just \[\sum_{c, d} count(b, m - prefixLength(c, d))\] where \(prefixLength(c, d)\) is the number of Gs at the beginning of \(S[c\ldots d]\), or \(need\) if \(S[c\ldots d]\) is nice.

In this case, it suffices to compute the number of subsequences which end at exactly \(b\) and have a suffix of exactly \(g\) Gs. Call this \(count'(b, g)\); then we can generate \(count\) from \(count'\) by just doing partial sums. To compute \(count'\), we can simply enumerate over all possible \((a, b)\) in \(O(N^2)\) time.

Note that there are some additional bookkeeping details (i.e. how to keep track of prefix and suffix lengths), but I'll leave those up to the reader.

Constellation

As with many problems involving expected value, one should immediately jump to thinking about linearity of expectation. This means that we should try to decompose the expected value of the area as a sum of some other variables; one possible way is as follows: \[A = \sum_{r\in R} I_r A_r\] where \(R\) is the set of all subregions of the polygon, \(A_r\) is the area of subregion, \(I_r\) is a binary variable indicating whether \(r\) is present or not. Then the expected area is just \[\mathbb{E}(A) = \mathbb{E}\left[\sum_{r\in R} I_r A_r\right] = \sum_{r\in R} P_r A_r\] where \(P_r\) is the expected value of \(I_r\), which is the probability of \(r\) being present. Note that I haven't quite defined what a subregion is -- that turns out to be the hard part of this problem.

It's tempting to try to decompose the area into triangles whose vertices are stars in the constellation, and then sum up the expected area of the triangles. Unfortunately, this doesn't quite work. Consider the second example -- there are 4 separate triangles, but each triangle intersects 2 other triangles. As a result, we overcount the area when all the triangles are present. Thus, the expected value of the area is too large.

One way to see the solution is to have a good understanding of the polygon area algorithm, AKA shoelace formula. This formula computes the area of a polygon in a clever way: first, we take each pair of adjacent points on the convex hull \((i, j)\) and extend them to a triangle by including the origin, \(o\). We then sum up the signed areas of the \((i, j, o)\)s -- which is positive or negative depending on whether \((i,j,o)\) are in clockwise or counterclockwise order. This can be done elegantly with cross products -- the area of triangle \((i, j, o)\) is just \((x_i y_j - x_j y_i)/2\). The expected signed area of the triangle is then its area multiplied by the probability that edge \((i, j)\) will be on the convex hull of the polygon.

\((i, j)\) is on the convex hull if and only if there is at least one visible star on the right side of \((i, j)\), and no visible stars on the left side. Again, we can use cross products to determine whether a point is on the left or right of \((i, j)\) -- if \((j - i) \times (k - i)\) is positive, then \(k\) is on the right; if it's negative, it's on the left side. Thus, the probability that \((i, j)\) is on the convex hull is just \[p_{i,j} = \left[\prod_{k \in L} (1 - p_k)\right] \left[1 - \prod_{k \in R} (1 - p_k)\right]\] Lastly, there is one additional case -- if there is some point \(k\) such that \(j\) is between \(i\) and \(k\), then \((i, k, o)\) contains \((i, j, o)\). In this case we'll overcount triangles again. The way to solve this is to consider \(k\) to be on the 'left' side; thus, whenever \(k\) appears, we won't add \((i, j, o)\) to our total. This gives us the following definitions of \(L\) and \(R\): \[L = \{k \mid (j - i) \times (k - i) < 0 \text{ or } k \text{ outside and colinear to } i, j\}\] \[R = \{k \mid (j - i) \times (k - i) > 0\}\] Then we can use \(p_{i,j}\) as before, along with the signed area, to compute the expected area of the polygon.

Pseudocode for the solution is then as follows:

  1. For all points \(i\) and \(j\):
    1. Let \(L = \{k \mid (j - i) \times (k - i) < 0 \text{ or } k \text{ outside and colinear to } i, j\}\)
    2. Let \(R = \{k \mid (j - i) \times (k - i) > 0\}\)
    3. \(ans \leftarrow ans + p_i \cdot p_j \cdot \left[\prod_{k \in L} (1 - p_k)\right] \cdot \left[1 - \prod_{k \in R} (1 - p_k)\right] \cdot (i \times j) / 2\)
  2. Return \(ans\)

Sunday, October 13, 2013

TopCoder SRM 591 Editorial: PyramidSequences

PyramidSequences

First, fix some \( n \) and \( m \). We begin by looking at an easier problem: If instead of looking at sequences of the form \( 1, 2, \ldots, n - 1, n, n - 1, \ldots, 3, 2 \), what if all the sequence elements were distinct (e.g. if the sequence was of the form \( 1, 2, \ldots, 2 n - 2 \))? The Chinese remainder theorem guarantees that a number \( x \) from the \( n \)-sequence will be paired with a number \( y \) from the \( m \)-sequence if and only if \( x \equiv y \pmod{2g} \), where \( g = \gcd( n - 1, m - 1 ) \). We can partition the \( n \)- and \( m \)-sequences into equivalence classes modulo \( 2g \); each class contains \( (n - 1) / g \) or \( (m - 1) / g \) elements, so there are \( (n - 1) (m - 1) / g^2 \) pairings per class, or \( (n - 1) (m - 1) / g^2 \), or \( 2 (n - 1) (m - 1) / g \) unique pairings.

Unfortunately, the pairings are not unique. Each \( k \) in the \( n \)-sequence has two possible indices modulo \( 2 g \): \( \{ k \% (2 g) \) and \( (2 n - k) \% (2 g) \} \). Let us first consider the two indices are equal, i.e. \( k \equiv (2 n - k) \pmod{2g} \). Simplifying, this gives us \( k \equiv n \pmod g \). Since \( g \) is a multiple of \( n - 1 \), then \( n \equiv 1 \pmod g \), so our final expression is \( k \equiv 1 \pmod g \); i.e. \( k \equiv 1 \text{ or } g + 1 \pmod{2g} \). Note that this condition is independent of \( n \) or \( m \), so it suffices to simply find the number of \( k \) equivalent to \( 1 \) or \( g + 1 \) modulo \( 2g \) in each sequence and multiply the counts accordingly.

Now consider the case in which \( k \) has two distinct indices, \( k \% (2 g) \) and \( (2 n - k) \% (2 g) \). Again, since \( n \equiv 1 \pmod g \), the second index can be written \( (2 - k) \% (2 g) \). If \( k \% (2 g) \in [2, g] \), then \( (2 - k) \% (2 g) \in [g + 2, 2g] \) and vice versa. Thus, each \( k \) appears exactly once in the range \( [2, g] \) and exactly once in \( [g + 2, 2g] \). It suffices, then, to compute the result for one range. This is simply \( (g - 1) (n - 1) (m - 1) / g^2 \).

Putting the two cases (\( k \equiv 1 \pmod g \) or not) together, we have the following Python code. Some care must be taken to account for the special cases \( 1 \), \( n \), and \( m \).

from fractions import *

class PyramidSequences(object):
 def distinctPairs(self, n, m):
  g = gcd(n - 1, m - 1)
  an, am = [(x - 1) / g for x in n, m]
  bn, bm = [x / 2 + 1 for x in [an, am]]
  cn, cm = [x / 2 + (x % 2) for x in [an, am]]
  return (g - 1) * an * am + bn * bm + cn * cm