Monday, June 25, 2012

InterviewStreet: Going Office (AKA The Most Vital Arc Problem)

Overview

Going Office is definitely one of the tougher problems on InterviewStreet, but the main idea* behind the problem is simple: given a weighted undirected graph \( G = ( V, E ) \) and two nodes \( s, t \in V \), we wish to answer \( Q \) queries of the following form:

What is the shortest path from \( s \) to \( t \) given that edge \( e \in E \) is removed from the graph?
(This type of problem is also known as the "most vital arc" problem -- see the first footnote below.)

Let \( N = | V | \) and \( M = | E | \) as usual. In this case, we are guaranteed that \( N, M, Q \leq 200000 \).

Prerequisites

You should definitely know Dijkstra's algorithm before attempting to solve this problem. Also, knowing about segment trees with lazy propagation would be good, but is not required to understand the solution.

Building Bridges

First of all, it should be clear that naively running Dijkstra's isn't going to run in time, because it requires \( O( Q N \log N ) \) operations. In fact, the size of the input suggests that we should look for at worst an \( O( N \log N ) \) solution. What can we do then? We may first try to solve an easier problem: can we determine whether an edge \( e \), when removed, will increase the shortest path length between \( s \) and \( t \)?

Definitions:
  1. An edge is optimal if it is on a shortest path from \( s \) to \( t \).
  2. An edge is a bridge* if it is on all shortest paths from \( s \) to \( t \). (Here we refer only to shortest paths in \( G \), without having removed any edges.)

Let \( d_s( u ) \) (resp. \( d_t( u ) \)) be the shortest-path distance from \( s \) (resp. \( t \)) to \( u \), and let \( OPT \) be the shortest-path distance from \( s \) to \( t \). Then we can show the following properties:

Lemmata:
  1. \( e = ( u, v ) \) is optimal if and only \( d_s( u ) + length( e ) + d_t( v ) = OPT \).
  2. \( e = ( u, v ) \) is a bridge if and only if it is optimal and for all other optimal \( e' = ( u', v' ) \), we have either \( d_s( v' ) \leq d_s( u ) \) or \( d_s( v ) \leq d_s( u' ) \). In other words, when we travel along an optimal path from \( s \) to \( t \), then between lengths \( d_s( u ) \) and \( d_s( v ) \), we must be traveling along \( e \). By convention we denote a bridge with the letter \( b \).
  3. As a corollary of the above statement, \( e \) is a bridge if and only if, when removed from \( G \), the shortest path distance from \( s \) to \( t \) increases.
(Proofs are left to the reader.) These properties should give you some idea of how to find all the bridges in \( G \). If you still need some hints, an algorithm for computing bridges is included at the end of this post.

Islands

Now we have a way of identifying which edges, when removed, affect the shortest path length. This still leaves one major question unanswered: given a bridge \( b \), how exactly do we determine the shortest path length from \( s \) to \( t \) once \( b \) is removed? Running Dijkstra's repeatedly is too slow, since we can easily construct examples in which every edge is a bridge. (Consider a graph consisting of only a path from \( s \) to \( t \).)

Let \( K \) denote the number of bridges. Note that Lemma 2 above implies that we can uniquely order the bridges \( b_0, b_1, \ldots, b_{K-1} \) based on their relative distances from \( s \). Could we use this to help us?

Definition: Let the \( i \)-th island be defined as the set of all vertices \( v \), such that there exists a shortest path from \( s \) to \( v \) using no more than \( i \) bridges. (If \( v \) is not connected to either \( s \) or \( t \), then we can ignore it -- it lies on some strange continent somewhere far away.)
The intuition behind islands and bridges is this: bridges are the fastest way to "travel" between the islands when going from \( s \) to \( t \). If we remove an edge that is not a bridge, our shortest path is not affected. However, when we take out a bridge, we are forced to travel along another (possibly much longer) path.
Proposition: Consider the bridge \( b_i \), connecting the \( i \)-th and \( i + 1 \)-st islands. Let \( E_i \) be the set of all edges \( e \) connecting an island with index \( \leq i \) to an island with index \( > i \). Then the shortest path from \( s \) to \( t \) that bypasses \( b_i \) must have length \[ \min_{ e = ( u, v ) \in E_i } \{ d_s( u ) + length( e ) + d_t( v ) \} \]
Proof: Any path \( P \) from \( s \) to \( t \) bypassing \( b_i \) must include some edge in \( E_i \). Now consider some edge \( e = ( u, v ) \in E_i \) -- the shortest path from \( s \) to \( t \) that goes through \( e \) must have length \( d_s( u ) + length( e ) + d_t( v ) \). Lastly, by the way we have constructed the islands, we know that \( b_i \) is not on the shortest paths from \( s \) to \( u \) or from \( v \) to \( t \).

A Data Structure for Bypass Length

Let \( bypass( i ) \) denote the length of the shortest path that bypasses \( b_i \). The above observations suggest that we can run the following algorithm to compute \( bypass( i ) \):

For each non-bridge edge \( e = ( u, v ) \) connecting islands \( i \) and \( j \), where \( i < j \):
For each \( k \in \{ i, i + 1, \ldots, j - 1 \} \):
\( bypass( k ) \leftarrow \min \{ bypass( k ), d_s( u ) + length( e ) + d_t( v ) \} \)
However, there is a slight problem with this algorithm: it's too slow. Through some tricky analysis, you can show that in the worst case, we can construct a graph that will cause the above subroutine to take \( O( M^{3/2} ) \) operations.

What we need is a data structure that supports the following two operations efficiently:

  1. \( update( i, j, val ) \), which runs \( bypass( k ) \leftarrow \min \{ bypass( k ), val \} \) for all \( k \in \{ i, i+1, \ldots, j-1 \} \)
  2. \( query( i ) \), which returns \( bypass( i ) \)

One possible data structure that can support both operations in \( O( \log K ) \) time is called a segment tree with lazy propagation or a segment tree with range update. There are several tutorials online about this type of tree, and it's a good data structure to be familiar with. (The main problem right now is that the structure is rather complicated and the online tutorials are not very good. I may do an article on it some time, if there is sufficient demand.)

However, if you are lazy, there is another data structure called the Kempe tree, which is a bit simpler and can support the same sort of queries, also using \( O( \log K ) \) operations. Check the link for more information.

Putting It All Together

The previous observations suggest the following algorithm for solving the problem:

  1. Compute \( d_s \) and \( d_t \) using Dijkstra's algorithm.
  2. Find the bridges:
    1. First find all the optimal edges by iterating over all edges \( e = ( u, v ) \), and storing the edges such that \( d_s( u ) + length( e ) + d_t( v ) = OPT \).
    2. Sort all the optimal edges by \( d_s( u ) \) and \( d_s( v ) \). Use the criterion from above to find the bridges.
  3. Find the islands.
    1. One way (the hard way) is to run a modified version of Dijkstra.
    2. A smarter and shorter alternative is to just run multiple depth-first searches from \( s \) and from the far end of each bridge. It's up to the reader to figure out how to do so.
  4. Find the bypassing paths.
    1. Initialise your favourite range-update data structure.
    2. Iterate over all the non-bridge edges. If \( e = ( u, v ) \) is an edge crossing from island \( i \) to island \( j > i \), then range-update \( bypass(i), bypass(i+1), \ldots, bypass(j-1) \) such that they are no more than \( d_s( u ) + length( e ) + d_t( v ) \).
  5. Process all the queries. If the edge in question is not a bridge, then we simply return the optimal length. Otherwise, we query the range-update data structure to find the best bypass length.
All of the above steps run in require \( O( R \log R ) \) operations where \( R = max\{ N, M, Q \} \), so the algorithm has complexity \( O( R \log R ) \).

The solution took me around 150 lines to implement with a decent amount of spacing and a relatively clean coding style. I'm somewhat suspicious, though, that I've overcomplicated the solution, and that a nicer solution exists. Message me or comment here if you have any other alternatives.

Footnotes

  1. See this paper for an example of previous work on the subject.
  2. See also this page.

6 comments:

  1. This comment has been removed by the author.

    ReplyDelete
  2. Re finding the islands, I have the Dijkstra algorithm also return an array of previous nodes, from which it is trivial to build the shortest path tree for S. Then I assign the island numbers to known nodes - S and far ends of all bridges - and mark the remaining nodes with -1. Finally I do a breadth-first walk on SPT(S), replacing island numbers on nodes added to the queue iff the current node belongs to the island with a bigger number.

    Does this make sense to you?

    ReplyDelete
    Replies
    1. This comment has been removed by the author.

      Delete
    2. This comment has been removed by the author.

      Delete
  3. I can also confirm that you ain't need no fancy range-update data structures to satisfy the timing requirements. My solution has passed all tests in no more than 4 seconds, and that's Python.

    ReplyDelete
  4. This comment has been removed by the author.

    ReplyDelete