A Divide-and-conquer Algorithm for Identifying Strongly Connected

A Divide-and-conquer Algorithm for

Identifying Strongly Connected

Components

Don Coppersmith

and Lisa Fleischer

a,2

and

Bruce Hendrickson

and Ali Pınar

c,1

T. J. Watson Research Center, IBM

Discrete Algorithms & Math Dept., S andia National Laboratories

Computational Research Division. Lawrence Berkeley National Laboratory

Abstract

The strongly connected components of a directed graph can be found in an optimal

linear time, by algorithms based on depth ﬁrst search. Unfortunately, depth ﬁrst

search is diﬃcult to parallelize. We describe two divide-and-conquer algorithms for

this problem that have signiﬁcantly greater potential for parallelization. We s how

the expected serial runtime of our simpler algorithm to be O(m log n), for a graph

with n vertices and m edges. We then show that the second algorithm has O(m log n)

worst–case complexity.

Key words: Design and analysis of algorithms, graph algorithms, parallel

algorithms, strongly connected components, divide-and-conquer, discrete ordinates

method.

1 Introduction

The decomposition of a directed graph into strongly connected components is

a fundamental tool in graph theory with applications in compiler analysis, data

mining, scientiﬁc computing, and other areas. Tarjan [18] devised an optimal

This work was funded by the Applied Mathematical Sciences program, U.S. Depart-

ment of Energy, Oﬃce of Energy Research and performed at Sandia, a multiprogram

laboratory operated by Sandia Corporation, a Lockheed-Martin Company, for the

U.S. DOE under contract number DE-AC-94AL85000.

A preliminary version of the current work appeared in [8].

Email addresses: [email protected] (Lisa Fleischer), [email protected] (Bruce

Hendrickson), [email protected] (Ali Pınar).

Supported by the Director, Oﬃce of Science, Division of Mathematical, Informa-

tion, and Computational Sciences of U.S. Department of Energy under contract DE-

AC03-76SF00098.

Partially supp orted by NSF through CAREER grant CCR-0049071.

Preprint submitted to Information Processing Letters 18 January 2006

O(m + n) algorithm for identifying s trongly connected components for a graph

with n vertices and m edges, which is based on depth ﬁrst search (DFS) of the

graph. Sharir [17]’s algorithm for identifying strongly connected components is

also built on DFS and is widely used in textbooks as an example of the power of

DFS [6]. Recently, Gabow [9] described an algorithm that avoids vertex labeling,

which is again based on DFS.

For large problems, a parallel algorithm for identifying strongly connected com-

ponents would be useful. One application of particular interest to us is discuss ed

below. Unfortunately, DFS is diﬃcult to parallelize. Reif showed that the lexi-

cographical DFS problem is P -Complete [16]. However, Aggarwal and Ander-

son [1] and Aggarwal et al. [2] describe randomized NC algorithms for ﬁnding

a DFS of undirected and directed graphs, respectively. The expected runtime

of this latter algorithm is O(log

n) and it requires an impractical O(n

2.376

)

processors. To our knowledge, the deterministic parallel complexity of DFS for

general, directed graphs is an open problem. Chaudhuri and Hagerup studied

the problem for acyclic [5] and planar graphs [11], respectively. More practically,

DFS is a diﬃcult operation to parallelize and we are aware of no algorithms

or implementations that perform well on large numbers of processors. Conse-

quently, the utility of Tarjan’s algorithm in parallel is questionable.

Alternatively, there exist several parallel algorithms for the strongly connected

components problem (SCC) that avoid the use of depth ﬁrst search. Gazit and

Miller devised an NC algorithm, which is based upon matrix–matrix multipli-

cation [10]. This algorithm requires O(n

2.376

) processors and O(log

n) time.

Kao developed a more complicated NC algorithm for planar graphs that re-

quires O(log

n) time and n/ log n processors [12]. More recently, Bader has

an eﬃcient parallel implementation of SCC for planar graphs [3], which uses

a packed–interval representation of the boundary of a planar graph. When n

is much larger than p, the number of processors, Bader’s approach has been

observed to have an O(n/p) performance in practice [3]. But Bader’s approach

does not apply to general graphs.

Our interest in the SCC problem is motivated by the discrete ordinates method

for modeling radiation transport. In this method, the object to be studied is

modeled as a union of polyhedral ﬁnite elements. Each element is a vertex in

our graph and an edge connects any pair of elements that share a face. The

radiation equations are approximated by an angular discretization. For each

angle in the discretization, the edges in the graph are directed to align with the

angle. The computations associated with an element can be performed if all its

predecessors have been completed. Thus, for each angle, the set of computations

are sequenced as a topological sort of the directed graph. A problem arises if the

topological sort cannot be completed, i.e., the graph has a cycle. If cycles exist,

the numerical calculations need to be modiﬁed, typically by using old informa-

tion along one of the edges in each cycle, thereby removing the dependency. So

identifying strongly connected components quickly is essential. Since radiation

transport calculations are computation- and memory-intensive, parallel imple-

mentations are necessary for large problems. Also, since the geometry of the

grid can change after each time-step for some applications, the SCC problem

must be solved in parallel. Eﬃcient parallel implementations of the topological

sort step of the radiation transport problem have been developed for structured

grids, oriented grids that have no cycles [4,7]. Some initial attempts to gener-

alize these techniques to unstructured grids are showing promise [14,15]. It is

these latter eﬀorts that motivated our interest in the SCC problem.

In this paper, we describe simple divide-and-conquer algorithms for ﬁnding

strongly connected components. We show one algorithm has expected serial

complexity O(m log n), and a modiﬁcation has worst case serial complexity

O(m log n). Our approach has good potential for parallelism for two reasons.

First, the divide-and-conquer paradigm generates smaller problems that can be

solved independently. Second, the basic step in our algorithm is a reachabil-

ity analysis, which is similar to topological sort in its parallelizability. In fact,

McLendon, et al. [13] have implemented a parallel version of our algorithm and

they report very encouraging results.

2 Preliminaries

Let G = (V, E) be a directed graph with vertex set V and directed-edge set E.

Let n = |V | and m = |E|. An edge (i, j) ∈ E is directed from vertex i to vertex

j. A vertex v is reachable from a vertex u if there is a sequence of directed edges

(u, x

), (x

, x

), . . . , (x

, v) ∈ E. A vertex is reachable from itself. A strongly

connected component of G is a maximal subset S of V such that each vertex

in S is reachable from every other vertex in S. The set of strongly connected

components of G is denoted SCC(G). Note that SCC(G) is a partition of V .

The unique element of SCC(G) that contains v is denoted SCC(G, v). V \ X

denotes the subset of vertices in V which are not in a subset X. The size of

vertex set X is denoted by |X|. Given a vertex v ∈ V , the descendants of v,

Desc(G, v), is the subset of vertices in G that are reachable from v. Similarly,

the predecessors of v, P red(G, v), is the subset of vertices from which v is

reachable. The set of vertices that is neither reachable from v nor reach v is

called the remainder, denoted by Rem(G, v) = V \ (Desc(G, v) ∪ P red(G, v)).

Given a graph G = (V, E) and a subset of vertices V

⊆ V , the induced subgraph

= (V

, E

) contains all edges of G connecting vertices of V

, i.e. E

= {(u, v) ∈

E : u, v ∈ V

}. We will use hV

i = G

= (V

, E

) to denote the subgraph of G

induced by vertex set V

Lemma 2.1 Let G be a directed graph with vertex v. Then SCC(G, v) =

Desc(G, v)∩P red(G, v); and every other strongly connected component in SCC(G)

is a strongly connected component of exactly one of hDesc(G, v) \SCC(G, v)i,

hP red(G, v) \ SCC(G, v)i, or hRem(G, v)i.

DCSC(G)

If G has no edges then

forall v ∈ V Output {v}.

Else

Select a random vertex v from V

SCC ← P red(G, v) ∩ Desc(G, v)

Output SCC

DCSC(hP red(G, v) \ SCCi)

DCSC(hDesc(G, v) \ SCCi)

DCSC(hR em(G, v)i)

Fig. 1. A divide-and-conquer algorithm that outputs all of the strongly connected

components of a directed graph.

Proof: The equality SCC(G, v) = Desc(G, v) ∩ P red(G, v) follows immedi-

ately from the deﬁnitions. Let u and w be two vertices of the same strongly

connected component in G. By deﬁnition, u and w are reachable from each

other. The proof involves establishing u ∈ Desc(G, v) ⇐⇒ w ∈ Desc(G, v) and

u ∈ P red(G, v) ⇐⇒ w ∈ P red(G, v), which then implies u ∈ Rem(G, v) ⇐⇒

w ∈ Rem(G, v). Since the proofs of these two statements are symmetric, we

give just the ﬁrst: If u ∈ Desc(G, v) then u must be reachable from v. But then

w must also be reachable from v, so w ∈ Desc(G, v).

3 A Divide-and-conquer Algorithm

In this section, we describe a divide-and-conquer algorithm that ﬁnds all of

the strongly connected components of a directed graph and show that it has

expected run time O(m log n). The algorithm, called DCSC (for Divide-and-

conquer Strong Components), is sketched in Figure 1. The basic idea is to select

an arbitrary vertex v, which we call a pivot vertex, and ﬁnd its descendant and

predecessor sets. The intersection of the predecessor and descendant sets is

SCC(G, v) by Lemma 2.1. After outputting SCC(G, v), the vertices in G \

SCC(G, v) are divided into three sets: Desc(G, v), P red(G, v), and Rem(G, v).

By Lemma 2.1, any additional strongly connected component must be entirely

contained within one of these three sets, so we can divide the problem and

recurse.

The run time of the algorithm depends on the choice of the pivot vertex. For

the analysis in the next section, we select the pivot vertex uniformly at random.

As mentioned above, this algorithm is amenable to practical parallelization on

two levels. First, the recursive invocations are completely independent and so

can execute independently. Second, the searches for predecessors and descen-

dants allow for much more parallelism than does a depth ﬁrst search.

3.1 Complexity of Algorithm DCSC

The cost of algorithm DCSC consists of time spent in predecessor and descen-

dant searches, plus the time to rep ort strongly connected components. The last

term takes linear time in the number of vertices over the course of the algorithm.

The remaining terms can be bounded as follows.

The performance of the algorithm depends on the sizes of the recursive calls,

and hence on the choice of the pivot vertex. The worst case occurs when the

predecessor or descendant set is very large, which leads to Θ(mn) runtime.

The best case occurs when the remainder set is very large, which leads to

Θ(m) runtime. Below we show that by choosing v to be chosen uniformly at

random from V , the expected complexity of the algorithm can be bounded as

Θ(m lg n + n).

Lemma 3.1 For a directed graph G, there is a numbering π

of the vertices

from 1 to n for which the following is true. All vertices v

in hP red(G, v) \

SCC(G, v)i satisfy π

) < π

(v); and all vertices v

in hDesc(G, v)\SCC(G, v)i

satisfy π

) > π

(v).

Proof: If G is acyclic, then a topological sort provides a numbering with this

property. If G has cycles, then each strongly connected component can be con-

tracted into a single vertex, and the resulting acyclic graph can be numbered

via topological sort. An ordering of the vertices is obtained by replacing each

strongly connected component by a list of the vertices it contains.

It is important to note that we do not need to construct an ordering with this

property; we just need to know that it exists. In the remainder of this section,

we refer to a vertex ordering according to this lemma as a consistent ordering

of the vertices.

Theorem 3.2 Algorithm DCSC has expected time complexity O(m lg n + n).

Proof: We will show that the expected number of times an edge e is explored

in a descendant search is O(log n). This then holds symmetrically for predecessor

searches and proves the theorem.

Let f(G, e) be the average number of times edge e gets explored. Let d(r) be

the maximum of f(G, e) over all edges e and all graphs G with r vertices. The

proof of the theorem is established by showing that

d(r) ≤ H

= 1 + 1/2 + 1/3 + ... + 1/r = Θ(lg r). (1)

How many times does an edge e = (u, w) get explored in descendant searches?

It is last explored when the strongly connected component containing u is dis-

covered. All other visits occur from searches originating on a vertex v that is

before u in the consistent ordering. There are at most π

(u) − 1 such vertices.

Consider the subgraph G

identiﬁed in such a search hDesc(G, v) \SCC(G, v)i.

This subgraph has a consistent ordering in which π

(u) is at most π

(u) −

(v). This leads to the following recurrence for d(n).

d(n) ≤1 +

i=1

d(n − i)

= 1 +

n−1

i=0

d(i) (2)

We claim that H

= 1 + 1/2 + ... + 1/n satisﬁes this recursion. It follows that

d(r) ≤ H(r) = Θ(lg n) and the theorem follows.

To show that the quantity d

= H

satisﬁes the recursion multiply (2) by r,

and subtract the similar equation with r − 1 replacing r:

r ∗ d(r) = r + d(0) + d(1) + ... + d(r −2) + d(r −1)

− [ (r − 1) ∗ d(r − 1) = (r − 1) + d(0) + d(1) + ... + d(r − 2) ].

This yields r ∗d(r) −(r −1) ∗d(r −1) = r + d(r −1) −(r −1), or equivalently,

d(r) = d(r − 1) + 1/r.

3.2 A Tight Example

The following example shows the bound O(m log n) is tight. Start with a di-

rected path of length n. Add edges from the last vertex in the path to a set of r

new ve rtices. And then add edges from each of these r vertices to a set of r ad-

ditional vertices to form a complete bipartite subgraph K

r,r

with

√

n ≤ r ≤ n

for α < 1.

On this graph, the algorithm will choose a decreasing sequence of vertices on

the path, each exercising the whole bipartite graph, before picking some vertex

in the bipartite graph (and breaking it). For simplicity of notation, let x = 2r.

Let g(n, x) be the expected length of this sequence when the directed path has

length n. We show by induction on n that g(n, 2r) = H

n+2r

− H

, which is

Θ(log n).

We may write g(n, x) as

g(n, x) =

i=1

n + x

[1 + g(i − 1, x)]. (3)

Note that g(1, x) =

x+1

. Use induction to replace g(i−1, x) in (3) with H

i−1+x

−

to get

g(n, x) =

n + x

n−1

i=1

n − i

x + i

n + x

n−1

i=1

[

n + x

n − i

(x + i)(n + x)

]

n + x

n−1

i=1

x + i

= H

n+x

− H

4 A Worst Case O(m log n)-time Algorithm

In this section, we modify the algorithm from §3 to achieve a worst case

O(m log n) time complexity. The basic observation here is that we don’t need to

ﬁnd both the predecessor and descendant sets, but rather just the intersection of

these two sets, which gives SCC(G, v). This intersection can be found by ﬁrst

ﬁnding the predecessor (or the descendant) set completely, and then search-

ing for descendants within the predecessor set. Once we have P red(G, v) (or

Desc(G, v)) and SCC(G, v), the algorithm can recurse on hP red(v) \SCC(v)i

and hG \ P red(v)i. Choosing the smaller of the predecessor or the descendant

sets will be more eﬃcient, since it decomposes the problem with less work.

However, we don’t know which set is smaller, so instead, we will simultaneously

search for both and abandon the second search when the ﬁrst one ﬁnishes. In

this modiﬁed algorithm, the work for each decomposition is proportional to the

smaller of the number of edges in the predecessor set and the number of edges in

the descendant set, whereas it is proportional to the sum of the number of edges

in these two sets in the original algorithm. The resulting algorithm W DCSC

is sketched in Figure 2. The algorithm is somewhat more complex than that

of §3, and its parallelization would be more diﬃcult, but it has a deterministic

performance guarantee.

Theorem 4.1 Algorithm WDCSC runs in worst case O(m log n)-time.

Proof: Let T (m) be the run time of WDCSC on a graph with m edges. Let

s to de note the total number of edges in the subgraph induced by the vertex

set SCC(G, v). Let p denote the total number of edges in the subgraph induced

by the vertex set P red(G, v), minus s. Symmetrically, let d denote the total

number of edges in the subgraph induced by the vertex set Desc(G, v), minus

s. By symmetry, we can assume p ≤ d. Then T (m) can be expressed as

T (m) ≤ T (p) + T (m − p − s) + c

(p + s), (4)

W DCSC(G)

If G is not empty then

Select v uniformly at random from V

Simultaneously search for P red(G, v) and Desc(G, v)

If P red(G, v) ﬁnishes ﬁrst

Search for Desc(G, v) only in P red(G, v).

SCC ← Desc(G, v) ∩ P red(G, v).

W DCSC(hP red(G, v) \ SCCi)

W DCSC(hV \P red(G, v)i)

Else If Desc(G, v) ﬁnishes ﬁrst

Search for P red(G, v) only in Desc(G, v).

SCC ← Desc(G, v) ∩ P red(G, v).

W DCSC(hDesc(G, v) \ SCCi)

W DCSC(hV \Desc(G, v)i)

Output P red(G, v) ∩ Desc(G, v)

Fig. 2. A worst case O(m log n) time algorithm to ﬁnd strongly connected com ponents

where c

is a constant ≥ 0, since the amount of work in one iteration is propor-

tional to the minimum of the number of edges in the predecessor set and the

number of edges in the descendant set. That is, the number of steps is twice

p + s until the algorithm completes one search, and then it is at most s. Note

that by hypothesis p ≤ m/2. We use induction to show T (m) = c

m log m for

some constant c

≥ 0.

T (m) ≤T (p) + T (m − p − s) + c

(p + s)

≤c

p lg p + c

(m − p − s) lg(m −p − s) + c

(p + s)

≤c

p lg p + c

(m − p − s) lg m + c

(p + s)

= c

m lg m + [c

p lg p − c

(p + s) lg m + c

(p + s)]

= c

m lg m + [p(c

lg(p/m) + c

) + s(c

− c

lg m)]

To ﬁnish the proof, it suﬃces to show that the term in brackets p(c

lg(p/m) +

) + s(c

−c

lg m) is not positive. As it turns out, we can choose c

such that

each of the two terms inside the brackets is negative: Recall that p < m/2; thus

lg(p/m) ≤ −1, and the ﬁrst term is negative. For c

≥ 2c

, the second term will

also be negative.

Acknowledgments

We beneﬁted from general discussions about algorithms for parallel strongly

connected components with Steve Plimpton, Will McLendon and David Bader.

We are also grateful for helpful comments from Cindy Phillips.

References

[1] A. Aggarwal and R. J. An derson, A random NC algorithm for depth ﬁrst

search, Combinatorica, 8 (1988), pp. 1–12.

[2] A. Aggarwal, R. J. Anderson, and M.-Y. Kao, Parallel depth-ﬁrst search

in genera l directed graphs, SIAM J. Comput., 19 (1990), pp. 397–409.

[3] D. A. Bader, A practical parallel algorithm for cycle d etection in partitioned

digraphs, Tech. Rep. Technical Report AHPCC-TR-99-013, Electrical &

Computer Engng. Dept., Univ. New Mexico, Albuquerque, NM, 1999.

[4] R. S. Baker and K. R. Koch, An S

algorithm for the massively parallel

CM-200 computer, Nuclear Science and Engineering, 128 (1998), pp. 312–320.

[5] P. Chaudhuri, Finding and updating depth-ﬁrst spanning trees of acyclic

digraphs in parallel, The Computer Journal, 33 (1990), pp. 247–251.

[6] T. H. Cormen, C. E. Leiserson, and R. L. Rivest, Introduction to

Algorithms, MIT Press and McGraw-Hill, Cambridge, MA, 1990.

[7] M. R. Dorr and C. H. Still, Concurrent source iteration in the solution of 3-

dimensional, multigroup discrete ordinates neutron-transport equations, Nuclear

Science and Engineering, 122 (1996), pp. 287–308.

[8] L. K. Fleischer, B. Hendrickson and A. Pınar, On Identifying Strongly

Connected Components in Parallel, in Solving Irregularly Structured Problems in

Parallel: 7th International Symposium, Irregular ’00, Lecture Notes in Computer

Science, Vol. 1800, pp. 505–511, Springer-Verlag, 2000.

[9] H. N. Gabow, Path-based depth-ﬁrst search for strong and biconnected

components, Information Processing Letters 74 (2000), pp. 107-114.

[10] H. Gazit and G. L. Miller, An improved parallel algorithm that computes the

BFS numbering of a directed graph, Inform. Process. Lett., 28 (1988), pp. 61–65.

[11] T. Hagerup, Planar depth-ﬁrst search in O(log n) parallel time, SIAM J.

Comput., 19 (1990), pp. 678–704.

[12] M.-Y. Kao, Linear-processor NC algorithms for planar directed graphs I:

Strongly connected components, SIAM J. Comput., 22 (1993), pp. 431–459.

[13] W. McLendon III, B. Hendrickson, S. Plimpton, and L. Rauchwerger. Finding

strongly connected components in distributed graphs. J. Parallel Distrib.

Comput., 65:901–910, 2005.

[14] S. Pautz. An algorithm for parallel S

sweeps on unstructured meshes, Nuclear

Science Engineering, 140 (2002), pp. 111–136.

[15] S. Plimpton, B. Hendrickson, S. Burns and W. McLendon III, Parallel

Algorithms for Radiation Transport on Unstructured Grids, In Proc. SC’00,

Portland, Oregon, 2000.

[16] J. H. Reif, Depth-ﬁrst search is inherently sequential, Inform. Process. Lett., 20

(1985), pp. 229–234.

[17] M. Sharir, A strong-connectivity algorithm and its applications in data-ﬂow

analysis, Computers and Mathematics with Applications, (1981), pp. 67–72.

[18] R. E. Tarjan, Depth ﬁrst search and linear graph algorithms, SIAM J. Comput.,

1 (1972), pp. 146–160.