Two-sample hypothesis testing for inhomogeneous random graphs

The Annals of Statistics

2020, Vol. 48, No. 4, 2208–2229

https://doi.org/10.1214/19-AOS1884

TWO-SAMPLE HYPOTHESIS TESTING FOR INHOMOGENEOUS

RANDOM GRAPHS

Y DEBARGHYA GHOSHDASTIDAR

1,*

,MAURILIO GUTZEIT

2,†

LEXANDRA CARPENTIER

2,‡

AND ULRIKE VON LUXBURG

1,**

Department of Computer Science, University of Tübingen,

debar[email protected];

luxbur[email protected]

Faculty of Mathematics, Otto von Guericke University Magdeburg,

†

[email protected];

‡

alexandr[email protected]

The study of networks leads to a wide range of high-dimensional infer-

ence problems. In many practical applications, one needs to draw inference

from one or few large sparse networks. The present paper studies hypothesis

testing of graphs in this high-dimensional regime, where the goal is to test

between two populations of inhomogeneous random graphs deﬁned on the

same set of n vertices. The size of each population m is much smaller than n,

and can even be a constant as small as 1. The critical question in this context

is whether the problem is solvable for small m.

We answer this question from a minimax testing perspective. Let P , Q be

the population adjacencies of two sparse inhomogeneous random graph mod-

els, and d be a suitably deﬁned distance function. Given a population of m

graphs from each model, we derive minimax separation rates for the problem

of testing P = Q against d(P,Q) > ρ. We observe that if m is small, then the

minimax separation is too large for some popular choices of d, including to-

tal variation distance between corresponding distributions. This implies that

some models that are widely separated in d cannot be distinguished for small

m, and hence, the testing problem is generally not solvable in these cases.

We also show that if m>1, then the minimax separation is relatively small

if d is the Frobenius norm or operator norm distance between P and Q.For

m = 1, only the latter distance provides small minimax separation. Thus, for

these distances, the problem is solvable for small m. We also present near-

optimal two-sample tests in both cases, where tests are adaptive with respect

to sparsity level of the graphs.

1. Introduction. Analysis of random graphs has piqued the curiosity of probabilists

since its inception decades ago, but the widespread use of networks in recent times has made

statistical inference from random graphs a topic of immense interest for both theoretical and

applied researchers. This has caused a fruitful interplay between theory and practice lead-

ing to deep understanding of statistical problems that, in turn, has led to advancements in

applied research. Signiﬁcant progress is clearly visible in problems related to network mod-

elling (Albert and Barabási (2002), Lovász (2012)), community detection (Abbe and Sandon

(2016), Decelle et al. (2011)), network dynamics (Berger et al. (2005)) among others, where

statistically guaranteed methods have emerged as effective practical solutions. Quite surpris-

ingly, the classical problem of hypothesis testing of random graphs is yet to beneﬁt from such

joint efforts from theoretical and applied researchers. It should be noted the problem itself is

actively studied in both communities. Testing between brain or “omics” networks have sur-

faced as a crucial challenge in the context of both modelling and decision making (Ginestet

et al. (2017), Hyduke, Lewis and Palsson (2013)). On the other hand, phase transitions are

Received June 2018; revised February 2019.

MSC2020 subject classiﬁcations. Primary 62H15; secondary 62C20, 05C80, 60B20.

Key words and phrases. Two-sample test, inhomogeneous Erd

os–Rényi model, minimax testing.

2208

TWO-SAMPLE TESTING FOR RANDOM GRAPHS 2209

now known for the problems of detecting high-dimensional geometry or strongly connected

groups in large random graphs (Arias-Castro and Verzelen (2014), Bubeck et al. (2016)).

However, little progress has been made in the design of consistent tests for general models of

random graphs. The present paper takes a step towards addressing this general concern.

While research on testing large random graphs has been limited, hypothesis testing in

large dimension is an integral part of modern statistics literature. In fact, Bai and Saranadasa

(1996) demonstrated the need for studying high-dimensional statistics through a two-sample

testing problem, where one tests between two n-variate normal distributions with different

means by accessing m i.i.d. observations from either distributions, where m  n (we de-

note the dimension by n since, in the context of graphs, the number of vertices n governs

the dimensionality of the problem). More recent works in this direction provide tests and

asymptotic guarantees as m →∞and

→∞(Cai, Liu and Xia (2014), Chen and Qin

(2010), Ramdas et al. (2015)). Similar studies also exist in the context of testing whether a n-

dimensional covariance matrix is identity, where only m  n i.i.d. observations of the data is

available (Arias-Castro, Bubeck and Lugosi (2015), Berthet and Rigollet (2013), Ledoit and

Wolf (2002)). It is also known that the computational complexity of this problem is closely

related to the clique detection problem in random graphs (Berthet and Rigollet (2013)). An

extreme version of high-dimensional testing arises in the signal detection literature, where

one observes a high-dimensional (Gaussian) signal, and tests the nullity of its mean. Subse-

quently, asymptotics of the problem is considered for n →∞but ﬁxed sample size (m = 1

or a constant), and minimax separation rates between the null and the alternative hypotheses

are derived (Baraud (2002), Ingster and Suslina (2003), Mukherjee, Pillai and Lin (2015),

Verzelen and Arias-Castro (2017)). The signal detection problem has also been extended in

the case of matrices in the context of trace regression (Carpentier and Nickl (2015)) and

sub-matrix detection (Sun and Nobel (2008)) among others, where the latter generalises the

planted clique detection problem.

In practice, the problem of testing random graphs comes in a wide range of ﬂavours. For

instance, while dealing with graphs associated with chemical compounds (Shervashidze et al.

(2011)) or brain networks of several patients collected at multiple laboratories (Ginestet et al.

(2017)), one has access to a large number of graphs (large m). This scenario is more amenable

as one can resort to the vast literature of nonparametric hypothesis testing that can even be ap-

plied to random graphs. A direct approach to this problem is to use kernel based tests (Gretton

et al. (2012)) in conjunction with graph kernels (Kondor and Pan (2016), Vishwanathan et al.

(2010

)), which does not require any structural assumptions on the network models. However,

kno

wn guarantees for such tests depend crucially on the sample size, and one cannot conclude

about the ﬁdelity of such tests for very small m. A more challenging situation arises when

m is small, and unfortunately, this is often the case in network analysis. For example, if the

graphs correspond to brain networks collected from patients in a single lab setup (m<20),

brain networks of one individual obtained from test-retest MRI scans (Landman et al. (2010)),

or molecular interaction networks arising from genomic or proteomic data (Hyduke, Lewis

and Palsson (2013)). Test-retest data of a patient provide only m = 2 networks, while omics

data typically result in one large interaction network, that is, m = 1. Hence, designing two-

sample tests for small populations for graphs is a problem of immense signiﬁcance, and yet,

practical tests with statistical guarantees are rather limited.

From a theoretical perspective, only a handful of results on testing large random graphs

are known. While ﬁnding hidden cliques have been a long-standing open problem, Arias-

Castro and Verzelen (2014), Verzelen and Arias-Castro (2015) for the ﬁrst time provide a

characterisation of the more basic problem of detecting a planted clique in an Erd

os–Rényi

graph. Gao and Lafferty (2017)andLei (2016) consider generalised variants of this problem

while designing tests to distinguish a stochastic block model from an Erd

os–Rényi graph, or

2210 GHOSHDASTIDAR, GUTZEIT, CARPENTIER AND VON LUXBURG

to estimate the number of communities in a stochastic block model, respectively. In a dif-

ferent direction, Bubeck et al. (2016) study the classical problem of testing whether a given

graph corresponds to a neighbourhood graph in a high-dimensional space, or it is generated

from Erd

os–Rényi model. This result is in fact a speciﬁc instance of the generic problem of

detecting whether a network data has a dependence structure or is unstructured (Bresler and

Nagaraj (2018), Daskalakis, Dikkala and Kamath (2018), Ryabko (2017)). The ﬁrst study on

two-sample testing of graphs, under a relatively broad framework is by Tang et al. (2017a),

where the authors test between a pair of random dot product graphs which are undirected

graphs on a common set of n vertices with mutually independent edges and low-rank popu-

lation adjacency matrices. A test statistic based on the difference in adjacency spectral em-

beddings is shown to be asymptotically consistent as n →∞provided that the rank of the

population adjacencies is ﬁxed and known. Tang et al. (2017b) study a more general problem

of comparing two random dot product graphs deﬁned on different vertex sets, where the ver-

tices have latent Euclidean representations. The latent representation can be recovered from

the adjacency spectral embedding, and kernel two-sample testing for Euclidean data can be

employed to solve the testing problem. Ghoshdastidar et al. (2017) provide a framework to

formulate the graph two-sample problem with minimal structural restrictions, and show that

this general framework can be used to prove minimax optimality of tests based on triangle

counts and spectral properties under special cases.

In this paper, we restrict ourselves to graphs on a common vertex set of size n and sampled

from an inhomogeneous Erd

os–Rényi (IER) model (Bollobás, Janson and Riordan (2007)),

that is, we consider undirected and unweighted random graphs where the edges occur in-

dependently, but there is no structural assumption on the population adjacency matrix. We

study the problem of testing between two IER models, where m i.i.d. graphs are observed for

each model. Apparently, allowing m ≥ 1 appears to be a slight generalisation of the m = 1

case (Tang et al. (2017a)), but we show that in some situations, the testing problem behaves

differently in the m = 1andm>1 cases. It is also well established that many graph learn-

ing problems have different behaviour in the case of dense and sparse graphs. This is indeed

true in the context of testing for geometric structures (Bubeck et al. (2016)) and community

detection (Verzelen and Arias-Castro (2015)). Bearing this in mind, we study the two-sample

problem at different levels of sparsity of the graph. A formal description of the problem is

presented in Section 2.

Given the above framework, one may resort to a variety of testing procedures. A clas-

sical approach involves viewing the problem as an instance of closeness testing for high-

dimensional discrete distributions (Chan et al. (2014), Daskalakis, Dikkala and Kamath

(2018)) and using variants of the χ

-test or related localisation procedures. On the other

hand, exploiting the independence of edges, one may even view the problem as a instance of

multiple testing, and may resort to tests based on higher criticism (Donoho and Jin (2004),

Donoho and Jin (2015)). A more direct approach may be to simply compare the adjacency

spectral embeddings (Tang et al. (2017a)), or other network statistics (Ghoshdastidar et al.

(2017)) or even the raw adjacency matrices. Sections 3–5 present a variety of testing prob-

lems, which differ in terms of the distance d(P,Q), that is, how we quantify the separation

between the two models. We show that some of the above principles are not useful for small

m since the associated testing problems are generally unsolvable in these cases. However, in

some cases, one can construct uniformly consistent tests that work with a small number of

observation, even m =1. Section 6 discusses the practicality of graph two-sample testing and

also presents minimax separation under special cases of the IER model, such as Erd

os–Rényi

or stochastic block models, or under different notions of sparsity in graphs. The detailed

proofs are provided in the Supplementary Material (Ghoshdastidar et al. (2020)).

TWO-SAMPLE TESTING FOR RANDOM GRAPHS 2211

2. Problem statement. In this section, we formally state the generic two-sample graph

testing problem studied in this paper. We also present the minimax framework that forms the

basis of our theoretical analysis.

We use the notation  and  to denote the standard inequalities but ignoring absolute

constants. Further, we use  to denote that two quantities are same up to possible difference

in constant scaling. We use ∧ and ∨ (or



and



) to denote minimum and maximum, re-

spectively. We also need several standard norms and distances. For two discrete distributions,

we denote the total variation distance by TV(·, ·) and the symmetric Kullback–Leibler (KL)

divergence by SKL(·, ·). The latter is a symmetrised version of KL-divergence (Daskalakis,

Dikkala and Kamath (2018)). We use the following quantities for any matrix:

(i) Frobenius norm, ·

, is the root of sum of squares of all entries,

(ii) max norm, ·

max

, is largest absolute entry of the matrix,

(iii) zero norm, ·

, is the number of nonzero entries,

(iv) operator norm, ·

, is the largest singular value of the matrix, and

(v) row sum norm, ·

row

, (or, the induced ∞-norm) is the maximum absolute row sum

of the matrix.

2.1. The model and the testing problem. Throughout the paper, V ={1, 2,...,n} de-

notes a set of n vertices, and we consider undirected graphs deﬁned on V . Any such graph

can be expressed as G = (V , E

),whereE

is the set of undirected edges. We use the

symmetric matrix A

∈{0, 1}

n×n

to denote the adjacency matrix of G,where(A

)

= 1

if (i, j ) ∈ E

, and 0 otherwise. The class of inhomogeneous random graphs, or more pre-

cisely inhomogeneous Erd

os–Rényi (IER) graphs, on V can be described as follows. Let

⊂[0, 1]

n×n

be the set of symmetric matrices with zero diagonal, and off-diagonal entries

in [0, 1].ForanyP ∈ M

, we say that G is an IER graph with population adjacency P ,de-

noted by G ∼ IER(P ), if the adjacency matrix A

is a symmetric random matrix such that

)

∼ Bernoulli

0−1

),and((A

)

1≤i<j≤n

are independent.

Let P,Q∈ M

.Givenm independent observations from each of IER(P ) and IER(Q),we

would like to test between the alternatives

(1)

: P = Q and H

: d(P,Q) > ρ

for some speciﬁed distance function d and a threshold ρ ≥ 0. At this stage, note that the

distribution IER(P ) is completely characterised by the expected adjacency matrix P . Hence,

in (1) is similar both under the mean difference alternative and the general difference

alternative (Ramdas et al. (2015)). Hence, one may assume d to be either a distance between

the distributions IER(P ) and IER(Q), or a matrix distance between P and Q. Different

examples of d are considered in Sections 3–5, which result in speciﬁc instances of the testing

problems.

We note that the complexity of graph inference problems is often governed by the sparsity

of the graphs. To take the effect of sparsity into account, we restrict the problem to models

such that P 

max

∨Q

max

≤ δ for some δ ∈ (0, 1] where δ may decay with m, n. Intu-

itively, we consider only graphs that are uniformly sparse, that is any edge can occur with

probability at most δ. For instance, if δ 

, we mostly observe sparse graphs with bounded

expected degrees. Such a uniform sparsity restriction is along the lines of a scalar sparsity

parameters introduced in some graph estimation problems (Klopp, Tsybakov and Verzelen

(2017)). More general notions of sparsity may be considered as discussed later in Section 6.

Based on the above considerations, we formally state the following general framework for

graph two-sample testing:

(2)

: (P , Q) ∈ 

vs. H

: (P , Q) ∈ 

2212 GHOSHDASTIDAR, GUTZEIT, CARPENTIER AND VON LUXBURG

where

(3)





(P , Q) ∈ M

×M

: P = Q, P 

max

≤ δ





(P , Q) ∈ M

×M

: d(P,Q) > ρ,P 

max

∨Q

max

≤ δ



Note that the hypotheses are governed by the distance d, the integers m, n, and the positive

scalars ρ, δ, where the last two terms may depend on m, n.

2.2. Minimax framework. Given the graphs G

,...,G

∼

iid

IER(P ) and H

,...,

∼

iid

IER(Q),atest is a binary function of 2m adjacency matrices, where  = 0

when the test accepts

,and = 1 otherwise. The maximum or worst-case risk of a test is

given by

R(, n,m,d,ρ, δ) = sup

θ∈

( = 1) + sup

θ∈

( = 0),

which is the sum of maximum possible Type-I and Type-II error rates incurred by the test.

Here, we use θ to denote any tuple (P , Q). The minimax risk for the problem in (2)–(3)is

deﬁned as

(4) R

∗

(n,m,d,ρ,δ)= inf



R(, n,m,d,ρ, δ).

Our aim in this paper is to ﬁnd the minimax separation ρ

∗

for a given problem, which

is the smallest possible ρ such that R

∗

(n,m,d,ρ,δ) ≤ η for some prespeciﬁed η ∈ (0, 1).

In the subsequent sections, we consider testing problems where the separation between P

and Q is deﬁned in terms of various distance functions. We provide bounds for ρ

∗

for the

different testing problems in terms of the various parameters of the problem (2)–(3). Though

our formal bounds are explicit in terms of η, we generally assume η to be a pre-speciﬁed

constant (e.g., η = 0.05) and focus on the dependence of ρ

∗

on n, m and δ.Ouraimisto

provide upper and lower bounds for ρ

∗

that are same up to difference in absolute constants

and functions of η.

3. Challenges of testing with small sample size. The theme of this paper is to test

between two populations of sparse graphs, where the sample size m is much smaller than the

number of vertices n. Our main interest is in cases where m is a small constant, or may grow

very slowly with n. In this section, we show that in case of some popular distance functions,

the testing problem is nearly unsolvable if m is small.

We formalise the notion of unsolvability in the following way. For any instance of the two-

sample testing problem (2)–(3), there is a trivial upper bound for ρ

∗

which is the maximal

possible value that can be attained by d(P,Q) (diameter with respect to d). As an example, if

d(P,Q) =TV(IER(P ), IER(Q)),thenρ

∗

≤ 1 trivially. Similarly, if d(P,Q) =P − Q

a trivial upper bound is ρ

∗

≤ nδ. On the other hand, for small m, if there is a lower bound



≤ ρ

∗

such that ρ



is equal or close to the trivial upper bound, then there exist model

pairs such that d(P,Q) is nearly as large as the diameter and yet cannot be distinguished for

small m. Hence, we may conclude that the problem (2)–(3) with the speciﬁc choice of d is

unsolvable for small m under a worst-case (minimax) analysis.

We present the ﬁrst instance of such an impossibility result for the case of total variation

distance. For any two probability mass functions p and q, deﬁned on the space of undirected

n-vertex graphs,

TV(p, q) =





p(G) −q(G)



where the summation is over all unweighted undirected graphs on n vertices. We present the

following minimax rate in the small sample regime.

TWO-SAMPLE TESTING FOR RANDOM GRAPHS 2213

PROPOSITION 3.1 (ρ

∗

for total variation distance). Consider the problem in (2)–(3) with

d(P,Q) = TV(IER(P ), IER(Q)) and δ ∈ (0, 1). Let η ∈ (0, 1) be the allowable risk. For a ny

δ ≥ C



lnn

1 −

≤ ρ

∗

≤ 1 for m ≤C





1 + 4(1 − η)



ln n

where C ∈ (0, 1) and C



> 1 are absolute constants.

In particular, for large n and m 

lnn

, we have ρ

∗

≈ 1.

ROOF SKETCH. The main idea is to ﬁnd an appropriate choice of (P , Q) such that

TV(IER(P ), IER(Q)) ≥ 1 −

, and yet they cannot be distinguished using m 

lnn

samples.

For this, we use standard approaches to derive minimax lower bounds (Baraud (2002)). This

is described in the present context of two-sample testing of IER graphs in Ghoshdastidar et

al. (2020).

In particular for the present proof, the choice of P , Q is the following. Under

,weset

P = Q such that the model corresponds to Erd

os–Rényi (ER) model with edge probability

. Under H

, we keep P as before whereas each entry of Q is chosen independent and

uniformly from {

−γ,

+γ }. An appropriate choice of γ ≤

leads to the result. 

The stated lower bound shows that at least m 

ln n

samples are needed to test for separa-

tion in total variation distance, which is beyond the small sample regime that we are interested

in. In fact, in the case of constant m, one can improve the stated bound as 1−e

−C



≤ ρ

∗

≤ 1,

where C



is a constant that depends on m.

An impossibility result also holds in the case of symmetric KL-divergence, which has

been effectively used for high-dimensional discrete distributions, particularly Ising models

(Daskalakis, Dikkala and Kamath (2018)). For any two probability mass functions p and q,

deﬁned on the space of undirected n-vertex graphs, the symmetric KL-divergence is given by

SKL(p, q) =



p(G)ln



p(G)

q(G)



+q(G)ln



q(G)

p(G)



where the summation is over all unweighted undirected graphs on n vertices. Note that the

above distance is unbounded even for ﬁnite n since SKL(p, q) =∞if there exists G

such

that p(G

) = 0butq(G

) = 0. We present the following result that demonstrates the im-

possibility of testing with respect to the symmetric KL-divergence when sample size m is

small.

ROPOSITION 3.2 (ρ

∗

for symmetric KL-divergence). Let η ∈ (0, 1) and consider the

problem in (2)–(3) with d(P,Q) =SKL(IER(P ), IER(Q)) and any δ ∈ (0, 1). Then

∗

=∞ for m ≤



(1 − η)n



ROOF SKETCH. The basic technique for the proof is similar to Proposition 3.1,butwe

choose P and Q such that SKL(IER(P ), IER(Q)) =∞, and yet the models are indistin-

guishable for small m.

To be precise, under

we set P = Q corresponding to an ER model. Under H

,we

set Q to be the same as P except for a randomly chosen entry, for which Q

= 0 = P

This implies that IER(P ) and IER(Q) do not have a common support, which leads to

SKL(IER(P ), IER(Q)) =∞. However, if the sample size is small (small m) or the graphs

are sparse (small δ), then the models cannot be distinguished. 

2214 GHOSHDASTIDAR, GUTZEIT, CARPENTIER AND VON LUXBURG

The above result shows that testing for separation in symmetric KL-divergence is impos-

sible for m  ln n sample even when the graphs are dense (δ  1). However, the situation is

worse in the case of sparse graphs. If δ 

, then at least m  n ln n samples are necessary,

which is worse than the condition for total variation distance.

Both Propositions 3.1 and 3.2 suggest that achieving a small sample complexity (small m)

could be difﬁcult under general difference alternatives, that is, if d corresponds to distribu-

tional distances. Hence, subsequent discussions focus only on matrix distances. However,

even in this case, the two-sample problem is not necessarily easily solvable for all distances

or dissimilarities.

ROPOSITION 3.3 (ρ

∗

for zero norm/effect rarity). Let η ∈ (0, 1) and consider the prob-

lem in (2)–(3) with d(P,Q) =P − Q

and any δ ∈ (0, 1). Then

∗

= n(n −1) for all m<∞.

ROOF SKETCH. The proof is straightforward since the entries P and Q can be ar-

bitrarily close but still be unequal. Hence, the models may not be distinguishable though

P − Q

= n(n −1), which is the trivial upper bound. 

Proposition 3.3 may be viewed as a trivial extremity of the rare/weak effect studied in

the context of multiple testing (Donoho and Jin (2015)). To put it simply, here we view the

problem as testing P

= Q

or P

= Q

for every i<j, and the edge independence in

IER graphs leads to a problem of multiple independent comparisons. Proposition 3.3 states

that if min

=Q

−Q

| is arbitrarily small, that is, the individual effects are arbitrarily

weak, then they cannot be detected even when the effects are dense P − Q

= n(n − 1).

A more detailed analysis of the rare/weak effect in the sparse Bernoulli setting may be done

by imposing a threshold min

=Q

−Q

| that characterises weakness of the effect. We

do not discuss further on this effect, and instead, we proceed to other instances of (2)–(3),

where the problem can be solved for small m.

4. Testing for separation in Frobenius norm. The previous section focused on impos-

sibility results, where ρ

∗

is typically large (close to trivial upper bound) when m is small.

In this section and the next one, we study two instances of the problem (2)–(3)wheretests

can be constructed even for small m. Formally, we show that the lower bound for ρ

∗

can be

much smaller than the trivial upper bound, and subsequently, we propose two-sample tests to

derive nearly matching upper bounds for ρ

∗

We ﬁrst quantify the separation in terms of Frobenius norm, that is, d(P,Q) =P −Q

This is equivalent to viewing the adjacencies as





-dimensional Bernoulli vectors, and using

two-sample test for high-dimensional vectors—a well-studied problem in the Gaussian case

(Chen and Qin (2010)). We state the following bounds for the minimax separation ρ

∗

HEOREM 4.1 (ρ

∗

for Frobenius norm separation). Consider the two-sample problem

(2)–(3) with d(P,Q) =P − Q

, any δ ∈ (0, 1) and any η ∈ (0, 1). There exist absolute

constants C

≥ 1 such that:

nδ

≤ ρ

∗

≤ nδ for m =1,

2. (







)nδ ≤ ρ

∗

≤ nδ for m>1 and δ ≤

, and





nδ

≤ ρ

∗

≤



nδ

for m>1 and δ ≥

TWO-SAMPLE TESTING FOR RANDOM GRAPHS 2215

where 



ln(1 + 4(1 − η)

). Hence, assuming the allowable risk η is ﬁxed, we have ρ

∗



nδ for m = 1 and ρ

∗

 nδ ∧



nδ

for m ≥ 2.

Theorem 4.1 provides a clear characterisation of the minimax separation ρ

∗

(up to factors

of η) when the distance between models is in terms of Frobenius norm. The second and

third statements deal with the case of m>1. In the ultra-sparse regime, that is δ 

, one

observes a total of only O(n) edges from the entire population of 2m graphs generated from

either models. This information is insufﬁcient for testing equality of models, and hence, it

is not surprising that ρ

∗

 nδ, which is the trivial upper bound. On the other hand, when

δ 

, we ﬁnd a nontrivial separation rate indicating that the problem is solvable in this

case.

The surprising ﬁnding of Theorem 4.1 is that ρ

∗

 nδ for m = 1, which informally means

that the problem is not solvable when one observes only m = 1 sample from each model.

This result is signiﬁcant since it shows that the problem of testing for separation in Frobenius

norm is unsuitable in the setting of comparing between two large networks, for instance, the

case of testing between two omics networks.

4.1. Proof of Theorem 4.1. We provide an outline of the proof of the above result high-

lighting the key technical lemmas. We sketch their proofs here, and the detailed proofs can

be found in Ghoshdastidar et al. (2020). To prove the lower bounds, we have the following

result.

EMMA 4.2 (Necessary conditions for detecting Frobenius norm separation). Fo r th e

testing problem (2)–(3) with d(P,Q) =P − Q

and for any η ∈ (0, 1), the minimax risk

(4) is at least η if either of the following conditions hold:

(i)ρ<

nδ







nδ

, or (ii)m= 1,ρ<

nδ

ROOF SKETCH. The proof follows the basic approach of Proposition 3.1, and also uses

same choice of P , Q.WesetP = Q corresponding to ER model with edge probability

under H

.ForH

, we set the same P , but each entry of Q is chosen independent and

uniformly from {

−γ,

+γ }. One can easily see that Q is chosen uniformly from a set of

n(n−1)/2

matrices, but for each choice of Q,wehaveP − Q

≈ nγ . Hence, a choice of

γ ∈ (

] implies that the pair of (P , Q) for every choice of Q lies in 

Subsequently, we use the techniques of Baraud (2002) to show that for m ≥2andδ 

there is an appropriate choice γ<

for which the random choice of (P , Q) ∈ 

cannot

be distinguished from the null case. If δ 

, then the same situation occurs even for the

choice γ =

. Finally, for m = 1, we observe that the same proof leads to the conclusion that

the random choice of (P , Q) ∈ 

is indistinguishable from the null case for any choice of

γ ≤

. In particular, γ =

leads to the claim in (ii).

We elaborate on the distinction between the cases m = 1andm>1. Consider the former

case of m = 1 under

, where we have G

∼ IER(P ) = ER(

) and H

∼ IER(Q) with

Q being chosen randomly as described above. Due the uniform choice of Q, one can easily

verify that the probability of each edge in H

, which is same as that of G

. Hence,

although the two graphs are sampled from different generative models with P − Q

>ρ,

they are essentially similar due to the random choice of Q. On the other hand, let m = 2,

∼

iid

ER(

) and H

∼

iid

IER(Q) with Q being random as before. Although H

are independent conditioned on the choice of Q, the two graphs are mutually dependent

2216 GHOSHDASTIDAR, GUTZEIT, CARPENTIER AND VON LUXBURG

without the knowledge of Q and hence, the population {H

} does not have the same

distribution as {G

}. 

We continue with the proof of Theorem 4.1. The lower bounds in the second and third

statements of Theorem 4.1 follow from condition (i) above by accounting for the conditions

on δ and noting C

≥ 1and

≤

√

ln 5. For the upper bounds in ﬁrst two statements, note

that ρ

∗

≤ nδ trivially holds since P − Q

≤ n(P 

max

∨Q

max

). To derive the upper

bound for the third case, we construct the following two-sample test. Let A

,...,A

and

,...,A

be the adjacency matrices of the 2m graphs. We deﬁne



μ =



i,j=1

i<j





k≤m/2

)

−(A

)





k>m/2

)

−(A

)



,(5)



σ =









i,j=1

i<j





k≤m/2

)

+(A

)





k>m/2

)

+(A

)



,(6)

and consider the test

(7) 

= 1





√



·1





σ>

3/2



for some positive constants t

, t

,where1{·} is the indicator function. We state the following

guarantee for 

EMMA 4.3 (Sufﬁcient conditions for detecting Frobenius norm separation). Consider

the testing problem with d(P,Q) =P − Q

and the test 

in (7). There exist absolute

constants t

, t

, C and C



such that for any η ∈ (0, 1) and δ ∈ (0, 1), if

(8) m ≥ 2 and ρ ≥



nδ





3/2

then R(

,n,m,d,ρ,δ)≤ η.

ROOF SKETCH. The proof is based on concentration statements for



μ and



σ derived

via Chebyshev’s inequality using

μ =



μ]=

P − Q

,σ

= E







P + Q

and bounds for Var[



μ] and Var[



] in terms of P − Q

and P + Q

For the type-I-error, that is, under

, these concentration bounds lead to large enough

choices for t

and t

such that at least one of the events





√



and





σ>

3/2



has small probability, which bounds the probability of the event {

= 1}. On the other

hand, by construction, in order to control the type-II-error rate we want to ensure that both

the events





≤

√



and





σ ≤

3/2



TWO-SAMPLE TESTING FOR RANDOM GRAPHS 2217

have small probability under H

. Now, the requirement in (8)onρ guarantees that

μ  σ 

√

8η

3/2

which allows us to rewrite these two events as large deviation statements of



μ and



from

their means as required. 

To conclude the proof of Theorem 4.1, observe that if δ ≥

2

Cη

, then the



nδ

term in the

above result dominates and we obtain the upper bound in the third statement of Theorem 4.1

with C

2

. Hence the theorem holds.

4.2. Further remarks on Theorem 4.1. As part of the proof of Theorem 4.1, we propose

atestin(7) which provides the nontrivial upper bound in the third statement of Theorem 4.1.

Though this bound matches the corresponding lower bound up to factors of η, the difference

is rather large with respect to η. This is an artefact of the proof of Lemma 4.3 which is based

on Chebyshev’s inequality, and its effect can also be seen in the two thresholds

√

and

3/2

deﬁned in (7), which can be very high for small η limiting the practical usefulness of the test.

Below, we show that this can be improved using more reﬁned concentration inequalities, but

provides a sufﬁcient condition that is weaker by a factor of ln n.

ROPOSITION 4.4 (Improving dependence on η). Consider the two sample problem of

Theorem 4.1 and assume m ≥2. Deﬁne the test

(9) 



= 1













·1





σ>t









where



μ,



σ are as in (5)–(6). There exist constants t

, t

, C and C



such that for any η ∈ (0, 1)

and δ ∈ (0, 1), the test in (9) has a risk at most η whenever

(10) ρ ≥ C ln







nδ

















ROOF SKETCH. The proof is very close in spirit to that of Lemma 4.3 above, but it is

based on a much more involved concentration statement than Chebyshev’s inequality (stated

in the Supplementary Material). As a result, the test is based on the same test statistic



and

differs from (7) only in the choice of thresholds.

More speciﬁcally, due to the fact that



μ and



are sums of products of sums with strong

independence properties, we can derive concentration inequalities with logarithmic depen-

dence on η for them by repeated application of Bernstein’s inequality. However, this comes

with the additional ln n factor which can be seen in equation (10) above. 

A key feature of both tests is that they are adaptive, that is, they do not require speciﬁcation

of the sparsity parameter δ. We highlight the importance of this property in the following

remark.

EMARK 4.5 (Adaptivity of proposed tests). The testing problem in (2)–(3)isdeﬁned

with respect to the sparsity parameter δ, which in turn governs the minimax separation rate

∗

. It is not hard to convince one that for any P , Q, it is impossible to estimate P 

max

∨

Q

max

from few observations (small m), and setting δ = 1 is clearly suboptimal for sparse

graph models. Hence, it is desirable to construct tests that do not require knowledge of δ,and

both tests in (7)and(9) are adaptive in this sense. Adaptivity of these tests are achieved by

estimating P + Q

, which is a lower bound for 2nδ.

2218 GHOSHDASTIDAR, GUTZEIT, CARPENTIER AND VON LUXBURG

5. Testing for separation in operator norm. In this section, we study the two sample

testing problem where d(P,Q) =P −Q

, and provide bounds on the minimax separation

∗

for all m ≥ 1. An interesting ﬁnding of this section is that one can obtain a nontrivial

minimax separation rate even for m = 1, that is, the problem is indeed solvable even with a

single observation from each model. Our main result is the following.

HEOREM 5.1 (ρ

∗

for operator norm separation). Consider the two-sample problem (2)–

(3) with d(P,Q) =P − Q

, and any m ≥1, δ ∈ (0, 1). Let η ∈ (0, 1) and 





√

∧

There exist constants C,C



≥ 1 such that:

nδ

≤ ρ

∗

≤ nδ for δ ≤



2

16mn

, and

2. 





nδ

≤ ρ

∗

≤



nδ



ln(

)









ln(

)) otherwise.

The theorem shows that the problem is not solvable in the ultra-sparse regime, that is,

δ 

. However, beyond this regime there is a nontrivial separation rate, which Theorem 5.1

ﬁnds up to a factor of ln n. It is natural to ask whether the additional logarithmic factor is

necessary. Later in this section, we reﬁne Theorem 5.1 to remove the lnn term in the upper

bound (see Corollary 5.5). This is achieved by using a nonadaptive test, which has prior

knowledge of δ (see Proportion 5.4).

5.1. Proof of Theorem 5.1. The lower bounds in the theorem are due to the following

necessary condition.

EMMA 5.2 (Necessary condition for detecting operator norm separation). For the t e st-

ing problem (2)–(3) with d(P,Q) =P −Q

, δ ∈(0, 1) and m ≥ 1, and for any η ∈ (0, 1),

the minimax risk (4) is at least η if

ρ<

nδ







nδ

ROOF SKETCH. The proof follows the technique of Baraud (2002) to derive minimax

lower bounds as used in the previous results in this paper. Hence, we mainly focus on the

choice of P , Q used in the present proof. We set P = Q corresponding to ER model with

edge probability

under H

.ForH

,wesetthesameP ,butQ is randomly chosen in the

following way. We partition the vertices randomly into two groups, and set Q

+γ for

i, j belonging to the same group, and Q

− γ otherwise. The random choice of Q is

due to randomly sampling one of 2

n−1

possible splits of the vertex set.

One can see that for each choice of Q,wehaveP −Q

= γ(n−1). Hence, a choice of

γ ∈ (

n−1

] implies that the pair of (P , Q) for every choice of Q lies in 

. Now similar to

the proof of Lemma 4.2, we show for any m ≥ 1andδ 

, there is an appropriate choice

γ<

for which the random choice of (P , Q) ∈ 

cannot be distinguished from the null

case. If δ 

, then the same situation occurs even for the choice γ =

, which leads to the

claim. 

We now prove the upper bounds in Theorem 5.1. For this, we note that P − Q

≤

P − Q

row

≤ nδ is the trivial upper bound for ρ

∗

. To prove the nontrivial upper bound, we

consider the following test:

(11) 

= 1



S

−





S



row







·1





row





TWO-SAMPLE TESTING FOR RANDOM GRAPHS 2219

where

(12) S

−



k=1

−A

and S



k=1

We have the following sufﬁcient condition for the two-sample test 

EMMA 5.3 (Sufﬁcient condition for detecting operator norm separation). Consider the

testing problem with d(P,Q) =P − Q

and the test 

in (11). There exist abso-

lute constants t

, t

, C and C



such that for any η ∈ (0, 1), δ ∈ (0, 1) and m ≥ 1, the risk

R(

,n,m,d,ρ,δ)≤ η if

ρ ≥ C



nδ













ROOF SKETCH. The proof mainly controls the type-I and type-II error rates for 

.We

achieve this through the matrix Bernstein inequality (Oliveira (2008), Tropp (2012)), which

in the present case guarantees that



−

−m(P − Q)







mP + Q

row

ln n

with high probability if P + Q

row



lnn

. We also use a Bernstein-type concentration re-

sult to ensure that S



row

 mP + Q

row

for P + Q

row



lnn

,andS



row

 ln n

otherwise.

We bound the type-I error by noting that under

,thatisP = Q, the above concentration

results imply that with high probability:

•S

−







S



row

ln n for P + Q

row



ln n

,and

•S



row

 ln n for P + Q

row



ln n

Hence, we have 

= 0 with high probability for a suitable choices of t

, t

On the other hand, we control the type-II error in the following way. Under

,wehave

P − Q

>ρwith ρ ≥



ln(

), which implies that

P + Q

row

≥P − Q



ln n

So the second indicator in 

is guaranteed to be 1 for C



large enough. This condition

also allows us to use the matrix Bernstein inequality which, along with the reverse triangle

inequality, gives



−



≥ mP − Q

−



−

−m(P − Q)



 mρ −



mP + Q

row

ln n.

We use the stated assumption ρ ≥ C



nδ

ln(

) and the fact S



row

 mP + Q

row

prove that the ﬁrst indicator in 

is also true with high probability for large enough C.This

bounds the type-II error rate. 

To conclude the proof of Theorem 5.1, we note that the upper bound in the second state-

ment is obtained by observing that



≤









nδ

for δ ≥



2

16mn

. We note that the above analysis

of the test 

is based on the matrix Bernstein inequality (Tropp (2012)), which makes the

upper bound worse factor of ln n. One may alternatively use other concentration results based

2220 GHOSHDASTIDAR, GUTZEIT, CARPENTIER AND VON LUXBURG

on reﬁnements of the trace method (Bandeira and van Handel (2016), Lu and Peng (2013))

to obtain slightly different sufﬁcient conditions in Lemma 5.3. For instance, use of Bandeira

and van Handel ((2016), Corollary 3.9) does provide a rate of



nδ

, but such bounds hold

only if δ 

polylog(n)

5.2. An optimal nonadaptive test. One of the objectives of this work is to determine

whether it is possible to test between two large graphs, that is, the case of m = 1. Theorem 5.1

provides an afﬁrmative answer to this question, but our proposed test 

(11) is not optimal

since the sufﬁcient condition on ρ is worse than the necessary condition by a factor of lnn.As

we note above, this is a consequence of our use of matrix Bernstein inequality in the proof of

Lemma 5.3, and leads to the question whether one can improve the result by using more sharp

concentration techniques known in the case of random graphs. We show that this is indeed

true,atleastform = 1, and can be shown using concentration of trimmed or regularised

adjacency matrices (Le, Levina and Vershynin (2017)).

Assume m = 1, and let G ∼ IER(P ) and H ∼ IER(Q) be the two random graphs. Also

assume that δ is speciﬁed. For some constant c ≥ 6, let A



be the adjacency matrix of the

graph obtained by deleting all edges in G that are incident on vertices with degree larger than

cnδ ln(

). Similarly, we obtain A



from H . Deﬁne a test as

(13) 



= 1







−A





√

nδ ln





for some constant t>0. We have the following guarantee for 



ROPOSITION 5.4 (Optimality for m =1). Consider the testing problem with d(P,Q) =

P −Q

, m =1 and δ>

. There exist absolute constants c, t such that for any η ∈ (0, 1),

the maximum risk of 



is at most η if

ρ ≥ 2t

√

nδ ln





Hence, assuming η is ﬁxed, ρ

∗



√

nδ for δ>

whereas ρ

∗

 nδ for δ ≤

ROOF SKETCH. The proof differs from that of Lemma 5.3 in the use of a different ma-

trix concentration inequality. We rely on the concentration of the trimmed adjacency matrix

(Le, Levina and Vershynin (2017)). In the present context, the result states that for δ>

and G ∼ IER(P ) with P 

max

≤ δ,





−P



≤ C

√

nδ ln





with probability 1 −

for some absolute constant C>0.

For P = Q, we have with probability 1 −





−A





≤





−P





−Q



≤ 2C

√

nδ ln





and hence, setting t = 2C ensures that the type-I error rate is smaller than

. Similarly, it

follows that under





−A





≥P −Q

−





−P





−Q



≥ ρ −2C

√

nδ ln





TWO-SAMPLE TESTING FOR RANDOM GRAPHS 2221

with probability 1 −

. Hence, A



−A





exceeds the threshold for ρ>4C

√

nδ ln

(

and we have a similar bound on the type-II error rate. 

While 



indeed achieves optimality, it comes at the cost of prior knowledge of δ.As

we note in Remark 4.5, this is an unreasonable restriction and hence, the suboptimal test 

(11) may be preferable due to its adaptivity. We do not know if an adaptive optimal operator

norm based test can be constructed since existing concentration bounds for sparse IER graphs

are typically in terms of the largest edge probability, which is hard to estimate.

5.3. Improving the bounds in Theorem 5.1. As discussed above, the nonadaptive test 



helps in improving the upper bound on ρ

∗

for m = 1. We have not studied whether 



(13)

and Proposition 5.4 extend to the case of m>1. However, the following corollary shows that

the results of Sections 4 and 5 can be combined to remove the undesirable ln n factor in the

upper bound of ρ

∗

in Theorem 5.1 for any m ≥ 1. For convenience, let the allowable risk η

be a constant, which we ignore in the statement of the result.

OROLLARY 5.5 (Improved bounds on ρ

∗

for operator norm separation). Consider the

two-sample problem (2)–(3) with d(P,Q) =P − Q

, and any m ≥ 1, δ ∈ (0, 1). There

exists a constant C>0 such that

∗

 nδ for δ ≤

, and ρ

∗





nδ

otherwise.

ROOF. The lower bounds on ρ

∗

follow from Theorem 5.1, and hence it remains to prove

that

∗

 nδ





nδ

for all m ≥ 1.

We claim that this is a direct consequence of the upper bounds in Proposition 5.4 and Theo-

rem 4.1. To see this, consider the following test:

 = 



for m = 1, and  = 

for m ≥ 2.

For m =1, Proposition 5.4 leads to the above stated upper bound.

For m ≥ 2, note that given δ and ρ, the two-sample problems under Frobenius and operator

norm separation have the same null set 

(3), whereas the sets under alternative hypotheses

are





(P , Q) :P − Q

>ρ,P 

max

∨Q

max

≤ δ



, and





(P , Q) :P − Q

>ρ,P 

max

∨Q

max

≤ δ



We use the relation P − Q

≥P − Q

to observe that 

⊂ 

. As a consequence,

the maximum risk of 

under the two different distances can be related as





,n,m,·

,ρ,δ



≤ R





,n,m,·

,ρ,δ



that is, if ρ is ﬁxed and 

is used for the problem of testing difference in operator norm,

then it achieves a smaller worst-case risk than what it achieves for the the problem of testing

difference in Frobenius norm. This implies that the minimax separation for operator norm is

at most the minimax separation for Frobenius norm. The claim now follows from the upper

bounds in Theorem 4.1 for m ≥2. 

2222 GHOSHDASTIDAR, GUTZEIT, CARPENTIER AND VON LUXBURG

6. Discussion. In this section, we discuss related graph two-sample problems, and com-

ment on practical two-sample tests.

6.1. A note on the sparsity condition. The notion of sparsity has no formal deﬁnition in

the context of random graphs, unlike sparsity in the signal detection literature. The informal

deﬁnition of a sparse graph is a graph where the number of edges are not arbitrarily large

compared to the number of vertices n. The sparsity condition used in (3), that is, P 

max

≤

δ, is one approach to deﬁne sparse graphs. More precisely, this condition implies that the

expected number of edges is at most n

δ, and in particular, for δ 

, we obtain graphs

where the number of edges grows linearly with n. However, one can induce sparsity through

alternative restrictions on P :

(i)



≤ δ

,(iii) P 

row

≤ δ

(ii) P 

≤ δ

,(iv) P 

≤ δ

among others. We note the δ

’s are of different order than δ used in (3). The ﬁrst condition

bounds the expected number of edges, while condition (iii) provides graphs with bounded ex-

pected degrees. The last restriction is along the lines of the signal detection literature since it

implies that at most δ

entries in P are nonzero, and results in random graphs with absolutely

bounded number of edges (not only in expectation).

It is natural to ask to whether our results also extend to the case where sparsity is controlled

through any one of the conditions. We do not provide a complete characterisation in each case,

but present two corollaries of Theorems 4.1 and 5.1 which show that some of our results easily

extend to alternative notions of sparsity.

OROLLARY 6.1 (ρ

∗

under Frobenius norm sparsity). Consider the problem of testing

between the following two hypotheses sets:





P = Q, P 

≤ δ



, and





P − Q

>ρ,P 

∨Q

≤ δ



The bounds on minimax separation stated in Theorem 4.1 hold in this case with the substitu-

tion δ

= nδ.

ROOF SKETCH. The proof of the corollary follows immediately from that of Theo-

rem 4.1. The substitution δ

= nδ is a consequence of the relation P 

≤ nP 

max

. 

We also have a result in the case of graphs with bounded expected degrees, which can

be similarly derived from Theorem 5.1. We do not know if the more precise rates given in

Corollary 5.5 can be extended to this setting.

OROLLARY 6.2 (ρ

∗

under row sum norm sparsity). Consider the problem of testing

between the following two hypotheses sets:





P = Q, P 

row

≤ δ



, and





P − Q

>ρ,P 

row

∨Q

row

≤ δ



The bounds on minimax separation stated in Theorem 5.1 hold in this case with the substitu-

tion δ

= nδ.

TWO-SAMPLE TESTING FOR RANDOM GRAPHS 2223

6.2. On the practicality of proposed tests. The theme of this paper has been to explore

different separation criteria for which the graph two sample testing problem can be solved

for a small population size. In the process of addressing this question, we suggest adaptive

tests 

(7), 



(9)and

(11) that also turn out to be near-optimal for the problems of

detecting separation in Frobenius or operator norms. However, the practical applicability of

these tests have not been discussed so far.

We note that in practice, one is more interested in the testing problem P = Q vs. P = Q,

and hence, a more basic question that needs to be addressed is—Which separation criterion

should be used? The ﬁndings of this paper suggest that for m =1, operator norm separation

is a possible choice, whereas other distances like total variation distance and Frobenius norm

should not be considered. For m>1 but small, we show that one could detect separation

in Frobenius norm using 

(7) or detect separation in operator norm using 

(11). We

compare the relative merits of both tests in terms of sample complexity in the following way.

EMARK 6.3 (Comparison between 

and 

). Consider P and Q such that P = Q

and P 

max

∨Q

max

≤ δ. Ignoring constants and terms involving η, Lemma 4.2 shows that

m 

nδ

P −Q

samples are necessary to distinguish between the two models. On the other

hand, Lemma 5.2 shows that m 

nδ

P −Q

samples are needed to distinguish between P , Q.

Since P − Q

≥P − Q

, testing for Frobenius norm separation is easier than testing

for separation in operator norm for m>1. In other words, one may expect 

to have a

smaller risk than 

However, the tests 

, 



and 

require n to be very large or the graphs to be dense

to achieve a small risk, and hence have limited applicability in moderate-sized problems.

It is known that the practical applicability of concentration based tests can be improved by

using bootstrapped variants (Gretton et al. (2012)), which approximate the null distribution

by generating bootstrap samples through random mixing of the two populations. Simulations,

not included in this paper, show that permutation based bootstrapping provides a reasonable

rejection rate for moderate sample size (m ≥ 10), but such bootstrapping is not effective for

smaller m, for instance m = 2. Furthermore, the relative merit 

(or 



) over 

,as

suggested by Remark 6.3, could not be veriﬁed in case of the bootstrapped variants.

When m = 1 and the population adjacency is of low rank, Tang et al. (2017a) suggest an

alternative bootstrapping principle based on estimation of the population adjacency P and

then drawing bootstrap samples from estimate of P . Simulations in Ghoshdastidar and von

Luxburg (2018) show that this procedure works to some extent for dense IER graphs but only

when P has a small (known) rank. When the rank is unknown or, more generally, if P does

not have a low rank, such bootstrapped tests are not necessarily reliable.

In the related work (Ghoshdastidar and von Luxburg (2018)), we explore alternative pos-

sibilities for constructing practical tests derived from Frobenius norm or operator norm based

test statistics that work even for m = 1, 2. These tests use statistics similar to the ones studied

in the present work, but are based on asymptotic null distributions that hold approximately

or under stronger assumptions. For instance, Ghoshdastidar and von Luxburg (2018)show

that under

and assumptions on the edge density of the graphs, the Frobenius norm based

statistic



(see (5)–(6)) is asymptotically dominated by a standard normal random variable

as n →∞. Based on this, we propose an asymptotic distribution based test that is powerful

for all m ≥2 and moderately sparse graphs, and is reliable even in the case of real networks.

For m = 1, we consider the operator norm of re-scaled version of A

−A

, which approx-

imately follows the Tracy–Widom law under

as n →∞. The practical applicability of

these tests stem from the fact that they do not explicitly rely on concentration inequalities

2224 GHOSHDASTIDAR, GUTZEIT, CARPENTIER AND VON LUXBURG

that lead to large thresholds, as in the present paper, nor do they use bootstrapping strate-

gies, which often require large sample sizes or assumptions on the graph model. Thus, our

related work provides more practically useful tests whose statistical guarantees hold either

approximately or under additional assumptions.

6.3. Minimax separation under structural assumptions. The present paper studies the

two-sample problem for IER graphs, where the population adjacency matrices do not have

any structural restriction. In other words, we study a hypothesis testing problem in a dimen-

sion of





with a sample size of m. Under this broad framework, Theorem 4.1 shows that the

minimax separation in Frobenius norm ρ

∗

is given by

∗



⎧

⎪

⎨

⎪

⎩

nδ for m = 1, and

nδ





nδ

for m>1.

Similarly, Corollary 5.5 shows that the minimax separation in operator norm

∗

 nδ





nδ

for all m ≥ 1.

It is natural to ask if these rates decrease if we further impose structural assumptions on the

population adjacencies thereby effectively reducing the problem dimension. In this section,

we provide some initial results in this direction. We impose the structural assumptions by

restricting the possible values for the population adjacency matrices. Formally, we deﬁne



⊂ M

as the set of symmetric matrices in [0, 1]

n×n

, whose diagonal entries are zero

and satisfy additional structural assumptions (speciﬁed below). Subsequently, the graph two-

sample problem, restricted to a special graph class, can be stated as the problem of testing

between

: (P , Q) ∈ 

∩(



) vs H

: (P , Q) ∈ 

∩(



where 

and 

are as deﬁned in the original problem (2)–(3).

We begin with the most simple case of Erd

os–Rényi (ER) graphs, where the restricted

set



corresponds to the symmetric matrices whose diagonal entries are zero and all off-

diagonal entries are identical. Note that each distribution is modelled by a single parameter,

and hence, we have a hypothesis testing problem in one dimension. We present following

result on minimax separation for Frobenius and operator norms in this setting. We simplify

the statement by ignoring absolute constants including the allowable risk η, which is assumed

to be ﬁxed.

ROPOSITION 6.4 (Minimax separation for testing ER graphs). Consider the graph two-

sample problem restricted to ER graphs with speciﬁed n, m and δ. The minimax separation

rates for Frobenius norm ρ

∗

and for operator norm ρ

∗

satisfy

∗

 ρ

∗

 nδ





for all m ≥ 1.

In particular, in the sparsity regime δ 

, the minimax separation is much smaller than

the corresponding rates for IER graphs.

The implementations for the practical tests proposed in Ghoshdastidar and von Luxburg (2018)aswell

as bootstrapped variants of tests studied in the present paper are available at: https://github.com/gdebarghya/

Network-TwoSampleTesting

TWO-SAMPLE TESTING FOR RANDOM GRAPHS 2225

PROOF SKETCH. The proof is relatively simple, and borrows ideas from the Theo-

rem 4.1. Hence, we only provide a brief sketch of the proof.

We consider the two-sample problem with G

,...,G

∼

iid

ER(p) and H

,...,H

∼

iid

ER(q),wherep, q ≤ δ denote the edge probabilities in either models. Note that the result can

be equivalently stated as: the minimax separation between p and q is δ ∧



To derive a lower bound, we use the approach stated in the previous results and deﬁne p, q

as follows. Under

,wesetp = q =

whereas under H

,weletp =

and q is randomly

selected from {p −γ,p+γ } for some γ ≤

. It turns out that γ 

∧



is an appropriate

choice, which corresponds to the claimed lower bound for minimax separation.

The upper bound is obtained by using a simple test that compares the edge densities esti-

mated from the two population. Deﬁne a test

 = 1





p −



q| >t



·1



(



p +



q) > t



where



p =





i<j

)

and



q =





i<j

)

are the two edge density

estimates, and t

, t

are suitably deﬁned thresholds. The proof strategy of Lemma 4.3 com-

bined with judicious choice of thresholds provides the desired upper bound claim in the result.



We next increase the complexity of the problem by considering stochastic block model

with 2 classes (2-SBM). This class of graphs has been studied in the context of one-sample

hypothesis testing, particularly for detecting community structure in graphs and estimating

the number of communities in a block model (Bickel and Sarkar (2016), Gao and Lafferty

(2017), Lei (2016)). We consider typical 2-SBM graphs, which are characterised by a binary

vector denoting the communities of the nodes as well as the within community and inter-

community edge probabilities. Formally, this is represented by the set



⊂ M

of matrices

which can be transformed to have a 2 classes block structure through row/column permuta-

tions. Furthermore, the off-diagonal entries in each matrix can take at most two distinct val-

ues. It is easy to observe that each matrix is governed by n + 2 parameters, and the problem

has a dimension much smaller than





-dimensional IER problem. However, the following re-

sult shows that the minimax separation does not decrease in this case compared to the general

IER setting.

ROPOSITION 6.5 (Minimax separation for testing 2-SBM graphs). Consider the graph

two-sample problem restricted to 2-SBM graphs with speciﬁed n, m and δ. The minimax

separation for operator norm is

∗

 nδ





nδ

for all m ≥ 1.

For minimax separation in Frobenius norm ρ

∗

, the above rate hold only for m ≥ 2. Fo r

m = 1, we have loose bounds nδ ∧

√

nδ  ρ

∗

 nδ.

Hence, the minimax separation for 2-SBM is similar to the corresponding rates for IER

graphs (with possible exception of ρ

∗

for m = 1).

ROOF SKETCH. We ﬁrst note that the testing problem is a restriction of the original

IER testing problem, and hence, the upper bounds of ρ

∗

and ρ

∗

, derived in Theorem 4.1 and

Corollary 5.5, respectively, also hold in this case. Hence, we only need to prove the lower

bounds

∗

 nδ





nδ

and ρ

∗

 nδ





nδ

2226 GHOSHDASTIDAR, GUTZEIT, CARPENTIER AND VON LUXBURG

for all m ≥ 1. This lower bound on ρ

∗

follows directly from the proof of Lemma 5.2. Recall

that the construction used to prove Lemma 5.2 was essentially a case of distinguishing a 2-

SBM from an ER, where the latter is also a special case of a 2-SBM. Hence, the same proof

works in the present case as well, and the claimed lower bound for ρ

∗

holds.

The lower bound for ρ

∗

also follows from the same construction and computing the Frobe-

nius norm distance between the choice of population adjacency matrices used in proof of

Lemma 5.2. 

Proposition 6.5 may have important consequences since it apparently implies that for any

model that is more complex than the simple 2-SBM, the two-sample problem is as difﬁcult

as the general setting of IER graphs. However, a more in-depth study may be required before

a strong claim can be made in this context. For instance, one should take into account the

fact that the broad literature on graph clustering and stochastic block model often require

an additional assumption of balanced community size. It would be interesting to understand

whether the presence of balanced communities also simplify the detection/testing problems.

In a broader context, one should note that Propositions 6.4 and 6.5 only provide minimax

rates for the Frobenius and operator norm distances for these special classes of IER graphs.

On the other hand, Proposition 3.1 and 3.2 demonstrate the limitation of distribution based

distances when no restriction is imposed on IER model. Hence, there is a possibility that, un-

der special models, total variation and KL-divergence based tests are as useful or even better

than Frobenius or operator norm based tests. Insights into these questions, along with mini-

max rates for related models such as k-SBM and random dot product graphs, may provide a

clear understanding of the problem of testing graphs on a common set of vertices.

6.4. Extensions. Several extensions of the two sample problem (2)–(3) can be studied.

Earlier in this section, we have discussed the possibility of considering alternative notions of

sparsity. Another interesting, and practically signiﬁcant, extension is to the case of directed

graphs. The problem naturally extends to this framework, and the proposed adaptive tests

easily tackle this generalisation without any critical modiﬁcation. For instance, in the case



, one merely needs to deﬁne



μ and



σ as a summation over all off-diagonal terms and the

thresholds change only by constant factors. The analysis of such tests as well as the minimax

lower bounds can be easily derived from our proofs. The same conclusion is true for the case

of operator norm separation, particularly, when the upper bounds are derived based on 

and the matrix Bernstein inequality.

In this paper, we only consider the problem of identity testing, that is, P = Q or d(P,Q) >

ρ. One may also study the more general problem of closeness testing, which ignores small

differences between the models, that is, one tests between the hypotheses





d(P,Q) ≤ ,P 

max

∨Q

max

≤ δ



, and





d(P,Q) > ρ,P 

max

∨Q

max

≤ δ



for some pre-speciﬁed <ρ. The proposed tests, which are primarily based on the princi-

ple of estimating d(P,Q) may be easily adapted to this setting by appropriately modifying

the test thresholds. However, it is not clear whether the minimax separation bounds in Theo-

rems 4.1 and 5.1 easily extend to this setting as well.

From a practical perspective, one may face a more general problem of two sample graph

testing, where the graphs are not deﬁned on a common set of vertices and may even be of

different sizes. This situation is generally hard to study, but tests for this problem are often

used in many applications, where one typically computes some scalar or vector function from

each graph and comments on the difference between two graph populations based on this

function (Stam et al. (2007)). We study this principle in a recent work (Ghoshdastidar et al.

TWO-SAMPLE TESTING FOR RANDOM GRAPHS 2227

(2017)), and propose a formal framework for testing between any two random graphs through

the means of a network function f :

≥n

→ M that maps the space of graphs on at least n

vertices to some metric space

M. We argue that if the network function concentrates for

some sub-class of random graphs as n →∞, then one can indeed construct two sample tests

based on the network function. However, such a test cannot distinguish between equivalence

classes, that is, random graph models that behave identically under the mapping f .

Acknowledgements. We are grateful to the Associate Editor and the anonymous review-

ers for their comments and suggestions, which have greatly improved the technical content

and the exposition of the paper.

The ﬁrst author was supported in part by the Deutsche Forschungsgemeinschaft (DFG,

FOR1735), the Institutional Strategy of the University of Tübingen (DFG, ZUK 63) and by

the Baden-Württemberg Stiftung through the Eliteprogramm for Postdocs.

The second author was supported by the Deutsche Forschungsgemeinschaft (DFG) Emmy

Noether grant MuSyAD (CA 1488/1-1).

The third author was supported in part by the DFG Emmy Noether grant MuSyAD

(CA 1488/1-1), by the DFG-314838170, GRK 2297 MathCoRe, by the DFG GRK 2433

DAEDALUS (384950143/GRK2433), by the DFG CRC 1294 “Data Assimilation”, Project

A03 and by the UFA-DFH through the French–German Doktorandenkolleg CDFA 01-18.

The fourth author was supported in part by the DFG FOR1735, the Institutional Strategy

of the University of Tübingen (DFG, ZUK 63) and DFG Cluster of Excellence “Machine

Learning—New Perspectives for Science” EXC 2064/1, project number 390727645.

SUPPLEMENTARY MATERIAL

Supplement: Appendix to “Two-sample hypothesis testing for inhomogeneous ran-

dom graphs” (DOI: 10.1214/19-AOS1884SUPP; .pdf). The supplementary material

(Ghoshdastidar et al. (2020)) contains the detailed proofs of the lemmas and the proposi-

tions stated in Sections 3, 4 and 5.

REFERENCES

ABBE,E.andSANDON, C. (2016). Achieving the KS threshold in the general stochastic block model with

linearized acyclic belief propagation. In Advances in Neural Information Processing Systems 29.

LBERT,R.andBARABÁSI, A.-L. (2002). Statistical mechanics of complex networks. Rev. Modern Phys. 74

47–97. MR1895096 https://doi.org/10.1103/RevModPhys.74.47

RIAS-CASTRO,E.,BUBECK,S.andLUGOSI, G. (2015). Detecting positive correlations in a multivariate sam-

ple. Bernoulli 21 209–241. MR3322317 https://doi.org/10.3150/13-BEJ565

RIAS-CASTRO,E.andVERZELEN, N. (2014). Community detection in dense random networks. Ann. Statist.

42 940–969. MR3210992 https://doi.org/10.1214/14-AOS1208

AI,Z.andSARANADASA, H. (1996). Effect of high dimension: By an example of a two sample problem. Statist.

Sinica 6 311–329. MR1399305

ANDEIRA,A.S.andVAN HANDEL, R. (2016). Sharp nonasymptotic bounds on the norm of random matrices

with independent entries. Ann. Probab. 44 2479–2506. MR3531673 https://doi.org/10.1214/15-AOP1025

ARAUD, Y. (2002). Non-asymptotic minimax rates of testing in signal detection. Bernoulli 8 577–606.

MR1935648

ERGER,N.,BORGS,C.,CHAYES,J.T.andSABERI, A. (2005). On the spread of viruses on the Internet. In

Proceedings of the Sixteenth Annual ACM-SIAM Symposium on Discrete Algorithms 301–310. ACM, New

Yor k. MR2298278

ERTHET,Q.andRIGOLLET, P. (2013). Optimal detection of sparse principal components in high dimension.

Ann. Statist. 41 1780–1815. MR3127849 https://doi.org/10.1214/13-AOS1127

ICKEL,P.J.andSARKAR, P. (2016). Hypothesis testing for automated community detection in networks. J. R.

Stat. Soc. Ser. B. Stat. Methodol. 78 253–273. MR3453655 https://doi.org/10.1111/rssb.12117

OLLOBÁS,B.,JANSON,S.andRIORDAN, O. (2007). The phase transition in inhomogeneous random graphs.

Random Structures Algorithms 31 3–122. MR2337396 https://doi.org/10.1002/rsa.20168

2228 GHOSHDASTIDAR, GUTZEIT, CARPENTIER AND VON LUXBURG

RESLER,G.andNAGARAJ, D. (2018). Optimal single sample tests for structured versus unstructured network

data. In Conference on Learning Theory. PMLR 75 1657–1690.

UBECK,S.,DING,J.,ELDAN,R.andRÁCZ, M. Z. (2016). Testing for high-dimensional geometry in random

graphs. Random Structures Algorithms 49 503–532. MR3545825 https://doi.org/10.1002/rsa.20633

AI,T.T.,LIU,W.andXIA, Y. (2014). Two-sample test of high dimensional means under dependence. J. R.

Stat. Soc. Ser. B. Stat. Methodol. 76 349–372. MR3164870 https://doi.org/10.1111/rssb.12034

ARPENTIER,A.andNICKL, R. (2015). On signal detection and conﬁdence sets for low rank inference problems.

Electron. J. Stat. 9 2675–2688. MR3432430 https://doi.org/10.1214/15-EJS1087

HAN,S.-O.,DIAKONIKOLAS,I.,VALIANT,G.andVALIANT, P. (2014). Optimal algorithms for testing close-

ness of discrete distributions. In Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete

Algorithms 1193–1203. ACM, New York. MR3376448 https://doi.org/10.1137/1.9781611973402.88

HEN,S.X.andQIN, Y.-L. (2010). A two-sample test for high-dimensional data with applications to gene-set

testing. Ann. Statist. 38 808–835. MR2604697 https://doi.org/10.1214/09-AOS716

ASKALAKIS,C.,DIKKALA,N.andKAMATH, G. (2018). Testing Ising models. In Proceedings of the

Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms 1989–2007. SIAM, Philadelphia, PA.

MR3775918 https://doi.org/10.1137/1.9781611975031.130

ECELLE,A.,KRZAKALA,F.,MOORE,C.andZDEBOROVÁ, L. (2011). Asymptotic analysis of the stochastic

block model for modular networks and its algorithmic applications. Phys. Rev. E 84 066106.

ONOHO,D.andJIN, J. (2004). Higher criticism for detecting sparse heterogeneous mixtures. Ann. Statist. 32

962–994. MR2065195 https://doi.org/10.1214/009053604000000265

ONOHO,D.andJIN, J. (2015). Higher criticism for large-scale inference, especially for rare and weak effects.

Statist. Sci. 30 1–25. MR3317751 https://doi.org/10.1214/14-STS506

AO,C.andLAFFERTY, J. (2017). Testing network structure using relations between small subgraph probabili-

ties. Preprint. Available at arXiv:1704.06742.

HOSHDASTIDAR,D.andVON LUXBURG, U. (2018). Practical methods for graph two-sample testing. In Ad-

vances in Neural Information Processing Systems 31.

HOSHDASTIDAR,D.,GUTZEIT,M.,CARPENTIER,A.andVON LUXBURG, U. (2017). Two-sample tests for

large random graphs using network statistics. In Conference on Learning Theory. PMLR 65 954–977.

HOSHDASTIDAR,D.,GUTZEIT,M.,CARPENTIER,A.andVON LUXBURG, U. (2020). Supplement to “Two-

sample hypothesis testing for inhomogeneous random graphs.” https://doi.org/10.1214/19-AOS1884SUPP.

INESTET,C.E.,LI,J.,BALACHANDRAN,P.,ROSENBERG,S.andKOLACZYK, E. D. (2017). Hypothesis test-

ing for network data in functional neuroimaging. Ann. Appl. Stat. 11 725–750. MR3693544 https://doi.org/10.

1214/16-AOAS1015

RETTON,A.,BORGWARDT,K.M.,RASCH,M.J.,SCHÖLKOPF,B.andSMOLA, A. (2012). A kernel two-

sample test. J. Mach. Learn. Res. 13 723–773. MR2913716

YDUKE,D.R.,LEWIS,N.E.andPALSSON, B. (2013). Analysis of omics data with genome-scale models of

metabolism. Mol. BioSyst. 9 167–174.

NGSTER,YU.I.andSUSLINA, I. A. (2003). Nonparametric Goodness-of-Fit Testing Under Gaussian Models.

Lecture Notes in Statistics 169. Springer, New York. MR1991446 https://doi.org/10.1007/978-0-387-21580-8

LOPP,O.,TSYBAKOV,A.B.andVERZELEN, N. (2017). Oracle inequalities for network models and sparse

graphon estimation. Ann. Statist. 45 316–354. MR3611494 https://doi.org/10.1214/16-AOS1454

ONDOR,R.andPAN, H. (2016). The multiscale Laplacian graph kernel. In Advances in Neural Information

Processing Systems.

ANDMAN,B.A.,HUANG,A.J.,GIFFORD,A.,VIKRAM,D.S.,LIM,I.A.,FARRELL,J.A.,BOGOVIC,J.A.,

UA,J.,CHEN, M. et al. (2011). Multi-parametric neuroimaging reproducibility: A 3-T resource study. Neu-

roImage 54 2854–2866.

E,C.M.,LEVINA,E.andVERSHYNIN, R. (2017). Concentration and regularization of random graphs. Random

Structures Algorithms 51 538–561. MR3689343 https://doi.org/10.1002/rsa.20713

EDOIT,O.andWOLF, M. (2002). Some hypothesis tests for the covariance matrix when the dimension is

large compared to the sample size. Ann. Statist. 30 1081–1102. MR1926169 https://doi.org/10.1214/aos/

1031689018

EI, J. (2016). A goodness-of-ﬁt test for stochastic block models. Ann. Statist. 44 401–424. MR3449773

https://doi.org/10.1214/15-AOS1370

OVÁSZ, L. (2012). Large Networks and Graph Limits. American Mathematical Society Colloquium Publications

60. Am. Math. Soc., Providence, RI. MR3012035 https://doi.org/10.1090/coll/060

U,L.andPENG, X. (2013). Spectra of edge-independent random graphs. Electron. J. Combin. 20 Paper 27.

MR3158266

UKHERJEE,R.,PILLAI,N.S.andLIN, X. (2015). Hypothesis testing for high-dimensional sparse binary

regression. Ann. Statist. 43 352–381. MR3311863 https://doi.org/10.1214/14-AOS1279

TWO-SAMPLE TESTING FOR RANDOM GRAPHS 2229

OLIVEIRA, R. I. (2009). Concentration of the adjacency matrix and of the Laplacian in random graphs with

independent edges. Preprint. Available at arXiv:0911.0600.

AMDAS,A.,REDDI,S.,POCZOS,B.,SINGH,A.andWASSERMAN, L. (2015). Adaptivity and computation-

statistics tradeoffs for kernel and distance based high dimensional two sample testing. Preprint. Available at

arXiv:1508.00655.

YABKO, D. (2017). Hypotheses testing on inﬁnite random graphs. In International Conference on Algorithmic

Learning Theory. Proc. Mach. Learn. Res.(PMLR) 76 1–12. Proceedings of Machine Learning Research

PMLR. MR3776703

HERVASHIDZE,N.,SCHWEITZER,P.,VAN LEEUWEN,E.J.,MEHLHORN,K.andBORGWARDT,K.M.

(2011). Weisfeiler–Lehman graph kernels. J. Mach. Learn. Res. 12 2539–2561. MR2845672

TAM,C.J.,JONES,B.F.,NOLTE,G.,BREAKSPEAR,M.andSCHELTENS, P. (2007). Small-world networks

and functional connectivity in Alzheimer’s disease. Cereb. Cortex 17 92–99.

UN,X.andNOBEL, A. B. (2008). On the size and recovery of submatrices of ones in a random binary matrix.

J. Mach. Learn. Res. 9 2431–2453. MR2460888

ANG,M.,ATHREYA,A.,SUSSMAN,D.L.,LYZINSKI,V.,PARK,Y.andPRIEBE, C. E. (2017a). A semi-

parametric two-sample hypothesis testing problem for random graphs. J. Comput. Graph. Statist. 26 344–354.

MR3640191 https://doi.org/10.1080/10618600.2016.1193505

ANG,M.,ATHREYA,A.,SUSSMAN,D.L.,LYZINSKI,V.andPRIEBE, C. E. (2017b). A nonparamet-

ric two-sample hypothesis testing problem for random graphs. Bernoulli 23 1599–1630. MR3624872

https://doi.org/10.3150/15-BEJ789

ROPP, J. A. (2012). User-friendly tail bounds for sums of random matrices. Found. Comput. Math. 12 389–434.

MR2946459 https://doi.org/10.1007/s10208-011-9099-z

ERZELEN,N.andARIAS-CASTRO, E. (2015). Community detection in sparse random networks. Ann. Appl.

Probab. 25 3465–3510. MR3404642 https://doi.org/10.1214/14-AAP1080

ERZELEN,N.andARIAS-CASTRO, E. (2017). Detection and feature selection in sparse mixture models. Ann.

Statist. 45 1920–1950. MR3718157 https://doi.org/10.1214/16-AOS1513

ISHWANATHAN,S.V.N.,SCHRAUDOLPH,N.N.,KONDOR,R.andBORGWARDT, K. M. (2010). Graph

kernels. J. Mach. Learn. Res. 11 1201–1242. MR2645450 https://doi.org/10.1093/chemse/bjq147