The Annals of Statistics
2020, Vol. 48, No. 4, 2208–2229
https://doi.org/10.1214/19-AOS1884
© Institute of Mathematical Statistics, 2020
TWO-SAMPLE HYPOTHESIS TESTING FOR INHOMOGENEOUS
RANDOM GRAPHS
B
Y DEBARGHYA GHOSHDASTIDAR
1,*
,MAURILIO GUTZEIT
2,
,
A
LEXANDRA CARPENTIER
2,
AND ULRIKE VON LUXBURG
1,**
1
Department of Computer Science, University of Tübingen,
*
**
2
Faculty of Mathematics, Otto von Guericke University Magdeburg,
The study of networks leads to a wide range of high-dimensional infer-
ence problems. In many practical applications, one needs to draw inference
from one or few large sparse networks. The present paper studies hypothesis
testing of graphs in this high-dimensional regime, where the goal is to test
between two populations of inhomogeneous random graphs defined on the
same set of n vertices. The size of each population m is much smaller than n,
and can even be a constant as small as 1. The critical question in this context
is whether the problem is solvable for small m.
We answer this question from a minimax testing perspective. Let P , Q be
the population adjacencies of two sparse inhomogeneous random graph mod-
els, and d be a suitably defined distance function. Given a population of m
graphs from each model, we derive minimax separation rates for the problem
of testing P = Q against d(P,Q) > ρ. We observe that if m is small, then the
minimax separation is too large for some popular choices of d, including to-
tal variation distance between corresponding distributions. This implies that
some models that are widely separated in d cannot be distinguished for small
m, and hence, the testing problem is generally not solvable in these cases.
We also show that if m>1, then the minimax separation is relatively small
if d is the Frobenius norm or operator norm distance between P and Q.For
m = 1, only the latter distance provides small minimax separation. Thus, for
these distances, the problem is solvable for small m. We also present near-
optimal two-sample tests in both cases, where tests are adaptive with respect
to sparsity level of the graphs.
1. Introduction. Analysis of random graphs has piqued the curiosity of probabilists
since its inception decades ago, but the widespread use of networks in recent times has made
statistical inference from random graphs a topic of immense interest for both theoretical and
applied researchers. This has caused a fruitful interplay between theory and practice lead-
ing to deep understanding of statistical problems that, in turn, has led to advancements in
applied research. Significant progress is clearly visible in problems related to network mod-
elling (Albert and Barabási (2002), Lovász (2012)), community detection (Abbe and Sandon
(2016), Decelle et al. (2011)), network dynamics (Berger et al. (2005)) among others, where
statistically guaranteed methods have emerged as effective practical solutions. Quite surpris-
ingly, the classical problem of hypothesis testing of random graphs is yet to benefit from such
joint efforts from theoretical and applied researchers. It should be noted the problem itself is
actively studied in both communities. Testing between brain or “omics” networks have sur-
faced as a crucial challenge in the context of both modelling and decision making (Ginestet
et al. (2017), Hyduke, Lewis and Palsson (2013)). On the other hand, phase transitions are
Received June 2018; revised February 2019.
MSC2020 subject classifications. Primary 62H15; secondary 62C20, 05C80, 60B20.
Key words and phrases. Two-sample test, inhomogeneous Erd
˝
os–Rényi model, minimax testing.
2208
TWO-SAMPLE TESTING FOR RANDOM GRAPHS 2209
now known for the problems of detecting high-dimensional geometry or strongly connected
groups in large random graphs (Arias-Castro and Verzelen (2014), Bubeck et al. (2016)).
However, little progress has been made in the design of consistent tests for general models of
random graphs. The present paper takes a step towards addressing this general concern.
While research on testing large random graphs has been limited, hypothesis testing in
large dimension is an integral part of modern statistics literature. In fact, Bai and Saranadasa
(1996) demonstrated the need for studying high-dimensional statistics through a two-sample
testing problem, where one tests between two n-variate normal distributions with different
means by accessing m i.i.d. observations from either distributions, where m n (we de-
note the dimension by n since, in the context of graphs, the number of vertices n governs
the dimensionality of the problem). More recent works in this direction provide tests and
asymptotic guarantees as m →∞and
n
m
→∞(Cai, Liu and Xia (2014), Chen and Qin
(2010), Ramdas et al. (2015)). Similar studies also exist in the context of testing whether a n-
dimensional covariance matrix is identity, where only m n i.i.d. observations of the data is
available (Arias-Castro, Bubeck and Lugosi (2015), Berthet and Rigollet (2013), Ledoit and
Wolf (2002)). It is also known that the computational complexity of this problem is closely
related to the clique detection problem in random graphs (Berthet and Rigollet (2013)). An
extreme version of high-dimensional testing arises in the signal detection literature, where
one observes a high-dimensional (Gaussian) signal, and tests the nullity of its mean. Subse-
quently, asymptotics of the problem is considered for n →∞but xed sample size (m = 1
or a constant), and minimax separation rates between the null and the alternative hypotheses
are derived (Baraud (2002), Ingster and Suslina (2003), Mukherjee, Pillai and Lin (2015),
Verzelen and Arias-Castro (2017)). The signal detection problem has also been extended in
the case of matrices in the context of trace regression (Carpentier and Nickl (2015)) and
sub-matrix detection (Sun and Nobel (2008)) among others, where the latter generalises the
planted clique detection problem.
In practice, the problem of testing random graphs comes in a wide range of flavours. For
instance, while dealing with graphs associated with chemical compounds (Shervashidze et al.
(2011)) or brain networks of several patients collected at multiple laboratories (Ginestet et al.
(2017)), one has access to a large number of graphs (large m). This scenario is more amenable
as one can resort to the vast literature of nonparametric hypothesis testing that can even be ap-
plied to random graphs. A direct approach to this problem is to use kernel based tests (Gretton
et al. (2012)) in conjunction with graph kernels (Kondor and Pan (2016), Vishwanathan et al.
(2010
)), which does not require any structural assumptions on the network models. However,
kno
wn guarantees for such tests depend crucially on the sample size, and one cannot conclude
about the fidelity of such tests for very small m. A more challenging situation arises when
m is small, and unfortunately, this is often the case in network analysis. For example, if the
graphs correspond to brain networks collected from patients in a single lab setup (m<20),
brain networks of one individual obtained from test-retest MRI scans (Landman et al. (2010)),
or molecular interaction networks arising from genomic or proteomic data (Hyduke, Lewis
and Palsson (2013)). Test-retest data of a patient provide only m = 2 networks, while omics
data typically result in one large interaction network, that is, m = 1. Hence, designing two-
sample tests for small populations for graphs is a problem of immense significance, and yet,
practical tests with statistical guarantees are rather limited.
From a theoretical perspective, only a handful of results on testing large random graphs
are known. While finding hidden cliques have been a long-standing open problem, Arias-
Castro and Verzelen (2014), Verzelen and Arias-Castro (2015) for the first time provide a
characterisation of the more basic problem of detecting a planted clique in an Erd
˝
os–Rényi
graph. Gao and Lafferty (2017)andLei (2016) consider generalised variants of this problem
while designing tests to distinguish a stochastic block model from an Erd
˝
os–Rényi graph, or
2210 GHOSHDASTIDAR, GUTZEIT, CARPENTIER AND VON LUXBURG
to estimate the number of communities in a stochastic block model, respectively. In a dif-
ferent direction, Bubeck et al. (2016) study the classical problem of testing whether a given
graph corresponds to a neighbourhood graph in a high-dimensional space, or it is generated
from Erd
˝
os–Rényi model. This result is in fact a specific instance of the generic problem of
detecting whether a network data has a dependence structure or is unstructured (Bresler and
Nagaraj (2018), Daskalakis, Dikkala and Kamath (2018), Ryabko (2017)). The first study on
two-sample testing of graphs, under a relatively broad framework is by Tang et al. (2017a),
where the authors test between a pair of random dot product graphs which are undirected
graphs on a common set of n vertices with mutually independent edges and low-rank popu-
lation adjacency matrices. A test statistic based on the difference in adjacency spectral em-
beddings is shown to be asymptotically consistent as n →∞provided that the rank of the
population adjacencies is fixed and known. Tang et al. (2017b) study a more general problem
of comparing two random dot product graphs defined on different vertex sets, where the ver-
tices have latent Euclidean representations. The latent representation can be recovered from
the adjacency spectral embedding, and kernel two-sample testing for Euclidean data can be
employed to solve the testing problem. Ghoshdastidar et al. (2017) provide a framework to
formulate the graph two-sample problem with minimal structural restrictions, and show that
this general framework can be used to prove minimax optimality of tests based on triangle
counts and spectral properties under special cases.
In this paper, we restrict ourselves to graphs on a common vertex set of size n and sampled
from an inhomogeneous Erd
˝
os–Rényi (IER) model (Bollobás, Janson and Riordan (2007)),
that is, we consider undirected and unweighted random graphs where the edges occur in-
dependently, but there is no structural assumption on the population adjacency matrix. We
study the problem of testing between two IER models, where m i.i.d. graphs are observed for
each model. Apparently, allowing m 1 appears to be a slight generalisation of the m = 1
case (Tang et al. (2017a)), but we show that in some situations, the testing problem behaves
differently in the m = 1andm>1 cases. It is also well established that many graph learn-
ing problems have different behaviour in the case of dense and sparse graphs. This is indeed
true in the context of testing for geometric structures (Bubeck et al. (2016)) and community
detection (Verzelen and Arias-Castro (2015)). Bearing this in mind, we study the two-sample
problem at different levels of sparsity of the graph. A formal description of the problem is
presented in Section 2.
Given the above framework, one may resort to a variety of testing procedures. A clas-
sical approach involves viewing the problem as an instance of closeness testing for high-
dimensional discrete distributions (Chan et al. (2014), Daskalakis, Dikkala and Kamath
(2018)) and using variants of the χ
2
-test or related localisation procedures. On the other
hand, exploiting the independence of edges, one may even view the problem as a instance of
multiple testing, and may resort to tests based on higher criticism (Donoho and Jin (2004),
Donoho and Jin (2015)). A more direct approach may be to simply compare the adjacency
spectral embeddings (Tang et al. (2017a)), or other network statistics (Ghoshdastidar et al.
(2017)) or even the raw adjacency matrices. Sections 35 present a variety of testing prob-
lems, which differ in terms of the distance d(P,Q), that is, how we quantify the separation
between the two models. We show that some of the above principles are not useful for small
m since the associated testing problems are generally unsolvable in these cases. However, in
some cases, one can construct uniformly consistent tests that work with a small number of
observation, even m =1. Section 6 discusses the practicality of graph two-sample testing and
also presents minimax separation under special cases of the IER model, such as Erd
˝
os–Rényi
or stochastic block models, or under different notions of sparsity in graphs. The detailed
proofs are provided in the Supplementary Material (Ghoshdastidar et al. (2020)).
TWO-SAMPLE TESTING FOR RANDOM GRAPHS 2211
2. Problem statement. In this section, we formally state the generic two-sample graph
testing problem studied in this paper. We also present the minimax framework that forms the
basis of our theoretical analysis.
We use the notation and to denote the standard inequalities but ignoring absolute
constants. Further, we use to denote that two quantities are same up to possible difference
in constant scaling. We use and (or
and
) to denote minimum and maximum, re-
spectively. We also need several standard norms and distances. For two discrete distributions,
we denote the total variation distance by TV(·, ·) and the symmetric Kullback–Leibler (KL)
divergence by SKL(·, ·). The latter is a symmetrised version of KL-divergence (Daskalakis,
Dikkala and Kamath (2018)). We use the following quantities for any matrix:
(i) Frobenius norm, ·
F
, is the root of sum of squares of all entries,
(ii) max norm, ·
max
, is largest absolute entry of the matrix,
(iii) zero norm, ·
0
, is the number of nonzero entries,
(iv) operator norm, ·
op
, is the largest singular value of the matrix, and
(v) row sum norm, ·
row
, (or, the induced -norm) is the maximum absolute row sum
of the matrix.
2.1. The model and the testing problem. Throughout the paper, V ={1, 2,...,n} de-
notes a set of n vertices, and we consider undirected graphs defined on V . Any such graph
can be expressed as G = (V , E
G
),whereE
G
is the set of undirected edges. We use the
symmetric matrix A
G
∈{0, 1}
n×n
to denote the adjacency matrix of G,where(A
G
)
ij
= 1
if (i, j ) E
G
, and 0 otherwise. The class of inhomogeneous random graphs, or more pre-
cisely inhomogeneous Erd
˝
os–Rényi (IER) graphs, on V can be described as follows. Let
M
n
⊂[0, 1]
n×n
be the set of symmetric matrices with zero diagonal, and off-diagonal entries
in [0, 1].ForanyP M
n
, we say that G is an IER graph with population adjacency P ,de-
noted by G IER(P ), if the adjacency matrix A
G
is a symmetric random matrix such that
(A
G
)
ij
Bernoulli
01
(P
ij
),and((A
G
)
ij
)
1i<jn
are independent.
Let P,Q M
n
.Givenm independent observations from each of IER(P ) and IER(Q),we
would like to test between the alternatives
(1)
H
0
: P = Q and H
1
: d(P,Q) > ρ
for some specified distance function d and a threshold ρ 0. At this stage, note that the
distribution IER(P ) is completely characterised by the expected adjacency matrix P . Hence,
H
0
in (1) is similar both under the mean difference alternative and the general difference
alternative (Ramdas et al. (2015)). Hence, one may assume d to be either a distance between
the distributions IER(P ) and IER(Q), or a matrix distance between P and Q. Different
examples of d are considered in Sections 35, which result in specific instances of the testing
problems.
We note that the complexity of graph inference problems is often governed by the sparsity
of the graphs. To take the effect of sparsity into account, we restrict the problem to models
such that P
max
∨Q
max
δ for some δ (0, 1] where δ may decay with m, n. Intu-
itively, we consider only graphs that are uniformly sparse, that is any edge can occur with
probability at most δ. For instance, if δ
1
n
, we mostly observe sparse graphs with bounded
expected degrees. Such a uniform sparsity restriction is along the lines of a scalar sparsity
parameters introduced in some graph estimation problems (Klopp, Tsybakov and Verzelen
(2017)). More general notions of sparsity may be considered as discussed later in Section 6.
Based on the above considerations, we formally state the following general framework for
graph two-sample testing:
(2)
H
0
: (P , Q)
0
vs. H
1
: (P , Q)
1
,
2212 GHOSHDASTIDAR, GUTZEIT, CARPENTIER AND VON LUXBURG
where
(3)
0
=
(P , Q) M
n
×M
n
: P = Q, P
max
δ
,
1
=
(P , Q) M
n
×M
n
: d(P,Q) > ρ,P
max
∨Q
max
δ
.
Note that the hypotheses are governed by the distance d, the integers m, n, and the positive
scalars ρ, δ, where the last two terms may depend on m, n.
2.2. Minimax framework. Given the graphs G
1
,...,G
m
iid
IER(P ) and H
1
,...,
H
m
iid
IER(Q),atest is a binary function of 2m adjacency matrices, where = 0
when the test accepts
H
0
,and = 1 otherwise. The maximum or worst-case risk of a test is
given by
R(, n,m,d,ρ, δ) = sup
θ
0
P
θ
( = 1) + sup
θ
1
P
θ
( = 0),
which is the sum of maximum possible Type-I and Type-II error rates incurred by the test.
Here, we use θ to denote any tuple (P , Q). The minimax risk for the problem in (2)–(3)is
defined as
(4) R
(n,m,d)= inf
R(, n,m,d,ρ, δ).
Our aim in this paper is to find the minimax separation ρ
for a given problem, which
is the smallest possible ρ such that R
(n,m,d) η for some prespecified η (0, 1).
In the subsequent sections, we consider testing problems where the separation between P
and Q is defined in terms of various distance functions. We provide bounds for ρ
for the
different testing problems in terms of the various parameters of the problem (2)–(3). Though
our formal bounds are explicit in terms of η, we generally assume η to be a pre-specified
constant (e.g., η = 0.05) and focus on the dependence of ρ
on n, m and δ.Ouraimisto
provide upper and lower bounds for ρ
that are same up to difference in absolute constants
and functions of η.
3. Challenges of testing with small sample size. The theme of this paper is to test
between two populations of sparse graphs, where the sample size m is much smaller than the
number of vertices n. Our main interest is in cases where m is a small constant, or may grow
very slowly with n. In this section, we show that in case of some popular distance functions,
the testing problem is nearly unsolvable if m is small.
We formalise the notion of unsolvability in the following way. For any instance of the two-
sample testing problem (2)–(3), there is a trivial upper bound for ρ
which is the maximal
possible value that can be attained by d(P,Q) (diameter with respect to d). As an example, if
d(P,Q) =TV(IER(P ), IER(Q)),thenρ
1 trivially. Similarly, if d(P,Q) =P Q
F
,
a trivial upper bound is ρ
. On the other hand, for small m, if there is a lower bound
ρ
ρ
such that ρ
is equal or close to the trivial upper bound, then there exist model
pairs such that d(P,Q) is nearly as large as the diameter and yet cannot be distinguished for
small m. Hence, we may conclude that the problem (2)–(3) with the specific choice of d is
unsolvable for small m under a worst-case (minimax) analysis.
We present the first instance of such an impossibility result for the case of total variation
distance. For any two probability mass functions p and q, defined on the space of undirected
n-vertex graphs,
TV(p, q) =
1
2
G
p(G) q(G)
,
where the summation is over all unweighted undirected graphs on n vertices. We present the
following minimax rate in the small sample regime.
TWO-SAMPLE TESTING FOR RANDOM GRAPHS 2213
PROPOSITION 3.1 (ρ
for total variation distance). Consider the problem in (2)(3) with
d(P,Q) = TV(IER(P ), IER(Q)) and δ (0, 1). Let η (0, 1) be the allowable risk. For a ny
δ C
lnn
n
2
,
1
1
n
ρ
1 for m C
ln
1 + 4(1 η)
2
n
ln n
,
where C (0, 1) and C
> 1 are absolute constants.
In particular, for large n and m
n
lnn
, we have ρ
1.
P
ROOF SKETCH. The main idea is to find an appropriate choice of (P , Q) such that
TV(IER(P ), IER(Q)) 1
1
n
, and yet they cannot be distinguished using m
n
lnn
samples.
For this, we use standard approaches to derive minimax lower bounds (Baraud (2002)). This
is described in the present context of two-sample testing of IER graphs in Ghoshdastidar et
al. (2020).
In particular for the present proof, the choice of P , Q is the following. Under
H
0
,weset
P = Q such that the model corresponds to Erd
˝
os–Rényi (ER) model with edge probability
δ
2
. Under H
1
, we keep P as before whereas each entry of Q is chosen independent and
uniformly from {
δ
2
γ,
δ
2
+γ }. An appropriate choice of γ
δ
2
leads to the result.
The stated lower bound shows that at least m
n
ln n
samples are needed to test for separa-
tion in total variation distance, which is beyond the small sample regime that we are interested
in. In fact, in the case of constant m, one can improve the stated bound as 1e
C

n
ρ
1,
where C

is a constant that depends on m.
An impossibility result also holds in the case of symmetric KL-divergence, which has
been effectively used for high-dimensional discrete distributions, particularly Ising models
(Daskalakis, Dikkala and Kamath (2018)). For any two probability mass functions p and q,
defined on the space of undirected n-vertex graphs, the symmetric KL-divergence is given by
SKL(p, q) =
G
p(G)ln
p(G)
q(G)
+q(G)ln
q(G)
p(G)
,
where the summation is over all unweighted undirected graphs on n vertices. Note that the
above distance is unbounded even for finite n since SKL(p, q) =∞if there exists G
0
such
that p(G
0
) = 0butq(G
0
) = 0. We present the following result that demonstrates the im-
possibility of testing with respect to the symmetric KL-divergence when sample size m is
small.
P
ROPOSITION 3.2 (ρ
for symmetric KL-divergence). Let η (0, 1) and consider the
problem in (2)(3) with d(P,Q) =SKL(IER(P ), IER(Q)) and any δ (0, 1). Then
ρ
=∞ for m
2
δ
ln
(1 η)n
.
P
ROOF SKETCH. The basic technique for the proof is similar to Proposition 3.1,butwe
choose P and Q such that SKL(IER(P ), IER(Q)) =∞, and yet the models are indistin-
guishable for small m.
To be precise, under
H
0
we set P = Q corresponding to an ER model. Under H
1
,we
set Q to be the same as P except for a randomly chosen entry, for which Q
ij
= 0 = P
ij
.
This implies that IER(P ) and IER(Q) do not have a common support, which leads to
SKL(IER(P ), IER(Q)) =∞. However, if the sample size is small (small m) or the graphs
are sparse (small δ), then the models cannot be distinguished.
2214 GHOSHDASTIDAR, GUTZEIT, CARPENTIER AND VON LUXBURG
The above result shows that testing for separation in symmetric KL-divergence is impos-
sible for m ln n sample even when the graphs are dense (δ 1). However, the situation is
worse in the case of sparse graphs. If δ
1
n
, then at least m n ln n samples are necessary,
which is worse than the condition for total variation distance.
Both Propositions 3.1 and 3.2 suggest that achieving a small sample complexity (small m)
could be difficult under general difference alternatives, that is, if d corresponds to distribu-
tional distances. Hence, subsequent discussions focus only on matrix distances. However,
even in this case, the two-sample problem is not necessarily easily solvable for all distances
or dissimilarities.
P
ROPOSITION 3.3 (ρ
for zero norm/effect rarity). Let η (0, 1) and consider the prob-
lem in (2)(3) with d(P,Q) =P Q
0
and any δ (0, 1). Then
ρ
= n(n 1) for all m<.
P
ROOF SKETCH. The proof is straightforward since the entries P and Q can be ar-
bitrarily close but still be unequal. Hence, the models may not be distinguishable though
P Q
0
= n(n 1), which is the trivial upper bound.
Proposition 3.3 may be viewed as a trivial extremity of the rare/weak effect studied in
the context of multiple testing (Donoho and Jin (2015)). To put it simply, here we view the
problem as testing P
ij
= Q
ij
or P
ij
= Q
ij
for every i<j, and the edge independence in
IER graphs leads to a problem of multiple independent comparisons. Proposition 3.3 states
that if min
P
ij
=Q
ij
|P
ij
Q
ij
| is arbitrarily small, that is, the individual effects are arbitrarily
weak, then they cannot be detected even when the effects are dense P Q
0
= n(n 1).
A more detailed analysis of the rare/weak effect in the sparse Bernoulli setting may be done
by imposing a threshold min
P
ij
=Q
ij
|P
ij
Q
ij
| that characterises weakness of the effect. We
do not discuss further on this effect, and instead, we proceed to other instances of (2)–(3),
where the problem can be solved for small m.
4. Testing for separation in Frobenius norm. The previous section focused on impos-
sibility results, where ρ
is typically large (close to trivial upper bound) when m is small.
In this section and the next one, we study two instances of the problem (2)–(3)wheretests
can be constructed even for small m. Formally, we show that the lower bound for ρ
can be
much smaller than the trivial upper bound, and subsequently, we propose two-sample tests to
derive nearly matching upper bounds for ρ
.
We first quantify the separation in terms of Frobenius norm, that is, d(P,Q) =P Q
F
.
This is equivalent to viewing the adjacencies as
n
2
-dimensional Bernoulli vectors, and using
two-sample test for high-dimensional vectors—a well-studied problem in the Gaussian case
(Chen and Qin (2010)). We state the following bounds for the minimax separation ρ
.
T
HEOREM 4.1 (ρ
for Frobenius norm separation). Consider the two-sample problem
(2)(3) with d(P,Q) =P Q
F
, any δ (0, 1) and any η (0, 1). There exist absolute
constants C
1
,C
2
1 such that:
1.
4
ρ
for m =1,
2. (
1
4
η
2
η
8C
1
)nδ ρ
for m>1 and δ
C
1
η
2
mn
, and
3.
η
8
m
ρ
C
2
η
m
for m>1 and δ
C
1
η
2
mn
,
TWO-SAMPLE TESTING FOR RANDOM GRAPHS 2215
where
η
=
ln(1 + 4(1 η)
2
). Hence, assuming the allowable risk η is fixed, we have ρ
for m = 1 and ρ
m
for m 2.
Theorem 4.1 provides a clear characterisation of the minimax separation ρ
(up to factors
of η) when the distance between models is in terms of Frobenius norm. The second and
third statements deal with the case of m>1. In the ultra-sparse regime, that is δ
1
mn
, one
observes a total of only O(n) edges from the entire population of 2m graphs generated from
either models. This information is insufficient for testing equality of models, and hence, it
is not surprising that ρ
, which is the trivial upper bound. On the other hand, when
δ
1
mn
, we find a nontrivial separation rate indicating that the problem is solvable in this
case.
The surprising finding of Theorem 4.1 is that ρ
for m = 1, which informally means
that the problem is not solvable when one observes only m = 1 sample from each model.
This result is significant since it shows that the problem of testing for separation in Frobenius
norm is unsuitable in the setting of comparing between two large networks, for instance, the
case of testing between two omics networks.
4.1. Proof of Theorem 4.1. We provide an outline of the proof of the above result high-
lighting the key technical lemmas. We sketch their proofs here, and the detailed proofs can
be found in Ghoshdastidar et al. (2020). To prove the lower bounds, we have the following
result.
L
EMMA 4.2 (Necessary conditions for detecting Frobenius norm separation). Fo r th e
testing problem (2)(3) with d(P,Q) =P Q
F
and for any η (0, 1), the minimax risk
(4) is at least η if either of the following conditions hold:
(i<
4
η
8
m
, or (ii)m= 1<
4
.
P
ROOF SKETCH. The proof follows the basic approach of Proposition 3.1, and also uses
same choice of P , Q.WesetP = Q corresponding to ER model with edge probability
δ
2
under H
0
.ForH
1
, we set the same P , but each entry of Q is chosen independent and
uniformly from {
δ
2
γ,
δ
2
+γ }. One can easily see that Q is chosen uniformly from a set of
2
n(n1)/2
matrices, but for each choice of Q,wehaveP Q
F
. Hence, a choice of
γ (
ρ
n
,
δ
2
] implies that the pair of (P , Q) for every choice of Q lies in
1
.
Subsequently, we use the techniques of Baraud (2002) to show that for m 2andδ
1
mn
,
there is an appropriate choice γ<
δ
2
for which the random choice of (P , Q)
1
cannot
be distinguished from the null case. If δ
1
mn
, then the same situation occurs even for the
choice γ =
δ
2
. Finally, for m = 1, we observe that the same proof leads to the conclusion that
the random choice of (P , Q)
1
is indistinguishable from the null case for any choice of
γ
δ
2
. In particular, γ =
δ
2
leads to the claim in (ii).
We elaborate on the distinction between the cases m = 1andm>1. Consider the former
case of m = 1 under
H
1
, where we have G
1
IER(P ) = ER(
δ
2
) and H
1
IER(Q) with
Q being chosen randomly as described above. Due the uniform choice of Q, one can easily
verify that the probability of each edge in H
1
is
δ
2
, which is same as that of G
1
. Hence,
although the two graphs are sampled from different generative models with P Q
F
,
they are essentially similar due to the random choice of Q. On the other hand, let m = 2,
G
1
,G
2
iid
ER(
δ
2
) and H
1
,H
2
iid
IER(Q) with Q being random as before. Although H
1
,
H
2
are independent conditioned on the choice of Q, the two graphs are mutually dependent
2216 GHOSHDASTIDAR, GUTZEIT, CARPENTIER AND VON LUXBURG
without the knowledge of Q and hence, the population {H
1
,H
2
} does not have the same
distribution as {G
1
,G
2
}.
We continue with the proof of Theorem 4.1. The lower bounds in the second and third
statements of Theorem 4.1 follow from condition (i) above by accounting for the conditions
on δ and noting C
1
1and
η
ln 5. For the upper bounds in first two statements, note
that ρ
trivially holds since P Q
F
n(P
max
∨Q
max
). To derive the upper
bound for the third case, we construct the following two-sample test. Let A
G
1
,...,A
G
m
and
A
H
1
,...,A
H
m
be the adjacency matrices of the 2m graphs. We define
μ =
n
i,j=1
i<j
km/2
(A
G
k
)
ij
(A
H
k
)
ij

k>m/2
(A
G
k
)
ij
(A
H
k
)
ij
,(5)
σ =
n
i,j=1
i<j
km/2
(A
G
k
)
ij
+(A
H
k
)
ij

k>m/2
(A
G
k
)
ij
+(A
H
k
)
ij
,(6)
and consider the test
(7)
F
= 1
μ
σ
>
t
1
η
·1
σ>
t
2
η
3/2
for some positive constants t
1
, t
2
,where1{·} is the indicator function. We state the following
guarantee for
F
.
L
EMMA 4.3 (Sufficient conditions for detecting Frobenius norm separation). Consider
the testing problem with d(P,Q) =P Q
F
and the test
F
in (7). There exist absolute
constants t
1
, t
2
, C and C
such that for any η (0, 1) and δ (0, 1), if
(8) m 2 and ρ
C
η
m
C
η
3/2
m
,
then R(
F
,n,m,d) η.
P
ROOF SKETCH. The proof is based on concentration statements for
μ and
σ derived
via Chebyshev’s inequality using
μ =
E[
μ]=
m
2
8
P Q
2
F
2
= E
σ
2
=
m
2
8
P + Q
2
F
and bounds for Var[
μ] and Var[
σ
2
] in terms of P Q
F
and P + Q
F
.
For the type-I-error, that is, under
H
0
, these concentration bounds lead to large enough
choices for t
1
and t
2
such that at least one of the events
μ
σ
>
t
1
η
and
σ>
t
2
η
3/2
has small probability, which bounds the probability of the event {
F
= 1}. On the other
hand, by construction, in order to control the type-II-error rate we want to ensure that both
the events
μ
σ
t
1
η
and
σ
t
2
η
3/2
TWO-SAMPLE TESTING FOR RANDOM GRAPHS 2217
have small probability under H
1
. Now, the requirement in (8)onρ guarantees that
μ σ
1
8η
3/2
,
which allows us to rewrite these two events as large deviation statements of
μ and
σ
2
from
their means as required.
To conclude the proof of Theorem 4.1, observe that if δ
C
2
2
mn
, then the
m
term in the
above result dominates and we obtain the upper bound in the third statement of Theorem 4.1
with C
1
=
C
2
C
. Hence the theorem holds.
4.2. Further remarks on Theorem 4.1. As part of the proof of Theorem 4.1, we propose
atestin(7) which provides the nontrivial upper bound in the third statement of Theorem 4.1.
Though this bound matches the corresponding lower bound up to factors of η, the difference
is rather large with respect to η. This is an artefact of the proof of Lemma 4.3 which is based
on Chebyshev’s inequality, and its effect can also be seen in the two thresholds
t
1
η
and
t
2
η
3/2
defined in (7), which can be very high for small η limiting the practical usefulness of the test.
Below, we show that this can be improved using more refined concentration inequalities, but
provides a sufficient condition that is weaker by a factor of ln n.
P
ROPOSITION 4.4 (Improving dependence on η). Consider the two sample problem of
Theorem 4.1 and assume m 2. Define the test
(9)
F
= 1
μ
σ
>t
1
ln
2
η
ln
n
η

·1
σ>t
2
ln
2
2
η
ln
n
η

,
where
μ,
σ are as in (5)(6). There exist constants t
1
, t
2
, C and C
such that for any η (0, 1)
and δ (0, 1), the test in (9) has a risk at most η whenever
(10) ρ C ln
2
η
m
ln
n
η
C
m
ln
2
2
η
ln
n
η
.
P
ROOF SKETCH. The proof is very close in spirit to that of Lemma 4.3 above, but it is
based on a much more involved concentration statement than Chebyshev’s inequality (stated
in the Supplementary Material). As a result, the test is based on the same test statistic
μ
σ
and
differs from (7) only in the choice of thresholds.
More specifically, due to the fact that
μ and
σ
2
are sums of products of sums with strong
independence properties, we can derive concentration inequalities with logarithmic depen-
dence on η for them by repeated application of Bernstein’s inequality. However, this comes
with the additional ln n factor which can be seen in equation (10) above.
A key feature of both tests is that they are adaptive, that is, they do not require specification
of the sparsity parameter δ. We highlight the importance of this property in the following
remark.
R
EMARK 4.5 (Adaptivity of proposed tests). The testing problem in (2)–(3)isdened
with respect to the sparsity parameter δ, which in turn governs the minimax separation rate
ρ
. It is not hard to convince one that for any P , Q, it is impossible to estimate P
max
Q
max
from few observations (small m), and setting δ = 1 is clearly suboptimal for sparse
graph models. Hence, it is desirable to construct tests that do not require knowledge of δ,and
both tests in (7)and(9) are adaptive in this sense. Adaptivity of these tests are achieved by
estimating P + Q
F
, which is a lower bound for 2.
2218 GHOSHDASTIDAR, GUTZEIT, CARPENTIER AND VON LUXBURG
5. Testing for separation in operator norm. In this section, we study the two sample
testing problem where d(P,Q) =P Q
op
, and provide bounds on the minimax separation
ρ
for all m 1. An interesting finding of this section is that one can obtain a nontrivial
minimax separation rate even for m = 1, that is, the problem is indeed solvable even with a
single observation from each model. Our main result is the following.
T
HEOREM 5.1 (ρ
for operator norm separation). Consider the two-sample problem (2)
(3) with d(P,Q) =P Q
op
, and any m 1, δ (0, 1). Let η (0, 1) and
η
=
η
8
1
16
.
There exist constants C,C
1 such that:
1.
4
ρ
for δ
2
η
16mn
, and
2.
η
m
ρ
m
(C
ln(
n
η
)
4C
η
ln(
n
η
)) otherwise.
The theorem shows that the problem is not solvable in the ultra-sparse regime, that is,
δ
1
mn
. However, beyond this regime there is a nontrivial separation rate, which Theorem 5.1
finds up to a factor of ln n. It is natural to ask whether the additional logarithmic factor is
necessary. Later in this section, we refine Theorem 5.1 to remove the lnn term in the upper
bound (see Corollary 5.5). This is achieved by using a nonadaptive test, which has prior
knowledge of δ (see Proportion 5.4).
5.1. Proof of Theorem 5.1. The lower bounds in the theorem are due to the following
necessary condition.
L
EMMA 5.2 (Necessary condition for detecting operator norm separation). For the t e st-
ing problem (2)(3) with d(P,Q) =P Q
op
, δ (0, 1) and m 1, and for any η (0, 1),
the minimax risk (4) is at least η if
ρ<
4
η
m
.
P
ROOF SKETCH. The proof follows the technique of Baraud (2002) to derive minimax
lower bounds as used in the previous results in this paper. Hence, we mainly focus on the
choice of P , Q used in the present proof. We set P = Q corresponding to ER model with
edge probability
δ
2
under H
0
.ForH
1
,wesetthesameP ,butQ is randomly chosen in the
following way. We partition the vertices randomly into two groups, and set Q
ij
=
δ
2
+γ for
i, j belonging to the same group, and Q
ij
=
δ
2
γ otherwise. The random choice of Q is
due to randomly sampling one of 2
n1
possible splits of the vertex set.
One can see that for each choice of Q,wehaveP Q
op
= γ(n1). Hence, a choice of
γ (
ρ
n1
,
δ
2
] implies that the pair of (P , Q) for every choice of Q lies in
1
. Now similar to
the proof of Lemma 4.2, we show for any m 1andδ
1
mn
, there is an appropriate choice
γ<
δ
2
for which the random choice of (P , Q)
1
cannot be distinguished from the null
case. If δ
1
mn
, then the same situation occurs even for the choice γ =
δ
2
, which leads to the
claim.
We now prove the upper bounds in Theorem 5.1. For this, we note that P Q
op
P Q
row
is the trivial upper bound for ρ
. To prove the nontrivial upper bound, we
consider the following test:
(11)
op
= 1
S
op
S
+
row
>t
1
ln
n
η

·1
S
+
row
>t
2
ln
n
η

,
TWO-SAMPLE TESTING FOR RANDOM GRAPHS 2219
where
(12) S
=
m
k=1
A
G
k
A
H
k
and S
+
=
m
k=1
A
G
k
+A
H
k
.
We have the following sufficient condition for the two-sample test
op
.
L
EMMA 5.3 (Sufficient condition for detecting operator norm separation). Consider the
testing problem with d(P,Q) =P Q
op
and the test
op
in (11). There exist abso-
lute constants t
1
, t
2
, C and C
such that for any η (0, 1), δ (0, 1) and m 1, the risk
R(
op
,n,m,d) η if
ρ C
m
ln
n
η
C
m
ln
n
η
.
P
ROOF SKETCH. The proof mainly controls the type-I and type-II error rates for
op
.We
achieve this through the matrix Bernstein inequality (Oliveira (2008), Tropp (2012)), which
in the present case guarantees that
S
m(P Q)
op
mP + Q
row
ln n
with high probability if P + Q
row
lnn
m
. We also use a Bernstein-type concentration re-
sult to ensure that S
+
row
mP + Q
row
for P + Q
row
lnn
m
,andS
+
row
ln n
otherwise.
We bound the type-I error by noting that under
H
0
,thatisP = Q, the above concentration
results imply that with high probability:
•S
op
S
+
row
ln n for P + Q
row
ln n
m
,and
•S
+
row
ln n for P + Q
row
ln n
m
.
Hence, we have
op
= 0 with high probability for a suitable choices of t
1
, t
2
.
On the other hand, we control the type-II error in the following way. Under
H
1
,wehave
P Q
op
with ρ
C
m
ln(
n
η
), which implies that
P + Q
row
≥P Q
op
ln n
m
.
So the second indicator in
op
is guaranteed to be 1 for C
large enough. This condition
also allows us to use the matrix Bernstein inequality which, along with the reverse triangle
inequality, gives
S
op
mP Q
op
S
m(P Q)
op
mP + Q
row
ln n.
We use the stated assumption ρ C
m
ln(
n
η
) and the fact S
+
row
mP + Q
row
to
prove that the first indicator in
op
is also true with high probability for large enough C.This
bounds the type-II error rate.
To conclude the proof of Theorem 5.1, we note that the upper bound in the second state-
ment is obtained by observing that
C
m
4C
η
m
for δ
2
η
16mn
. We note that the above analysis
of the test
op
is based on the matrix Bernstein inequality (Tropp (2012)), which makes the
upper bound worse factor of ln n. One may alternatively use other concentration results based
2220 GHOSHDASTIDAR, GUTZEIT, CARPENTIER AND VON LUXBURG
on refinements of the trace method (Bandeira and van Handel (2016), Lu and Peng (2013))
to obtain slightly different sufficient conditions in Lemma 5.3. For instance, use of Bandeira
and van Handel ((2016), Corollary 3.9) does provide a rate of
m
, but such bounds hold
only if δ
polylog(n)
n
.
5.2. An optimal nonadaptive test. One of the objectives of this work is to determine
whether it is possible to test between two large graphs, that is, the case of m = 1. Theorem 5.1
provides an affirmative answer to this question, but our proposed test
op
(11) is not optimal
since the sufficient condition on ρ is worse than the necessary condition by a factor of lnn.As
we note above, this is a consequence of our use of matrix Bernstein inequality in the proof of
Lemma 5.3, and leads to the question whether one can improve the result by using more sharp
concentration techniques known in the case of random graphs. We show that this is indeed
true,atleastform = 1, and can be shown using concentration of trimmed or regularised
adjacency matrices (Le, Levina and Vershynin (2017)).
Assume m = 1, and let G IER(P ) and H IER(Q) be the two random graphs. Also
assume that δ is specified. For some constant c 6, let A
G
be the adjacency matrix of the
graph obtained by deleting all edges in G that are incident on vertices with degree larger than
cnδ ln(
2
η
). Similarly, we obtain A
H
from H . Define a test as
(13)
op
= 1
A
G
A
H
op
>t
ln
2
2
η

for some constant t>0. We have the following guarantee for
op
.
P
ROPOSITION 5.4 (Optimality for m =1). Consider the testing problem with d(P,Q) =
P Q
op
, m =1 and δ>
10
n
. There exist absolute constants c, t such that for any η (0, 1),
the maximum risk of
op
is at most η if
ρ 2t
ln
2
2
η
.
Hence, assuming η is fixed, ρ
for δ>
10
n
whereas ρ
for δ
10
n
.
P
ROOF SKETCH. The proof differs from that of Lemma 5.3 in the use of a different ma-
trix concentration inequality. We rely on the concentration of the trimmed adjacency matrix
(Le, Levina and Vershynin (2017)). In the present context, the result states that for δ>
10
n
and G IER(P ) with P
max
δ,
A
G
P
op
C
ln
2
2
η
with probability 1
η
4
for some absolute constant C>0.
For P = Q, we have with probability 1
η
2
,
A
G
A
H
op
A
G
P
op
+
A
H
Q
op
2C
ln
2
2
η
,
and hence, setting t = 2C ensures that the type-I error rate is smaller than
η
2
. Similarly, it
follows that under
H
1
,
A
G
A
H
op
≥P Q
op
A
G
P
op
+
A
H
Q
op
ρ 2C
ln
2
2
η
,
TWO-SAMPLE TESTING FOR RANDOM GRAPHS 2221
with probability 1
η
2
. Hence, A
G
A
H
op
exceeds the threshold for ρ>4C
ln
2
(
2
η
),
and we have a similar bound on the type-II error rate.
While
op
indeed achieves optimality, it comes at the cost of prior knowledge of δ.As
we note in Remark 4.5, this is an unreasonable restriction and hence, the suboptimal test
op
(11) may be preferable due to its adaptivity. We do not know if an adaptive optimal operator
norm based test can be constructed since existing concentration bounds for sparse IER graphs
are typically in terms of the largest edge probability, which is hard to estimate.
5.3. Improving the bounds in Theorem 5.1. As discussed above, the nonadaptive test
op
helps in improving the upper bound on ρ
for m = 1. We have not studied whether
op
(13)
and Proposition 5.4 extend to the case of m>1. However, the following corollary shows that
the results of Sections 4 and 5 can be combined to remove the undesirable ln n factor in the
upper bound of ρ
in Theorem 5.1 for any m 1. For convenience, let the allowable risk η
be a constant, which we ignore in the statement of the result.
C
OROLLARY 5.5 (Improved bounds on ρ
for operator norm separation). Consider the
two-sample problem (2)(3) with d(P,Q) =P Q
op
, and any m 1, δ (0, 1). There
exists a constant C>0 such that
ρ
for δ
C
mn
, and ρ
m
otherwise.
P
ROOF. The lower bounds on ρ
follow from Theorem 5.1, and hence it remains to prove
that
ρ
m
for all m 1.
We claim that this is a direct consequence of the upper bounds in Proposition 5.4 and Theo-
rem 4.1. To see this, consider the following test:
=
op
for m = 1, and =
F
for m 2.
For m =1, Proposition 5.4 leads to the above stated upper bound.
For m 2, note that given δ and ρ, the two-sample problems under Frobenius and operator
norm separation have the same null set
0
(3), whereas the sets under alternative hypotheses
are
op
1
=
(P , Q) :P Q
op
,P
max
∨Q
max
δ
, and
F
1
=
(P , Q) :P Q
F
,P
max
∨Q
max
δ
.
We use the relation P Q
F
≥P Q
op
to observe that
op
1
F
1
. As a consequence,
the maximum risk of
F
under the two different distances can be related as
R
F
,n,m,·
op
R
F
,n,m,·
F
,
that is, if ρ is fixed and
F
is used for the problem of testing difference in operator norm,
then it achieves a smaller worst-case risk than what it achieves for the the problem of testing
difference in Frobenius norm. This implies that the minimax separation for operator norm is
at most the minimax separation for Frobenius norm. The claim now follows from the upper
bounds in Theorem 4.1 for m 2.
2222 GHOSHDASTIDAR, GUTZEIT, CARPENTIER AND VON LUXBURG
6. Discussion. In this section, we discuss related graph two-sample problems, and com-
ment on practical two-sample tests.
6.1. A note on the sparsity condition. The notion of sparsity has no formal definition in
the context of random graphs, unlike sparsity in the signal detection literature. The informal
definition of a sparse graph is a graph where the number of edges are not arbitrarily large
compared to the number of vertices n. The sparsity condition used in (3), that is, P
max
δ, is one approach to define sparse graphs. More precisely, this condition implies that the
expected number of edges is at most n
2
δ, and in particular, for δ
1
n
, we obtain graphs
where the number of edges grows linearly with n. However, one can induce sparsity through
alternative restrictions on P :
(i)
ij
P
ij
δ
1
,(iii) P
row
δ
3
,
(ii) P
F
δ
2
,(iv) P
0
δ
4
among others. We note the δ
i
s are of different order than δ used in (3). The first condition
bounds the expected number of edges, while condition (iii) provides graphs with bounded ex-
pected degrees. The last restriction is along the lines of the signal detection literature since it
implies that at most δ
4
entries in P are nonzero, and results in random graphs with absolutely
bounded number of edges (not only in expectation).
It is natural to ask to whether our results also extend to the case where sparsity is controlled
through any one of the conditions. We do not provide a complete characterisation in each case,
but present two corollaries of Theorems 4.1 and 5.1 which show that some of our results easily
extend to alternative notions of sparsity.
C
OROLLARY 6.1 (ρ
under Frobenius norm sparsity). Consider the problem of testing
between the following two hypotheses sets:
0
=
P = Q, P
F
δ
2
, and
1
=
P Q
F
,P
F
∨Q
F
δ
2
.
The bounds on minimax separation stated in Theorem 4.1 hold in this case with the substitu-
tion δ
2
= .
P
ROOF SKETCH. The proof of the corollary follows immediately from that of Theo-
rem 4.1. The substitution δ
2
= is a consequence of the relation P
F
nP
max
.
We also have a result in the case of graphs with bounded expected degrees, which can
be similarly derived from Theorem 5.1. We do not know if the more precise rates given in
Corollary 5.5 can be extended to this setting.
C
OROLLARY 6.2 (ρ
under row sum norm sparsity). Consider the problem of testing
between the following two hypotheses sets:
0
=
P = Q, P
row
δ
3
, and
1
=
P Q
op
,P
row
∨Q
row
δ
3
.
The bounds on minimax separation stated in Theorem 5.1 hold in this case with the substitu-
tion δ
3
= .
TWO-SAMPLE TESTING FOR RANDOM GRAPHS 2223
6.2. On the practicality of proposed tests. The theme of this paper has been to explore
different separation criteria for which the graph two sample testing problem can be solved
for a small population size. In the process of addressing this question, we suggest adaptive
tests
F
(7),
F
(9)and
op
(11) that also turn out to be near-optimal for the problems of
detecting separation in Frobenius or operator norms. However, the practical applicability of
these tests have not been discussed so far.
We note that in practice, one is more interested in the testing problem P = Q vs. P = Q,
and hence, a more basic question that needs to be addressed is—Which separation criterion
should be used? The findings of this paper suggest that for m =1, operator norm separation
is a possible choice, whereas other distances like total variation distance and Frobenius norm
should not be considered. For m>1 but small, we show that one could detect separation
in Frobenius norm using
F
(7) or detect separation in operator norm using
op
(11). We
compare the relative merits of both tests in terms of sample complexity in the following way.
R
EMARK 6.3 (Comparison between
F
and
op
). Consider P and Q such that P = Q
and P
max
∨Q
max
δ. Ignoring constants and terms involving η, Lemma 4.2 shows that
m
P Q
2
F
samples are necessary to distinguish between the two models. On the other
hand, Lemma 5.2 shows that m
P Q
2
op
samples are needed to distinguish between P , Q.
Since P Q
F
≥P Q
op
, testing for Frobenius norm separation is easier than testing
for separation in operator norm for m>1. In other words, one may expect
F
to have a
smaller risk than
op
.
However, the tests
F
,
F
and
op
require n to be very large or the graphs to be dense
to achieve a small risk, and hence have limited applicability in moderate-sized problems.
It is known that the practical applicability of concentration based tests can be improved by
using bootstrapped variants (Gretton et al. (2012)), which approximate the null distribution
by generating bootstrap samples through random mixing of the two populations. Simulations,
not included in this paper, show that permutation based bootstrapping provides a reasonable
rejection rate for moderate sample size (m 10), but such bootstrapping is not effective for
smaller m, for instance m = 2. Furthermore, the relative merit
F
(or
F
) over
op
,as
suggested by Remark 6.3, could not be verified in case of the bootstrapped variants.
When m = 1 and the population adjacency is of low rank, Tang et al. (2017a) suggest an
alternative bootstrapping principle based on estimation of the population adjacency P and
then drawing bootstrap samples from estimate of P . Simulations in Ghoshdastidar and von
Luxburg (2018) show that this procedure works to some extent for dense IER graphs but only
when P has a small (known) rank. When the rank is unknown or, more generally, if P does
not have a low rank, such bootstrapped tests are not necessarily reliable.
In the related work (Ghoshdastidar and von Luxburg (2018)), we explore alternative pos-
sibilities for constructing practical tests derived from Frobenius norm or operator norm based
test statistics that work even for m = 1, 2. These tests use statistics similar to the ones studied
in the present work, but are based on asymptotic null distributions that hold approximately
or under stronger assumptions. For instance, Ghoshdastidar and von Luxburg (2018)show
that under
H
0
and assumptions on the edge density of the graphs, the Frobenius norm based
statistic
μ
σ
(see (5)–(6)) is asymptotically dominated by a standard normal random variable
as n →∞. Based on this, we propose an asymptotic distribution based test that is powerful
for all m 2 and moderately sparse graphs, and is reliable even in the case of real networks.
For m = 1, we consider the operator norm of re-scaled version of A
G
1
A
H
1
, which approx-
imately follows the Tracy–Widom law under
H
0
as n →∞. The practical applicability of
these tests stem from the fact that they do not explicitly rely on concentration inequalities
2224 GHOSHDASTIDAR, GUTZEIT, CARPENTIER AND VON LUXBURG
that lead to large thresholds, as in the present paper, nor do they use bootstrapping strate-
gies, which often require large sample sizes or assumptions on the graph model. Thus, our
related work provides more practically useful tests whose statistical guarantees hold either
approximately or under additional assumptions.
1
6.3. Minimax separation under structural assumptions. The present paper studies the
two-sample problem for IER graphs, where the population adjacency matrices do not have
any structural restriction. In other words, we study a hypothesis testing problem in a dimen-
sion of
n
2
with a sample size of m. Under this broad framework, Theorem 4.1 shows that the
minimax separation in Frobenius norm ρ
F
is given by
ρ
F
for m = 1, and
m
for m>1.
Similarly, Corollary 5.5 shows that the minimax separation in operator norm
ρ
op
m
for all m 1.
It is natural to ask if these rates decrease if we further impose structural assumptions on the
population adjacencies thereby effectively reducing the problem dimension. In this section,
we provide some initial results in this direction. We impose the structural assumptions by
restricting the possible values for the population adjacency matrices. Formally, we define
M
n
M
n
as the set of symmetric matrices in [0, 1]
n×n
, whose diagonal entries are zero
and satisfy additional structural assumptions (specified below). Subsequently, the graph two-
sample problem, restricted to a special graph class, can be stated as the problem of testing
between
H
0
: (P , Q)
0
(
M
n
×
M
n
) vs H
1
: (P , Q)
1
(
M
n
×
M
n
),
where
0
and
1
are as defined in the original problem (2)–(3).
We begin with the most simple case of Erd
˝
os–Rényi (ER) graphs, where the restricted
set
M
n
corresponds to the symmetric matrices whose diagonal entries are zero and all off-
diagonal entries are identical. Note that each distribution is modelled by a single parameter,
and hence, we have a hypothesis testing problem in one dimension. We present following
result on minimax separation for Frobenius and operator norms in this setting. We simplify
the statement by ignoring absolute constants including the allowable risk η, which is assumed
to be fixed.
P
ROPOSITION 6.4 (Minimax separation for testing ER graphs). Consider the graph two-
sample problem restricted to ER graphs with specified n, m and δ. The minimax separation
rates for Frobenius norm ρ
F
and for operator norm ρ
op
satisfy
ρ
F
ρ
op
δ
m
for all m 1.
In particular, in the sparsity regime δ
1
n
2
m
, the minimax separation is much smaller than
the corresponding rates for IER graphs.
1
The implementations for the practical tests proposed in Ghoshdastidar and von Luxburg (2018)aswell
as bootstrapped variants of tests studied in the present paper are available at: https://github.com/gdebarghya/
Network-TwoSampleTesting
TWO-SAMPLE TESTING FOR RANDOM GRAPHS 2225
PROOF SKETCH. The proof is relatively simple, and borrows ideas from the Theo-
rem 4.1. Hence, we only provide a brief sketch of the proof.
We consider the two-sample problem with G
1
,...,G
m
iid
ER(p) and H
1
,...,H
m
iid
ER(q),wherep, q δ denote the edge probabilities in either models. Note that the result can
be equivalently stated as: the minimax separation between p and q is δ
1
n
δ
m
.
To derive a lower bound, we use the approach stated in the previous results and define p, q
as follows. Under
H
0
,wesetp = q =
δ
2
whereas under H
1
,weletp =
δ
2
and q is randomly
selected from {p γ,p+γ } for some γ
δ
2
. It turns out that γ
δ
2
1
n
δ
m
is an appropriate
choice, which corresponds to the claimed lower bound for minimax separation.
The upper bound is obtained by using a simple test that compares the edge densities esti-
mated from the two population. Define a test
= 1
|
p
q| >t
1
·1
(
p +
q) > t
2
,
where
p =
1
m
n
2
k
i<j
(A
G
k
)
ij
and
q =
1
m
n
2
k
i<j
(A
H
k
)
ij
are the two edge density
estimates, and t
1
, t
2
are suitably defined thresholds. The proof strategy of Lemma 4.3 com-
bined with judicious choice of thresholds provides the desired upper bound claim in the result.
We next increase the complexity of the problem by considering stochastic block model
with 2 classes (2-SBM). This class of graphs has been studied in the context of one-sample
hypothesis testing, particularly for detecting community structure in graphs and estimating
the number of communities in a block model (Bickel and Sarkar (2016), Gao and Lafferty
(2017), Lei (2016)). We consider typical 2-SBM graphs, which are characterised by a binary
vector denoting the communities of the nodes as well as the within community and inter-
community edge probabilities. Formally, this is represented by the set
M
n
M
n
of matrices
which can be transformed to have a 2 classes block structure through row/column permuta-
tions. Furthermore, the off-diagonal entries in each matrix can take at most two distinct val-
ues. It is easy to observe that each matrix is governed by n + 2 parameters, and the problem
has a dimension much smaller than
n
2
-dimensional IER problem. However, the following re-
sult shows that the minimax separation does not decrease in this case compared to the general
IER setting.
P
ROPOSITION 6.5 (Minimax separation for testing 2-SBM graphs). Consider the graph
two-sample problem restricted to 2-SBM graphs with specified n, m and δ. The minimax
separation for operator norm is
ρ
op
m
for all m 1.
For minimax separation in Frobenius norm ρ
F
, the above rate hold only for m 2. Fo r
m = 1, we have loose bounds
ρ
F
.
Hence, the minimax separation for 2-SBM is similar to the corresponding rates for IER
graphs (with possible exception of ρ
F
for m = 1).
P
ROOF SKETCH. We first note that the testing problem is a restriction of the original
IER testing problem, and hence, the upper bounds of ρ
F
and ρ
op
, derived in Theorem 4.1 and
Corollary 5.5, respectively, also hold in this case. Hence, we only need to prove the lower
bounds
ρ
op
m
and ρ
F
m
2226 GHOSHDASTIDAR, GUTZEIT, CARPENTIER AND VON LUXBURG
for all m 1. This lower bound on ρ
op
follows directly from the proof of Lemma 5.2. Recall
that the construction used to prove Lemma 5.2 was essentially a case of distinguishing a 2-
SBM from an ER, where the latter is also a special case of a 2-SBM. Hence, the same proof
works in the present case as well, and the claimed lower bound for ρ
op
holds.
The lower bound for ρ
F
also follows from the same construction and computing the Frobe-
nius norm distance between the choice of population adjacency matrices used in proof of
Lemma 5.2.
Proposition 6.5 may have important consequences since it apparently implies that for any
model that is more complex than the simple 2-SBM, the two-sample problem is as difficult
as the general setting of IER graphs. However, a more in-depth study may be required before
a strong claim can be made in this context. For instance, one should take into account the
fact that the broad literature on graph clustering and stochastic block model often require
an additional assumption of balanced community size. It would be interesting to understand
whether the presence of balanced communities also simplify the detection/testing problems.
In a broader context, one should note that Propositions 6.4 and 6.5 only provide minimax
rates for the Frobenius and operator norm distances for these special classes of IER graphs.
On the other hand, Proposition 3.1 and 3.2 demonstrate the limitation of distribution based
distances when no restriction is imposed on IER model. Hence, there is a possibility that, un-
der special models, total variation and KL-divergence based tests are as useful or even better
than Frobenius or operator norm based tests. Insights into these questions, along with mini-
max rates for related models such as k-SBM and random dot product graphs, may provide a
clear understanding of the problem of testing graphs on a common set of vertices.
6.4. Extensions. Several extensions of the two sample problem (2)–(3) can be studied.
Earlier in this section, we have discussed the possibility of considering alternative notions of
sparsity. Another interesting, and practically significant, extension is to the case of directed
graphs. The problem naturally extends to this framework, and the proposed adaptive tests
easily tackle this generalisation without any critical modification. For instance, in the case
F
, one merely needs to define
μ and
σ as a summation over all off-diagonal terms and the
thresholds change only by constant factors. The analysis of such tests as well as the minimax
lower bounds can be easily derived from our proofs. The same conclusion is true for the case
of operator norm separation, particularly, when the upper bounds are derived based on
op
and the matrix Bernstein inequality.
In this paper, we only consider the problem of identity testing, that is, P = Q or d(P,Q) >
ρ. One may also study the more general problem of closeness testing, which ignores small
differences between the models, that is, one tests between the hypotheses
0
=
d(P,Q) ,P
max
∨Q
max
δ
, and
1
=
d(P,Q) > ρ,P
max
∨Q
max
δ
for some pre-specified <ρ. The proposed tests, which are primarily based on the princi-
ple of estimating d(P,Q) may be easily adapted to this setting by appropriately modifying
the test thresholds. However, it is not clear whether the minimax separation bounds in Theo-
rems 4.1 and 5.1 easily extend to this setting as well.
From a practical perspective, one may face a more general problem of two sample graph
testing, where the graphs are not defined on a common set of vertices and may even be of
different sizes. This situation is generally hard to study, but tests for this problem are often
used in many applications, where one typically computes some scalar or vector function from
each graph and comments on the difference between two graph populations based on this
function (Stam et al. (2007)). We study this principle in a recent work (Ghoshdastidar et al.
TWO-SAMPLE TESTING FOR RANDOM GRAPHS 2227
(2017)), and propose a formal framework for testing between any two random graphs through
the means of a network function f :
G
n
M that maps the space of graphs on at least n
vertices to some metric space
M. We argue that if the network function concentrates for
some sub-class of random graphs as n →∞, then one can indeed construct two sample tests
based on the network function. However, such a test cannot distinguish between equivalence
classes, that is, random graph models that behave identically under the mapping f .
Acknowledgements. We are grateful to the Associate Editor and the anonymous review-
ers for their comments and suggestions, which have greatly improved the technical content
and the exposition of the paper.
The first author was supported in part by the Deutsche Forschungsgemeinschaft (DFG,
FOR1735), the Institutional Strategy of the University of Tübingen (DFG, ZUK 63) and by
the Baden-Württemberg Stiftung through the Eliteprogramm for Postdocs.
The second author was supported by the Deutsche Forschungsgemeinschaft (DFG) Emmy
Noether grant MuSyAD (CA 1488/1-1).
The third author was supported in part by the DFG Emmy Noether grant MuSyAD
(CA 1488/1-1), by the DFG-314838170, GRK 2297 MathCoRe, by the DFG GRK 2433
DAEDALUS (384950143/GRK2433), by the DFG CRC 1294 “Data Assimilation”, Project
A03 and by the UFA-DFH through the French–German Doktorandenkolleg CDFA 01-18.
The fourth author was supported in part by the DFG FOR1735, the Institutional Strategy
of the University of Tübingen (DFG, ZUK 63) and DFG Cluster of Excellence “Machine
Learning—New Perspectives for Science” EXC 2064/1, project number 390727645.
SUPPLEMENTARY MATERIAL
Supplement: Appendix to “Two-sample hypothesis testing for inhomogeneous ran-
dom graphs” (DOI: 10.1214/19-AOS1884SUPP; .pdf). The supplementary material
(Ghoshdastidar et al. (2020)) contains the detailed proofs of the lemmas and the proposi-
tions stated in Sections 3, 4 and 5.
REFERENCES
ABBE,E.andSANDON, C. (2016). Achieving the KS threshold in the general stochastic block model with
linearized acyclic belief propagation. In Advances in Neural Information Processing Systems 29.
A
LBERT,R.andBARABÁSI, A.-L. (2002). Statistical mechanics of complex networks. Rev. Modern Phys. 74
47–97. MR1895096 https://doi.org/10.1103/RevModPhys.74.47
A
RIAS-CASTRO,E.,BUBECK,S.andLUGOSI, G. (2015). Detecting positive correlations in a multivariate sam-
ple. Bernoulli 21 209–241. MR3322317 https://doi.org/10.3150/13-BEJ565
A
RIAS-CASTRO,E.andVERZELEN, N. (2014). Community detection in dense random networks. Ann. Statist.
42 940–969. MR3210992 https://doi.org/10.1214/14-AOS1208
B
AI,Z.andSARANADASA, H. (1996). Effect of high dimension: By an example of a two sample problem. Statist.
Sinica 6 311–329. MR1399305
B
ANDEIRA,A.S.andVAN HANDEL, R. (2016). Sharp nonasymptotic bounds on the norm of random matrices
with independent entries. Ann. Probab. 44 2479–2506. MR3531673 https://doi.org/10.1214/15-AOP1025
B
ARAUD, Y. (2002). Non-asymptotic minimax rates of testing in signal detection. Bernoulli 8 577–606.
MR1935648
B
ERGER,N.,BORGS,C.,CHAYES,J.T.andSABERI, A. (2005). On the spread of viruses on the Internet. In
Proceedings of the Sixteenth Annual ACM-SIAM Symposium on Discrete Algorithms 301–310. ACM, New
Yor k. MR2298278
B
ERTHET,Q.andRIGOLLET, P. (2013). Optimal detection of sparse principal components in high dimension.
Ann. Statist. 41 1780–1815. MR3127849 https://doi.org/10.1214/13-AOS1127
B
ICKEL,P.J.andSARKAR, P. (2016). Hypothesis testing for automated community detection in networks. J. R.
Stat. Soc. Ser. B. Stat. Methodol. 78 253–273. MR3453655 https://doi.org/10.1111/rssb.12117
B
OLLOBÁS,B.,JANSON,S.andRIORDAN, O. (2007). The phase transition in inhomogeneous random graphs.
Random Structures Algorithms 31 3–122. MR2337396 https://doi.org/10.1002/rsa.20168
2228 GHOSHDASTIDAR, GUTZEIT, CARPENTIER AND VON LUXBURG
B
RESLER,G.andNAGARAJ, D. (2018). Optimal single sample tests for structured versus unstructured network
data. In Conference on Learning Theory. PMLR 75 1657–1690.
B
UBECK,S.,DING,J.,ELDAN,R.andRÁCZ, M. Z. (2016). Testing for high-dimensional geometry in random
graphs. Random Structures Algorithms 49 503–532. MR3545825 https://doi.org/10.1002/rsa.20633
C
AI,T.T.,LIU,W.andXIA, Y. (2014). Two-sample test of high dimensional means under dependence. J. R.
Stat. Soc. Ser. B. Stat. Methodol. 76 349–372. MR3164870 https://doi.org/10.1111/rssb.12034
C
ARPENTIER,A.andNICKL, R. (2015). On signal detection and confidence sets for low rank inference problems.
Electron. J. Stat. 9 2675–2688. MR3432430 https://doi.org/10.1214/15-EJS1087
C
HAN,S.-O.,DIAKONIKOLAS,I.,VALIANT,G.andVALIANT, P. (2014). Optimal algorithms for testing close-
ness of discrete distributions. In Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete
Algorithms 1193–1203. ACM, New York. MR3376448 https://doi.org/10.1137/1.9781611973402.88
C
HEN,S.X.andQIN, Y.-L. (2010). A two-sample test for high-dimensional data with applications to gene-set
testing. Ann. Statist. 38 808–835. MR2604697 https://doi.org/10.1214/09-AOS716
D
ASKALAKIS,C.,DIKKALA,N.andKAMATH, G. (2018). Testing Ising models. In Proceedings of the
Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms 1989–2007. SIAM, Philadelphia, PA.
MR3775918 https://doi.org/10.1137/1.9781611975031.130
D
ECELLE,A.,KRZAKALA,F.,MOORE,C.andZDEBORO, L. (2011). Asymptotic analysis of the stochastic
block model for modular networks and its algorithmic applications. Phys. Rev. E 84 066106.
D
ONOHO,D.andJIN, J. (2004). Higher criticism for detecting sparse heterogeneous mixtures. Ann. Statist. 32
962–994. MR2065195 https://doi.org/10.1214/009053604000000265
D
ONOHO,D.andJIN, J. (2015). Higher criticism for large-scale inference, especially for rare and weak effects.
Statist. Sci. 30 1–25. MR3317751 https://doi.org/10.1214/14-STS506
G
AO,C.andLAFFERTY, J. (2017). Testing network structure using relations between small subgraph probabili-
ties. Preprint. Available at arXiv:1704.06742.
G
HOSHDASTIDAR,D.andVON LUXBURG, U. (2018). Practical methods for graph two-sample testing. In Ad-
vances in Neural Information Processing Systems 31.
G
HOSHDASTIDAR,D.,GUTZEIT,M.,CARPENTIER,A.andVON LUXBURG, U. (2017). Two-sample tests for
large random graphs using network statistics. In Conference on Learning Theory. PMLR 65 954–977.
G
HOSHDASTIDAR,D.,GUTZEIT,M.,CARPENTIER,A.andVON LUXBURG, U. (2020). Supplement to “Two-
sample hypothesis testing for inhomogeneous random graphs.https://doi.org/10.1214/19-AOS1884SUPP.
G
INESTET,C.E.,LI,J.,BALACHANDRAN,P.,ROSENBERG,S.andKOLACZYK, E. D. (2017). Hypothesis test-
ing for network data in functional neuroimaging. Ann. Appl. Stat. 11 725–750. MR3693544 https://doi.org/10.
1214/16-AOAS1015
G
RETTON,A.,BORGWARDT,K.M.,RASCH,M.J.,SCHÖLKOPF,B.andSMOLA, A. (2012). A kernel two-
sample test. J. Mach. Learn. Res. 13 723–773. MR2913716
H
YDUKE,D.R.,LEWIS,N.E.andPALSSON, B. (2013). Analysis of omics data with genome-scale models of
metabolism. Mol. BioSyst. 9 167–174.
I
NGSTER,YU.I.andSUSLINA, I. A. (2003). Nonparametric Goodness-of-Fit Testing Under Gaussian Models.
Lecture Notes in Statistics 169. Springer, New York. MR1991446 https://doi.org/10.1007/978-0-387-21580-8
K
LOPP,O.,TSYBAKOV,A.B.andVERZELEN, N. (2017). Oracle inequalities for network models and sparse
graphon estimation. Ann. Statist. 45 316–354. MR3611494 https://doi.org/10.1214/16-AOS1454
K
ONDOR,R.andPAN, H. (2016). The multiscale Laplacian graph kernel. In Advances in Neural Information
Processing Systems.
L
ANDMAN,B.A.,HUANG,A.J.,GIFFORD,A.,VIKRAM,D.S.,LIM,I.A.,FARRELL,J.A.,BOGOVIC,J.A.,
H
UA,J.,CHEN, M. et al. (2011). Multi-parametric neuroimaging reproducibility: A 3-T resource study. Neu-
roImage 54 2854–2866.
L
E,C.M.,LEVINA,E.andVERSHYNIN, R. (2017). Concentration and regularization of random graphs. Random
Structures Algorithms 51 538–561. MR3689343 https://doi.org/10.1002/rsa.20713
L
EDOIT,O.andWOLF, M. (2002). Some hypothesis tests for the covariance matrix when the dimension is
large compared to the sample size. Ann. Statist. 30 1081–1102. MR1926169 https://doi.org/10.1214/aos/
1031689018
L
EI, J. (2016). A goodness-of-fit test for stochastic block models. Ann. Statist. 44 401–424. MR3449773
https://doi.org/10.1214/15-AOS1370
L
OVÁSZ, L. (2012). Large Networks and Graph Limits. American Mathematical Society Colloquium Publications
60. Am. Math. Soc., Providence, RI. MR3012035 https://doi.org/10.1090/coll/060
L
U,L.andPENG, X. (2013). Spectra of edge-independent random graphs. Electron. J. Combin. 20 Paper 27.
MR3158266
M
UKHERJEE,R.,PILLAI,N.S.andLIN, X. (2015). Hypothesis testing for high-dimensional sparse binary
regression. Ann. Statist. 43 352–381. MR3311863 https://doi.org/10.1214/14-AOS1279
TWO-SAMPLE TESTING FOR RANDOM GRAPHS 2229
OLIVEIRA, R. I. (2009). Concentration of the adjacency matrix and of the Laplacian in random graphs with
independent edges. Preprint. Available at arXiv:0911.0600.
R
AMDAS,A.,REDDI,S.,POCZOS,B.,SINGH,A.andWASSERMAN, L. (2015). Adaptivity and computation-
statistics tradeoffs for kernel and distance based high dimensional two sample testing. Preprint. Available at
arXiv:1508.00655.
R
YABKO, D. (2017). Hypotheses testing on infinite random graphs. In International Conference on Algorithmic
Learning Theory. Proc. Mach. Learn. Res.(PMLR) 76 1–12. Proceedings of Machine Learning Research
PMLR. MR3776703
S
HERVASHIDZE,N.,SCHWEITZER,P.,VAN LEEUWEN,E.J.,MEHLHORN,K.andBORGWARDT,K.M.
(2011). Weisfeiler–Lehman graph kernels. J. Mach. Learn. Res. 12 2539–2561. MR2845672
S
TAM,C.J.,JONES,B.F.,NOLTE,G.,BREAKSPEAR,M.andSCHELTENS, P. (2007). Small-world networks
and functional connectivity in Alzheimer’s disease. Cereb. Cortex 17 92–99.
S
UN,X.andNOBEL, A. B. (2008). On the size and recovery of submatrices of ones in a random binary matrix.
J. Mach. Learn. Res. 9 2431–2453. MR2460888
T
ANG,M.,ATHREYA,A.,SUSSMAN,D.L.,LYZINSKI,V.,PARK,Y.andPRIEBE, C. E. (2017a). A semi-
parametric two-sample hypothesis testing problem for random graphs. J. Comput. Graph. Statist. 26 344–354.
MR3640191 https://doi.org/10.1080/10618600.2016.1193505
T
ANG,M.,ATHREYA,A.,SUSSMAN,D.L.,LYZINSKI,V.andPRIEBE, C. E. (2017b). A nonparamet-
ric two-sample hypothesis testing problem for random graphs. Bernoulli 23 1599–1630. MR3624872
https://doi.org/10.3150/15-BEJ789
T
ROPP, J. A. (2012). User-friendly tail bounds for sums of random matrices. Found. Comput. Math. 12 389–434.
MR2946459 https://doi.org/10.1007/s10208-011-9099-z
V
ERZELEN,N.andARIAS-CASTRO, E. (2015). Community detection in sparse random networks. Ann. Appl.
Probab. 25 3465–3510. MR3404642 https://doi.org/10.1214/14-AAP1080
V
ERZELEN,N.andARIAS-CASTRO, E. (2017). Detection and feature selection in sparse mixture models. Ann.
Statist. 45 1920–1950. MR3718157 https://doi.org/10.1214/16-AOS1513
V
ISHWANATHAN,S.V.N.,SCHRAUDOLPH,N.N.,KONDOR,R.andBORGWARDT, K. M. (2010). Graph
kernels. J. Mach. Learn. Res. 11 1201–1242. MR2645450 https://doi.org/10.1093/chemse/bjq147