karateclub¶
Contents
Overlapping community detection¶

class
EgoNetSplitter
(resolution=1.0)[source]¶ An implementation of “EgoSplitting” from the KDD ‘17 paper “EgoSplitting Framework: from NonOverlapping to Overlapping Clusters”. The tool first creates the egonets of nodes. A personagraph is created which is clustered by the Louvain method. The resulting overlapping cluster memberships are stored as a dictionary.
Parameters: resolution (float) – Resolution parameter of Python Louvain. Default 1.0.

class
DANMF
(layers=[32, 8], pre_iterations=100, iterations=100, seed=42, lamb=0.01)[source]¶ An implementation of “DANMF” from the CIKM ‘18 paper “Deep Autoencoderlike Nonnegative Matrix Factorization for Community Detection”. The procedure uses telescopic nonnegative matrix factorization in order to learn a cluster membership distribution over nodes. The method can be used in an overlapping and nonoverlapping way.
Parameters:  layers (list) – Autoencoder layer sizes in a list of integers. Default [32, 8].
 pre_iterations (int) – Number of pretraining epochs. Default 100.
 iterations (int) – Number of training epochs. Default 100.
 seed (int) – Random seed for weight initializations. Default 42.
 lamb (float) – Regularization parameter. Default 0.01.

fit
(graph)[source]¶ Fitting a DANMF clustering model.
 Arg types:
 graph (NetworkX graph)  The graph to be clustered.

class
NNSED
(dimensions=32, iterations=10, seed=42)[source]¶ An implementation of “NNSED” from the CIKM ‘17 paper “A Nonnegative Symmetric EncoderDecoder Approach for Community Detection”. The procedure uses nonnegative matrix factorization in order to learn an unnormalized cluster membership distribution over nodes. The method can be used in an overlapping and nonoverlapping way.
Parameters: 
fit
(graph)[source]¶ Fitting an NNSED clustering model.
 Arg types:
 graph (NetworkX graph)  The graph to be clustered.


class
MNMF
(dimensions=128, clusters=10, lambd=0.2, alpha=0.05, beta=0.05, iterations=200, lower_control=1e15, eta=5.0)[source]¶ An implementation of “MNMF” from the AAAI ‘17 paper “Community Preserving Network Embedding”. The procedure uses joint nonnegative matrix factorization with modularity based regularization in order to learn a cluster membership distribution over nodes. The method can be used in an overlapping and nonoverlapping way.
Parameters:  dimensions (int) – Number of dimensions. Default is 128.
 clusters (int) – Number of clusters. Default is 10.
 lambd (float) – KKT penalty. Default is 0.2
 alpha (float) – Clustering penalty. Default is 0.05.
 beta (float) – Modularity regularization penalty. Default is 0.05.
 iterations (int) – Number of power iterations. Default is 200.
 lower_control (float) – Floating point overflow control. Default is 10**15.
 eta (float) – Similarity mixing parameter. Default is 5.0.

fit
(graph)[source]¶ Fitting an MNMF clustering model.
 Arg types:
 graph (NetworkX graph)  The graph to be clustered.

get_cluster_centers
()[source]¶ Getting the node embedding.
 Return types:
 centers (Numpy array)  The cluster centers.

class
BigClam
(dimensions=8, iterations=50, learning_rate=0.005)[source]¶ An implementation of “BigClam” from the WSDM ‘13 paper “Overlapping Community Detection at Scale: A Nonnegative Matrix Factorization Approach”. The procedure uses gradient ascent to create an embedding which is used for deciding the nodecluster affiliations.
Parameters: 
fit
(graph)[source]¶ Fitting a BigClam clustering model.
 Arg types:
 graph (NetworkX graph)  The graph to be clustered.


class
SymmNMF
(dimensions=32, iterations=200, rho=100.0)[source]¶ An implementation of “SymmNMF” from the SDM‘12 paper “Symmetric Nonnegative Matrix Factorization for Graph Clustering”. The procedure decomposed the second power od the normalized adjacency matrix with an ADMM based nonnegative matrix factorization based technique. This results in a node embedding and each node is associated with an embedding factor in the created latent space.
Parameters: 
fit
(graph)[source]¶ Fitting a SymmNMF clustering model.
 Arg types:
 graph (NetworkX graph)  The graph to be clustered.

Nonoverlapping community detection¶

class
EdMot
(component_count=2, cutoff=50)[source]¶ An implementation of “Edge Motif Clustering” from the KDD ‘19 paper “EdMot: An Edge Enhancement Approach for Motifaware Community Detection”. The tool first creates the graph of higher order motifs. This graph is clustered by the Louvain method. The resulting cluster memberships are stored as a dictionary.
Parameters:

class
SCD
(iterations=25, eps=1e06)[source]¶ An implementation of “SCD” from the WWW ‘14 paper “High Quality, Scalable and Parallel Community Detection for Large Real Graphs”. The procedure greedily optimizes the approximate weighted community clustering metric. First, clusters are built around highly clustered nodes. Second, we refine the initial partition by using the approximate WCC. These refinements happen for the whole vertex set.
Parameters:

class
LabelPropagation
(seed=42, iterations=100)[source]¶ An implementation of “Label Propagation Clustering” from the Physical Review ‘07 paper “Near Linear Time Algorithm to Detect Community Structures in LargeScale Networks”. The tool executes a series of label propagations with unique labels. The final labels are used as cluster memberships.
Parameters:
Neighbourhoodbased node embedding¶

class
Diff2Vec
(diffusion_number=10, diffusion_cover=80, dimensions=128, workers=4, window_size=5, epochs=1, learning_rate=0.05, min_count=1)[source]¶ An implementation of “Diff2Vec” from the CompleNet ‘18 paper “Diff2Vec: Fast Sequence Based Embedding with Diffusion Graphs”. The procedure creates diffusion trees from every source node in the graph. These graphs are linearized by a directed Eulerian walk, the walks are used for running the skipgram algorithm the learn node level neighbourhood based embeddings.
Parameters:  diffusion_number (int) – Number of diffusions. Default is 10.
 diffusion_cover (int) – Number of nodes in diffusion. Default is 80.
 dimensions (int) – Dimensionality of embedding. Default is 128.
 workers (int) – Number of cores. Default is 4.
 window_size (int) – Matrix power order. Default is 5.
 epochs (int) – Number of epochs. Default is 1.
 learning_rate (float) – HogWild! learning rate. Default is 0.05.
 min_count (int) – Minimal count of node occurences. Default is 1.

class
NodeSketch
(dimensions=32, iterations=2, decay=0.01)[source]¶ An implementation of “NodeSketch” from the KDD ‘19 paper “NodeSketch: HighlyEfficient Graph Embeddings via Recursive Sketching”. The procedure starts by sketching the selfloopaugmented adjacency matrix of the graph to output loworder node embeddings, and then recursively generates korder node embeddings based on the selfloopaugmented adjacency matrix and (k1)order node embeddings.
Parameters:

class
NetMF
(dimensions=32, iteration=10, order=2, negative_samples=1, seed=42)[source]¶ An implementation of “NetMF” from the WSDM ‘18 paper “Network Embedding as Matrix Factorization: Unifying DeepWalk, LINE, PTE, and Node2Vec”. The procedure uses sparse truncated SVD to learn embeddings for the pooled powers of the PMI matrix computed from powers of the normalized adjacency matrix.
Parameters:

class
BoostNE
(dimensions=8, iterations=16, order=2, alpha=0.01)[source]¶ An implementation of “BoostNE” from the ASONAM ‘19 paper “MultiLevel Network Embedding with Boosted LowRank Matrix Approximation”. The procedure uses nonnegative matrix factorization iteratively to decompose the residuals obtained by previous factorization models. The base target matrix is a pooled sum of adjacency matrix powers.
Parameters:

class
Walklets
(walk_number=10, walk_length=80, dimensions=32, workers=4, window_size=4, epochs=1, learning_rate=0.05, min_count=1)[source]¶ An implementation of “Walklets” from the ASONAM ‘17 paper “Don’t Walk, Skip! Online Learning of Multiscale Network Embeddings”. The procedure uses random walks to approximate the pointwise mutual information matrix obtained by individual normalized adjacency matrix powers. These are all decomposed by an approximate factorization technique and the embeddings are concatenated together.
Parameters:  walk_number (int) – Number of random walks. Default is 10.
 walk_length (int) – Length of random walks. Default is 80.
 dimensions (int) – Dimensionality of embedding. Default is 32.
 workers (int) – Number of cores. Default is 4.
 window_size (int) – Matrix power order. Default is 4.
 epochs (int) – Number of epochs. Default is 1.
 learning_rate (float) – HogWild! learning rate. Default is 0.05.
 min_count (int) – Minimal count of node occurences. Default is 1.

class
GraRep
(dimensions=32, iteration=10, order=5, seed=42)[source]¶ An implementation of “GraRep” from the CIKM ‘15 paper “GraRep: Learning Graph Representations with Global Structural Information”. The procedure uses sparse truncated SVD to learn embeddings for the powers of the PMI matrix computed from powers of the normalized adjacency matrix.
Parameters:

class
DeepWalk
(walk_number=10, walk_length=80, dimensions=128, workers=4, window_size=5, epochs=1, learning_rate=0.05, min_count=1)[source]¶ An implementation of “DeepWalk” from the KDD ‘14 paper “DeepWalk: Online Learning of Social Representations”. The procedure uses random walks to approximate the pointwise mutual information matrix obtained by pooling normalized adjacency matrix powers. This matrix is decomposed by an approximate factorization technique.
Parameters:  walk_number (int) – Number of random walks. Default is 10.
 walk_length (int) – Length of random walks. Default is 80.
 dimensions (int) – Dimensionality of embedding. Default is 128.
 workers (int) – Number of cores. Default is 4.
 window_size (int) – Matrix power order. Default is 5.
 epochs (int) – Number of epochs. Default is 1.
 learning_rate (float) – HogWild! learning rate. Default is 0.05.
 min_count (int) – Minimal count of node occurences. Default is 1.

class
NMFADMM
(dimensions=32, iterations=100, rho=1.0)[source]¶ An implementation of “NMFADMM” from the ICASSP ‘14 paper “Alternating Direction Method of Multipliers for NonNegative Matrix Factorization with the BetaDivergence”. The procedure learns an embedding of the normalized adjacency matrix with by using the alternating direction method of multipliers to solve a non negative matrix factorization problem.
Parameters:

class
LaplacianEigenmaps
(dimensions=128)[source]¶ An implementation of “Laplacian Eigenmaps” from the NIPS ‘01 paper “Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering”. The procedure extracts the eigenvectors corresponding to the largest values of the graph Laplacian. These vectors are used as the node embedding.
Parameters: dimensions (int) – Dimensionality of embedding. Default is 128.
Structural node embedding¶

class
GraphWave
(sample_number=200, step_size=0.1, heat_coefficient=1.0, approximation=100, mechanism='approximate', switch=1000)[source]¶ An implementation of “GraphWave” from the KDD ‘18 paper “Learning Structural Node Embeddings Via Diffusion Wavelets”. The procedure first calculates the graph wavelets using a heat kernel. The wavelets are treated as probability distributions over nodes from a source node. Using these the characteristic function is evaluated at certain gird points to learn structural node embeddings of the vertices.
Parameters:  sample_number (int) – Number of evaluation points. Default is 200.
 step_size (float) – Grid point step size. Default is 0.1.
 heat_coefficient (float) – Heat kernel coefficient. Default is 1.0.
 approximation (int) – Chebyshev polynomial order. Default is 100.
 mechanism (str) – Wavelet calculation method ‘exact’ or ‘approximate’. Default is ‘approximate’.
 switch (int) – Vertex cardinality when the wavelet calculation method switches to approximation. Default is 1000.

class
Role2Vec
(walk_number=10, walk_length=80, dimensions=128, workers=4, window_size=2, epochs=1, learning_rate=0.05, down_sampling=0.0001, min_count=10, wl_iterations=2)[source]¶ An implementation of “Role2vec” from the IJCAI ‘18 paper “Learning Rolebased Graph Embeddings”. The procedure uses random walks to approximate the pointwise mutual information matrix obtained by multiplying the pooled adjacency power matrix with a structural feature matrix (in this case WeisfeilerLehman features). This way one gets structural node embeddings.
Parameters:  walk_number (int) – Number of random walks. Default is 10.
 walk_length (int) – Length of random walks. Default is 80.
 dimensions (int) – Dimensionality of embedding. Default is 128.
 workers (int) – Number of cores. Default is 4.
 window_size (int) – Matrix power order. Default is 2.
 epochs (int) – Number of epochs. Default is 1.
 learning_rate (float) – HogWild! learning rate. Default is 0.05.
 down_sampling (float) – Down sampling frequency. Default is 0.0001.
 min_count (int) – Minimal count of feature occurences. Default is 10.
 wl_iterations (int) – Number of WeisfeilerLehman hashing iterations. Default is 2.
Attributed node embedding¶

class
MUSAE
(walk_number=5, walk_length=80, dimensions=32, workers=4, window_size=3, epochs=5, learning_rate=0.05, down_sampling=0.0001, min_count=1)[source]¶ An implementation of “MUSAE” from the Arxiv ‘19 paper “MUSAE: MultiScale Attributed Node Embedding”. The procedure does attributed random walks to approximate the adjacency matrix power node feature matrix products. The matrices are decomposed implicitly by a SkipGram style optimizer. The individual embeddings are concatenated together to form a multiscale attributed node embedding. This way the feature distributions at different scales are separable.
Parameters:  walk_number (int) – Number of random walks. Default is 10.
 walk_length (int) – Length of random walks. Default is 80.
 dimensions (int) – Dimensionality of embedding. Default is 32.
 workers (int) – Number of cores. Default is 4.
 window_size (int) – Matrix power order. Default is 3.
 epochs (int) – Number of epochs. Default is 1.
 learning_rate (float) – HogWild! learning rate. Default is 0.05.
 down_sampling (float) – Down sampling rate in the corpus. Default is 0.0001.
 min_count (int) – Minimal count of node occurences. Default is 1.

class
SINE
(walk_number=10, walk_length=80, dimensions=128, workers=4, window_size=5, epochs=1, learning_rate=0.05, min_count=1)[source]¶ An implementation of “SINE” from the ICDM ‘18 paper “SINE: Scalable Incomplete Network Embedding”. The procedure implicitly factorizes a joint adjacency matrix power and feature matrix. The decomposition happens on truncated random walks and the adjacency matrix powers are pooled together.
Parameters:  walk_number (int) – Number of random walks. Default is 10.
 walk_length (int) – Length of random walks. Default is 80.
 dimensions (int) – Dimensionality of embedding. Default is 128.
 workers (int) – Number of cores. Default is 4.
 window_size (int) – Matrix power order. Default is 5.
 epochs (int) – Number of epochs. Default is 1.
 learning_rate (float) – HogWild! learning rate. Default is 0.05.
 min_count (int) – Minimal count of node occurences. Default is 1.

class
BANE
(dimensions=32, svd_iterations=20, seed=42, alpha=0.3, iterations=100, binarization_iterations=20)[source]¶ An implementation of “BANE” from the ICDM ‘18 paper “Binarized Attributed Network Embedding Class”. The procedure first calculates the truncated SVD of an adjacency  feature matrix product. This matrix is further decomposed by a binary CCD based technique.
Parameters:  dimensions (int) – Number of embedding dimensions. Default is 32.
 svd_iterations (int) – SVD iteration count. Default is 20.
 seed (int) – Random seed. Default is 42.
 alpha (float) – Kernel matrix inversion parameter. Default is 0.3.
 iterations (int) – Matrix decomposition iterations. Default is 100.
 binarization_iterations (int) – Binarization iterations. Default is 20.

class
TENE
(dimensions=32, lower_control=1e15, alpha=0.1, beta=0.1, iterations=200)[source]¶ An implementation of “TENE” from the ICPR ‘18 paper “Enhanced Network Embedding with Text Information”. The procedure jointly factorizes the adjacency and node feature matrices using alternating least squares.
Parameters:  dimensions (int) – Number of embedding dimensions. Default is 32.
 lower_control (float) – Embedding score minimal value. Default is 10**15.
 alpha (float) – Adjacency matrix regularization coefficient. Default is 0.1.
 beta (float) – Feature matrix regularization coefficient. Default is 0.1.
 iterations (int) – ALS iterations. Default is 200.

class
TADW
(dimensions=32, reduction_dimensions=64, svd_iterations=20, seed=42, alpha=0.01, iterations=10, lambd=10.0)[source]¶ An implementation of “TADW” from the IJCAI ‘15 paper “Network Representation Learning with Rich Text Information”. The procedure uses the node attribute matrix with a factorization matrix to reproduce a power of the adjacency matrix to create representations.
Parameters:  dimensions (int) – Number of embedding dimensions. Default is 32.
 reduction_dimensions (int) – SVD reduction dimensions. Default is 64.
 svd_iterations (int) – SVD iteration count. Default is 20.
 seed (int) – Random seed. Default is 42.
 alpha (float) – Learning rate. Default is 0.01.
 iterations (int) – Matrix decomposition iterations. Default is 10.
 lambd (float) – Regularization coefficient. Default is 10.0.

class
FSCNMF
(dimensions=32, lower_control=1e15, iterations=500, alpha_1=1000.0, alpha_2=1.0, alpha_3=1.0, beta_1=1000.0, beta_2=1.0, beta_3=1.0)[source]¶ An implementation of “FCNMF” from the Arxiv ‘18 paper “Fusing Structure and Content via Nonnegative Matrix Factorization for Embedding Information Networks”. The procedure uses a joint matrix factorization technique on the adjacency and feature matrices. The node and feature embeddings are coregularized for alignment of the embedding spaces.
Parameters:  dimensions (int) – Number of embedding dimensions. Default is 32.
 lower_control (float) – Embedding score minimal value. Default is 10**15.
 iterations (int) – Power iterations. Default is 200.
 alpha_1 (float) – Alignment parameter for adjacency matrix. Default is 1000.0.
 alpha_2 (float) – Adjacency basis regularization. Default is 1.0.
 alpha_3 (float) – Adjacency features regularization. Default is 1.0.
 beta_1 (float) – Alignment parameter for feature matrix. Default is 1000.0.
 beta_2 (float) – Attribute basis regularization. Default is 1.0.
Meta node embedding¶

class
NEU
(L1=0.5, L2=0.25, T=1)[source]¶ An implementation of “NEU” from the IJCAI 17 paper “Fast Network Embedding Enhancement via High Order Proximity Approximation”. The procedure uses an arbitrary embedding and augments it by higher order proximities wiht a recursive meta learning algorithm. :param L1: Weight of lower order proximities. Defauls is 0.5 :type L1: float :param L2: Weight of higer order proximities. Default is 0.25. :type L2: float :param T: Number of iterations. Default is 1. :type T: int
Whole graph embedding¶

class
GL2Vec
(wl_iterations=2, dimensions=128, workers=4, down_sampling=0.0001, epochs=10, learning_rate=0.025, min_count=5, seed=42)[source]¶ An implementation of “GL2Vec” from the ICONIP ‘19 paper “GL2vec: Graph Embedding Enriched by Line Graphs with Edge Features”. First, the algorithm creates the line graph of each graph in the graph dataset. The procedure creates WeisfeilerLehman tree features for nodes in graphs. Using these features a document (graph)  feature cooccurence matrix is decomposed in order to generate representations for the graphs.
The procedure assumes that nodes have no string feature present and the WLhashing defaults to the degree centrality. However, if a node feature with the key “feature” is supported for the nodes the feature extraction happens based on the values of this key.
Parameters:  wl_iterations (int) – Number of WeisfeilerLehman iterations. Default is 2.
 dimensions (int) – Dimensionality of embedding. Default is 128.
 workers (int) – Number of cores. Default is 4.
 down_sampling (float) – Down sampling frequency. Default is 0.0001.
 epochs (int) – Number of epochs. Default is 10.
 learning_rate (float) – HogWild! learning rate. Default is 0.025.
 min_count (int) – Minimal count of graph feature occurences. Default is 5.
 seed (int) – Random seed for the model. Default is 42.

class
NetLSD
(scale_min=2.0, scale_max=2.0, scale_steps=250, approximations=200)[source]¶ An implementation of “NetLSD” from the KDD ‘18 paper “NetLSD: Hearing the Shape of a Graph”. The procedure calculate the heat kernel trace of the normalized Laplacian matrix over a vector of time scales. If the matrix is large it switches to an approximation of the eigenvalues.
Parameters:

class
SF
(dimensions=128)[source]¶ An implementation of “SF” from the NeurIPS Relational Representation Learning Workshop ‘18 paper “A Simple Baseline Algorithm for Graph Classification”. The procedure calculates the k lowest egeinvalues of the normalized Laplacian. If the graph has a lower number of eigenvalues than k the representation is padded with zeros.
Parameters: dimensions (int) – Number of lowest eigenvalues. Default is 128.

class
FGSD
(hist_bins=200, hist_range=20)[source]¶ An implementation of “FGSD” from the NeurIPS ‘17 paper “Hunt For The Unique, Stable, Sparse And Fast Feature Learning On Graphs”. The procedure calculates the MoorePenrose spectrum of the normalized Laplacian. Using this spectrum the histogram of the spectral features is used as a whole graph representation.
Parameters:

class
Graph2Vec
(wl_iterations=2, attributed=False, dimensions=128, workers=4, down_sampling=0.0001, epochs=10, learning_rate=0.025, min_count=5, seed=42)[source]¶ An implementation of “Graph2Vec” from the MLGWorkshop ‘17 paper “Graph2Vec: Learning Distributed Representations of Graphs”. The procedure creates WeisfeilerLehman tree features for nodes in graphs. Using these features a document (graph)  feature cooccurence matrix is decomposed in order to generate representations for the graphs.
The procedure assumes that nodes have no string feature present and the WLhashing defaults to the degree centrality. However, if a node feature with the key “feature” is supported for the nodes the feature extraction happens based on the values of this key.
Parameters:  wl_iterations (int) – Number of WeisfeilerLehman iterations. Default is 2.
 attributed (bool) – Presence of graph attributes. Default is False.
 dimensions (int) – Dimensionality of embedding. Default is 128.
 workers (int) – Number of cores. Default is 4.
 down_sampling (float) – Down sampling frequency. Default is 0.0001.
 epochs (int) – Number of epochs. Default is 10.
 learning_rate (float) – HogWild! learning rate. Default is 0.025.
 min_count (int) – Minimal count of graph feature occurences. Default is 5.
 seed (int) – Random seed for the model. Default is 42.