karateclub

Overlapping community detection

class EgoNetSplitter(resolution=1.0)[source]

An implementation of “Ego-Splitting” from the KDD ‘17 paper “Ego-Splitting Framework: from Non-Overlapping to Overlapping Clusters”. The tool first creates the ego-nets of nodes. A persona-graph is created which is clustered by the Louvain method. The resulting overlapping cluster memberships are stored as a dictionary.

Parameters:resolution (float) – Resolution parameter of Python Louvain. Default 1.0.
fit(graph)[source]

Fitting an Ego-Splitter clustering model.

Arg types:
  • graph (NetworkX graph) - The graph to be clustered.
get_memberships()[source]

Getting the cluster membership of nodes.

Return types:
  • memberships (dictionary of lists) - Cluster memberships.
class DANMF(layers=[32, 8], pre_iterations=100, iterations=100, seed=42, lamb=0.01)[source]

An implementation of “DANMF” from the CIKM ‘18 paper “Deep Autoencoder-like Nonnegative Matrix Factorization for Community Detection”. The procedure uses telescopic non-negative matrix factorization in order to learn a cluster membership distribution over nodes. The method can be used in an overlapping and non-overlapping way.

Parameters:
  • layers (list) – Autoencoder layer sizes in a list of integers. Default [32, 8].
  • pre_iterations (int) – Number of pre-training epochs. Default 100.
  • iterations (int) – Number of training epochs. Default 100.
  • seed (int) – Random seed for weight initializations. Default 42.
  • lamb (float) – Regularization parameter. Default 0.01.
fit(graph)[source]

Fitting a DANMF clustering model.

Arg types:
  • graph (NetworkX graph) - The graph to be clustered.
get_embedding()[source]

Getting the bottleneck layer embedding.

Return types:
  • embedding (Numpy array) - The bottleneck layer embedding of nodes.
get_memberships()[source]

Getting the cluster membership of nodes.

Return types:
  • memberships (dict): Node cluster memberships.
class NNSED(dimensions=32, iterations=10, seed=42)[source]

An implementation of “NNSED” from the CIKM ‘17 paper “A Non-negative Symmetric Encoder-Decoder Approach for Community Detection”. The procedure uses non-negative matrix factorization in order to learn an unnormalized cluster membership distribution over nodes. The method can be used in an overlapping and non-overlapping way.

Parameters:
  • layers (int) – Embedding layer size. Default is 32.
  • iterations (int) – Number of training epochs. Default 10.
  • seed (int) – Random seed for weight initializations. Default 42.
fit(graph)[source]

Fitting an NNSED clustering model.

Arg types:
  • graph (NetworkX graph) - The graph to be clustered.
get_embedding()[source]

Getting the bottleneck layer embedding.

Return types:
  • embedding (Numpy array) - The embedding of nodes.
get_memberships()[source]

Getting the cluster membership of nodes.

Return types:
  • memberships (dict) - Node cluster memberships.
class MNMF(dimensions=128, clusters=10, lambd=0.2, alpha=0.05, beta=0.05, iterations=200, lower_control=1e-15, eta=5.0)[source]

An implementation of “M-NMF” from the AAAI ‘17 paper “Community Preserving Network Embedding”. The procedure uses joint non-negative matrix factorization with modularity based regularization in order to learn a cluster membership distribution over nodes. The method can be used in an overlapping and non-overlapping way.

Parameters:
  • dimensions (int) – Number of dimensions. Default is 128.
  • clusters (int) – Number of clusters. Default is 10.
  • lambd (float) – KKT penalty. Default is 0.2
  • alpha (float) – Clustering penalty. Default is 0.05.
  • beta (float) – Modularity regularization penalty. Default is 0.05.
  • iterations (int) – Number of power iterations. Default is 200.
  • lower_control (float) – Floating point overflow control. Default is 10**-15.
  • eta (float) – Similarity mixing parameter. Default is 5.0.
fit(graph)[source]

Fitting an M-NMF clustering model.

Arg types:
  • graph (NetworkX graph) - The graph to be clustered.
get_cluster_centers()[source]

Getting the node embedding.

Return types:
  • centers (Numpy array) - The cluster centers.
get_embedding()[source]

Getting the node embedding.

Return types:
  • embedding (Numpy array) - The embedding of nodes.
get_memberships()[source]

Getting the cluster membership of nodes.

Return types:
  • memberships (dict) - Node cluster memberships.
class BigClam(dimensions=8, iterations=50, learning_rate=0.005)[source]

An implementation of “BigClam” from the WSDM ‘13 paper “Overlapping Community Detection at Scale: A Non-negative Matrix Factorization Approach”. The procedure uses gradient ascent to create an embedding which is used for deciding the node-cluster affiliations.

Parameters:
  • dimensions (int) – Number of embedding dimensions. Default 8.
  • iterations (int) – Number of training iterations. Default 50.
  • learning_rate (float) – Gradient ascent learning rate. Default is 0.005.
fit(graph)[source]

Fitting a BigClam clustering model.

Arg types:
  • graph (NetworkX graph) - The graph to be clustered.
get_embedding()[source]

Getting the node embedding.

Return types:
  • embedding (Numpy array) - The embedding of nodes.
get_memberships()[source]

Getting the cluster membership of nodes.

Return types:
  • memberships (dict) - Node cluster memberships.
class SymmNMF(dimensions=32, iterations=200, rho=100.0)[source]

An implementation of “Symm-NMF” from the SDM‘12 paper “Symmetric Nonnegative Matrix Factorization for Graph Clustering”. The procedure decomposed the second power od the normalized adjacency matrix with an ADMM based non-negative matrix factorization based technique. This results in a node embedding and each node is associated with an embedding factor in the created latent space.

Parameters:
  • dimensions (int) – Number of dimensions. Default is 32.
  • iterations (int) – Number of power iterations. Default is 200.
  • rho (float) – ADMM tuning parameter. Default is 100.0.
fit(graph)[source]

Fitting a Symm-NMF clustering model.

Arg types:
  • graph (NetworkX graph) - The graph to be clustered.
get_embedding()[source]

Getting the node embedding.

Return types:
  • embedding (Numpy array) - The embedding of nodes.
get_memberships()[source]

Getting the cluster membership of nodes.

Return types:
  • memberships (dict) - Node cluster memberships.

Non-overlapping community detection

class EdMot(component_count=2, cutoff=50)[source]

An implementation of “Edge Motif Clustering” from the KDD ‘19 paper “EdMot: An Edge Enhancement Approach for Motif-aware Community Detection”. The tool first creates the graph of higher order motifs. This graph is clustered by the Louvain method. The resulting cluster memberships are stored as a dictionary.

Parameters:
  • component_count (int) – Number of extracted motif hypergraph components. Default is 2.
  • cutoff (int) – Motif edge cut-off value. Default is 50.
fit(graph)[source]

Fitting an Edge Motif clustering model.

Arg types:
  • graph (NetworkX graph) - The graph to be clustered.
get_memberships()[source]

Getting the cluster membership of nodes.

Return types:
  • memberships (dictionary of ints) - Cluster memberships.
class SCD(iterations=25, eps=1e-06)[source]

An implementation of “SCD” from the WWW ‘14 paper “High Quality, Scalable and Parallel Community Detection for Large Real Graphs”. The procedure greedily optimizes the approximate weighted community clustering metric. First, clusters are built around highly clustered nodes. Second, we refine the initial partition by using the approximate WCC. These refinements happen for the whole vertex set.

Parameters:
  • iterations (int) – Refinemeent iterations. Default is 25.
  • eps (float) – Epsilon score for zero division correction. Default is 10**-6.
fit(graph)[source]

Fitting a Label Propagation clustering model.

Arg types:
  • graph (NetworkX graph) - The graph to be clustered.
get_memberships()[source]

Getting the cluster membership of nodes.

Return types:
  • memberships (dict) - Node cluster memberships.
class LabelPropagation(seed=42, iterations=100)[source]

An implementation of “Label Propagation Clustering” from the Physical Review ‘07 paper “Near Linear Time Algorithm to Detect Community Structures in Large-Scale Networks”. The tool executes a series of label propagations with unique labels. The final labels are used as cluster memberships.

Parameters:
  • seed (int) – Random seed. Default is 42.
  • iterations (int) – Propagation iterations. Default is 100.
fit(graph)[source]

Fitting a Label Propagation clustering model.

Arg types:
  • graph (NetworkX graph) - The graph to be clustered.
get_memberships()[source]

Getting the cluster membership of nodes.

Return types:
  • memberships (dict) - Node cluster memberships.

Neighbourhood-based node embedding

class Diff2Vec(diffusion_number=10, diffusion_cover=80, dimensions=128, workers=4, window_size=5, epochs=1, learning_rate=0.05, min_count=1)[source]

An implementation of “Diff2Vec” from the CompleNet ‘18 paper “Diff2Vec: Fast Sequence Based Embedding with Diffusion Graphs”. The procedure creates diffusion trees from every source node in the graph. These graphs are linearized by a directed Eulerian walk, the walks are used for running the skip-gram algorithm the learn node level neighbourhood based embeddings.

Parameters:
  • diffusion_number (int) – Number of diffusions. Default is 10.
  • diffusion_cover (int) – Number of nodes in diffusion. Default is 80.
  • dimensions (int) – Dimensionality of embedding. Default is 128.
  • workers (int) – Number of cores. Default is 4.
  • window_size (int) – Matrix power order. Default is 5.
  • epochs (int) – Number of epochs. Default is 1.
  • learning_rate (float) – HogWild! learning rate. Default is 0.05.
  • min_count (int) – Minimal count of node occurences. Default is 1.
fit(graph)[source]

Fitting a Diff2Vec model.

Arg types:
  • graph (NetworkX graph) - The graph to be embedded.
get_embedding()[source]

Getting the node embedding.

Return types:
  • embedding (Numpy array) - The embedding of nodes.
class NodeSketch(dimensions=32, iterations=2, decay=0.01)[source]

An implementation of “NodeSketch” from the KDD ‘19 paper “NodeSketch: Highly-Efficient Graph Embeddings via Recursive Sketching”. The procedure starts by sketching the self-loop-augmented adjacency matrix of the graph to output low-order node embeddings, and then recursively generates k-order node embeddings based on the self-loop-augmented adjacency matrix and (k-1)-order node embeddings.

Parameters:
  • dimensions (int) – Embedding dimensions. Default is 32
  • iterations (int) – Number of iterations (sketch order minus one). Default is 2
  • decay (float) – Exponential decay rate. Default is 0.01
fit(graph)[source]

Fitting a NodeSketch model.

Arg types:
  • graph (NetworkX graph) - The graph to be embedded.
get_embedding()[source]

Getting the node embedding.

Return types:
  • embedding (Numpy array) - The embedding of nodes.
class NetMF(dimensions=32, iteration=10, order=2, negative_samples=1, seed=42)[source]

An implementation of “NetMF” from the WSDM ‘18 paper “Network Embedding as Matrix Factorization: Unifying DeepWalk, LINE, PTE, and Node2Vec”. The procedure uses sparse truncated SVD to learn embeddings for the pooled powers of the PMI matrix computed from powers of the normalized adjacency matrix.

Parameters:
  • dimensions (int) – Number of embedding dimension. Default is 32.
  • iteration (int) – Number of SVD iterations. Default is 10.
  • order (int) – Number of PMI matrix powers. Default is 2.
  • negative_samples (in) – Number of negative samples. Default is 1.
  • seed (int) – SVD random seed. Default is 42.
fit(graph)[source]

Fitting a NetMF model.

Arg types:
  • graph (NetworkX graph) - The graph to be embedded.
get_embedding()[source]

Getting the node embedding.

Return types:
  • embedding (Numpy array) - The embedding of nodes.
class BoostNE(dimensions=8, iterations=16, order=2, alpha=0.01)[source]

An implementation of “BoostNE” from the ASONAM ‘19 paper “Multi-Level Network Embedding with Boosted Low-Rank Matrix Approximation”. The procedure uses non-negative matrix factorization iteratively to decompose the residuals obtained by previous factorization models. The base target matrix is a pooled sum of adjacency matrix powers.

Parameters:
  • dimensions (int) – Number of individual embedding dimensions. Default is 8.
  • iterations (int) – Number of boosting iterations. Default is 16.
  • order (int) – Number of adjacency matrix powers. Default is 2.
  • alpha (float) – NMF regularization parameter. Default is 0.01.
fit(graph)[source]

Fitting a BoostNE model.

Arg types:
  • graph (NetworkX graph) - The graph to be embedded.
get_embedding()[source]

Getting the node embedding.

Return types:
  • embedding (Numpy array) - The embedding of nodes.
class Walklets(walk_number=10, walk_length=80, dimensions=32, workers=4, window_size=4, epochs=1, learning_rate=0.05, min_count=1)[source]

An implementation of “Walklets” from the ASONAM ‘17 paper “Don’t Walk, Skip! Online Learning of Multi-scale Network Embeddings”. The procedure uses random walks to approximate the pointwise mutual information matrix obtained by individual normalized adjacency matrix powers. These are all decomposed by an approximate factorization technique and the embeddings are concatenated together.

Parameters:
  • walk_number (int) – Number of random walks. Default is 10.
  • walk_length (int) – Length of random walks. Default is 80.
  • dimensions (int) – Dimensionality of embedding. Default is 32.
  • workers (int) – Number of cores. Default is 4.
  • window_size (int) – Matrix power order. Default is 4.
  • epochs (int) – Number of epochs. Default is 1.
  • learning_rate (float) – HogWild! learning rate. Default is 0.05.
  • min_count (int) – Minimal count of node occurences. Default is 1.
fit(graph)[source]

Fitting a Walklets model.

Arg types:
  • graph (NetworkX graph) - The graph to be embedded.
get_embedding()[source]

Getting the node embedding.

Return types:
  • embedding (Numpy array) - The embedding of nodes.
class GraRep(dimensions=32, iteration=10, order=5, seed=42)[source]

An implementation of “GraRep” from the CIKM ‘15 paper “GraRep: Learning Graph Representations with Global Structural Information”. The procedure uses sparse truncated SVD to learn embeddings for the powers of the PMI matrix computed from powers of the normalized adjacency matrix.

Parameters:
  • dimensions (int) – Number of individual embedding dimensions. Default is 32.
  • iteration (int) – Number of SVD iterations. Default is 10.
  • order (int) – Number of PMI matrix powers. Default is 5.
  • seed (int) – SVD random seed. Default is 42.
fit(graph)[source]

Fitting a GraRep model.

Arg types:
  • graph (NetworkX graph) - The graph to be embedded.
get_embedding()[source]

Getting the node embedding.

Return types:
  • embedding (Numpy array) - The embedding of nodes.
class DeepWalk(walk_number=10, walk_length=80, dimensions=128, workers=4, window_size=5, epochs=1, learning_rate=0.05, min_count=1)[source]

An implementation of “DeepWalk” from the KDD ‘14 paper “DeepWalk: Online Learning of Social Representations”. The procedure uses random walks to approximate the pointwise mutual information matrix obtained by pooling normalized adjacency matrix powers. This matrix is decomposed by an approximate factorization technique.

Parameters:
  • walk_number (int) – Number of random walks. Default is 10.
  • walk_length (int) – Length of random walks. Default is 80.
  • dimensions (int) – Dimensionality of embedding. Default is 128.
  • workers (int) – Number of cores. Default is 4.
  • window_size (int) – Matrix power order. Default is 5.
  • epochs (int) – Number of epochs. Default is 1.
  • learning_rate (float) – HogWild! learning rate. Default is 0.05.
  • min_count (int) – Minimal count of node occurences. Default is 1.
fit(graph)[source]

Fitting a DeepWalk model.

Arg types:
  • graph (NetworkX graph) - The graph to be embedded.
get_embedding()[source]

Getting the node embedding.

Return types:
  • embedding (Numpy array) - The embedding of nodes.
class NMFADMM(dimensions=32, iterations=100, rho=1.0)[source]

An implementation of “NMF-ADMM” from the ICASSP ‘14 paper “Alternating Direction Method of Multipliers for Non-Negative Matrix Factorization with the Beta-Divergence”. The procedure learns an embedding of the normalized adjacency matrix with by using the alternating direction method of multipliers to solve a non negative matrix factorization problem.

Parameters:
  • dimensions (int) – Number of individual embedding dimensions. Default is 32.
  • iterations (int) – Number of ADMM iterations. Default is 100.
  • rho (float) – ADMM Tuning parameter. Default is 1.0.
fit(graph)[source]

Fitting an NMF model on the normalized adjacency matrix with ADMM.

Arg types:
  • graph (NetworkX graph) - The graph to be embedded.
get_embedding()[source]

Getting the node embedding.

Return types:
  • embedding (Numpy array) - The embedding of nodes.
class LaplacianEigenmaps(dimensions=128)[source]

An implementation of “Laplacian Eigenmaps” from the NIPS ‘01 paper “Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering”. The procedure extracts the eigenvectors corresponding to the largest values of the graph Laplacian. These vectors are used as the node embedding.

Parameters:dimensions (int) – Dimensionality of embedding. Default is 128.
fit(graph)[source]

Fitting a Laplacian EigenMaps model.

Arg types:
  • graph (NetworkX graph) - The graph to be embedded.
get_embedding()[source]

Getting the node embedding.

Return types:
  • embedding (Numpy array) - The embedding of nodes.

Structural node embedding

class GraphWave(sample_number=200, step_size=0.1, heat_coefficient=1.0, approximation=100, mechanism='approximate', switch=1000)[source]

An implementation of “GraphWave” from the KDD ‘18 paper “Learning Structural Node Embeddings Via Diffusion Wavelets”. The procedure first calculates the graph wavelets using a heat kernel. The wavelets are treated as probability distributions over nodes from a source node. Using these the characteristic function is evaluated at certain gird points to learn structural node embeddings of the vertices.

Parameters:
  • sample_number (int) – Number of evaluation points. Default is 200.
  • step_size (float) – Grid point step size. Default is 0.1.
  • heat_coefficient (float) – Heat kernel coefficient. Default is 1.0.
  • approximation (int) – Chebyshev polynomial order. Default is 100.
  • mechanism (str) – Wavelet calculation method ‘exact’ or ‘approximate’. Default is ‘approximate’.
  • switch (int) – Vertex cardinality when the wavelet calculation method switches to approximation. Default is 1000.
fit(graph)[source]

Fitting a GraphWave model.

Arg types:
  • graph (NetworkX graph) - The graph to be embedded.
get_embedding()[source]

Getting the node embedding.

Return types:
  • embedding (Numpy array) - The embedding of nodes.
class Role2Vec(walk_number=10, walk_length=80, dimensions=128, workers=4, window_size=2, epochs=1, learning_rate=0.05, down_sampling=0.0001, min_count=10, wl_iterations=2)[source]

An implementation of “Role2vec” from the IJCAI ‘18 paper “Learning Role-based Graph Embeddings”. The procedure uses random walks to approximate the pointwise mutual information matrix obtained by multiplying the pooled adjacency power matrix with a structural feature matrix (in this case Weisfeiler-Lehman features). This way one gets structural node embeddings.

Parameters:
  • walk_number (int) – Number of random walks. Default is 10.
  • walk_length (int) – Length of random walks. Default is 80.
  • dimensions (int) – Dimensionality of embedding. Default is 128.
  • workers (int) – Number of cores. Default is 4.
  • window_size (int) – Matrix power order. Default is 2.
  • epochs (int) – Number of epochs. Default is 1.
  • learning_rate (float) – HogWild! learning rate. Default is 0.05.
  • down_sampling (float) – Down sampling frequency. Default is 0.0001.
  • min_count (int) – Minimal count of feature occurences. Default is 10.
  • wl_iterations (int) – Number of Weisfeiler-Lehman hashing iterations. Default is 2.
fit(graph)[source]

Fitting a Role2vec model.

Arg types:
  • graph (NetworkX graph) - The graph to be embedded.
get_embedding()[source]

Getting the node embedding.

Return types:
  • embedding (Numpy array) - The embedding of nodes.

Attributed node embedding

class MUSAE(walk_number=5, walk_length=80, dimensions=32, workers=4, window_size=3, epochs=5, learning_rate=0.05, down_sampling=0.0001, min_count=1)[source]

An implementation of “MUSAE” from the Arxiv ‘19 paper “MUSAE: Multi-Scale Attributed Node Embedding”. The procedure does attributed random walks to approximate the adjacency matrix power node feature matrix products. The matrices are decomposed implicitly by a Skip-Gram style optimizer. The individual embeddings are concatenated together to form a multi-scale attributed node embedding. This way the feature distributions at different scales are separable.

Parameters:
  • walk_number (int) – Number of random walks. Default is 10.
  • walk_length (int) – Length of random walks. Default is 80.
  • dimensions (int) – Dimensionality of embedding. Default is 32.
  • workers (int) – Number of cores. Default is 4.
  • window_size (int) – Matrix power order. Default is 3.
  • epochs (int) – Number of epochs. Default is 1.
  • learning_rate (float) – HogWild! learning rate. Default is 0.05.
  • down_sampling (float) – Down sampling rate in the corpus. Default is 0.0001.
  • min_count (int) – Minimal count of node occurences. Default is 1.
fit(graph, X)[source]

Fitting a MUSAE model.

Arg types:
  • graph (NetworkX graph) - The graph to be embedded.
  • X (Scipy COO array) - The binary matrix of node features.
get_embedding()[source]

Getting the node embedding.

Return types:
  • embedding (Numpy array) - The embedding of nodes.
class SINE(walk_number=10, walk_length=80, dimensions=128, workers=4, window_size=5, epochs=1, learning_rate=0.05, min_count=1)[source]

An implementation of “SINE” from the ICDM ‘18 paper “SINE: Scalable Incomplete Network Embedding”. The procedure implicitly factorizes a joint adjacency matrix power and feature matrix. The decomposition happens on truncated random walks and the adjacency matrix powers are pooled together.

Parameters:
  • walk_number (int) – Number of random walks. Default is 10.
  • walk_length (int) – Length of random walks. Default is 80.
  • dimensions (int) – Dimensionality of embedding. Default is 128.
  • workers (int) – Number of cores. Default is 4.
  • window_size (int) – Matrix power order. Default is 5.
  • epochs (int) – Number of epochs. Default is 1.
  • learning_rate (float) – HogWild! learning rate. Default is 0.05.
  • min_count (int) – Minimal count of node occurences. Default is 1.
fit(graph, X)[source]

Fitting a SINE model.

Arg types:
  • graph (NetworkX graph) - The graph to be embedded.
  • X (Scipy COO array) - The matrix of node features.
get_embedding()[source]

Getting the node embedding.

Return types:
  • embedding (Numpy array) - The embedding of nodes.
class BANE(dimensions=32, svd_iterations=20, seed=42, alpha=0.3, iterations=100, binarization_iterations=20)[source]

An implementation of “BANE” from the ICDM ‘18 paper “Binarized Attributed Network Embedding Class”. The procedure first calculates the truncated SVD of an adjacency - feature matrix product. This matrix is further decomposed by a binary CCD based technique.

Parameters:
  • dimensions (int) – Number of embedding dimensions. Default is 32.
  • svd_iterations (int) – SVD iteration count. Default is 20.
  • seed (int) – Random seed. Default is 42.
  • alpha (float) – Kernel matrix inversion parameter. Default is 0.3.
  • iterations (int) – Matrix decomposition iterations. Default is 100.
  • binarization_iterations (int) – Binarization iterations. Default is 20.
fit(graph, X)[source]

Fitting a BANE model.

Arg types:
  • graph (NetworkX graph) - The graph to be embedded.
  • X (Scipy COO or Numpy array) - The matrix of node features.
get_embedding()[source]

Getting the node embedding.

Return types:
  • embedding (Numpy array) - The embedding of nodes.
class TENE(dimensions=32, lower_control=1e-15, alpha=0.1, beta=0.1, iterations=200)[source]

An implementation of “TENE” from the ICPR ‘18 paper “Enhanced Network Embedding with Text Information”. The procedure jointly factorizes the adjacency and node feature matrices using alternating least squares.

Parameters:
  • dimensions (int) – Number of embedding dimensions. Default is 32.
  • lower_control (float) – Embedding score minimal value. Default is 10**-15.
  • alpha (float) – Adjacency matrix regularization coefficient. Default is 0.1.
  • beta (float) – Feature matrix regularization coefficient. Default is 0.1.
  • iterations (int) – ALS iterations. Default is 200.
fit(graph, T)[source]

Fitting a TENE model.

Arg types:
  • graph (NetworkX graph) - The graph to be embedded.
  • T (Scipy COO or Numpy array) - The matrix of node features.
get_embedding()[source]

Getting the node embedding.

Return types:
  • embedding (Numpy array) - The embedding of nodes.
class TADW(dimensions=32, reduction_dimensions=64, svd_iterations=20, seed=42, alpha=0.01, iterations=10, lambd=10.0)[source]

An implementation of “TADW” from the IJCAI ‘15 paper “Network Representation Learning with Rich Text Information”. The procedure uses the node attribute matrix with a factorization matrix to reproduce a power of the adjacency matrix to create representations.

Parameters:
  • dimensions (int) – Number of embedding dimensions. Default is 32.
  • reduction_dimensions (int) – SVD reduction dimensions. Default is 64.
  • svd_iterations (int) – SVD iteration count. Default is 20.
  • seed (int) – Random seed. Default is 42.
  • alpha (float) – Learning rate. Default is 0.01.
  • iterations (int) – Matrix decomposition iterations. Default is 10.
  • lambd (float) – Regularization coefficient. Default is 10.0.
fit(graph, X)[source]

Fitting a TADW model.

Arg types:
  • graph (NetworkX graph) - The graph to be embedded.
  • X (Scipy COO or Numpy array) - The matrix of node features.
get_embedding()[source]

Getting the node embedding.

Return types:
  • embedding (Numpy array) - The embedding of nodes.
class FSCNMF(dimensions=32, lower_control=1e-15, iterations=500, alpha_1=1000.0, alpha_2=1.0, alpha_3=1.0, beta_1=1000.0, beta_2=1.0, beta_3=1.0)[source]

An implementation of “FCNMF” from the Arxiv ‘18 paper “Fusing Structure and Content via Non-negative Matrix Factorization for Embedding Information Networks”. The procedure uses a joint matrix factorization technique on the adjacency and feature matrices. The node and feature embeddings are co-regularized for alignment of the embedding spaces.

Parameters:
  • dimensions (int) – Number of embedding dimensions. Default is 32.
  • lower_control (float) – Embedding score minimal value. Default is 10**-15.
  • iterations (int) – Power iterations. Default is 200.
  • alpha_1 (float) – Alignment parameter for adjacency matrix. Default is 1000.0.
  • alpha_2 (float) – Adjacency basis regularization. Default is 1.0.
  • alpha_3 (float) – Adjacency features regularization. Default is 1.0.
  • beta_1 (float) – Alignment parameter for feature matrix. Default is 1000.0.
  • beta_2 (float) – Attribute basis regularization. Default is 1.0.
fit(graph, X)[source]

Fitting a TENE model.

Arg types:
  • graph (NetworkX graph) - The graph to be embedded.
  • X (Scipy COO or Numpy array) - The matrix of node features.
get_embedding()[source]

Getting the node embedding.

Return types:
  • embedding (Numpy array) - The embedding of nodes.

Meta node embedding

class NEU(L1=0.5, L2=0.25, T=1)[source]

An implementation of “NEU” from the IJCAI 17 paper “Fast Network Embedding Enhancement via High Order Proximity Approximation”. The procedure uses an arbitrary embedding and augments it by higher order proximities wiht a recursive meta learning algorithm. :param L1: Weight of lower order proximities. Defauls is 0.5 :type L1: float :param L2: Weight of higer order proximities. Default is 0.25. :type L2: float :param T: Number of iterations. Default is 1. :type T: int

fit(graph, model)[source]

Fitting a model and performing NEU.

Parameters:
  • graph * (*) –
  • model * (*) –
get_embedding()[source]

Getting the node embedding.

Return types:
  • embedding (Numpy array) - The embedding of nodes.

Whole graph embedding

class GL2Vec(wl_iterations=2, dimensions=128, workers=4, down_sampling=0.0001, epochs=10, learning_rate=0.025, min_count=5, seed=42)[source]

An implementation of “GL2Vec” from the ICONIP ‘19 paper “GL2vec: Graph Embedding Enriched by Line Graphs with Edge Features”. First, the algorithm creates the line graph of each graph in the graph dataset. The procedure creates Weisfeiler-Lehman tree features for nodes in graphs. Using these features a document (graph) - feature co-occurence matrix is decomposed in order to generate representations for the graphs.

The procedure assumes that nodes have no string feature present and the WL-hashing defaults to the degree centrality. However, if a node feature with the key “feature” is supported for the nodes the feature extraction happens based on the values of this key.

Parameters:
  • wl_iterations (int) – Number of Weisfeiler-Lehman iterations. Default is 2.
  • dimensions (int) – Dimensionality of embedding. Default is 128.
  • workers (int) – Number of cores. Default is 4.
  • down_sampling (float) – Down sampling frequency. Default is 0.0001.
  • epochs (int) – Number of epochs. Default is 10.
  • learning_rate (float) – HogWild! learning rate. Default is 0.025.
  • min_count (int) – Minimal count of graph feature occurences. Default is 5.
  • seed (int) – Random seed for the model. Default is 42.
fit(graphs)[source]

Fitting a GL2Vec model.

Arg types:
  • graphs (List of NetworkX graphs) - The graphs to be embedded.
get_embedding()[source]

Getting the embedding of graphs.

Return types:
  • embedding (Numpy array) - The embedding of graphs.
class NetLSD(scale_min=-2.0, scale_max=2.0, scale_steps=250, approximations=200)[source]

An implementation of “NetLSD” from the KDD ‘18 paper “NetLSD: Hearing the Shape of a Graph”. The procedure calculate the heat kernel trace of the normalized Laplacian matrix over a vector of time scales. If the matrix is large it switches to an approximation of the eigenvalues.

Parameters:
  • scale_min (int) – Time scale interval minimum. Default is -2.0.
  • scale_max (int) – Time scale interval maximum. Default is 2.0.
  • scale_steps (int) – Number of steps in time scale. Default is 250.
  • scale_approximations (int) – Number of eigenvalue approximations. Default is 200.
fit(graphs)[source]

Fitting a NetLSD model.

Arg types:
  • graphs (List of NetworkX graphs) - The graphs to be embedded.
get_embedding()[source]

Getting the embedding of graphs.

Return types:
  • embedding (Numpy array) - The embedding of graphs.
class SF(dimensions=128)[source]

An implementation of “SF” from the NeurIPS Relational Representation Learning Workshop ‘18 paper “A Simple Baseline Algorithm for Graph Classification”. The procedure calculates the k lowest egeinvalues of the normalized Laplacian. If the graph has a lower number of eigenvalues than k the representation is padded with zeros.

Parameters:dimensions (int) – Number of lowest eigenvalues. Default is 128.
fit(graphs)[source]

Fitting an SF model.

Arg types:
  • graphs (List of NetworkX graphs) - The graphs to be embedded.
get_embedding()[source]

Getting the embedding of graphs.

Return types:
  • embedding (Numpy array) - The embedding of graphs.
class FGSD(hist_bins=200, hist_range=20)[source]

An implementation of “FGSD” from the NeurIPS ‘17 paper “Hunt For The Unique, Stable, Sparse And Fast Feature Learning On Graphs”. The procedure calculates the Moore-Penrose spectrum of the normalized Laplacian. Using this spectrum the histogram of the spectral features is used as a whole graph representation.

Parameters:
  • hist_bins (int) – Number of histogram bins. Default is 200.
  • hist_range (int) – Histogram range considered. Default is 20.
fit(graphs)[source]

Fitting a FGSD model.

Arg types:
  • graphs (List of NetworkX graphs) - The graphs to be embedded.
get_embedding()[source]

Getting the embedding of graphs.

Return types:
  • embedding (Numpy array) - The embedding of graphs.
class Graph2Vec(wl_iterations=2, attributed=False, dimensions=128, workers=4, down_sampling=0.0001, epochs=10, learning_rate=0.025, min_count=5, seed=42)[source]

An implementation of “Graph2Vec” from the MLGWorkshop ‘17 paper “Graph2Vec: Learning Distributed Representations of Graphs”. The procedure creates Weisfeiler-Lehman tree features for nodes in graphs. Using these features a document (graph) - feature co-occurence matrix is decomposed in order to generate representations for the graphs.

The procedure assumes that nodes have no string feature present and the WL-hashing defaults to the degree centrality. However, if a node feature with the key “feature” is supported for the nodes the feature extraction happens based on the values of this key.

Parameters:
  • wl_iterations (int) – Number of Weisfeiler-Lehman iterations. Default is 2.
  • attributed (bool) – Presence of graph attributes. Default is False.
  • dimensions (int) – Dimensionality of embedding. Default is 128.
  • workers (int) – Number of cores. Default is 4.
  • down_sampling (float) – Down sampling frequency. Default is 0.0001.
  • epochs (int) – Number of epochs. Default is 10.
  • learning_rate (float) – HogWild! learning rate. Default is 0.025.
  • min_count (int) – Minimal count of graph feature occurences. Default is 5.
  • seed (int) – Random seed for the model. Default is 42.
fit(graphs)[source]

Fitting a Graph2Vec model.

Arg types:
  • graphs (List of NetworkX graphs) - The graphs to be embedded.
get_embedding()[source]

Getting the embedding of graphs.

Return types:
  • embedding (Numpy array) - The embedding of graphs.