karateclub

Overlapping community detection

class EgoNetSplitter(resolution: float = 1.0, seed: int = 42, weight: Optional[str] = 'weight')[source]

An implementation of “Ego-Splitting” from the KDD ‘17 paper “Ego-Splitting Framework: from Non-Overlapping to Overlapping Clusters”. The tool first creates the ego-nets of nodes. A persona-graph is created which is clustered by the Louvain method. The resulting overlapping cluster memberships are stored as a dictionary.

Parameters:
  • resolution (float) – Resolution parameter of Python Louvain. Default 1.0.
  • seed (int) – Random seed value. Default is 42.
  • weight (str) – the key in the graph to use as weight. Default to ‘weight’. Specify None to force using an unweighted version of the graph.
fit(graph: networkx.classes.graph.Graph)[source]

Fitting an Ego-Splitter clustering model.

Arg types:
  • graph (NetworkX graph) - The graph to be clustered.
get_memberships() → Dict[int, int][source]

Getting the cluster membership of nodes.

Return types:
  • memberships (dictionary of lists) - Cluster memberships.
class DANMF(layers: List[int] = [32, 8], pre_iterations: int = 100, iterations: int = 100, seed: int = 42, lamb: float = 0.01)[source]

An implementation of “DANMF” from the CIKM ‘18 paper “Deep Autoencoder-like Nonnegative Matrix Factorization for Community Detection”. The procedure uses telescopic non-negative matrix factorization in order to learn a cluster membership distribution over nodes. The method can be used in an overlapping and non-overlapping way.

Parameters:
  • layers (list) – Autoencoder layer sizes in a list of integers. Default [32, 8].
  • pre_iterations (int) – Number of pre-training epochs. Default 100.
  • iterations (int) – Number of training epochs. Default 100.
  • seed (int) – Random seed for weight initializations. Default 42.
  • lamb (float) – Regularization parameter. Default 0.01.
  • seed – Random seed value. Default is 42.
fit(graph: networkx.classes.graph.Graph)[source]

Fitting a DANMF clustering model.

Arg types:
  • graph (NetworkX graph) - The graph to be clustered.
get_embedding() → numpy.array[source]

Getting the bottleneck layer embedding.

Return types:
  • embedding (Numpy array) - The bottleneck layer embedding of nodes.
get_memberships() → Dict[int, int][source]

Getting the cluster membership of nodes.

Return types:
  • memberships (dict): Node cluster memberships.
class NNSED(dimensions: int = 32, iterations: int = 10, seed: int = 42, noise: float = 1e-06)[source]

An implementation of “NNSED” from the CIKM ‘17 paper “A Non-negative Symmetric Encoder-Decoder Approach for Community Detection”. The procedure uses non-negative matrix factorization in order to learn an unnormalized cluster membership distribution over nodes. The method can be used in an overlapping and non-overlapping way.

Parameters:
  • layers (int) – Embedding layer size. Default is 32.
  • iterations (int) – Number of training epochs. Default 10.
  • seed (int) – Random seed for weight initializations. Default 42.
  • noise (float) – Random noise for normalization stability. Default is 10**-6.
fit(graph: networkx.classes.graph.Graph)[source]

Fitting an NNSED clustering model.

Arg types:
  • graph (NetworkX graph) - The graph to be clustered.
get_embedding() → numpy.array[source]

Getting the bottleneck layer embedding.

Return types:
  • embedding (Numpy array) - The embedding of nodes.
get_memberships() → Dict[int, int][source]

Getting the cluster membership of nodes.

Return types:
  • memberships (dict) - Node cluster memberships.
class MNMF(dimensions: int = 128, clusters: int = 10, lambd: float = 0.2, alpha: float = 0.05, beta: float = 0.05, iterations: int = 200, lower_control: float = 1e-15, eta: float = 5.0, seed: int = 42)[source]

An implementation of “M-NMF” from the AAAI ‘17 paper “Community Preserving Network Embedding”. The procedure uses joint non-negative matrix factorization with modularity based regularization in order to learn a cluster membership distribution over nodes. The method can be used in an overlapping and non-overlapping way.

Parameters:
  • dimensions (int) – Number of dimensions. Default is 128.
  • clusters (int) – Number of clusters. Default is 10.
  • lambd (float) – KKT penalty. Default is 0.2
  • alpha (float) – Clustering penalty. Default is 0.05.
  • beta (float) – Modularity regularization penalty. Default is 0.05.
  • iterations (int) – Number of power iterations. Default is 200.
  • lower_control (float) – Floating point overflow control. Default is 10**-15.
  • eta (float) – Similarity mixing parameter. Default is 5.0.
  • seed (int) – Random seed value. Default is 42.
fit(graph: networkx.classes.graph.Graph)[source]

Fitting an M-NMF clustering model.

Arg types:
  • graph (NetworkX graph) - The graph to be clustered.
get_cluster_centers() → numpy.array[source]

Getting the node embedding.

Return types:
  • centers (Numpy array) - The cluster centers.
get_embedding() → numpy.array[source]

Getting the node embedding.

Return types:
  • embedding (Numpy array) - The embedding of nodes.
get_memberships() → Dict[int, int][source]

Getting the cluster membership of nodes.

Return types:
  • memberships (dict) - Node cluster memberships.
class BigClam(dimensions: int = 8, iterations: int = 50, learning_rate: float = 0.005, seed: int = 42)[source]

An implementation of “BigClam” from the WSDM ‘13 paper “Overlapping Community Detection at Scale: A Non-negative Matrix Factorization Approach”. The procedure uses gradient ascent to create an embedding which is used for deciding the node-cluster affiliations.

Parameters:
  • dimensions (int) – Number of embedding dimensions. Default 8.
  • iterations (int) – Number of training iterations. Default 50.
  • learning_rate (float) – Gradient ascent learning rate. Default is 0.005.
  • seed (int) – Random seed value. Default is 42.
fit(graph: networkx.classes.graph.Graph)[source]

Fitting a BigClam clustering model.

Arg types:
  • graph (NetworkX graph) - The graph to be clustered.
get_embedding() → numpy.array[source]

Getting the node embedding.

Return types:
  • embedding (Numpy array) - The embedding of nodes.
get_memberships() → Dict[int, int][source]

Getting the cluster membership of nodes.

Return types:
  • memberships (dict) - Node cluster memberships.
class SymmNMF(dimensions: int = 32, iterations: int = 200, rho: float = 100.0, seed: int = 42)[source]

An implementation of “Symm-NMF” from the SDM’12 paper “Symmetric Nonnegative Matrix Factorization for Graph Clustering”. The procedure decomposed the second power od the normalized adjacency matrix with an ADMM based non-negative matrix factorization based technique. This results in a node embedding and each node is associated with an embedding factor in the created latent space.

Parameters:
  • dimensions (int) – Number of dimensions. Default is 32.
  • iterations (int) – Number of power iterations. Default is 200.
  • rho (float) – Regularization tuning parameter. Default is 100.0.
  • seed (int) – Random seed value. Default is 42.
fit(graph: networkx.classes.graph.Graph)[source]

Fitting a Symm-NMF clustering model.

Arg types:
  • graph (NetworkX graph) - The graph to be clustered.
get_embedding() → numpy.array[source]

Getting the node embedding.

Return types:
  • embedding (Numpy array) - The embedding of nodes.
get_memberships() → Dict[int, int][source]

Getting the cluster membership of nodes.

Return types:
  • memberships (dict) - Node cluster memberships.

Non-overlapping community detection

class GEMSEC(walk_number: int = 5, walk_length: int = 80, dimensions: int = 32, negative_samples: int = 5, window_size: int = 5, learning_rate: float = 0.1, clusters: int = 10, gamma: float = 0.1, seed: int = 42)[source]

An implementation of “GEMSEC” from the ASONAM ‘19 paper “GEMSEC: Graph Embedding with Self Clustering”. The procedure uses random walks to approximate the pointwise mutual information matrix obtained by pooling normalized adjacency matrix powers. This matrix is decomposed by an approximate factorization technique which is combined with a k-means like clustering cost. A node embedding and clustering are learned jointly.

Parameters:
  • walk_number (int) – Number of random walks. Default is 5.
  • walk_length (int) – Length of random walks. Default is 80.
  • dimensions (int) – Dimensionality of embedding. Default is 32.
  • negative_samples (int) – Number of negative samples. Default is 5.
  • window_size (int) – Matrix power order. Default is 5.
  • learning_rate (float) – Gradient descent learning rate. Default is 0.1.
  • clusters (int) – Number of cluster centers. Default is 10.
  • gamma (float) – Clustering cost weight coefficient. Default is 0.1.
  • seed (int) – Random seed value. Default is 42.
fit(graph: networkx.classes.graph.Graph)[source]

Fitting a GEMSEC model.

Arg types:
  • graph (NetworkX graph) - The graph to be embedded.
get_embedding() → numpy.array[source]

Getting the node embedding.

Return types:
  • embedding (Numpy array): The embedding of nodes.
get_memberships() → Dict[int, int][source]

Getting the cluster membership of nodes.

Return types:
  • memberships (dict): Node cluster memberships.
class EdMot(component_count: int = 2, cutoff: int = 50, seed: int = 42)[source]

An implementation of “Edge Motif Clustering” from the KDD ‘19 paper “EdMot: An Edge Enhancement Approach for Motif-aware Community Detection”. The tool first creates the graph of higher order motifs. This graph is clustered by the Louvain method. The resulting cluster memberships are stored as a dictionary.

Parameters:
  • component_count (int) – Number of extracted motif hypergraph components. Default is 2.
  • cutoff (int) – Motif edge cut-off value. Default is 50.
  • seed (int) – Random seed value. Default is 42.
fit(graph: networkx.classes.graph.Graph)[source]

Fitting an Edge Motif clustering model.

Arg types:
  • graph (NetworkX graph) - The graph to be clustered.
get_memberships() → Dict[int, int][source]

Getting the cluster membership of nodes.

Return types:
  • memberships (dictionary of ints) - Cluster memberships.
class SCD(iterations: int = 25, eps: float = 1e-06, seed: int = 42)[source]

An implementation of “SCD” from the WWW ‘14 paper “High Quality, Scalable and Parallel Community Detection for Large Real Graphs”. The procedure greedily optimizes the approximate weighted community clustering metric. First, clusters are built around highly clustered nodes. Second, we refine the initial partition by using the approximate WCC. These refinements happen for the whole vertex set.

Parameters:
  • iterations (int) – Refinemeent iterations. Default is 25.
  • eps (float) – Epsilon score for zero division correction. Default is 10**-6.
  • seed (int) – Random seed value. Default is 42.
fit(graph: networkx.classes.graph.Graph)[source]

Fitting a Label Propagation clustering model.

Arg types:
  • graph (NetworkX graph) - The graph to be clustered.
get_memberships() → Dict[int, int][source]

Getting the cluster membership of nodes.

Return types:
  • memberships (dict) - Node cluster memberships.
class LabelPropagation(seed: int = 42, iterations: int = 100)[source]

An implementation of “Label Propagation Clustering” from the Physical Review ‘07 paper “Near Linear Time Algorithm to Detect Community Structures in Large-Scale Networks”. The tool executes a series of label propagations with unique labels. The final labels are used as cluster memberships.

Parameters:
  • seed (int) – Random seed. Default is 42.
  • iterations (int) – Propagation iterations. Default is 100.
fit(graph: networkx.classes.graph.Graph)[source]

Fitting a Label Propagation clustering model.

Arg types:
  • graph (NetworkX graph) - The graph to be clustered.
get_memberships() → Dict[int, int][source]

Getting the cluster membership of nodes.

Return types:
  • memberships (dict) - Node cluster memberships.

Neighbourhood-based node embedding

class SocioDim(dimensions: int = 128, seed: int = 42)[source]

An implementation of “SocioDim” from the KDD ‘09 paper “Relational Learning via Latent Social Dimensions”. The procedure extracts the eigenvectors corresponding to the largest eigenvalues of the graph modularity matrix. These vectors are used as the node embedding.

Parameters:
  • dimensions (int) – Dimensionality of embedding. Default is 128.
  • seed (int) – Random seed value. Default is 42.
fit(graph: networkx.classes.graph.Graph)[source]

Fitting a Social Dimensions model.

Arg types:
  • graph (NetworkX graph) - The graph to be embedded.
get_embedding() → numpy.array[source]

Getting the node embedding.

Return types:
  • embedding (Numpy array) - The embedding of nodes.
class RandNE(dimensions: int = 128, alphas: list = [0.5, 0.5], seed: int = 42)[source]

An implementation of “RandNE” from the ICDM ‘18 paper “Billion-scale Network Embedding with Iterative Random Projection”. The procedure uses normalized adjacency matrix based smoothing on an orthogonalized random normally generate base node embedding matrix.

Parameters:
  • dimensions (int) – Number of embedding dimension. Default is 128.
  • alphas (list) – Smoothing weights for adjacency matrix powers. Default is [0.5, 0.5].
  • seed (int) – Random seed. Default is 42.
fit(graph: networkx.classes.graph.Graph)[source]

Fitting a NetMF model.

Arg types:
  • graph (NetworkX graph) - The graph to be embedded.
get_embedding() → numpy.array[source]

Getting the node embedding.

Return types:
  • embedding (Numpy array) - The embedding of nodes.
class GLEE(dimensions: int = 128, seed: int = 42)[source]

An implementation of “Geometric Laplacian Eigenmaps” from the Journal of Complex Networks ‘20 paper “GLEE: Geometric Laplacian Eigenmap Embedding”. The procedure extracts the eigenvectors corresponding to the largest eigenvalues of the graph Laplacian. These vectors are used as the node embedding.

Parameters:
  • dimensions (int) – Dimensionality of embedding. Default is 128.
  • seed (int) – Random seed value. Default is 42.
fit(graph: networkx.classes.graph.Graph)[source]

Fitting a Geometric Laplacian EigenMaps model.

Arg types:
  • graph (NetworkX graph) - The graph to be embedded.
get_embedding() → numpy.array[source]

Getting the node embedding.

Return types:
  • embedding (Numpy array) - The embedding of nodes.
class Diff2Vec(diffusion_number: int = 10, diffusion_cover: int = 80, dimensions: int = 128, workers: int = 4, window_size: int = 5, epochs: int = 1, learning_rate: float = 0.05, min_count: int = 1, seed: int = 42)[source]

An implementation of “Diff2Vec” from the CompleNet ‘18 paper “Diff2Vec: Fast Sequence Based Embedding with Diffusion Graphs”. The procedure creates diffusion trees from every source node in the graph. These graphs are linearized by a directed Eulerian walk, the walks are used for running the skip-gram algorithm the learn node level neighbourhood based embeddings.

Parameters:
  • diffusion_number (int) – Number of diffusions. Default is 10.
  • diffusion_cover (int) – Number of nodes in diffusion. Default is 80.
  • dimensions (int) – Dimensionality of embedding. Default is 128.
  • workers (int) – Number of cores. Default is 4.
  • window_size (int) – Matrix power order. Default is 5.
  • epochs (int) – Number of epochs. Default is 1.
  • learning_rate (float) – HogWild! learning rate. Default is 0.05.
  • min_count (int) – Minimal count of node occurrences. Default is 1.
  • seed (int) – Random seed value. Default is 42.
fit(graph: networkx.classes.graph.Graph)[source]

Fitting a Diff2Vec model.

Arg types:
  • graph (NetworkX graph) - The graph to be embedded.
get_embedding() → numpy.array[source]

Getting the node embedding.

Return types:
  • embedding (Numpy array) - The embedding of nodes.
class NodeSketch(dimensions: int = 32, iterations: int = 2, decay: float = 0.01, seed: int = 42)[source]

An implementation of “NodeSketch” from the KDD ‘19 paper “NodeSketch: Highly-Efficient Graph Embeddings via Recursive Sketching”. The procedure starts by sketching the self-loop-augmented adjacency matrix of the graph to output low-order node embeddings, and then recursively generates k-order node embeddings based on the self-loop-augmented adjacency matrix and (k-1)-order node embeddings.

Parameters:
  • dimensions (int) – Embedding dimensions. Default is 32.
  • iterations (int) – Number of iterations (sketch order minus one). Default is 2.
  • decay (float) – Exponential decay rate. Default is 0.01.
  • seed (int) – Random seed value. Default is 42.
fit(graph)[source]

Fitting a NodeSketch model.

Arg types:
  • graph (NetworkX graph) - The graph to be embedded.
get_embedding()[source]

Getting the node embedding.

Return types:
  • embedding (Numpy array) - The embedding of nodes.
class NetMF(dimensions: int = 32, iteration: int = 10, order: int = 2, negative_samples: int = 1, seed: int = 42)[source]

An implementation of “NetMF” from the WSDM ‘18 paper “Network Embedding as Matrix Factorization: Unifying DeepWalk, LINE, PTE, and Node2Vec”. The procedure uses sparse truncated SVD to learn embeddings for the pooled powers of the PMI matrix computed from powers of the normalized adjacency matrix.

Parameters:
  • dimensions (int) – Number of embedding dimension. Default is 32.
  • iteration (int) – Number of SVD iterations. Default is 10.
  • order (int) – Number of PMI matrix powers. Default is 2.
  • negative_samples (in) – Number of negative samples. Default is 1.
  • seed (int) – SVD random seed. Default is 42.
fit(graph: networkx.classes.graph.Graph)[source]

Fitting a NetMF model.

Arg types:
  • graph (NetworkX graph) - The graph to be embedded.
get_embedding() → numpy.array[source]

Getting the node embedding.

Return types:
  • embedding (Numpy array) - The embedding of nodes.
class BoostNE(dimensions: int = 8, iterations: int = 16, order: int = 2, alpha: float = 0.01, seed: int = 42)[source]

An implementation of “BoostNE” from the ASONAM ‘19 paper “Multi-Level Network Embedding with Boosted Low-Rank Matrix Approximation”. The procedure uses non-negative matrix factorization iteratively to decompose the residuals obtained by previous factorization models. The base target matrix is a pooled sum of adjacency matrix powers.

Parameters:
  • dimensions (int) – Number of individual embedding dimensions. Default is 8.
  • iterations (int) – Number of boosting iterations. Default is 16.
  • order (int) – Number of adjacency matrix powers. Default is 2.
  • alpha (float) – NMF regularization parameter. Default is 0.01.
  • seed (int) – Random seed value. Default is 42.
fit(graph: networkx.classes.graph.Graph)[source]

Fitting a BoostNE model.

Arg types:
  • graph (NetworkX graph) - The graph to be embedded.
get_embedding() → numpy.array[source]

Getting the node embedding.

Return types:
  • embedding (Numpy array) - The embedding of nodes.
class Walklets(walk_number: int = 10, walk_length: int = 80, dimensions: int = 32, workers: int = 4, window_size: int = 4, epochs: int = 1, learning_rate: float = 0.05, min_count: int = 1, seed: int = 42)[source]

An implementation of “Walklets” from the ASONAM ‘17 paper “Don’t Walk, Skip! Online Learning of Multi-scale Network Embeddings”. The procedure uses random walks to approximate the pointwise mutual information matrix obtained by individual normalized adjacency matrix powers. These are all decomposed by an approximate factorization technique and the embeddings are concatenated together.

Parameters:
  • walk_number (int) – Number of random walks. Default is 10.
  • walk_length (int) – Length of random walks. Default is 80.
  • dimensions (int) – Dimensionality of embedding. Default is 32.
  • workers (int) – Number of cores. Default is 4.
  • window_size (int) – Matrix power order. Default is 4.
  • epochs (int) – Number of epochs. Default is 1.
  • learning_rate (float) – HogWild! learning rate. Default is 0.05.
  • min_count (int) – Minimal count of node occurrences. Default is 1.
  • seed (int) – Random seed value. Default is 42.
fit(graph: networkx.classes.graph.Graph)[source]

Fitting a Walklets model.

Arg types:
  • graph (NetworkX graph) - The graph to be embedded.
get_embedding() → numpy.array[source]

Getting the node embedding.

Return types:
  • embedding (Numpy array) - The embedding of nodes.
class GraRep(dimensions: int = 32, iteration: int = 10, order: int = 5, seed: int = 42)[source]

An implementation of “GraRep” from the CIKM ‘15 paper “GraRep: Learning Graph Representations with Global Structural Information”. The procedure uses sparse truncated SVD to learn embeddings for the powers of the PMI matrix computed from powers of the normalized adjacency matrix.

Parameters:
  • dimensions (int) – Number of individual embedding dimensions. Default is 32.
  • iteration (int) – Number of SVD iterations. Default is 10.
  • order (int) – Number of PMI matrix powers. Default is 5.
  • seed (int) – SVD random seed. Default is 42.
fit(graph: networkx.classes.graph.Graph)[source]

Fitting a GraRep model.

Arg types:
  • graph (NetworkX graph) - The graph to be embedded.
get_embedding() → numpy.array[source]

Getting the node embedding.

Return types:
  • embedding (Numpy array) - The embedding of nodes.
class DeepWalk(walk_number: int = 10, walk_length: int = 80, dimensions: int = 128, workers: int = 4, window_size: int = 5, epochs: int = 1, learning_rate: float = 0.05, min_count: int = 1, seed: int = 42)[source]

An implementation of “DeepWalk” from the KDD ‘14 paper “DeepWalk: Online Learning of Social Representations”. The procedure uses random walks to approximate the pointwise mutual information matrix obtained by pooling normalized adjacency matrix powers. This matrix is decomposed by an approximate factorization technique.

Parameters:
  • walk_number (int) – Number of random walks. Default is 10.
  • walk_length (int) – Length of random walks. Default is 80.
  • dimensions (int) – Dimensionality of embedding. Default is 128.
  • workers (int) – Number of cores. Default is 4.
  • window_size (int) – Matrix power order. Default is 5.
  • epochs (int) – Number of epochs. Default is 1.
  • learning_rate (float) – HogWild! learning rate. Default is 0.05.
  • min_count (int) – Minimal count of node occurrences. Default is 1.
  • seed (int) – Random seed value. Default is 42.
fit(graph: networkx.classes.graph.Graph)[source]

Fitting a DeepWalk model.

Arg types:
  • graph (NetworkX graph) - The graph to be embedded.
get_embedding() → numpy.array[source]

Getting the node embedding.

Return types:
  • embedding (Numpy array) - The embedding of nodes.
class Node2Vec(walk_number: int = 10, walk_length: int = 80, p: float = 1.0, q: float = 1.0, dimensions: int = 128, workers: int = 4, window_size: int = 5, epochs: int = 1, learning_rate: float = 0.05, min_count: int = 1, seed: int = 42)[source]

An implementation of “Node2Vec” from the KDD ‘16 paper “node2vec: Scalable Feature Learning for Networks”. The procedure uses biased second order random walks to approximate the pointwise mutual information matrix obtained by pooling normalized adjacency matrix powers. This matrix is decomposed by an approximate factorization technique.

Parameters:
  • walk_number (int) – Number of random walks. Default is 10.
  • walk_length (int) – Length of random walks. Default is 80.
  • p (float) – Return parameter (1/p transition probability) to move towards from previous node.
  • q (float) – In-out parameter (1/q transition probability) to move away from previous node.
  • dimensions (int) – Dimensionality of embedding. Default is 128.
  • workers (int) – Number of cores. Default is 4.
  • window_size (int) – Matrix power order. Default is 5.
  • epochs (int) – Number of epochs. Default is 1.
  • learning_rate (float) – HogWild! learning rate. Default is 0.05.
  • min_count (int) – Minimal count of node occurrences. Default is 1.
  • seed (int) – Random seed value. Default is 42.
fit(graph: networkx.classes.graph.Graph)[source]

Fitting a DeepWalk model.

Arg types:
  • graph (NetworkX graph) - The graph to be embedded.
get_embedding() → numpy.array[source]

Getting the node embedding.

Return types:
  • embedding (Numpy array) - The embedding of nodes.
class NMFADMM(dimensions: int = 32, iterations: int = 100, rho: float = 1.0, seed: int = 42)[source]

An implementation of “NMF-ADMM” from the ICASSP ‘14 paper “Alternating Direction Method of Multipliers for Non-Negative Matrix Factorization with the Beta-Divergence”. The procedure learns an embedding of the normalized adjacency matrix with by using the alternating direction method of multipliers to solve a non negative matrix factorization problem.

Parameters:
  • dimensions (int) – Number of individual embedding dimensions. Default is 32.
  • iterations (int) – Number of ADMM iterations. Default is 100.
  • rho (float) – ADMM Tuning parameter. Default is 1.0.
  • seed (int) – Random seed value. Default is 42.
fit(graph: networkx.classes.graph.Graph)[source]

Fitting an NMF model on the normalized adjacency matrix with ADMM.

Arg types:
  • graph (NetworkX graph) - The graph to be embedded.
get_embedding() → numpy.array[source]

Getting the node embedding.

Return types:
  • embedding (Numpy array) - The embedding of nodes.
class LaplacianEigenmaps(dimensions: int = 128, maximum_number_of_iterations: int = 100, seed: int = 42)[source]

An implementation of “Laplacian Eigenmaps” from the NIPS ‘01 paper “Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering”. The procedure extracts the eigenvectors corresponding to the largest eigenvalues of the graph Laplacian. These vectors are used as the node embedding.

Parameters:
  • dimensions (int) – Dimensionality of embedding. Default is 128.
  • maximum_number_of_iterations (int) – Maximum number of iterations to execute with ARPACK. The value will be multiplied by the number of nodes.
  • seed (int) – Random seed value. Default is 42.
fit(graph: networkx.classes.graph.Graph)[source]

Fitting a Laplacian EigenMaps model.

Arg types:
  • graph (NetworkX graph) - The graph to be embedded.
get_embedding() → numpy.array[source]

Getting the node embedding.

Return types:
  • embedding (Numpy array) - The embedding of nodes.

Structural node embedding

class GraphWave(sample_number: int = 200, step_size: float = 0.1, heat_coefficient: float = 1.0, approximation: int = 100, mechanism: str = 'approximate', switch: int = 1000, seed: int = 42)[source]

An implementation of “GraphWave” from the KDD ‘18 paper “Learning Structural Node Embeddings Via Diffusion Wavelets”. The procedure first calculates the graph wavelets using a heat kernel. The wavelets are treated as probability distributions over nodes from a source node. Using these the characteristic function is evaluated at certain gird points to learn structural node embeddings of the vertices.

Parameters:
  • sample_number (int) – Number of evaluation points. Default is 200.
  • step_size (float) – Grid point step size. Default is 0.1.
  • heat_coefficient (float) – Heat kernel coefficient. Default is 1.0.
  • approximation (int) – Chebyshev polynomial order. Default is 100.
  • mechanism (str) – Wavelet calculation method one of: ("exact", "approximate"). Default is ‘approximate’.
  • switch (int) – Vertex cardinality when the wavelet calculation method switches to approximation. Default is 1000.
  • seed (int) – Random seed value. Default is 42.
fit(graph: networkx.classes.graph.Graph)[source]

Fitting a GraphWave model.

Arg types:
  • graph (NetworkX graph) - The graph to be embedded.
get_embedding() → numpy.array[source]

Getting the node embedding.

Return types:
  • embedding (Numpy array) - The embedding of nodes.
class Role2Vec(walk_number: int = 10, walk_length: int = 80, dimensions: int = 128, workers: int = 4, window_size: int = 2, epochs: int = 1, learning_rate: float = 0.05, down_sampling: float = 0.0001, min_count: int = 10, wl_iterations: int = 2, seed: int = 42, erase_base_features: bool = False)[source]

An implementation of “Role2vec” from the IJCAI ‘18 paper “Learning Role-based Graph Embeddings”. The procedure uses random walks to approximate the pointwise mutual information matrix obtained by multiplying the pooled adjacency power matrix with a structural feature matrix (in this case Weisfeiler-Lehman features). This way one gets structural node embeddings.

Parameters:
  • walk_number (int) – Number of random walks. Default is 10.
  • walk_length (int) – Length of random walks. Default is 80.
  • dimensions (int) – Dimensionality of embedding. Default is 128.
  • workers (int) – Number of cores. Default is 4.
  • window_size (int) – Matrix power order. Default is 2.
  • epochs (int) – Number of epochs. Default is 1.
  • learning_rate (float) – HogWild! learning rate. Default is 0.05.
  • down_sampling (float) – Down sampling frequency. Default is 0.0001.
  • min_count (int) – Minimal count of feature occurrences. Default is 10.
  • wl_iterations (int) – Number of Weisfeiler-Lehman hashing iterations. Default is 2.
  • seed (int) – Random seed value. Default is 42.
  • erase_base_features (bool) – Removing the base features. Default is False.
fit(graph: networkx.classes.graph.Graph)[source]

Fitting a Role2vec model.

Arg types:
  • graph (NetworkX graph) - The graph to be embedded.
get_embedding() → numpy.array[source]

Getting the node embedding.

Return types:
  • embedding (Numpy array) - The embedding of nodes.

Attributed node embedding

class FeatherNode(reduction_dimensions: int = 64, svd_iterations: int = 20, theta_max: float = 2.5, eval_points: int = 25, order: int = 5, seed: int = 42)[source]

An implementation of “FEATHER-N” from the CIKM ‘20 paper “Characteristic Functions on Graphs: Birds of a Feather, from Statistical Descriptors to Parametric Models”. The procedure uses characteristic functions of node features with random walk weights to describe node neighborhoods.

Parameters:
  • reduction_dimensions (int) – SVD reduction dimensions. Default is 64.
  • svd_iterations (int) – SVD iteration count. Default is 20.
  • theta_max (float) – Maximal evaluation point. Default is 2.5.
  • eval_points (int) – Number of characteristic function evaluation points. Default is 25.
  • order (int) – Scale - number of adjacency matrix powers. Default is 5.
  • seed (int) – Random seed value. Default is 42.
fit(graph: networkx.classes.graph.Graph, X: Union[numpy.array, scipy.sparse.coo.coo_matrix])[source]

Fitting a FEATHER-N model.

Arg types:
  • graph (NetworkX graph) - The graph to be embedded.
  • X (Scipy COO or Numpy array) - The matrix of node features.
get_embedding() → numpy.array[source]

Getting the node embedding.

Return types:
  • embedding (Numpy array) - The embedding of nodes.
class AE(walk_number=5, walk_length=80, dimensions=32, workers=4, window_size=3, epochs=5, learning_rate=0.05, down_sampling=0.0001, min_count=1, seed=42)[source]

An implementation of “AE” from the Arxiv ‘19 paper “MUSAE: Multi-Scale Attributed Node Embedding”. The procedure does attributed random walks to approximate the pooled adjacency matrix power node feature matrix product. The matrix is decomposed implicitly by a Skip-Gram style optimization problem.

Parameters:
  • walk_number (int) – Number of random walks. Default is 5.
  • walk_length (int) – Length of random walks. Default is 80.
  • dimensions (int) – Dimensionality of embedding. Default is 32.
  • workers (int) – Number of cores. Default is 4.
  • window_size (int) – Matrix power order. Default is 3.
  • epochs (int) – Number of epochs. Default is 1.
  • learning_rate (float) – HogWild! learning rate. Default is 0.05.
  • down_sampling (float) – Down sampling rate in the corpus. Default is 0.0001.
  • min_count (int) – Minimal count of node occurrences. Default is 1.
  • seed (int) – Random seed value. Default is 42.
fit(graph: networkx.classes.graph.Graph, X: Union[numpy.array, scipy.sparse.coo.coo_matrix])[source]

Fitting an AE model.

Arg types:
  • graph (NetworkX graph) - The graph to be embedded.
  • X (Scipy COO array) - The binary matrix of node features.
get_embedding() → numpy.array[source]

Getting the node embedding.

Return types:
  • embedding (Numpy array) - The embedding of nodes.
class MUSAE(walk_number=5, walk_length=80, dimensions=32, workers=4, window_size=3, epochs=5, learning_rate=0.05, down_sampling=0.0001, min_count=1, seed=42)[source]

An implementation of “MUSAE” from the Arxiv ‘19 paper “MUSAE: Multi-Scale Attributed Node Embedding”. The procedure does attributed random walks to approximate the adjacency matrix power node feature matrix products. The matrices are decomposed implicitly by a Skip-Gram style optimizer. The individual embeddings are concatenated together to form a multi-scale attributed node embedding. This way the feature distributions at different scales are separable.

Parameters:
  • walk_number (int) – Number of random walks. Default is 5.
  • walk_length (int) – Length of random walks. Default is 80.
  • dimensions (int) – Dimensionality of embedding. Default is 32.
  • workers (int) – Number of cores. Default is 4.
  • window_size (int) – Matrix power order. Default is 3.
  • epochs (int) – Number of epochs. Default is 1.
  • learning_rate (float) – HogWild! learning rate. Default is 0.05.
  • down_sampling (float) – Down sampling rate in the corpus. Default is 0.0001.
  • min_count (int) – Minimal count of node occurrences. Default is 1.
  • seed (int) – Random seed value. Default is 42.
fit(graph: networkx.classes.graph.Graph, X: Union[numpy.array, scipy.sparse.coo.coo_matrix])[source]

Fitting a MUSAE model.

Arg types:
  • graph (NetworkX graph) - The graph to be embedded.
  • X (Scipy COO array) - The binary matrix of node features.
get_embedding() → numpy.array[source]

Getting the node embedding.

Return types:
  • embedding (Numpy array) - The embedding of nodes.
class SINE(walk_number: int = 10, walk_length: int = 80, dimensions: int = 128, workers: int = 4, window_size: int = 5, epochs: int = 1, learning_rate: float = 0.05, min_count: int = 1, seed: int = 42)[source]

An implementation of “SINE” from the ICDM ‘18 paper “SINE: Scalable Incomplete Network Embedding”. The procedure implicitly factorizes a joint adjacency matrix power and feature matrix. The decomposition happens on truncated random walks and the adjacency matrix powers are pooled together.

Parameters:
  • walk_number (int) – Number of random walks. Default is 10.
  • walk_length (int) – Length of random walks. Default is 80.
  • dimensions (int) – Dimensionality of embedding. Default is 128.
  • workers (int) – Number of cores. Default is 4.
  • window_size (int) – Matrix power order. Default is 5.
  • epochs (int) – Number of epochs. Default is 1.
  • learning_rate (float) – HogWild! learning rate. Default is 0.05.
  • min_count (int) – Minimal count of node occurrences. Default is 1.
  • seed (int) – Random seed value. Default is 42.
fit(graph: networkx.classes.graph.Graph, X: Union[numpy.array, scipy.sparse.coo.coo_matrix])[source]

Fitting a SINE model.

Arg types:
  • graph (NetworkX graph) - The graph to be embedded.
  • X (Scipy COO array) - The matrix of node features.
get_embedding() → numpy.array[source]

Getting the node embedding.

Return types:
  • embedding (Numpy array) - The embedding of nodes.
class BANE(dimensions: int = 32, svd_iterations: int = 20, seed: int = 42, alpha: float = 0.3, iterations: int = 100, binarization_iterations: int = 20)[source]

An implementation of “BANE” from the ICDM ‘18 paper “Binarized Attributed Network Embedding Class”. The procedure first calculates the truncated SVD of an adjacency - feature matrix product. This matrix is further decomposed by a binary CCD based technique.

Parameters:
  • dimensions (int) – Number of embedding dimensions. Default is 32.
  • svd_iterations (int) – SVD iteration count. Default is 20.
  • seed (int) – Random seed. Default is 42.
  • alpha (float) – Kernel matrix inversion parameter. Default is 0.3.
  • iterations (int) – Matrix decomposition iterations. Default is 100.
  • binarization_iterations (int) – Binarization iterations. Default is 20.
  • seed – Random seed value. Default is 42.
fit(graph: networkx.classes.graph.Graph, X: Union[numpy.array, scipy.sparse.coo.coo_matrix])[source]

Fitting a BANE model.

Arg types:
  • graph (NetworkX graph) - The graph to be embedded.
  • X (Scipy COO or Numpy array) - The matrix of node features.
get_embedding() → numpy.array[source]

Getting the node embedding.

Return types:
  • embedding (Numpy array) - The embedding of nodes.
class TENE(dimensions=32, lower_control=1e-15, alpha=0.1, beta=0.1, iterations=200, seed=42)[source]

An implementation of “TENE” from the ICPR ‘18 paper “Enhanced Network Embedding with Text Information”. The procedure jointly factorizes the adjacency and node feature matrices using alternating least squares.

Parameters:
  • dimensions (int) – Number of embedding dimensions. Default is 32.
  • lower_control (float) – Embedding score minimal value. Default is 10**-15.
  • alpha (float) – Adjacency matrix regularization coefficient. Default is 0.1.
  • beta (float) – Feature matrix regularization coefficient. Default is 0.1.
  • iterations (int) – ALS iterations. Default is 200.
  • seed (int) – Random seed value. Default is 42.
fit(graph: networkx.classes.graph.Graph, T: Union[numpy.array, scipy.sparse.coo.coo_matrix])[source]

Fitting a TENE model.

Arg types:
  • graph (NetworkX graph) - The graph to be embedded.
  • T (Scipy COO or Numpy array) - The matrix of node features.
get_embedding() → numpy.array[source]

Getting the node embedding.

Return types:
  • embedding (Numpy array) - The embedding of nodes.
class TADW(dimensions: int = 32, reduction_dimensions: int = 64, svd_iterations: int = 20, seed: int = 42, alpha: float = 0.01, iterations: int = 10, lambd: float = 10.0)[source]

An implementation of “TADW” from the IJCAI ‘15 paper “Network Representation Learning with Rich Text Information”. The procedure uses the node attribute matrix with a factorization matrix to reproduce a power of the adjacency matrix to create representations.

Parameters:
  • dimensions (int) – Number of embedding dimensions. Default is 32.
  • reduction_dimensions (int) – SVD reduction dimensions. Default is 64.
  • svd_iterations (int) – SVD iteration count. Default is 20.
  • seed (int) – Random seed. Default is 42.
  • alpha (float) – Learning rate. Default is 0.01.
  • iterations (int) – Matrix decomposition iterations. Default is 10.
  • lambd (float) – Regularization coefficient. Default is 10.0.
fit(graph: networkx.classes.graph.Graph, X: Union[numpy.array, scipy.sparse.coo.coo_matrix])[source]

Fitting a TADW model.

Arg types:
  • graph (NetworkX graph) - The graph to be embedded.
  • X (Scipy COO or Numpy array) - The matrix of node features.
get_embedding() → numpy.array[source]

Getting the node embedding.

Return types:
  • embedding (Numpy array) - The embedding of nodes.
class FSCNMF(dimensions: int = 32, lower_control: float = 1e-15, iterations: int = 500, alpha_1: float = 1000.0, alpha_2: float = 1.0, alpha_3: float = 1.0, beta_1: float = 1000.0, beta_2: float = 1.0, beta_3: float = 1.0, seed: int = 42)[source]

An implementation of “FCNMF” from the Arxiv ‘18 paper “Fusing Structure and Content via Non-negative Matrix Factorization for Embedding Information Networks”. The procedure uses a joint matrix factorization technique on the adjacency and feature matrices. The node and feature embeddings are co-regularized for alignment of the embedding spaces.

Parameters:
  • dimensions (int) – Number of embedding dimensions. Default is 32.
  • lower_control (float) – Embedding score minimal value. Default is 10**-15.
  • iterations (int) – Power iterations. Default is 500.
  • alpha_1 (float) – Alignment parameter for adjacency matrix. Default is 1000.0.
  • alpha_2 (float) – Adjacency basis regularization. Default is 1.0.
  • alpha_3 (float) – Adjacency features regularization. Default is 1.0.
  • beta_1 (float) – Alignment parameter for feature matrix. Default is 1000.0.
  • beta_2 (float) – Attribute basis regularization. Default is 1.0.
  • beta_3 (float) – Attribute basis regularization. Default is 1.0.
  • seed (int) – Random seed value. Default is 42.
fit(graph: networkx.classes.graph.Graph, X: Union[numpy.array, scipy.sparse.coo.coo_matrix])[source]

Fitting an FSCNMF model.

Arg types:
  • graph (NetworkX graph) - The graph to be embedded.
  • X (Scipy COO or Numpy array) - The matrix of node features.
get_embedding() → numpy.array[source]

Getting the node embedding.

Return types:
  • embedding (Numpy array) - The embedding of nodes.
class ASNE(dimensions: int = 128, workers: int = 4, epochs: int = 100, down_sampling: float = 0.0001, learning_rate: float = 0.05, min_count: int = 1, seed: int = 42)[source]

An implementation of “ASNE” from the TKDE ‘18 paper “Attributed Social Network Embedding”. The procedure implicitly factorizes a concatenated adjacency matrix and feature matrix.

Parameters:
  • dimensions (int) – Dimensionality of embedding. Default is 128.
  • workers (int) – Number of cores. Default is 4.
  • epochs (int) – Number of epochs. Default is 100.
  • down_sampling (float) – Down sampling frequency. Default is 0.0001.
  • learning_rate (float) – HogWild! learning rate. Default is 0.05.
  • min_count (int) – Minimal count of node occurrences. Default is 1.
  • seed (int) – Random seed value. Default is 42.
fit(graph: networkx.classes.graph.Graph, X: scipy.sparse.coo.coo_matrix)[source]

Fitting an ASNE model.

Arg types:
  • graph (NetworkX graph) - The graph to be embedded.
  • X (Scipy COO array) - The matrix of node features.
get_embedding() → numpy.array[source]

Getting the node embedding.

Return types:
  • embedding (Numpy array) - The embedding of nodes.

Meta node embedding

class NEU(L1: float = 0.5, L2: float = 0.25, T: int = 1, seed: int = 42)[source]

An implementation of “NEU” from the IJCAI 17 paper “Fast Network Embedding Enhancement via High Order Proximity Approximation”. The procedure uses an arbitrary embedding and augments it by higher order proximities with a recursive meta learning algorithm.

Parameters:
  • L1 (float) – Weight of lower order proximities. Defauls is 0.5
  • L2 (float) – Weight of higher order proximities. Default is 0.25.
  • T (int) – Number of iterations. Default is 1.
  • seed (int) – Random seed value. Default is 42.
fit(graph: networkx.classes.graph.Graph, model: karateclub.estimator.Estimator)[source]

Fitting a model and performing NEU.

Parameters:
  • graph * (*) –
  • model * (*) –
get_embedding() → numpy.array[source]

Getting the node embedding.

Return types:
  • embedding (Numpy array) - The embedding of nodes.

Whole graph embedding

class WaveletCharacteristic(order: int = 5, eval_points: int = 25, theta_max: float = 2.5, tau: float = 1.0, pooling: str = 'mean')[source]

An implementation of “WaveCharacteristic” from the CIKM ‘21 paper “Graph Embedding via Diffusion-Wavelets-Based Node Feature Distribution Characterization”. The procedure uses characteristic functions of node features with wavelet function weights to describe node neighborhoods. These node level features are pooled by mean pooling to create graph level statistics.

Parameters:
  • order (int) – Adjacency matrix powers. Default is 5.
  • eval_points (int) – Number of characteristic function evaluations. Default is 5.
  • theta_max (float) – Largest characteristic function time value. Default is 2.5.
  • tau (float) – Wave function heat - time diffusion. Default is 1.0.
  • pooling (str) – Pooling function appliead to the characteristic functions. Default is “mean”.
fit(graphs: List[networkx.classes.graph.Graph])[source]

Fitting a Geometric-Scattering model.

Arg types:
  • graphs (List of NetworkX graphs) - The graphs to be embedded.
get_embedding() → numpy.array[source]

Getting the embedding of graphs.

Return types:
  • embedding (Numpy array) - The embedding of graphs.
infer(graphs: List[networkx.classes.graph.Graph]) → numpy.array[source]

Infer the graph embeddings.

Arg types:
  • graphs (List of NetworkX graphs) - The graphs to be embedded.
Return types:
  • embedding (Numpy array) - The embedding of graphs.

Local Degree Profile based embedding.

class LDP(bins: int = 32)[source]

An implementation of “LDP” from the ICLR Representation Learning on Graphs and Manifolds Workshop ‘19 paper “A Simple Yet Effective Baseline for Non-Attributed Graph Classification”. The procedure calculates histograms of degree profiles. These concatenated histograms form the graph representations.

Parameters:bins (int) – Number of histogram bins. Default is 32.
fit(graphs)[source]

Fitting an LDP model.

Arg types:
  • graphs (List of NetworkX graphs) - The graphs to be embedded.
get_embedding() → numpy.array[source]

Getting the embedding of graphs.

Return types:
  • embedding (Numpy array) - The embedding of graphs.
infer(graphs: List[networkx.classes.graph.Graph])[source]

Infer the embedding of graphs.

Arg types:
  • graphs (List of NetworkX graphs) - The graphs to be embedded.
Return types:
  • embedding (Numpy array) - The embedding of graphs.
class FeatherGraph(order: int = 5, eval_points: int = 25, theta_max: float = 2.5, seed: int = 42, pooling: str = 'mean')[source]

An implementation of “FEATHER-G” from the CIKM ‘20 paper “Characteristic Functions on Graphs: Birds of a Feather, from Statistical Descriptors to Parametric Models”. The procedure uses characteristic functions of node features with random walk weights to describe node neighborhoods. These node level features are pooled by mean pooling to create graph level statistics.

Parameters:
  • order (int) – Adjacency matrix powers. Default is 5.
  • eval_points (int) – Number of evaluation points. Default is 25.
  • theta_max (int) – Maximal evaluation point value. Default is 2.5.
  • seed (int) – Random seed value. Default is 42.
  • pooling (str) – Permutation invariant pooling function, one of: ("mean", "max", "min"). Default is “mean.”
fit(graphs: List[networkx.classes.graph.Graph]) → None[source]

Fitting a graph level FEATHER model.

Arg types:
  • graphs (List of NetworkX graphs) - The graphs to be embedded.
get_embedding() → numpy.array[source]

Getting the embedding of graphs.

Return types:
  • embedding (Numpy array) - The embedding of graphs.
infer(graphs: List[networkx.classes.graph.Graph]) → numpy.array[source]

Inferring graph embeddings with a graph level FEATHER model.

Arg types:
  • graphs (List of NetworkX graphs) - The graphs to be embedded.
Return types:
  • embedding (Numpy array) - The embedding of graphs.

Invariant Graph Embedding model class.

class IGE(feature_embedding_dimensions: List[int] = [3, 5], spectral_embedding_dimensions: List[int] = [10, 20], histogram_bins: List[int] = [10, 20], seed: int = 42)[source]

An implementation of “Invariant Graph Embedding” from the ICML 2019 Workshop on Learning and Reasoning with Graph-Structured Data paper “Invariant Embedding for Graph Classification”. The procedure computes a mixture of spectral and node embedding based features. Specifically, it uses scattering, eigenvalues and pooled node feature embeddings to create graph descriptors.

Parameters:
  • feature_embedding_dimensions (list) – Feature embedding dimensions. Default is [3, 5]
  • spectral_embedding_dimensions (list) – Spectral embedding dimensions. Default is [10, 20].
  • histogram_bins (list) – Number of histogram bins. Default is [10, 20].
  • seed (int) – Random seed value. Default is 42.
fit(graphs: List[networkx.classes.graph.Graph])[source]

Fitting an Invariant Graph Embedding model.

Arg types:
  • graphs (List of NetworkX graphs) - The graphs to be embedded.
get_embedding() → numpy.array[source]

Getting the embedding of graphs.

Return types:
  • embedding (Numpy array) - The embedding of graphs.
infer(graphs: List[networkx.classes.graph.Graph]) → numpy.array[source]

Infer the embedding of graphs.

Arg types:
  • graphs (List of NetworkX graphs) - The graphs to be embedded.
Return types:
  • embedding (Numpy array) - The embedding of graphs.
class GeoScattering(order: int = 4, moments: int = 4, seed: int = 42)[source]

An implementation of “GeoScattering” from the ICML ‘19 paper “Geometric Scattering for Graph Data Analysis”. The procedure uses scattering with wavelet transforms to create graph spectral descriptors. Moments of the wavelet transformed features are used as graph level features for the embedding.

Parameters:
  • order (int) – Adjacency matrix powers. Default is 4.
  • moments (int) – Unnormalized moments considered. Default is 4.
  • seed (int) – Random seed value. Default is 42.
fit(graphs: List[networkx.classes.graph.Graph])[source]

Fitting a Geometric-Scattering model.

Arg types:
  • graphs (List of NetworkX graphs) - The graphs to be embedded.
get_embedding() → numpy.array[source]

Getting the embedding of graphs.

Return types:
  • embedding (Numpy array) - The embedding of graphs.
infer(graphs: List[networkx.classes.graph.Graph]) → numpy.array[source]

Infer the embedding of graphs.

Arg types:
  • graphs (List of NetworkX graphs) - The graphs to be embedded.
Return types:
  • embedding (Numpy array) - The embedding of graphs.
class GL2Vec(wl_iterations: int = 2, dimensions: int = 128, workers: int = 4, down_sampling: float = 0.0001, epochs: int = 10, learning_rate: float = 0.025, min_count: int = 5, seed: int = 42, erase_base_features: bool = False)[source]

An implementation of “GL2Vec” from the ICONIP ‘19 paper “GL2vec: Graph Embedding Enriched by Line Graphs with Edge Features”. First, the algorithm creates the line graph of each graph in the graph dataset. The procedure creates Weisfeiler-Lehman tree features for nodes in graphs. Using these features a document (graph) - feature co-occurrence matrix is decomposed in order to generate representations for the graphs.

The procedure assumes that nodes have no string feature present and the WL-hashing defaults to the degree centrality. However, if a node feature with the key “feature” is supported for the nodes the feature extraction happens based on the values of this key.

Parameters:
  • wl_iterations (int) – Number of Weisfeiler-Lehman iterations. Default is 2.
  • dimensions (int) – Dimensionality of embedding. Default is 128.
  • workers (int) – Number of cores. Default is 4.
  • down_sampling (float) – Down sampling frequency. Default is 0.0001.
  • epochs (int) – Number of epochs. Default is 10.
  • learning_rate (float) – HogWild! learning rate. Default is 0.025.
  • min_count (int) – Minimal count of graph feature occurrences. Default is 5.
  • seed (int) – Random seed for the model. Default is 42.
fit(graphs: List[networkx.classes.graph.Graph])[source]

Fitting a GL2Vec model.

Arg types:
  • graphs (List of NetworkX graphs) - The graphs to be embedded.
get_embedding() → numpy.array[source]

Getting the embedding of graphs.

Return types:
  • embedding (Numpy array) - The embedding of graphs.
infer(graphs) → numpy.array[source]

Infer the graph embeddings.

Arg types:
  • graphs (List of NetworkX graphs) - The graphs to be embedded.
Return types:
  • embedding (Numpy array) - The embedding of graphs.
class NetLSD(scale_min: float = -2.0, scale_max: float = 2.0, scale_steps: int = 250, approximations: int = 200, seed: int = 42)[source]

An implementation of “NetLSD” from the KDD ‘18 paper “NetLSD: Hearing the Shape of a Graph”. The procedure calculate the heat kernel trace of the normalized Laplacian matrix over a vector of time scales. If the matrix is large it switches to an approximation of the eigenvalues.

Parameters:
  • scale_min (float) – Time scale interval minimum. Default is -2.0.
  • scale_max (float) – Time scale interval maximum. Default is 2.0.
  • scale_steps (int) – Number of steps in time scale. Default is 250.
  • scale_approximations (int) – Number of eigenvalue approximations. Default is 200.
  • seed (int) – Random seed value. Default is 42.
fit(graphs: List[networkx.classes.graph.Graph])[source]

Fitting a NetLSD model.

Arg types:
  • graphs (List of NetworkX graphs) - The graphs to be embedded.
get_embedding() → numpy.array[source]

Getting the embedding of graphs.

Return types:
  • embedding (Numpy array) - The embedding of graphs.
infer(graphs: List[networkx.classes.graph.Graph]) → numpy.array[source]

Inferring the NetLSD embeddings.

Arg types:
  • graphs (List of NetworkX graphs) - The graphs to be embedded.
Return types:
  • embedding (Numpy array) - The embedding of graphs.
class SF(dimensions: int = 128, seed: int = 42)[source]

An implementation of “SF” from the NeurIPS Relational Representation Learning Workshop ‘18 paper “A Simple Baseline Algorithm for Graph Classification”. The procedure calculates the k lowest eigenvalues of the normalized Laplacian. If the graph has a lower number of eigenvalues than k the representation is padded with zeros.

Parameters:
  • dimensions (int) – Number of lowest eigenvalues. Default is 128.
  • seed (int) – Random seed value. Default is 42.
fit(graphs)[source]

Fitting an SF model.

Arg types:
  • graphs (List of NetworkX graphs) - The graphs to be embedded.
get_embedding() → numpy.array[source]

Getting the embedding of graphs.

Return types:
  • embedding (Numpy array) - The embedding of graphs.
infer(graphs) → numpy.array[source]

Inferring the embedding vectors.

Arg types:
  • graphs (List of NetworkX graphs) - The graphs to be embedded.
Return types:
  • embedding (Numpy array) - The embedding of graphs.
class FGSD(hist_bins: int = 200, hist_range: int = 20, seed: int = 42)[source]

An implementation of “FGSD” from the NeurIPS ‘17 paper “Hunt For The Unique, Stable, Sparse And Fast Feature Learning On Graphs”. The procedure calculates the Moore-Penrose spectrum of the normalized Laplacian. Using this spectrum the histogram of the spectral features is used as a whole graph representation.

Parameters:
  • hist_bins (int) – Number of histogram bins. Default is 200.
  • hist_range (int) – Histogram range considered. Default is 20.
  • seed (int) – Random seed value. Default is 42.
fit(graphs: List[networkx.classes.graph.Graph])[source]

Fitting a FGSD model.

Arg types:
  • graphs (List of NetworkX graphs) - The graphs to be embedded.
get_embedding() → numpy.array[source]

Getting the embedding of graphs.

Return types:
  • embedding (Numpy array) - The embedding of graphs.
infer(graphs: List[networkx.classes.graph.Graph]) → numpy.array[source]

Inferring the embedding for a list of graphs.

Arg types:
  • graphs (List of NetworkX graphs) - The graphs to be embedded.
Return types:
  • embedding (Numpy array) - The embedding of graphs.
class Graph2Vec(wl_iterations: int = 2, attributed: bool = False, dimensions: int = 128, workers: int = 4, down_sampling: float = 0.0001, epochs: int = 10, learning_rate: float = 0.025, min_count: int = 5, seed: int = 42, erase_base_features: bool = False)[source]

An implementation of “Graph2Vec” from the MLGWorkshop ‘17 paper “Graph2Vec: Learning Distributed Representations of Graphs”. The procedure creates Weisfeiler-Lehman tree features for nodes in graphs. Using these features a document (graph) - feature co-occurrence matrix is decomposed in order to generate representations for the graphs.

The procedure assumes that nodes have no string feature present and the WL-hashing defaults to the degree centrality. However, if a node feature with the key “feature” is supported for the nodes the feature extraction happens based on the values of this key.

Parameters:
  • wl_iterations (int) – Number of Weisfeiler-Lehman iterations. Default is 2.
  • attributed (bool) – Presence of graph attributes. Default is False.
  • dimensions (int) – Dimensionality of embedding. Default is 128.
  • workers (int) – Number of cores. Default is 4.
  • down_sampling (float) – Down sampling frequency. Default is 0.0001.
  • epochs (int) – Number of epochs. Default is 10.
  • learning_rate (float) – HogWild! learning rate. Default is 0.025.
  • min_count (int) – Minimal count of graph feature occurrences. Default is 5.
  • seed (int) – Random seed for the model. Default is 42.
  • erase_base_features (bool) – Erasing the base features. Default is False.
fit(graphs: List[networkx.classes.graph.Graph])[source]

Fitting a Graph2Vec model.

Arg types:
  • graphs (List of NetworkX graphs) - The graphs to be embedded.
get_embedding() → numpy.array[source]

Getting the embedding of graphs.

Return types:
  • embedding (Numpy array) - The embedding of graphs.
infer(graphs) → numpy.array[source]

Infer the graph embeddings.

Arg types:
  • graphs (List of NetworkX graphs) - The graphs to be embedded.
Return types:
  • embedding (Numpy array) - The embedding of graphs.