karateclub¶
Contents
Overlapping community detection¶
-
class
EgoNetSplitter
(resolution: float = 1.0, seed: int = 42, weight: Optional[str] = 'weight')[source]¶ An implementation of “Ego-Splitting” from the KDD ‘17 paper “Ego-Splitting Framework: from Non-Overlapping to Overlapping Clusters”. The tool first creates the ego-nets of nodes. A persona-graph is created which is clustered by the Louvain method. The resulting overlapping cluster memberships are stored as a dictionary.
Parameters:
-
class
DANMF
(layers: List[int] = [32, 8], pre_iterations: int = 100, iterations: int = 100, seed: int = 42, lamb: float = 0.01)[source]¶ An implementation of “DANMF” from the CIKM ‘18 paper “Deep Autoencoder-like Nonnegative Matrix Factorization for Community Detection”. The procedure uses telescopic non-negative matrix factorization in order to learn a cluster membership distribution over nodes. The method can be used in an overlapping and non-overlapping way.
Parameters: - layers (list) – Autoencoder layer sizes in a list of integers. Default [32, 8].
- pre_iterations (int) – Number of pre-training epochs. Default 100.
- iterations (int) – Number of training epochs. Default 100.
- seed (int) – Random seed for weight initializations. Default 42.
- lamb (float) – Regularization parameter. Default 0.01.
- seed – Random seed value. Default is 42.
-
fit
(graph: networkx.classes.graph.Graph)[source]¶ Fitting a DANMF clustering model.
- Arg types:
- graph (NetworkX graph) - The graph to be clustered.
-
class
NNSED
(dimensions: int = 32, iterations: int = 10, seed: int = 42, noise: float = 1e-06)[source]¶ An implementation of “NNSED” from the CIKM ‘17 paper “A Non-negative Symmetric Encoder-Decoder Approach for Community Detection”. The procedure uses non-negative matrix factorization in order to learn an unnormalized cluster membership distribution over nodes. The method can be used in an overlapping and non-overlapping way.
Parameters: -
fit
(graph: networkx.classes.graph.Graph)[source]¶ Fitting an NNSED clustering model.
- Arg types:
- graph (NetworkX graph) - The graph to be clustered.
-
-
class
MNMF
(dimensions: int = 128, clusters: int = 10, lambd: float = 0.2, alpha: float = 0.05, beta: float = 0.05, iterations: int = 200, lower_control: float = 1e-15, eta: float = 5.0, seed: int = 42)[source]¶ An implementation of “M-NMF” from the AAAI ‘17 paper “Community Preserving Network Embedding”. The procedure uses joint non-negative matrix factorization with modularity based regularization in order to learn a cluster membership distribution over nodes. The method can be used in an overlapping and non-overlapping way.
Parameters: - dimensions (int) – Number of dimensions. Default is 128.
- clusters (int) – Number of clusters. Default is 10.
- lambd (float) – KKT penalty. Default is 0.2
- alpha (float) – Clustering penalty. Default is 0.05.
- beta (float) – Modularity regularization penalty. Default is 0.05.
- iterations (int) – Number of power iterations. Default is 200.
- lower_control (float) – Floating point overflow control. Default is 10**-15.
- eta (float) – Similarity mixing parameter. Default is 5.0.
- seed (int) – Random seed value. Default is 42.
-
fit
(graph: networkx.classes.graph.Graph)[source]¶ Fitting an M-NMF clustering model.
- Arg types:
- graph (NetworkX graph) - The graph to be clustered.
-
get_cluster_centers
() → numpy.array[source]¶ Getting the node embedding.
- Return types:
- centers (Numpy array) - The cluster centers.
-
class
BigClam
(dimensions: int = 8, iterations: int = 50, learning_rate: float = 0.005, seed: int = 42)[source]¶ An implementation of “BigClam” from the WSDM ‘13 paper “Overlapping Community Detection at Scale: A Non-negative Matrix Factorization Approach”. The procedure uses gradient ascent to create an embedding which is used for deciding the node-cluster affiliations.
Parameters: -
fit
(graph: networkx.classes.graph.Graph)[source]¶ Fitting a BigClam clustering model.
- Arg types:
- graph (NetworkX graph) - The graph to be clustered.
-
-
class
SymmNMF
(dimensions: int = 32, iterations: int = 200, rho: float = 100.0, seed: int = 42)[source]¶ An implementation of “Symm-NMF” from the SDM’12 paper “Symmetric Nonnegative Matrix Factorization for Graph Clustering”. The procedure decomposed the second power od the normalized adjacency matrix with an ADMM based non-negative matrix factorization based technique. This results in a node embedding and each node is associated with an embedding factor in the created latent space.
Parameters: -
fit
(graph: networkx.classes.graph.Graph)[source]¶ Fitting a Symm-NMF clustering model.
- Arg types:
- graph (NetworkX graph) - The graph to be clustered.
-
Non-overlapping community detection¶
-
class
GEMSEC
(walk_number: int = 5, walk_length: int = 80, dimensions: int = 32, negative_samples: int = 5, window_size: int = 5, learning_rate: float = 0.1, clusters: int = 10, gamma: float = 0.1, seed: int = 42)[source]¶ An implementation of “GEMSEC” from the ASONAM ‘19 paper “GEMSEC: Graph Embedding with Self Clustering”. The procedure uses random walks to approximate the pointwise mutual information matrix obtained by pooling normalized adjacency matrix powers. This matrix is decomposed by an approximate factorization technique which is combined with a k-means like clustering cost. A node embedding and clustering are learned jointly.
Parameters: - walk_number (int) – Number of random walks. Default is 5.
- walk_length (int) – Length of random walks. Default is 80.
- dimensions (int) – Dimensionality of embedding. Default is 32.
- negative_samples (int) – Number of negative samples. Default is 5.
- window_size (int) – Matrix power order. Default is 5.
- learning_rate (float) – Gradient descent learning rate. Default is 0.1.
- clusters (int) – Number of cluster centers. Default is 10.
- gamma (float) – Clustering cost weight coefficient. Default is 0.1.
- seed (int) – Random seed value. Default is 42.
-
fit
(graph: networkx.classes.graph.Graph)[source]¶ Fitting a GEMSEC model.
- Arg types:
- graph (NetworkX graph) - The graph to be embedded.
-
class
EdMot
(component_count: int = 2, cutoff: int = 50, seed: int = 42)[source]¶ An implementation of “Edge Motif Clustering” from the KDD ‘19 paper “EdMot: An Edge Enhancement Approach for Motif-aware Community Detection”. The tool first creates the graph of higher order motifs. This graph is clustered by the Louvain method. The resulting cluster memberships are stored as a dictionary.
Parameters:
-
class
SCD
(iterations: int = 25, eps: float = 1e-06, seed: int = 42)[source]¶ An implementation of “SCD” from the WWW ‘14 paper “High Quality, Scalable and Parallel Community Detection for Large Real Graphs”. The procedure greedily optimizes the approximate weighted community clustering metric. First, clusters are built around highly clustered nodes. Second, we refine the initial partition by using the approximate WCC. These refinements happen for the whole vertex set.
Parameters:
-
class
LabelPropagation
(seed: int = 42, iterations: int = 100)[source]¶ An implementation of “Label Propagation Clustering” from the Physical Review ‘07 paper “Near Linear Time Algorithm to Detect Community Structures in Large-Scale Networks”. The tool executes a series of label propagations with unique labels. The final labels are used as cluster memberships.
Parameters:
Neighbourhood-based node embedding¶
-
class
SocioDim
(dimensions: int = 128, seed: int = 42)[source]¶ An implementation of “SocioDim” from the KDD ‘09 paper “Relational Learning via Latent Social Dimensions”. The procedure extracts the eigenvectors corresponding to the largest eigenvalues of the graph modularity matrix. These vectors are used as the node embedding.
Parameters:
-
class
RandNE
(dimensions: int = 128, alphas: list = [0.5, 0.5], seed: int = 42)[source]¶ An implementation of “RandNE” from the ICDM ‘18 paper “Billion-scale Network Embedding with Iterative Random Projection”. The procedure uses normalized adjacency matrix based smoothing on an orthogonalized random normally generate base node embedding matrix.
Parameters:
-
class
GLEE
(dimensions: int = 128, seed: int = 42)[source]¶ An implementation of “Geometric Laplacian Eigenmaps” from the Journal of Complex Networks ‘20 paper “GLEE: Geometric Laplacian Eigenmap Embedding”. The procedure extracts the eigenvectors corresponding to the largest eigenvalues of the graph Laplacian. These vectors are used as the node embedding.
Parameters:
-
class
Diff2Vec
(diffusion_number: int = 10, diffusion_cover: int = 80, dimensions: int = 128, workers: int = 4, window_size: int = 5, epochs: int = 1, learning_rate: float = 0.05, min_count: int = 1, seed: int = 42)[source]¶ An implementation of “Diff2Vec” from the CompleNet ‘18 paper “Diff2Vec: Fast Sequence Based Embedding with Diffusion Graphs”. The procedure creates diffusion trees from every source node in the graph. These graphs are linearized by a directed Eulerian walk, the walks are used for running the skip-gram algorithm the learn node level neighbourhood based embeddings.
Parameters: - diffusion_number (int) – Number of diffusions. Default is 10.
- diffusion_cover (int) – Number of nodes in diffusion. Default is 80.
- dimensions (int) – Dimensionality of embedding. Default is 128.
- workers (int) – Number of cores. Default is 4.
- window_size (int) – Matrix power order. Default is 5.
- epochs (int) – Number of epochs. Default is 1.
- learning_rate (float) – HogWild! learning rate. Default is 0.05.
- min_count (int) – Minimal count of node occurrences. Default is 1.
- seed (int) – Random seed value. Default is 42.
-
class
NodeSketch
(dimensions: int = 32, iterations: int = 2, decay: float = 0.01, seed: int = 42)[source]¶ An implementation of “NodeSketch” from the KDD ‘19 paper “NodeSketch: Highly-Efficient Graph Embeddings via Recursive Sketching”. The procedure starts by sketching the self-loop-augmented adjacency matrix of the graph to output low-order node embeddings, and then recursively generates k-order node embeddings based on the self-loop-augmented adjacency matrix and (k-1)-order node embeddings.
Parameters:
-
class
NetMF
(dimensions: int = 32, iteration: int = 10, order: int = 2, negative_samples: int = 1, seed: int = 42)[source]¶ An implementation of “NetMF” from the WSDM ‘18 paper “Network Embedding as Matrix Factorization: Unifying DeepWalk, LINE, PTE, and Node2Vec”. The procedure uses sparse truncated SVD to learn embeddings for the pooled powers of the PMI matrix computed from powers of the normalized adjacency matrix.
Parameters:
-
class
BoostNE
(dimensions: int = 8, iterations: int = 16, order: int = 2, alpha: float = 0.01, seed: int = 42)[source]¶ An implementation of “BoostNE” from the ASONAM ‘19 paper “Multi-Level Network Embedding with Boosted Low-Rank Matrix Approximation”. The procedure uses non-negative matrix factorization iteratively to decompose the residuals obtained by previous factorization models. The base target matrix is a pooled sum of adjacency matrix powers.
Parameters: - dimensions (int) – Number of individual embedding dimensions. Default is 8.
- iterations (int) – Number of boosting iterations. Default is 16.
- order (int) – Number of adjacency matrix powers. Default is 2.
- alpha (float) – NMF regularization parameter. Default is 0.01.
- seed (int) – Random seed value. Default is 42.
-
class
Walklets
(walk_number: int = 10, walk_length: int = 80, dimensions: int = 32, workers: int = 4, window_size: int = 4, epochs: int = 1, learning_rate: float = 0.05, min_count: int = 1, seed: int = 42)[source]¶ An implementation of “Walklets” from the ASONAM ‘17 paper “Don’t Walk, Skip! Online Learning of Multi-scale Network Embeddings”. The procedure uses random walks to approximate the pointwise mutual information matrix obtained by individual normalized adjacency matrix powers. These are all decomposed by an approximate factorization technique and the embeddings are concatenated together.
Parameters: - walk_number (int) – Number of random walks. Default is 10.
- walk_length (int) – Length of random walks. Default is 80.
- dimensions (int) – Dimensionality of embedding. Default is 32.
- workers (int) – Number of cores. Default is 4.
- window_size (int) – Matrix power order. Default is 4.
- epochs (int) – Number of epochs. Default is 1.
- learning_rate (float) – HogWild! learning rate. Default is 0.05.
- min_count (int) – Minimal count of node occurrences. Default is 1.
- seed (int) – Random seed value. Default is 42.
-
class
GraRep
(dimensions: int = 32, iteration: int = 10, order: int = 5, seed: int = 42)[source]¶ An implementation of “GraRep” from the CIKM ‘15 paper “GraRep: Learning Graph Representations with Global Structural Information”. The procedure uses sparse truncated SVD to learn embeddings for the powers of the PMI matrix computed from powers of the normalized adjacency matrix.
Parameters:
-
class
DeepWalk
(walk_number: int = 10, walk_length: int = 80, dimensions: int = 128, workers: int = 4, window_size: int = 5, epochs: int = 1, learning_rate: float = 0.05, min_count: int = 1, seed: int = 42)[source]¶ An implementation of “DeepWalk” from the KDD ‘14 paper “DeepWalk: Online Learning of Social Representations”. The procedure uses random walks to approximate the pointwise mutual information matrix obtained by pooling normalized adjacency matrix powers. This matrix is decomposed by an approximate factorization technique.
Parameters: - walk_number (int) – Number of random walks. Default is 10.
- walk_length (int) – Length of random walks. Default is 80.
- dimensions (int) – Dimensionality of embedding. Default is 128.
- workers (int) – Number of cores. Default is 4.
- window_size (int) – Matrix power order. Default is 5.
- epochs (int) – Number of epochs. Default is 1.
- learning_rate (float) – HogWild! learning rate. Default is 0.05.
- min_count (int) – Minimal count of node occurrences. Default is 1.
- seed (int) – Random seed value. Default is 42.
-
class
Node2Vec
(walk_number: int = 10, walk_length: int = 80, p: float = 1.0, q: float = 1.0, dimensions: int = 128, workers: int = 4, window_size: int = 5, epochs: int = 1, learning_rate: float = 0.05, min_count: int = 1, seed: int = 42)[source]¶ An implementation of “Node2Vec” from the KDD ‘16 paper “node2vec: Scalable Feature Learning for Networks”. The procedure uses biased second order random walks to approximate the pointwise mutual information matrix obtained by pooling normalized adjacency matrix powers. This matrix is decomposed by an approximate factorization technique.
Parameters: - walk_number (int) – Number of random walks. Default is 10.
- walk_length (int) – Length of random walks. Default is 80.
- p (float) – Return parameter (1/p transition probability) to move towards from previous node.
- q (float) – In-out parameter (1/q transition probability) to move away from previous node.
- dimensions (int) – Dimensionality of embedding. Default is 128.
- workers (int) – Number of cores. Default is 4.
- window_size (int) – Matrix power order. Default is 5.
- epochs (int) – Number of epochs. Default is 1.
- learning_rate (float) – HogWild! learning rate. Default is 0.05.
- min_count (int) – Minimal count of node occurrences. Default is 1.
- seed (int) – Random seed value. Default is 42.
-
class
NMFADMM
(dimensions: int = 32, iterations: int = 100, rho: float = 1.0, seed: int = 42)[source]¶ An implementation of “NMF-ADMM” from the ICASSP ‘14 paper “Alternating Direction Method of Multipliers for Non-Negative Matrix Factorization with the Beta-Divergence”. The procedure learns an embedding of the normalized adjacency matrix with by using the alternating direction method of multipliers to solve a non negative matrix factorization problem.
Parameters:
-
class
LaplacianEigenmaps
(dimensions: int = 128, maximum_number_of_iterations: int = 100, seed: int = 42)[source]¶ An implementation of “Laplacian Eigenmaps” from the NIPS ‘01 paper “Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering”. The procedure extracts the eigenvectors corresponding to the largest eigenvalues of the graph Laplacian. These vectors are used as the node embedding.
Parameters:
Structural node embedding¶
-
class
GraphWave
(sample_number: int = 200, step_size: float = 0.1, heat_coefficient: float = 1.0, approximation: int = 100, mechanism: str = 'approximate', switch: int = 1000, seed: int = 42)[source]¶ An implementation of “GraphWave” from the KDD ‘18 paper “Learning Structural Node Embeddings Via Diffusion Wavelets”. The procedure first calculates the graph wavelets using a heat kernel. The wavelets are treated as probability distributions over nodes from a source node. Using these the characteristic function is evaluated at certain gird points to learn structural node embeddings of the vertices.
Parameters: - sample_number (int) – Number of evaluation points. Default is 200.
- step_size (float) – Grid point step size. Default is 0.1.
- heat_coefficient (float) – Heat kernel coefficient. Default is 1.0.
- approximation (int) – Chebyshev polynomial order. Default is 100.
- mechanism (str) – Wavelet calculation method one of:
(
"exact"
,"approximate"
). Default is ‘approximate’. - switch (int) – Vertex cardinality when the wavelet calculation method switches to approximation. Default is 1000.
- seed (int) – Random seed value. Default is 42.
-
class
Role2Vec
(walk_number: int = 10, walk_length: int = 80, dimensions: int = 128, workers: int = 4, window_size: int = 2, epochs: int = 1, learning_rate: float = 0.05, down_sampling: float = 0.0001, min_count: int = 10, wl_iterations: int = 2, seed: int = 42, erase_base_features: bool = False)[source]¶ An implementation of “Role2vec” from the IJCAI ‘18 paper “Learning Role-based Graph Embeddings”. The procedure uses random walks to approximate the pointwise mutual information matrix obtained by multiplying the pooled adjacency power matrix with a structural feature matrix (in this case Weisfeiler-Lehman features). This way one gets structural node embeddings.
Parameters: - walk_number (int) – Number of random walks. Default is 10.
- walk_length (int) – Length of random walks. Default is 80.
- dimensions (int) – Dimensionality of embedding. Default is 128.
- workers (int) – Number of cores. Default is 4.
- window_size (int) – Matrix power order. Default is 2.
- epochs (int) – Number of epochs. Default is 1.
- learning_rate (float) – HogWild! learning rate. Default is 0.05.
- down_sampling (float) – Down sampling frequency. Default is 0.0001.
- min_count (int) – Minimal count of feature occurrences. Default is 10.
- wl_iterations (int) – Number of Weisfeiler-Lehman hashing iterations. Default is 2.
- seed (int) – Random seed value. Default is 42.
- erase_base_features (bool) – Removing the base features. Default is False.
Attributed node embedding¶
-
class
FeatherNode
(reduction_dimensions: int = 64, svd_iterations: int = 20, theta_max: float = 2.5, eval_points: int = 25, order: int = 5, seed: int = 42)[source]¶ An implementation of “FEATHER-N” from the CIKM ‘20 paper “Characteristic Functions on Graphs: Birds of a Feather, from Statistical Descriptors to Parametric Models”. The procedure uses characteristic functions of node features with random walk weights to describe node neighborhoods.
Parameters: - reduction_dimensions (int) – SVD reduction dimensions. Default is 64.
- svd_iterations (int) – SVD iteration count. Default is 20.
- theta_max (float) – Maximal evaluation point. Default is 2.5.
- eval_points (int) – Number of characteristic function evaluation points. Default is 25.
- order (int) – Scale - number of adjacency matrix powers. Default is 5.
- seed (int) – Random seed value. Default is 42.
-
class
AE
(walk_number=5, walk_length=80, dimensions=32, workers=4, window_size=3, epochs=5, learning_rate=0.05, down_sampling=0.0001, min_count=1, seed=42)[source]¶ An implementation of “AE” from the Arxiv ‘19 paper “MUSAE: Multi-Scale Attributed Node Embedding”. The procedure does attributed random walks to approximate the pooled adjacency matrix power node feature matrix product. The matrix is decomposed implicitly by a Skip-Gram style optimization problem.
Parameters: - walk_number (int) – Number of random walks. Default is 5.
- walk_length (int) – Length of random walks. Default is 80.
- dimensions (int) – Dimensionality of embedding. Default is 32.
- workers (int) – Number of cores. Default is 4.
- window_size (int) – Matrix power order. Default is 3.
- epochs (int) – Number of epochs. Default is 1.
- learning_rate (float) – HogWild! learning rate. Default is 0.05.
- down_sampling (float) – Down sampling rate in the corpus. Default is 0.0001.
- min_count (int) – Minimal count of node occurrences. Default is 1.
- seed (int) – Random seed value. Default is 42.
-
class
MUSAE
(walk_number=5, walk_length=80, dimensions=32, workers=4, window_size=3, epochs=5, learning_rate=0.05, down_sampling=0.0001, min_count=1, seed=42)[source]¶ An implementation of “MUSAE” from the Arxiv ‘19 paper “MUSAE: Multi-Scale Attributed Node Embedding”. The procedure does attributed random walks to approximate the adjacency matrix power node feature matrix products. The matrices are decomposed implicitly by a Skip-Gram style optimizer. The individual embeddings are concatenated together to form a multi-scale attributed node embedding. This way the feature distributions at different scales are separable.
Parameters: - walk_number (int) – Number of random walks. Default is 5.
- walk_length (int) – Length of random walks. Default is 80.
- dimensions (int) – Dimensionality of embedding. Default is 32.
- workers (int) – Number of cores. Default is 4.
- window_size (int) – Matrix power order. Default is 3.
- epochs (int) – Number of epochs. Default is 1.
- learning_rate (float) – HogWild! learning rate. Default is 0.05.
- down_sampling (float) – Down sampling rate in the corpus. Default is 0.0001.
- min_count (int) – Minimal count of node occurrences. Default is 1.
- seed (int) – Random seed value. Default is 42.
-
class
SINE
(walk_number: int = 10, walk_length: int = 80, dimensions: int = 128, workers: int = 4, window_size: int = 5, epochs: int = 1, learning_rate: float = 0.05, min_count: int = 1, seed: int = 42)[source]¶ An implementation of “SINE” from the ICDM ‘18 paper “SINE: Scalable Incomplete Network Embedding”. The procedure implicitly factorizes a joint adjacency matrix power and feature matrix. The decomposition happens on truncated random walks and the adjacency matrix powers are pooled together.
Parameters: - walk_number (int) – Number of random walks. Default is 10.
- walk_length (int) – Length of random walks. Default is 80.
- dimensions (int) – Dimensionality of embedding. Default is 128.
- workers (int) – Number of cores. Default is 4.
- window_size (int) – Matrix power order. Default is 5.
- epochs (int) – Number of epochs. Default is 1.
- learning_rate (float) – HogWild! learning rate. Default is 0.05.
- min_count (int) – Minimal count of node occurrences. Default is 1.
- seed (int) – Random seed value. Default is 42.
-
class
BANE
(dimensions: int = 32, svd_iterations: int = 20, seed: int = 42, alpha: float = 0.3, iterations: int = 100, binarization_iterations: int = 20)[source]¶ An implementation of “BANE” from the ICDM ‘18 paper “Binarized Attributed Network Embedding Class”. The procedure first calculates the truncated SVD of an adjacency - feature matrix product. This matrix is further decomposed by a binary CCD based technique.
Parameters: - dimensions (int) – Number of embedding dimensions. Default is 32.
- svd_iterations (int) – SVD iteration count. Default is 20.
- seed (int) – Random seed. Default is 42.
- alpha (float) – Kernel matrix inversion parameter. Default is 0.3.
- iterations (int) – Matrix decomposition iterations. Default is 100.
- binarization_iterations (int) – Binarization iterations. Default is 20.
- seed – Random seed value. Default is 42.
-
class
TENE
(dimensions=32, lower_control=1e-15, alpha=0.1, beta=0.1, iterations=200, seed=42)[source]¶ An implementation of “TENE” from the ICPR ‘18 paper “Enhanced Network Embedding with Text Information”. The procedure jointly factorizes the adjacency and node feature matrices using alternating least squares.
Parameters: - dimensions (int) – Number of embedding dimensions. Default is 32.
- lower_control (float) – Embedding score minimal value. Default is 10**-15.
- alpha (float) – Adjacency matrix regularization coefficient. Default is 0.1.
- beta (float) – Feature matrix regularization coefficient. Default is 0.1.
- iterations (int) – ALS iterations. Default is 200.
- seed (int) – Random seed value. Default is 42.
-
class
TADW
(dimensions: int = 32, reduction_dimensions: int = 64, svd_iterations: int = 20, seed: int = 42, alpha: float = 0.01, iterations: int = 10, lambd: float = 10.0)[source]¶ An implementation of “TADW” from the IJCAI ‘15 paper “Network Representation Learning with Rich Text Information”. The procedure uses the node attribute matrix with a factorization matrix to reproduce a power of the adjacency matrix to create representations.
Parameters: - dimensions (int) – Number of embedding dimensions. Default is 32.
- reduction_dimensions (int) – SVD reduction dimensions. Default is 64.
- svd_iterations (int) – SVD iteration count. Default is 20.
- seed (int) – Random seed. Default is 42.
- alpha (float) – Learning rate. Default is 0.01.
- iterations (int) – Matrix decomposition iterations. Default is 10.
- lambd (float) – Regularization coefficient. Default is 10.0.
-
class
FSCNMF
(dimensions: int = 32, lower_control: float = 1e-15, iterations: int = 500, alpha_1: float = 1000.0, alpha_2: float = 1.0, alpha_3: float = 1.0, beta_1: float = 1000.0, beta_2: float = 1.0, beta_3: float = 1.0, seed: int = 42)[source]¶ An implementation of “FCNMF” from the Arxiv ‘18 paper “Fusing Structure and Content via Non-negative Matrix Factorization for Embedding Information Networks”. The procedure uses a joint matrix factorization technique on the adjacency and feature matrices. The node and feature embeddings are co-regularized for alignment of the embedding spaces.
Parameters: - dimensions (int) – Number of embedding dimensions. Default is 32.
- lower_control (float) – Embedding score minimal value. Default is 10**-15.
- iterations (int) – Power iterations. Default is 500.
- alpha_1 (float) – Alignment parameter for adjacency matrix. Default is 1000.0.
- alpha_2 (float) – Adjacency basis regularization. Default is 1.0.
- alpha_3 (float) – Adjacency features regularization. Default is 1.0.
- beta_1 (float) – Alignment parameter for feature matrix. Default is 1000.0.
- beta_2 (float) – Attribute basis regularization. Default is 1.0.
- beta_3 (float) – Attribute basis regularization. Default is 1.0.
- seed (int) – Random seed value. Default is 42.
-
class
ASNE
(dimensions: int = 128, workers: int = 4, epochs: int = 100, down_sampling: float = 0.0001, learning_rate: float = 0.05, min_count: int = 1, seed: int = 42)[source]¶ An implementation of “ASNE” from the TKDE ‘18 paper “Attributed Social Network Embedding”. The procedure implicitly factorizes a concatenated adjacency matrix and feature matrix.
Parameters: - dimensions (int) – Dimensionality of embedding. Default is 128.
- workers (int) – Number of cores. Default is 4.
- epochs (int) – Number of epochs. Default is 100.
- down_sampling (float) – Down sampling frequency. Default is 0.0001.
- learning_rate (float) – HogWild! learning rate. Default is 0.05.
- min_count (int) – Minimal count of node occurrences. Default is 1.
- seed (int) – Random seed value. Default is 42.
Meta node embedding¶
-
class
NEU
(L1: float = 0.5, L2: float = 0.25, T: int = 1, seed: int = 42)[source]¶ An implementation of “NEU” from the IJCAI 17 paper “Fast Network Embedding Enhancement via High Order Proximity Approximation”. The procedure uses an arbitrary embedding and augments it by higher order proximities with a recursive meta learning algorithm.
Parameters:
Whole graph embedding¶
-
class
WaveletCharacteristic
(order: int = 5, eval_points: int = 25, theta_max: float = 2.5, tau: float = 1.0, pooling: str = 'mean')[source]¶ An implementation of “WaveCharacteristic” from the CIKM ‘21 paper “Graph Embedding via Diffusion-Wavelets-Based Node Feature Distribution Characterization”. The procedure uses characteristic functions of node features with wavelet function weights to describe node neighborhoods. These node level features are pooled by mean pooling to create graph level statistics.
Parameters: - order (int) – Adjacency matrix powers. Default is 5.
- eval_points (int) – Number of characteristic function evaluations. Default is 5.
- theta_max (float) – Largest characteristic function time value. Default is 2.5.
- tau (float) – Wave function heat - time diffusion. Default is 1.0.
- pooling (str) – Pooling function appliead to the characteristic functions. Default is “mean”.
-
fit
(graphs: List[networkx.classes.graph.Graph])[source]¶ Fitting a Geometric-Scattering model.
- Arg types:
- graphs (List of NetworkX graphs) - The graphs to be embedded.
Local Degree Profile based embedding.
-
class
LDP
(bins: int = 32)[source]¶ An implementation of “LDP” from the ICLR Representation Learning on Graphs and Manifolds Workshop ‘19 paper “A Simple Yet Effective Baseline for Non-Attributed Graph Classification”. The procedure calculates histograms of degree profiles. These concatenated histograms form the graph representations.
Parameters: bins (int) – Number of histogram bins. Default is 32. -
fit
(graphs)[source]¶ Fitting an LDP model.
- Arg types:
- graphs (List of NetworkX graphs) - The graphs to be embedded.
-
-
class
FeatherGraph
(order: int = 5, eval_points: int = 25, theta_max: float = 2.5, seed: int = 42, pooling: str = 'mean')[source]¶ An implementation of “FEATHER-G” from the CIKM ‘20 paper “Characteristic Functions on Graphs: Birds of a Feather, from Statistical Descriptors to Parametric Models”. The procedure uses characteristic functions of node features with random walk weights to describe node neighborhoods. These node level features are pooled by mean pooling to create graph level statistics.
Parameters: - order (int) – Adjacency matrix powers. Default is 5.
- eval_points (int) – Number of evaluation points. Default is 25.
- theta_max (int) – Maximal evaluation point value. Default is 2.5.
- seed (int) – Random seed value. Default is 42.
- pooling (str) – Permutation invariant pooling function, one of:
(
"mean"
,"max"
,"min"
). Default is “mean.”
-
fit
(graphs: List[networkx.classes.graph.Graph]) → None[source]¶ Fitting a graph level FEATHER model.
- Arg types:
- graphs (List of NetworkX graphs) - The graphs to be embedded.
Invariant Graph Embedding model class.
-
class
IGE
(feature_embedding_dimensions: List[int] = [3, 5], spectral_embedding_dimensions: List[int] = [10, 20], histogram_bins: List[int] = [10, 20], seed: int = 42)[source]¶ An implementation of “Invariant Graph Embedding” from the ICML 2019 Workshop on Learning and Reasoning with Graph-Structured Data paper “Invariant Embedding for Graph Classification”. The procedure computes a mixture of spectral and node embedding based features. Specifically, it uses scattering, eigenvalues and pooled node feature embeddings to create graph descriptors.
Parameters: -
fit
(graphs: List[networkx.classes.graph.Graph])[source]¶ Fitting an Invariant Graph Embedding model.
- Arg types:
- graphs (List of NetworkX graphs) - The graphs to be embedded.
-
-
class
GeoScattering
(order: int = 4, moments: int = 4, seed: int = 42)[source]¶ An implementation of “GeoScattering” from the ICML ‘19 paper “Geometric Scattering for Graph Data Analysis”. The procedure uses scattering with wavelet transforms to create graph spectral descriptors. Moments of the wavelet transformed features are used as graph level features for the embedding.
Parameters: -
fit
(graphs: List[networkx.classes.graph.Graph])[source]¶ Fitting a Geometric-Scattering model.
- Arg types:
- graphs (List of NetworkX graphs) - The graphs to be embedded.
-
-
class
GL2Vec
(wl_iterations: int = 2, dimensions: int = 128, workers: int = 4, down_sampling: float = 0.0001, epochs: int = 10, learning_rate: float = 0.025, min_count: int = 5, seed: int = 42, erase_base_features: bool = False)[source]¶ An implementation of “GL2Vec” from the ICONIP ‘19 paper “GL2vec: Graph Embedding Enriched by Line Graphs with Edge Features”. First, the algorithm creates the line graph of each graph in the graph dataset. The procedure creates Weisfeiler-Lehman tree features for nodes in graphs. Using these features a document (graph) - feature co-occurrence matrix is decomposed in order to generate representations for the graphs.
The procedure assumes that nodes have no string feature present and the WL-hashing defaults to the degree centrality. However, if a node feature with the key “feature” is supported for the nodes the feature extraction happens based on the values of this key.
Parameters: - wl_iterations (int) – Number of Weisfeiler-Lehman iterations. Default is 2.
- dimensions (int) – Dimensionality of embedding. Default is 128.
- workers (int) – Number of cores. Default is 4.
- down_sampling (float) – Down sampling frequency. Default is 0.0001.
- epochs (int) – Number of epochs. Default is 10.
- learning_rate (float) – HogWild! learning rate. Default is 0.025.
- min_count (int) – Minimal count of graph feature occurrences. Default is 5.
- seed (int) – Random seed for the model. Default is 42.
-
fit
(graphs: List[networkx.classes.graph.Graph])[source]¶ Fitting a GL2Vec model.
- Arg types:
- graphs (List of NetworkX graphs) - The graphs to be embedded.
-
class
NetLSD
(scale_min: float = -2.0, scale_max: float = 2.0, scale_steps: int = 250, approximations: int = 200, seed: int = 42)[source]¶ An implementation of “NetLSD” from the KDD ‘18 paper “NetLSD: Hearing the Shape of a Graph”. The procedure calculate the heat kernel trace of the normalized Laplacian matrix over a vector of time scales. If the matrix is large it switches to an approximation of the eigenvalues.
Parameters: - scale_min (float) – Time scale interval minimum. Default is -2.0.
- scale_max (float) – Time scale interval maximum. Default is 2.0.
- scale_steps (int) – Number of steps in time scale. Default is 250.
- scale_approximations (int) – Number of eigenvalue approximations. Default is 200.
- seed (int) – Random seed value. Default is 42.
-
fit
(graphs: List[networkx.classes.graph.Graph])[source]¶ Fitting a NetLSD model.
- Arg types:
- graphs (List of NetworkX graphs) - The graphs to be embedded.
-
class
SF
(dimensions: int = 128, seed: int = 42)[source]¶ An implementation of “SF” from the NeurIPS Relational Representation Learning Workshop ‘18 paper “A Simple Baseline Algorithm for Graph Classification”. The procedure calculates the k lowest eigenvalues of the normalized Laplacian. If the graph has a lower number of eigenvalues than k the representation is padded with zeros.
Parameters: -
fit
(graphs)[source]¶ Fitting an SF model.
- Arg types:
- graphs (List of NetworkX graphs) - The graphs to be embedded.
-
-
class
FGSD
(hist_bins: int = 200, hist_range: int = 20, seed: int = 42)[source]¶ An implementation of “FGSD” from the NeurIPS ‘17 paper “Hunt For The Unique, Stable, Sparse And Fast Feature Learning On Graphs”. The procedure calculates the Moore-Penrose spectrum of the normalized Laplacian. Using this spectrum the histogram of the spectral features is used as a whole graph representation.
Parameters: -
fit
(graphs: List[networkx.classes.graph.Graph])[source]¶ Fitting a FGSD model.
- Arg types:
- graphs (List of NetworkX graphs) - The graphs to be embedded.
-
-
class
Graph2Vec
(wl_iterations: int = 2, attributed: bool = False, dimensions: int = 128, workers: int = 4, down_sampling: float = 0.0001, epochs: int = 10, learning_rate: float = 0.025, min_count: int = 5, seed: int = 42, erase_base_features: bool = False)[source]¶ An implementation of “Graph2Vec” from the MLGWorkshop ‘17 paper “Graph2Vec: Learning Distributed Representations of Graphs”. The procedure creates Weisfeiler-Lehman tree features for nodes in graphs. Using these features a document (graph) - feature co-occurrence matrix is decomposed in order to generate representations for the graphs.
The procedure assumes that nodes have no string feature present and the WL-hashing defaults to the degree centrality. However, if a node feature with the key “feature” is supported for the nodes the feature extraction happens based on the values of this key.
Parameters: - wl_iterations (int) – Number of Weisfeiler-Lehman iterations. Default is 2.
- attributed (bool) – Presence of graph attributes. Default is False.
- dimensions (int) – Dimensionality of embedding. Default is 128.
- workers (int) – Number of cores. Default is 4.
- down_sampling (float) – Down sampling frequency. Default is 0.0001.
- epochs (int) – Number of epochs. Default is 10.
- learning_rate (float) – HogWild! learning rate. Default is 0.025.
- min_count (int) – Minimal count of graph feature occurrences. Default is 5.
- seed (int) – Random seed for the model. Default is 42.
- erase_base_features (bool) – Erasing the base features. Default is False.
-
fit
(graphs: List[networkx.classes.graph.Graph])[source]¶ Fitting a Graph2Vec model.
- Arg types:
- graphs (List of NetworkX graphs) - The graphs to be embedded.