markovianSBM
BaumWelch module
- class markovianSBM.BaumWelch.BaumWelch[source]
Bases:
objectClass which performs the Baum-Welch algorithm and that uses it to solve the link prediction and the collaborative filtering problems.
- BaumWelch(m, n, delta, nbite, eps=0.01, K=None)[source]
BaumWelch algorithm that iteratively perform the forward, backward and update steps.
- Parameters
m – We observe fully the graph until time m
n – We do not observe reliably the connection involving nodes between time m and n
delta – We observe the connections between nodes n and n+delta
nbite – Number of iterations for the Baum Welch algorithm
eps – Parameter use to initialize the matrix of emission probabilities O
K – Number of communities
Using this function with a specified K can be typically interesting when we try to estimate the number of clusters with the procedure describes in the paper.
- RES(ini, alpha, beta, O, observed_links, observed_nodes, m, n)[source]
Solve robustly the collaborative filtering problem when we observe fully the graph at time m and we want to predict the community of node n when we observe only a subset of the edges that connects (or not) n with the nodes 1,…,m.
- Parameters
ini – initial distribution of the Markov chain given by the Baum-Welch algorithm
alpha – Matrix of size $K imes n+delta$ given by the Baum-Welch algorithm
beta – Matrix of size $K imes n+delta$ given by the Baum-Welch algorithm
observed_links – A vector with binary variables of length less than m. For any i, observed_links[i] is 1 if and only if we observe an edge between nodes n and observed_nodes[i] (and 0 otherwise).
observed_nodes – A vector with the same length as the vector “observed_links”. It contains the nodes for which we observe the connection (or not) with node n
m – We observe fully the graph until time m
n – Node that we want to learn the community
Our implementation do not respect striclty the formula of the paper. We get rid of quantities that do not depend on the cluster k of node n. This allows to avoid underflow issues.
- adjacency_BaumWelch(m, n, delta)[source]
Builds an adjacency matrix from the attribute self.X removing the nodes between m+1 and n-1 (that we consider to be not reliable).
- Parameters
m – We observe fully the graph until time m
n – We do not observe reliably how nodes between m+1 and n-1 are connected
delta – We observe reliably how nodes between n and n+delta are connected
- backward(P, O, m, n, delta, Y, K=None)[source]
Backward step of the BaumWelch algorithm. We observe the connections between nodes 0,…,m,n,…,n+delta. However, we consider that the edges involving nodes between m+1 and n-1 are not reliable and we do not take into account the estimated clusters for the nodes between m+1 and n-1.
- Parameters
P – Transition matrix of the Markov chain
O – matrix of emission probabilities
m – We observe fully the graph until time m
n – We do not observe reliably the connection involving nodes between time m and n
delta – We observe the connections between nodes n and n+delta
Y – vector of length n+delta+1 of the estimated communities
K – Number of communities
Using this function with a specified K can be typically interesting when we try to estimate the number of clusters with the procedure describes in the paper.
- collaborative_filtering_optimalMAP(alpha, beta, observed_links, observed_nodes, m, n)[source]
Solve robustly the collaborative filtering problem using the optimal approach when we observe fully the graph at time m and we want to predict the community of node n when we observe only a subset of the edges that connects (or not) n with the nodes 1,…,m.
- Parameters
alpha – Matrix of size $K imes n+delta$ given by the Baum-Welch algorithm
beta – Matrix of size $K imes n+delta$ given by the Baum-Welch algorithm
observed_links – A vector with binary variables of length less than m. For any i, observed_links[i] is 1 if and only if we observe an edge between nodes n and observed_nodes[i] (and 0 otherwise).
observed_nodes – A vector with the same length as the vector “observed_links”. It contains the nodes for which we observe the connection (or not) with node n
m – We observe fully the graph until time m
n – Node that we want to learn the community
- collaborative_filtering_pluginMAP(alpha, beta, observed_links, observed_nodes, m, n)[source]
Solve the collaborative filtering problem using the plugin approach when we observe fully the graph at time m and we want to predict the community of node n when we observe only a subset of the edges that connects (or not) n with the nodes 1,…,m.
- Parameters
alpha – Matrix of size $K imes n+delta$ given by the Baum-Welch algorithm
beta – Matrix of size $K imes n+delta$ given by the Baum-Welch algorithm
observed_links – A vector with binary variables of length less than m. For any i, observed_links[i] is 1 if and only if we observe an edge between nodes n and observed_nodes[i] (and 0 otherwise).
observed_nodes – A vector with the same length as the vector “observed_links”. It contains the nodes for which we observe the connection (or not) with node n
m – We observe fully the graph until time m
n – Node that we want to learn the community
- collaborative_filtering_robustMAP(ini, alpha, beta, O, observed_links, observed_nodes, m, n, Pbaum=None)[source]
Solve robustly the collaborative filtering problem when we observe fully the graph at time m and we want to predict the community of node n when we observe only a subset of the edges that connects (or not) n with the nodes 1,…,m.
- Parameters
ini – initial distribution of the Markov chain given by the Baum-Welch algorithm
alpha – Matrix of size $K imes n+delta$ given by the Baum-Welch algorithm
beta – Matrix of size $K imes n+delta$ given by the Baum-Welch algorithm
observed_links – A vector with binary variables of length less than m. For any i, observed_links[i] is 1 if and only if we observe an edge between nodes n and observed_nodes[i] (and 0 otherwise).
observed_nodes – A vector with the same length as the vector “observed_links”. It contains the nodes for which we observe the connection (or not) with node n
m – We observe fully the graph until time m
n – Node that we want to learn the community
Our implementation do not respect striclty the formula of the paper. We get rid of quantities that do not depend on the cluster k of node n. This allows to avoid underflow issues.
- forward(ini, P, O, m, n, delta, Y, K=None)[source]
Forward step of the BaumWelch algorithm. We observe the connections between nodes 0,…,m,n,…,n+delta. However, we consider that the edges involving nodes between m+1 and n-1 are not reliable and we do not take into account the estimated clusters for the nodes between m+1 and n-1.
- Parameters
ini – initial distribution of the Markov chain
P – Transition matrix of the Markov chain
O – matrix of emission probabilities
m – We observe fully the graph until time m
n – We do not observe reliably the connection involving nodes between time m and n
delta – We observe the connections between nodes n and n+delta
Y – vector of length n+delta+1 of the estimated communities
K – Number of communities
Using this function with a specified K can be typically interesting when we try to estimate the number of clusters with the procedure describes in the paper.
- update(alpha, beta, P, O, m, n, delta, Y, K=None)[source]
Update step of the BaumWelch algorithm: the parameters of the HMM are updated and returned.
- Parameters
alpha – Matrix of size $K imes n+delta$ giving the output of the “forward” method
beta – Matrix of size $K imes n+delta$ giving the output of the “backward” method
P – Transition matrix of the Markov chain
O – matrix of emission probabilities
m – We observe fully the graph until time m
n – We do not observe reliably the connection involving nodes between time m and n
delta – We observe the connections between nodes n and n+delta
Y – vector of length n+delta+1 of the estimated communities
K – Number of communities
Using this function with a specified K can be typically interesting when we try to estimate the number of clusters with the procedure describes in the paper.
Clustering module
- class markovianSBM.Clustering.Clustering(n, K)[source]
Bases:
objectClass that performs the final rounding step on the rows of the matrix $hat{B}$ which is the optimal solution of the SDP relaxation of the K-means problem.
- Kmedoids(barx, bary, C)[source]
Kmedoid algorithm that performs a rounding step on the rows of $hat{B}$.
- Parameters
barx – Output of the method ‘solve_relaxed_LP’
bary – Output of the method ‘solve_relaxed_LP’
C – Output of the method ‘solve_relaxed_LP’
- build_partition(clust)[source]
Given a list clust that associates to each node its community, this methods builds the associated partition of the noes of the graph.
- Parameters
clust – List of estimated clusters for the n nodes of the graph
- find_permutation(true_partition, approx_partition)[source]
Find the permutation between the names of the true communities and the ones estimated by our algorithm.
- Parameters
true_partition – True partition of the nodes in the graph according to their clusters
approx_partition – Estimated partition of the nodes
RelaxedKmeans module
- class markovianSBM.RelaxedKmeans.RelaxedKmeans[source]
Bases:
objectClass solving the Semi Definite Program which is a relaxed version of the K means problem.
- compute_costs()[source]
Show the value of the objective function of the SDP with the matrix $B^*$ which is the solution of the K means problem and with $hat{B}$ which is the optimal solution of the SDP.
- visualize_B_matrices()[source]
Method providing a vizualisation of the matrices $B^*$ (optimal solution of the K-means problem) and $hat{B}$ optimal solution of the SDP relaxation. This allows to easily see if a final rounding step on the rows of $hat{B}$ could allow to reach a relevant clustering of the nodes of the graph.
Estimation module
SBM module
- class markovianSBM.SBM.SBM(n, K, ini_distribution='uniform', framework='iid', X=None, Q=None, P=None, save_B_matrices=False)[source]
Bases:
RelaxedKmeans,Clustering,Estimation,BaumWelchMain class building the graph and running the algorithm to recover communities.
- adjacency_matrix(X=None)[source]
Builds the adjacency matrix of the graph.
- Parameters
X – Pre-defined adjacency matrix
- edges_matrix(Q)[source]
Builds the connectivity matrix Q.
- Parameters
Q – Connectivity matrix of size $K imes K$
- generate_clusters(P)[source]
Samples the communities of the nodes.
- Parameters
P – Transition matrix of the Markov chain of size $K imes K$
- initial_distribution()[source]
Defines the distribution of the community of the first node of the graph.
..note: You can add distribution by yourself in the method