markovianSBM

BaumWelch module

class markovianSBM.BaumWelch.BaumWelch[source]

Bases: object

Class which performs the Baum-Welch algorithm and that uses it to solve the link prediction and the collaborative filtering problems.

BaumWelch(m, n, delta, nbite, eps=0.01, K=None)[source]

BaumWelch algorithm that iteratively perform the forward, backward and update steps.

Parameters
  • m – We observe fully the graph until time m

  • n – We do not observe reliably the connection involving nodes between time m and n

  • delta – We observe the connections between nodes n and n+delta

  • nbite – Number of iterations for the Baum Welch algorithm

  • eps – Parameter use to initialize the matrix of emission probabilities O

  • K – Number of communities

Using this function with a specified K can be typically interesting when we try to estimate the number of clusters with the procedure describes in the paper.

RES(ini, alpha, beta, O, observed_links, observed_nodes, m, n)[source]

Solve robustly the collaborative filtering problem when we observe fully the graph at time m and we want to predict the community of node n when we observe only a subset of the edges that connects (or not) n with the nodes 1,…,m.

Parameters
  • ini – initial distribution of the Markov chain given by the Baum-Welch algorithm

  • alpha – Matrix of size $K imes n+delta$ given by the Baum-Welch algorithm

  • beta – Matrix of size $K imes n+delta$ given by the Baum-Welch algorithm

  • observed_links – A vector with binary variables of length less than m. For any i, observed_links[i] is 1 if and only if we observe an edge between nodes n and observed_nodes[i] (and 0 otherwise).

  • observed_nodes – A vector with the same length as the vector “observed_links”. It contains the nodes for which we observe the connection (or not) with node n

  • m – We observe fully the graph until time m

  • n – Node that we want to learn the community

Our implementation do not respect striclty the formula of the paper. We get rid of quantities that do not depend on the cluster k of node n. This allows to avoid underflow issues.

adjacency_BaumWelch(m, n, delta)[source]

Builds an adjacency matrix from the attribute self.X removing the nodes between m+1 and n-1 (that we consider to be not reliable).

Parameters
  • m – We observe fully the graph until time m

  • n – We do not observe reliably how nodes between m+1 and n-1 are connected

  • delta – We observe reliably how nodes between n and n+delta are connected

backward(P, O, m, n, delta, Y, K=None)[source]

Backward step of the BaumWelch algorithm. We observe the connections between nodes 0,…,m,n,…,n+delta. However, we consider that the edges involving nodes between m+1 and n-1 are not reliable and we do not take into account the estimated clusters for the nodes between m+1 and n-1.

Parameters
  • P – Transition matrix of the Markov chain

  • O – matrix of emission probabilities

  • m – We observe fully the graph until time m

  • n – We do not observe reliably the connection involving nodes between time m and n

  • delta – We observe the connections between nodes n and n+delta

  • Y – vector of length n+delta+1 of the estimated communities

  • K – Number of communities

Using this function with a specified K can be typically interesting when we try to estimate the number of clusters with the procedure describes in the paper.

collaborative_filtering_optimalMAP(alpha, beta, observed_links, observed_nodes, m, n)[source]

Solve robustly the collaborative filtering problem using the optimal approach when we observe fully the graph at time m and we want to predict the community of node n when we observe only a subset of the edges that connects (or not) n with the nodes 1,…,m.

Parameters
  • alpha – Matrix of size $K imes n+delta$ given by the Baum-Welch algorithm

  • beta – Matrix of size $K imes n+delta$ given by the Baum-Welch algorithm

  • observed_links – A vector with binary variables of length less than m. For any i, observed_links[i] is 1 if and only if we observe an edge between nodes n and observed_nodes[i] (and 0 otherwise).

  • observed_nodes – A vector with the same length as the vector “observed_links”. It contains the nodes for which we observe the connection (or not) with node n

  • m – We observe fully the graph until time m

  • n – Node that we want to learn the community

collaborative_filtering_pluginMAP(alpha, beta, observed_links, observed_nodes, m, n)[source]

Solve the collaborative filtering problem using the plugin approach when we observe fully the graph at time m and we want to predict the community of node n when we observe only a subset of the edges that connects (or not) n with the nodes 1,…,m.

Parameters
  • alpha – Matrix of size $K imes n+delta$ given by the Baum-Welch algorithm

  • beta – Matrix of size $K imes n+delta$ given by the Baum-Welch algorithm

  • observed_links – A vector with binary variables of length less than m. For any i, observed_links[i] is 1 if and only if we observe an edge between nodes n and observed_nodes[i] (and 0 otherwise).

  • observed_nodes – A vector with the same length as the vector “observed_links”. It contains the nodes for which we observe the connection (or not) with node n

  • m – We observe fully the graph until time m

  • n – Node that we want to learn the community

collaborative_filtering_robustMAP(ini, alpha, beta, O, observed_links, observed_nodes, m, n, Pbaum=None)[source]

Solve robustly the collaborative filtering problem when we observe fully the graph at time m and we want to predict the community of node n when we observe only a subset of the edges that connects (or not) n with the nodes 1,…,m.

Parameters
  • ini – initial distribution of the Markov chain given by the Baum-Welch algorithm

  • alpha – Matrix of size $K imes n+delta$ given by the Baum-Welch algorithm

  • beta – Matrix of size $K imes n+delta$ given by the Baum-Welch algorithm

  • observed_links – A vector with binary variables of length less than m. For any i, observed_links[i] is 1 if and only if we observe an edge between nodes n and observed_nodes[i] (and 0 otherwise).

  • observed_nodes – A vector with the same length as the vector “observed_links”. It contains the nodes for which we observe the connection (or not) with node n

  • m – We observe fully the graph until time m

  • n – Node that we want to learn the community

Our implementation do not respect striclty the formula of the paper. We get rid of quantities that do not depend on the cluster k of node n. This allows to avoid underflow issues.

forward(ini, P, O, m, n, delta, Y, K=None)[source]

Forward step of the BaumWelch algorithm. We observe the connections between nodes 0,…,m,n,…,n+delta. However, we consider that the edges involving nodes between m+1 and n-1 are not reliable and we do not take into account the estimated clusters for the nodes between m+1 and n-1.

Parameters
  • ini – initial distribution of the Markov chain

  • P – Transition matrix of the Markov chain

  • O – matrix of emission probabilities

  • m – We observe fully the graph until time m

  • n – We do not observe reliably the connection involving nodes between time m and n

  • delta – We observe the connections between nodes n and n+delta

  • Y – vector of length n+delta+1 of the estimated communities

  • K – Number of communities

Using this function with a specified K can be typically interesting when we try to estimate the number of clusters with the procedure describes in the paper.

update(alpha, beta, P, O, m, n, delta, Y, K=None)[source]

Update step of the BaumWelch algorithm: the parameters of the HMM are updated and returned.

Parameters
  • alpha – Matrix of size $K imes n+delta$ giving the output of the “forward” method

  • beta – Matrix of size $K imes n+delta$ giving the output of the “backward” method

  • P – Transition matrix of the Markov chain

  • O – matrix of emission probabilities

  • m – We observe fully the graph until time m

  • n – We do not observe reliably the connection involving nodes between time m and n

  • delta – We observe the connections between nodes n and n+delta

  • Y – vector of length n+delta+1 of the estimated communities

  • K – Number of communities

Using this function with a specified K can be typically interesting when we try to estimate the number of clusters with the procedure describes in the paper.

Clustering module

class markovianSBM.Clustering.Clustering(n, K)[source]

Bases: object

Class that performs the final rounding step on the rows of the matrix $hat{B}$ which is the optimal solution of the SDP relaxation of the K-means problem.

Kmedoids(barx, bary, C)[source]

Kmedoid algorithm that performs a rounding step on the rows of $hat{B}$.

Parameters
  • barx – Output of the method ‘solve_relaxed_LP’

  • bary – Output of the method ‘solve_relaxed_LP’

  • C – Output of the method ‘solve_relaxed_LP’

build_partition(clust)[source]

Given a list clust that associates to each node its community, this methods builds the associated partition of the noes of the graph.

Parameters

clust – List of estimated clusters for the n nodes of the graph

find_permutation(true_partition, approx_partition)[source]

Find the permutation between the names of the true communities and the ones estimated by our algorithm.

Parameters
  • true_partition – True partition of the nodes in the graph according to their clusters

  • approx_partition – Estimated partition of the nodes

solve_relaxed_LP(M)[source]

Prelimanary to run the K-medoid algorithm.

Parameters

M – M is the output of the relaxed SDP problem.

markovianSBM.Clustering.add_liste(Level, Liste, Node, UP=True)[source]
markovianSBM.Clustering.recu(node, level, graph, liste, node2ind, dejavu, dico_closest, nodes)[source]

RelaxedKmeans module

class markovianSBM.RelaxedKmeans.RelaxedKmeans[source]

Bases: object

Class solving the Semi Definite Program which is a relaxed version of the K means problem.

compute_costs()[source]

Show the value of the objective function of the SDP with the matrix $B^*$ which is the solution of the K means problem and with $hat{B}$ which is the optimal solution of the SDP.

solve_relaxed_SDP()[source]

Solve the Semi Deifnite Program and save the optimal solution.

visualize_B_matrices()[source]

Method providing a vizualisation of the matrices $B^*$ (optimal solution of the K-means problem) and $hat{B}$ optimal solution of the SDP relaxation. This allows to easily see if a final rounding step on the rows of $hat{B}$ could allow to reach a relevant clustering of the nodes of the graph.

Estimation module

class markovianSBM.Estimation.Estimation[source]

Bases: object

Class related to the estimation of the parameters of our model.

estimate_connectivity_matrix()[source]

Estimates the connectivity matrix.

estimate_effectifs()[source]

Computes the sizes of each clusters

estimate_parameters()[source]

Estimates the parameters of the model.

estimate_transition_matrix()[source]

Estimates the transition matrix and the invariant measure of the chain.

SBM module

class markovianSBM.SBM.SBM(n, K, ini_distribution='uniform', framework='iid', X=None, Q=None, P=None, save_B_matrices=False)[source]

Bases: RelaxedKmeans, Clustering, Estimation, BaumWelch

Main class building the graph and running the algorithm to recover communities.

adjacency_matrix(X=None)[source]

Builds the adjacency matrix of the graph.

Parameters

X – Pre-defined adjacency matrix

bernoulli(q)[source]

Sample from the Bernoulli distribution with parameter q.

edges_matrix(Q)[source]

Builds the connectivity matrix Q.

Parameters

Q – Connectivity matrix of size $K imes K$

effectif_clusters()[source]

Computes the sizes of each clusters

estimate_partition()[source]

Runs the algorithm to estimate communities of the nodes.

generate_clusters(P)[source]

Samples the communities of the nodes.

Parameters

P – Transition matrix of the Markov chain of size $K imes K$

initial_distribution()[source]

Defines the distribution of the community of the first node of the graph.

..note: You can add distribution by yourself in the method

next_state(i)[source]

Method used to sample the community of the next code knowing that the community of the previous node was i.

Parameters

i – Integer representing the community of the current node

proportion_error()[source]

Compute the misclassification error of our estimated clustering of the nodes.