BaumWelch

BaumWelch module

class markovianSBM.BaumWelch.BaumWelch[source]

Bases: object

Class which performs the Baum-Welch algorithm and that uses it to solve the link prediction and the collaborative filtering problems.

BaumWelch(m, n, delta, nbite, eps=0.01, K=None)[source]

BaumWelch algorithm that iteratively perform the forward, backward and update steps.

Parameters
  • m – We observe fully the graph until time m

  • n – We do not observe reliably the connection involving nodes between time m and n

  • delta – We observe the connections between nodes n and n+delta

  • nbite – Number of iterations for the Baum Welch algorithm

  • eps – Parameter use to initialize the matrix of emission probabilities O

  • K – Number of communities

Using this function with a specified K can be typically interesting when we try to estimate the number of clusters with the procedure describes in the paper.

RES(ini, alpha, beta, O, observed_links, observed_nodes, m, n)[source]

Solve robustly the collaborative filtering problem when we observe fully the graph at time m and we want to predict the community of node n when we observe only a subset of the edges that connects (or not) n with the nodes 1,…,m.

Parameters
  • ini – initial distribution of the Markov chain given by the Baum-Welch algorithm

  • alpha – Matrix of size $K imes n+delta$ given by the Baum-Welch algorithm

  • beta – Matrix of size $K imes n+delta$ given by the Baum-Welch algorithm

  • observed_links – A vector with binary variables of length less than m. For any i, observed_links[i] is 1 if and only if we observe an edge between nodes n and observed_nodes[i] (and 0 otherwise).

  • observed_nodes – A vector with the same length as the vector “observed_links”. It contains the nodes for which we observe the connection (or not) with node n

  • m – We observe fully the graph until time m

  • n – Node that we want to learn the community

Our implementation do not respect striclty the formula of the paper. We get rid of quantities that do not depend on the cluster k of node n. This allows to avoid underflow issues.

adjacency_BaumWelch(m, n, delta)[source]

Builds an adjacency matrix from the attribute self.X removing the nodes between m+1 and n-1 (that we consider to be not reliable).

Parameters
  • m – We observe fully the graph until time m

  • n – We do not observe reliably how nodes between m+1 and n-1 are connected

  • delta – We observe reliably how nodes between n and n+delta are connected

backward(P, O, m, n, delta, Y, K=None)[source]

Backward step of the BaumWelch algorithm. We observe the connections between nodes 0,…,m,n,…,n+delta. However, we consider that the edges involving nodes between m+1 and n-1 are not reliable and we do not take into account the estimated clusters for the nodes between m+1 and n-1.

Parameters
  • P – Transition matrix of the Markov chain

  • O – matrix of emission probabilities

  • m – We observe fully the graph until time m

  • n – We do not observe reliably the connection involving nodes between time m and n

  • delta – We observe the connections between nodes n and n+delta

  • Y – vector of length n+delta+1 of the estimated communities

  • K – Number of communities

Using this function with a specified K can be typically interesting when we try to estimate the number of clusters with the procedure describes in the paper.

collaborative_filtering_optimalMAP(alpha, beta, observed_links, observed_nodes, m, n)[source]

Solve robustly the collaborative filtering problem using the optimal approach when we observe fully the graph at time m and we want to predict the community of node n when we observe only a subset of the edges that connects (or not) n with the nodes 1,…,m.

Parameters
  • alpha – Matrix of size $K imes n+delta$ given by the Baum-Welch algorithm

  • beta – Matrix of size $K imes n+delta$ given by the Baum-Welch algorithm

  • observed_links – A vector with binary variables of length less than m. For any i, observed_links[i] is 1 if and only if we observe an edge between nodes n and observed_nodes[i] (and 0 otherwise).

  • observed_nodes – A vector with the same length as the vector “observed_links”. It contains the nodes for which we observe the connection (or not) with node n

  • m – We observe fully the graph until time m

  • n – Node that we want to learn the community

collaborative_filtering_pluginMAP(alpha, beta, observed_links, observed_nodes, m, n)[source]

Solve the collaborative filtering problem using the plugin approach when we observe fully the graph at time m and we want to predict the community of node n when we observe only a subset of the edges that connects (or not) n with the nodes 1,…,m.

Parameters
  • alpha – Matrix of size $K imes n+delta$ given by the Baum-Welch algorithm

  • beta – Matrix of size $K imes n+delta$ given by the Baum-Welch algorithm

  • observed_links – A vector with binary variables of length less than m. For any i, observed_links[i] is 1 if and only if we observe an edge between nodes n and observed_nodes[i] (and 0 otherwise).

  • observed_nodes – A vector with the same length as the vector “observed_links”. It contains the nodes for which we observe the connection (or not) with node n

  • m – We observe fully the graph until time m

  • n – Node that we want to learn the community

collaborative_filtering_robustMAP(ini, alpha, beta, O, observed_links, observed_nodes, m, n, Pbaum=None)[source]

Solve robustly the collaborative filtering problem when we observe fully the graph at time m and we want to predict the community of node n when we observe only a subset of the edges that connects (or not) n with the nodes 1,…,m.

Parameters
  • ini – initial distribution of the Markov chain given by the Baum-Welch algorithm

  • alpha – Matrix of size $K imes n+delta$ given by the Baum-Welch algorithm

  • beta – Matrix of size $K imes n+delta$ given by the Baum-Welch algorithm

  • observed_links – A vector with binary variables of length less than m. For any i, observed_links[i] is 1 if and only if we observe an edge between nodes n and observed_nodes[i] (and 0 otherwise).

  • observed_nodes – A vector with the same length as the vector “observed_links”. It contains the nodes for which we observe the connection (or not) with node n

  • m – We observe fully the graph until time m

  • n – Node that we want to learn the community

Our implementation do not respect striclty the formula of the paper. We get rid of quantities that do not depend on the cluster k of node n. This allows to avoid underflow issues.

forward(ini, P, O, m, n, delta, Y, K=None)[source]

Forward step of the BaumWelch algorithm. We observe the connections between nodes 0,…,m,n,…,n+delta. However, we consider that the edges involving nodes between m+1 and n-1 are not reliable and we do not take into account the estimated clusters for the nodes between m+1 and n-1.

Parameters
  • ini – initial distribution of the Markov chain

  • P – Transition matrix of the Markov chain

  • O – matrix of emission probabilities

  • m – We observe fully the graph until time m

  • n – We do not observe reliably the connection involving nodes between time m and n

  • delta – We observe the connections between nodes n and n+delta

  • Y – vector of length n+delta+1 of the estimated communities

  • K – Number of communities

Using this function with a specified K can be typically interesting when we try to estimate the number of clusters with the procedure describes in the paper.

update(alpha, beta, P, O, m, n, delta, Y, K=None)[source]

Update step of the BaumWelch algorithm: the parameters of the HMM are updated and returned.

Parameters
  • alpha – Matrix of size $K imes n+delta$ giving the output of the “forward” method

  • beta – Matrix of size $K imes n+delta$ giving the output of the “backward” method

  • P – Transition matrix of the Markov chain

  • O – matrix of emission probabilities

  • m – We observe fully the graph until time m

  • n – We do not observe reliably the connection involving nodes between time m and n

  • delta – We observe the connections between nodes n and n+delta

  • Y – vector of length n+delta+1 of the estimated communities

  • K – Number of communities

Using this function with a specified K can be typically interesting when we try to estimate the number of clusters with the procedure describes in the paper.