iddn.tools

Utility functions of iDDN

Functions

evaluate_metrics(net_est, net_gt)

Calculate the recall, precision, and F1 scores for iDDN estimates given ground truth

get_comm_diff_network(out_iddn)

Find the common and differential network from iDDN estimates

iddn_basic_pipeline(dat1, dat2[, dep_mat, lambda1, ...])

A convenient pipeline for iDDN

iddn_output_to_csv(out_iddn, node_names)

Convert iDDN results to Pandas data frames

collect_edges(conn_mat, wt_mat, node_names[, group, ...])

Convert the adjacency matrix to Pandas data frame

Module Contents

iddn.tools.evaluate_metrics(net_est: numpy.ndarray, net_gt: numpy.ndarray)

Calculate the recall, precision, and F1 scores for iDDN estimates given ground truth

Parameters:
  • net_est (array_like) – Estimated network dependency matrix. The weights will be binarized.

  • net_gt (array_like) – Ground truth network dependency matrix. The weights will be binarized.

Return type:

None

iddn.tools.get_comm_diff_network(out_iddn)

Find the common and differential network from iDDN estimates

Parameters:

out_iddn ((2,P,P) arraylike) – The raw output of iDDN. P is the number of features.

Return type:

Common network and differential network matrices

iddn.tools.iddn_basic_pipeline(dat1, dat2, dep_mat=None, lambda1=0.2, lambda2=0.05)

A convenient pipeline for iDDN

Let P be the number of features (like genes or any molecules). N1 is the number of samples in condition1, N2 in conditions.

The data will be standardized by iDDN, so users do not need to standardize it. The data from two conditions can have different sample size, but the feature number must be the same.

Parameters:
  • dat1 ((N1,P) array_like) – The data in condition 1.

  • dat2 ((N2,P) array_like) – The data in condition 2.

  • dep_mat ((P,P) array_like) – Constraints or dependency matrix

  • lambda1 (float) – The penalty for overall sparsity, from 0 to 1.

  • lambda2 (float) – The penalty for the discrepancies between the networks under two conditions

Returns:

  • A dictionary containing results. comm (P by P) is the estimated common network,

  • diff (P by P) the differential network.

  • g1 (P by P) is the network under the first condition. g2 (P by P) is the network under the second condition.

  • out_iddn (2 by P by P) is the raw output of iDDN.

iddn.tools.iddn_output_to_csv(out_iddn, node_names)

Convert iDDN results to Pandas data frames

This is useful for sharing the results, as well as visualization. Each row of the data frame is one edge. There are four columns: the first node in an edge, the second node in an edge, the condition at which the edge exist, the weight, and the color of that edge. For common networks, the conditions are all set as 0. For differential networks, if an edge only exists in the first condition, the condition is set as 0. If an edge only exists in the second condition, the condition is set as 1.

Let P be the number of features.

Parameters:
  • out_iddn ((2,P,P) array_like) – The raw output of iDDN.

  • node_names – The list of node names to output.

Returns:

  • df_edge_comm (pd.DataFrame) – A Pandas data frame for common network.

  • df_edge_diff (pd.DataFrame) – A Pandas data frame for differential network.

  • nodes_show_comm (array_like) – The list of node names that is present in the estimated common network. In other words, we only keep nodes that has at least one edge with other nodes.

  • nodes_show_diff (array_like) – The list of node names that is present in the estimated common network.

iddn.tools.collect_edges(conn_mat, wt_mat, node_names, group=0, color_in='blue')

Convert the adjacency matrix to Pandas data frame

For differential network, call this function twice, once for each condition, and then combine them.

Let P be the number of features.

Parameters:
  • conn_mat ((P,P) array_like) – The adjacency or connectivity matrix

  • wt_mat ((P,P) array_like) – Similar to conn_mat, but with weights

  • node_names – The names of all nodes

  • group (int) – The index of condition, can be 0 or 1

  • color_in (str) – The color for this data frame.

Returns:

  • df_edge (pd.DataFrame) – A Pandas data frame for the network.

  • nodes_show (array_like) – The list of node names that is present in the estimated network.