iddn.tools
Utility functions of iDDN
Functions
|
Calculate the recall, precision, and F1 scores for iDDN estimates given ground truth |
|
Find the common and differential network from iDDN estimates |
|
A convenient pipeline for iDDN |
|
Convert iDDN results to Pandas data frames |
|
Convert the adjacency matrix to Pandas data frame |
Module Contents
- iddn.tools.evaluate_metrics(net_est: numpy.ndarray, net_gt: numpy.ndarray)
Calculate the recall, precision, and F1 scores for iDDN estimates given ground truth
- Parameters:
net_est (array_like) – Estimated network dependency matrix. The weights will be binarized.
net_gt (array_like) – Ground truth network dependency matrix. The weights will be binarized.
- Return type:
None
- iddn.tools.get_comm_diff_network(out_iddn)
Find the common and differential network from iDDN estimates
- Parameters:
out_iddn ((2,P,P) arraylike) – The raw output of iDDN. P is the number of features.
- Return type:
Common network and differential network matrices
- iddn.tools.iddn_basic_pipeline(dat1, dat2, dep_mat=None, lambda1=0.2, lambda2=0.05)
A convenient pipeline for iDDN
Let P be the number of features (like genes or any molecules). N1 is the number of samples in condition1, N2 in conditions.
The data will be standardized by iDDN, so users do not need to standardize it. The data from two conditions can have different sample size, but the feature number must be the same.
- Parameters:
dat1 ((N1,P) array_like) – The data in condition 1.
dat2 ((N2,P) array_like) – The data in condition 2.
dep_mat ((P,P) array_like) – Constraints or dependency matrix
lambda1 (float) – The penalty for overall sparsity, from 0 to 1.
lambda2 (float) – The penalty for the discrepancies between the networks under two conditions
- Returns:
A dictionary containing results. comm (P by P) is the estimated common network,
diff (P by P) the differential network.
g1 (P by P) is the network under the first condition. g2 (P by P) is the network under the second condition.
out_iddn (2 by P by P) is the raw output of iDDN.
- iddn.tools.iddn_output_to_csv(out_iddn, node_names)
Convert iDDN results to Pandas data frames
This is useful for sharing the results, as well as visualization. Each row of the data frame is one edge. There are four columns: the first node in an edge, the second node in an edge, the condition at which the edge exist, the weight, and the color of that edge. For common networks, the conditions are all set as 0. For differential networks, if an edge only exists in the first condition, the condition is set as 0. If an edge only exists in the second condition, the condition is set as 1.
Let P be the number of features.
- Parameters:
out_iddn ((2,P,P) array_like) – The raw output of iDDN.
node_names – The list of node names to output.
- Returns:
df_edge_comm (pd.DataFrame) – A Pandas data frame for common network.
df_edge_diff (pd.DataFrame) – A Pandas data frame for differential network.
nodes_show_comm (array_like) – The list of node names that is present in the estimated common network. In other words, we only keep nodes that has at least one edge with other nodes.
nodes_show_diff (array_like) – The list of node names that is present in the estimated common network.
- iddn.tools.collect_edges(conn_mat, wt_mat, node_names, group=0, color_in='blue')
Convert the adjacency matrix to Pandas data frame
For differential network, call this function twice, once for each condition, and then combine them.
Let P be the number of features.
- Parameters:
conn_mat ((P,P) array_like) – The adjacency or connectivity matrix
wt_mat ((P,P) array_like) – Similar to conn_mat, but with weights
node_names – The names of all nodes
group (int) – The index of condition, can be 0 or 1
color_in (str) – The color for this data frame.
- Returns:
df_edge (pd.DataFrame) – A Pandas data frame for the network.
nodes_show (array_like) – The list of node names that is present in the estimated network.