1. Finding co-accessible modules (CCAN)¶
pip show circe-py
Name: circe-py
Version: 0.4.0
Summary: Circe: Package for building co-accessibility networks from ATAC-seq data.
Home-page: https://github.com/cantinilab/circe
Author:
Author-email: Rémi Trimbour <remi.trimbour@pasteur.fr>
License: GPL-3.0-only
Location: /Users/remitrimbour/miniconda3/envs/circe/lib/python3.12/site-packages
Editable project location: /Users/remitrimbour/CIRCE/circe_package
Requires: anndata, attrs, cmake, Cython, dask, distributed, joblib, numba, numpy, pandas, rich, scanpy, scikit-learn
Required-by:
Note: you may need to restart the kernel to use updated packages.
import numpy as np
import circe as ci
import scanpy as sc
import scipy as sp
Importing (or create) data¶
Create fake AnnData¶
This data doesn’t contain strong correlation between fake regions, the score will then be low.
It will still allow us to demonstrate how to use Circe. :)
atac = sc.datasets.blobs(n_centers=8, n_variables=2_000, n_observations=300, random_state=1)
atac.X = np.random.poisson(lam=0.2, size=atac.X.shape)
cell_names = [f"cell_{i}" for i in range(1, atac.shape[0]+1)]
# number of chr_start_end region names
region_names = [[f"chr{i}_{str(j)}_{str(j+399)}"
for j in range(1, 10000*400+1, 10000)]
for i in range(1, 6)]
regions_names = [item for sublist in region_names for item in sublist]
# select a random number of regions per cell, and make it 0
for i in range(atac.shape[0]):
num_regions = np.random.randint(500, 1000)
region_indices = np.random.choice(len(regions_names), num_regions, replace=False)
atac.X[i, region_indices] = 0
atac.var_names = regions_names
atac.obs_names = cell_names
1. Preprocessing the data¶
1.1. Filtering the data¶
sc.pp.filter_genes(atac, min_cells=30)
sc.pp.filter_cells(atac, min_genes=20)
Make sure to all your features (peaks) are expressed in at least 1 of the cells:
sc.pp.filter_genes(atac, min_counts=1)
1.2. Adding region position in AnnData.obs¶
Let’s first run add_region_infos that will add coordinate annotations chr, start, end as columns in atac.var slot
atac = ci.add_region_infos(atac)
atac.var.head(3)
| n_cells | n_counts | chromosome | start | end | |
|---|---|---|---|---|---|
| chr1_20001_20400 | 31 | 32 | chr1 | 20001 | 20400 |
| chr1_30001_30400 | 38 | 45 | chr1 | 30001 | 30400 |
| chr1_40001_40400 | 32 | 32 | chr1 | 40001 | 40400 |
2. Computing pseudocells¶
Computing metacells is an interesting step to reduce sparsity in our data, since scATAC-seq data have usually a lot of dropouts.
metacells = ci.metacells.compute_metacells(atac)
You can also use your own dimensionality reduction space, that would be stored in the atac.obsm slot
atac.obsm["dim_reduction_name"] = atac.X
ci.metacells.compute_metacells(atac, dim_reduction="dim_reduction_name")
Using adata.obsm['dim_reduction_name'] to identify neighboring cells
AnnData object with n_obs × n_vars = 299 × 1644
var: 'n_cells', 'n_counts', 'chromosome', 'start', 'end'
3. Calculating co-accessibility¶
3.A. Human datasets¶
We can finally compute all the cis co-accessibility scores !
The default way is to indicate your organism if among the one known by Circe.
The atac network is stored automatically as a sparse matrix in atac.varp["atac_network"]
ci.compute_atac_network(
metacells,
njobs=3, verbose=0,
organism="human"
)
Calculating co-accessibility scores...
Concatenating results from all chromosomes...
circe_network = ci.extract_atac_links(
metacells)
3.B. Other organisms¶
If your organism is not ”human”, ”mouse” or ”drosophila”, human default values will be used for:¶
s, window_size and distance_constraint.
Distance constraint is usually half of the window size.
If you know their values, you can also specify yourself the following parameters that are organism-specific.
ci.compute_atac_network(
metacells,
njobs=3, verbose=0,
window_size=100000,
unit_distance=1000,
distance_constraint=50000,
s=0.75
)
#ci.compute_atac_network(
# metacells,
# njobs=3, verbose=0,
# organism="drosophila"
#)
3.C. Using sparse matrix¶
Circe can also work with
sparse covariance matrix.
atac.X = sp.sparse.csr_matrix(atac.X)
Connections can also be stored externally from your
AnnDataobject usingsliding_graphical_lasso.
final_score = ci.sliding_graphical_lasso(
metacells,
n_samples=50,
n_samples_maxtry=100,
max_alpha_iteration=500,
verbose=0
)
# You can then add it in a second time
metacells.varp['atac_network'] = final_score
/home/rtrimbou/ATACNet/circe/circe/circe.py:958: UserWarning:
No organism, nor value passed for the parameters: ['window_size', 'distance_constraint', 's'],
using default values.
The default values are defined from human and mouse data,
you might want to change them if you are working with
another organisms.
Default values used:
{'window_size': 500000, 'distance_constraint': 250000, 's': 0.75}
You can check how to define them in https://cole-trapnell-lab.github.io/cicero-release/docs_m3/#important-considerations-for-non-human-data.
warnings.warn(
Calculating co-accessibility scores...
/home/rtrimbou/ATACNet/circe/circe/circe.py:1114: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
chr_var.loc[:, map_indices] = np.arange(len(chr_var), dtype=np.int64)
/home/rtrimbou/ATACNet/circe/circe/circe.py:1114: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
chr_var.loc[:, map_indices] = np.arange(len(chr_var), dtype=np.int64)
/home/rtrimbou/ATACNet/circe/circe/circe.py:1114: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
chr_var.loc[:, map_indices] = np.arange(len(chr_var), dtype=np.int64)
/home/rtrimbou/ATACNet/circe/circe/circe.py:1114: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
chr_var.loc[:, map_indices] = np.arange(len(chr_var), dtype=np.int64)
/home/rtrimbou/ATACNet/circe/circe/circe.py:1114: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
chr_var.loc[:, map_indices] = np.arange(len(chr_var), dtype=np.int64)
Concatenating results from all chromosomes...
4. Extracting connections¶
4.A.Getting the whole genome cis-coaccessible network¶
You can extract connections from the atac.varp slot (sparse matrix), as a DataFrame object with extract_atac_links.
circe_network = ci.extract_atac_links(metacells) #atac)
circe_network.head(3)
| Peak1 | Peak2 | score | |
|---|---|---|---|
| 0 | chr1_2770001_2770400 | chr1_3210001_3210400 | 0.325869 |
| 1 | chr1_1010001_1010400 | chr1_1320001_1320400 | 0.304926 |
| 2 | chr4_2580001_2580400 | chr4_2750001_2750400 | 0.300668 |
4.B. Subsetting the AnnData object to a given window¶
If you’re interested in a specific genomic region only, you can also subset your anndata object on this specific window
subset_atac = ci.subset_region(metacells, "chr1", 10_000, 200_000).copy()
circe_network_subset = ci.extract_atac_links(subset_atac)
circe_network_subset.head(3)
| Peak1 | Peak2 | score | |
|---|---|---|---|
| 0 | chr1_110001_110400 | chr1_130001_130400 | 0.049829 |
| 1 | chr1_130001_130400 | chr1_180001_180400 | 0.048972 |
| 2 | chr1_80001_80400 | chr1_150001_150400 | 0.033743 |
5. Plotting co-accessibility scores in a window¶
You can pass either your anndata object or your freshly extracted dataframe into plot_connections to visualize all Circe scores.
Blue edges correspond to positive co-accessibility scores, while yellow ones are for negative scores
ci.plot_connections(
metacells,
chromosome="chr2",
start=10_000,
end=200_000,
sep=("_","_"),
abs_threshold=0.01
)
6. Finding cis-coaccessible connected modules¶
In addition to pairwise co-accessibility scores, Circe can also be used to define cis co-accessibility modules (called CCANs by Cicero), which are modules of sites that are highly co-accessible with one another. We use the Louvain community detection algorithm (Blondel et al., 2008) to find clusters of sites that tended to be co-accessible.
The function
find_ccanstakes as input aconnection data frameor theanndata objectonce again, and outputs a data frame with CCAN assignments for each input peak.You can also use
add_ccansto directly add CCAN assignments to your anndata object asanndata.var['CCAN']. The regions that aren’t assigned to a CCAN will containNone.
ccans = ci.find_ccans(circe_network, seed=0)
ccans.head(3)
Coaccessibility cutoff used: 0.02
Number of CCANs generated: 70
| Peak | CCAN | |
|---|---|---|
| 0 | chr1_2910001_2910400 | 0 |
| 1 | chr1_3360001_3360400 | 0 |
| 2 | chr1_3400001_3400400 | 0 |
metacells = ci.add_ccans(metacells)
metacells.var.head(3)
Coaccessibility cutoff used: 0.03
| n_cells | n_counts | chromosome | start | end | CCAN | |
|---|---|---|---|---|---|---|
| chr1_20001_20400 | 31 | 32 | chr1 | 20001 | 20400 | None |
| chr1_30001_30400 | 38 | 45 | chr1 | 30001 | 30400 | None |
| chr1_40001_40400 | 32 | 32 | chr1 | 40001 | 40400 | None |
metacells.var['CCAN'].value_counts().head(3)
CCAN
35 52
37 42
8 36
Name: count, dtype: int64
With default coaccess_cutoff_override=None, the function will determine and report an appropriate co-accessibility score cutoff value for CCAN generation based on the number of overall CCANs at varying cutoffs. You can also set coaccess_cutoff_override to be a numeric between 0 and 1, to override the cutoff-finding part of the function. This option is useful if you feel that the cutoff found automatically was too strict or loose, or for speed if you are rerunning the code and know what the cutoff will be, since the cutoff finding procedure can be slow.
ccans = ci.find_ccans(circe_network, seed=0, coaccess_cutoff_override=1e-7)
Coaccessibility cutoff used: 1e-07
Number of CCANs generated: 37
6.B. Plotting only a CCAN module¶
If you’re interested in a specific CCAN module, you can use circe.draw.plot_ccan function, precising its CCAN name. Only the connections between regions belonging to this CCAN module will be plotted.
ci.draw.plot_ccan(
metacells,
ccan_module=metacells.var['CCAN'].value_counts().index[1],
sep=('_', '_'),
abs_threshold=0,
figsize=(15,5),
only_positive=False)
This CCAN module is on the chromosome: chr3
/home/rtrimbou/ATACNet/circe/circe/draw.py:244: UserWarning: varp parameter is ignored for DataFrame input.
warnings.warn(
7.Coordinates overlap between co-accessible regions and gene bodies¶
To better understand the role of specific regions and modules, you can additionally plot gene bodies falling into your window of interest.
You need first to load gene coordinates as a dataframe, or to download them through circe.downloads.download_genes. This functions will require to install the pybiomart package Then you can use ci.draw.plot_connections_genes, using gene infos and either your AnnData object or the dataframe of Circe’s results to compare these locations.
pip install pybiomart
Requirement already satisfied: pybiomart in /home/rtrimbou/miniconda3/envs/circe_local/lib/python3.10/site-packages (0.2.0)
Requirement already satisfied: future in /home/rtrimbou/miniconda3/envs/circe_local/lib/python3.10/site-packages (from pybiomart) (1.0.0)
Requirement already satisfied: pandas in /home/rtrimbou/miniconda3/envs/circe_local/lib/python3.10/site-packages (from pybiomart) (2.2.3)
Requirement already satisfied: requests in /home/rtrimbou/miniconda3/envs/circe_local/lib/python3.10/site-packages (from pybiomart) (2.32.3)
Requirement already satisfied: requests-cache in /home/rtrimbou/miniconda3/envs/circe_local/lib/python3.10/site-packages (from pybiomart) (1.2.1)
Requirement already satisfied: numpy>=1.22.4 in /home/rtrimbou/miniconda3/envs/circe_local/lib/python3.10/site-packages (from pandas->pybiomart) (1.26.4)
Requirement already satisfied: python-dateutil>=2.8.2 in /home/rtrimbou/miniconda3/envs/circe_local/lib/python3.10/site-packages (from pandas->pybiomart) (2.9.0.post0)
Requirement already satisfied: pytz>=2020.1 in /home/rtrimbou/miniconda3/envs/circe_local/lib/python3.10/site-packages (from pandas->pybiomart) (2025.1)
Requirement already satisfied: tzdata>=2022.7 in /home/rtrimbou/miniconda3/envs/circe_local/lib/python3.10/site-packages (from pandas->pybiomart) (2025.1)
Requirement already satisfied: charset_normalizer<4,>=2 in /home/rtrimbou/miniconda3/envs/circe_local/lib/python3.10/site-packages (from requests->pybiomart) (3.4.1)
Requirement already satisfied: idna<4,>=2.5 in /home/rtrimbou/miniconda3/envs/circe_local/lib/python3.10/site-packages (from requests->pybiomart) (3.10)
Requirement already satisfied: urllib3<3,>=1.21.1 in /home/rtrimbou/miniconda3/envs/circe_local/lib/python3.10/site-packages (from requests->pybiomart) (2.2.1)
Requirement already satisfied: certifi>=2017.4.17 in /home/rtrimbou/miniconda3/envs/circe_local/lib/python3.10/site-packages (from requests->pybiomart) (2024.12.14)
Requirement already satisfied: attrs>=21.2 in /home/rtrimbou/miniconda3/envs/circe_local/lib/python3.10/site-packages (from requests-cache->pybiomart) (25.1.0)
Requirement already satisfied: cattrs>=22.2 in /home/rtrimbou/miniconda3/envs/circe_local/lib/python3.10/site-packages (from requests-cache->pybiomart) (24.1.2)
Requirement already satisfied: platformdirs>=2.5 in /home/rtrimbou/miniconda3/envs/circe_local/lib/python3.10/site-packages (from requests-cache->pybiomart) (4.3.6)
Requirement already satisfied: url-normalize>=1.4 in /home/rtrimbou/miniconda3/envs/circe_local/lib/python3.10/site-packages (from requests-cache->pybiomart) (1.4.3)
Requirement already satisfied: exceptiongroup>=1.1.1 in /home/rtrimbou/miniconda3/envs/circe_local/lib/python3.10/site-packages (from cattrs>=22.2->requests-cache->pybiomart) (1.2.2)
Requirement already satisfied: typing-extensions!=4.6.3,>=4.1.0 in /home/rtrimbou/miniconda3/envs/circe_local/lib/python3.10/site-packages (from cattrs>=22.2->requests-cache->pybiomart) (4.12.2)
Requirement already satisfied: six>=1.5 in /home/rtrimbou/miniconda3/envs/circe_local/lib/python3.10/site-packages (from python-dateutil>=2.8.2->pandas->pybiomart) (1.17.0)
Note: you may need to restart the kernel to use updated packages.
import circe.downloads
genes_df = ci.downloads.download_genes()
import matplotlib.pyplot as plt
fig, ax = plt.subplots(1, figsize = (20, 6))
ci.draw.plot_connections_genes(
connections=metacells,
genes=genes_df,
chromosome="chr1",
start=50_000,
end=350_000,
gene_spacing=30_000,
y_lim_top=-0.02,
abs_threshold=0.0,
track_spacing=0.01,
track_width=0.01,
legend=True,
ax=ax
)
7. Work in progress! Happy to get feedbacks :)¶
If you feel any function could be useful for you on others, don’t hesitate to write me at remi.trimbour@pasteur.fr or to submit an issue on github.com/cantinilab/Circe.