1. Finding co-accessible modules (CCAN)

pip show circe-py
Name: circe-py
Version: 0.4.0
Summary: Circe: Package for building co-accessibility networks from ATAC-seq data.
Home-page: https://github.com/cantinilab/circe
Author: 
Author-email: Rémi Trimbour <remi.trimbour@pasteur.fr>
License: GPL-3.0-only
Location: /Users/remitrimbour/miniconda3/envs/circe/lib/python3.12/site-packages
Editable project location: /Users/remitrimbour/CIRCE/circe_package
Requires: anndata, attrs, cmake, Cython, dask, distributed, joblib, numba, numpy, pandas, rich, scanpy, scikit-learn
Required-by: 
Note: you may need to restart the kernel to use updated packages.
import numpy as np
import circe as ci
import scanpy as sc
import scipy as sp

Importing (or create) data

Create fake AnnData

This data doesn’t contain strong correlation between fake regions, the score will then be low.

It will still allow us to demonstrate how to use Circe. :)

atac = sc.datasets.blobs(n_centers=8, n_variables=2_000, n_observations=300, random_state=1)
atac.X = np.random.poisson(lam=0.2, size=atac.X.shape)
cell_names = [f"cell_{i}" for i in range(1, atac.shape[0]+1)]
# number of chr_start_end region names
region_names = [[f"chr{i}_{str(j)}_{str(j+399)}" 
                 for j in range(1, 10000*400+1, 10000)] 
                for i in range(1, 6)]
regions_names = [item for sublist in region_names for item in sublist]
# select a random number of regions per cell, and make it 0
for i in range(atac.shape[0]):
    num_regions = np.random.randint(500, 1000)
    region_indices = np.random.choice(len(regions_names), num_regions, replace=False)
    atac.X[i, region_indices] = 0

atac.var_names = regions_names
atac.obs_names = cell_names

1. Preprocessing the data

1.1. Filtering the data

sc.pp.filter_genes(atac, min_cells=30)
sc.pp.filter_cells(atac, min_genes=20)

Make sure to all your features (peaks) are expressed in at least 1 of the cells:

sc.pp.filter_genes(atac, min_counts=1)

1.2. Adding region position in AnnData.obs

Let’s first run add_region_infos that will add coordinate annotations chr, start, end as columns in atac.var slot

atac = ci.add_region_infos(atac)
atac.var.head(3)
n_cells n_counts chromosome start end
chr1_20001_20400 31 32 chr1 20001 20400
chr1_30001_30400 38 45 chr1 30001 30400
chr1_40001_40400 32 32 chr1 40001 40400

2. Computing pseudocells

Computing metacells is an interesting step to reduce sparsity in our data, since scATAC-seq data have usually a lot of dropouts.

metacells = ci.metacells.compute_metacells(atac)

You can also use your own dimensionality reduction space, that would be stored in the atac.obsm slot

atac.obsm["dim_reduction_name"] = atac.X
ci.metacells.compute_metacells(atac, dim_reduction="dim_reduction_name")
Using adata.obsm['dim_reduction_name'] to identify neighboring cells

AnnData object with n_obs × n_vars = 299 × 1644
    var: 'n_cells', 'n_counts', 'chromosome', 'start', 'end'

3. Calculating co-accessibility

3.A. Human datasets

We can finally compute all the cis co-accessibility scores !
The default way is to indicate your organism if among the one known by Circe.
The atac network is stored automatically as a sparse matrix in atac.varp["atac_network"]

ci.compute_atac_network(
    metacells,
    njobs=3, verbose=0,
    organism="human"
)


Calculating co-accessibility scores...
Concatenating results from all chromosomes...
circe_network = ci.extract_atac_links(
    metacells)

3.B. Other organisms

If your organism is not ”human”, ”mouse” or ”drosophila”, human default values will be used for:

s, window_size and distance_constraint. Distance constraint is usually half of the window size.

If you know their values, you can also specify yourself the following parameters that are organism-specific.

ci.compute_atac_network(
    metacells,
    njobs=3, verbose=0,
    window_size=100000,
    unit_distance=1000,
    distance_constraint=50000,
    s=0.75
)

#ci.compute_atac_network(
#    metacells,
#    njobs=3, verbose=0,
#    organism="drosophila"
#)

3.C. Using sparse matrix

  1. Circe can also work with sparse covariance matrix.

atac.X = sp.sparse.csr_matrix(atac.X)
  1. Connections can also be stored externally from your AnnData object using sliding_graphical_lasso.

final_score = ci.sliding_graphical_lasso(
    metacells,
    n_samples=50,
    n_samples_maxtry=100,
    max_alpha_iteration=500,
    verbose=0
)

# You can then add it in a second time
metacells.varp['atac_network'] = final_score
/home/rtrimbou/ATACNet/circe/circe/circe.py:958: UserWarning: 
                No organism, nor value passed for the parameters: ['window_size', 'distance_constraint', 's'],
                using default values.
                The default values are defined from human and mouse data,
                you might want to change them if you are working with
                another organisms.

                Default values used:
                {'window_size': 500000, 'distance_constraint': 250000, 's': 0.75}

                You can check how to define them in https://cole-trapnell-lab.github.io/cicero-release/docs_m3/#important-considerations-for-non-human-data.
                
  warnings.warn(


Calculating co-accessibility scores...
/home/rtrimbou/ATACNet/circe/circe/circe.py:1114: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  chr_var.loc[:, map_indices] = np.arange(len(chr_var), dtype=np.int64)
/home/rtrimbou/ATACNet/circe/circe/circe.py:1114: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  chr_var.loc[:, map_indices] = np.arange(len(chr_var), dtype=np.int64)
/home/rtrimbou/ATACNet/circe/circe/circe.py:1114: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  chr_var.loc[:, map_indices] = np.arange(len(chr_var), dtype=np.int64)
/home/rtrimbou/ATACNet/circe/circe/circe.py:1114: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  chr_var.loc[:, map_indices] = np.arange(len(chr_var), dtype=np.int64)
/home/rtrimbou/ATACNet/circe/circe/circe.py:1114: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  chr_var.loc[:, map_indices] = np.arange(len(chr_var), dtype=np.int64)
Concatenating results from all chromosomes...

4. Extracting connections

4.A.Getting the whole genome cis-coaccessible network

You can extract connections from the atac.varp slot (sparse matrix), as a DataFrame object with extract_atac_links.

circe_network = ci.extract_atac_links(metacells) #atac)
circe_network.head(3)
Peak1 Peak2 score
0 chr1_2770001_2770400 chr1_3210001_3210400 0.325869
1 chr1_1010001_1010400 chr1_1320001_1320400 0.304926
2 chr4_2580001_2580400 chr4_2750001_2750400 0.300668

4.B. Subsetting the AnnData object to a given window

If you’re interested in a specific genomic region only, you can also subset your anndata object on this specific window

subset_atac = ci.subset_region(metacells, "chr1", 10_000, 200_000).copy()
circe_network_subset = ci.extract_atac_links(subset_atac)
circe_network_subset.head(3)
Peak1 Peak2 score
0 chr1_110001_110400 chr1_130001_130400 0.049829
1 chr1_130001_130400 chr1_180001_180400 0.048972
2 chr1_80001_80400 chr1_150001_150400 0.033743

5. Plotting co-accessibility scores in a window

You can pass either your anndata object or your freshly extracted dataframe into plot_connections to visualize all Circe scores.
Blue edges correspond to positive co-accessibility scores, while yellow ones are for negative scores

ci.plot_connections(
    metacells,
    chromosome="chr2",
    start=10_000,
    end=200_000,
    sep=("_","_"),
    abs_threshold=0.01
)
../_images/46c32f60fba2b37833d55fdf9f64eeff541a2a4b346a5a71bcb3cb405e9c9d07.png

6. Finding cis-coaccessible connected modules

In addition to pairwise co-accessibility scores, Circe can also be used to define cis co-accessibility modules (called CCANs by Cicero), which are modules of sites that are highly co-accessible with one another. We use the Louvain community detection algorithm (Blondel et al., 2008) to find clusters of sites that tended to be co-accessible.

  • The function find_ccans takes as input a connection data frame or the anndata object once again, and outputs a data frame with CCAN assignments for each input peak.

  • You can also use add_ccans to directly add CCAN assignments to your anndata object as anndata.var['CCAN']. The regions that aren’t assigned to a CCAN will contain None.

ccans = ci.find_ccans(circe_network, seed=0)
ccans.head(3)
Coaccessibility cutoff used: 0.02
Number of CCANs generated: 70
Peak CCAN
0 chr1_2910001_2910400 0
1 chr1_3360001_3360400 0
2 chr1_3400001_3400400 0
metacells = ci.add_ccans(metacells)
metacells.var.head(3)
Coaccessibility cutoff used: 0.03
n_cells n_counts chromosome start end CCAN
chr1_20001_20400 31 32 chr1 20001 20400 None
chr1_30001_30400 38 45 chr1 30001 30400 None
chr1_40001_40400 32 32 chr1 40001 40400 None
metacells.var['CCAN'].value_counts().head(3)
CCAN
35    52
37    42
8     36
Name: count, dtype: int64

With default coaccess_cutoff_override=None, the function will determine and report an appropriate co-accessibility score cutoff value for CCAN generation based on the number of overall CCANs at varying cutoffs. You can also set coaccess_cutoff_override to be a numeric between 0 and 1, to override the cutoff-finding part of the function. This option is useful if you feel that the cutoff found automatically was too strict or loose, or for speed if you are rerunning the code and know what the cutoff will be, since the cutoff finding procedure can be slow.

ccans = ci.find_ccans(circe_network, seed=0, coaccess_cutoff_override=1e-7)
Coaccessibility cutoff used: 1e-07
Number of CCANs generated: 37

6.B. Plotting only a CCAN module

If you’re interested in a specific CCAN module, you can use circe.draw.plot_ccan function, precising its CCAN name. Only the connections between regions belonging to this CCAN module will be plotted.

ci.draw.plot_ccan(
    metacells,
    ccan_module=metacells.var['CCAN'].value_counts().index[1],
    sep=('_', '_'),
    abs_threshold=0,
    figsize=(15,5),
    only_positive=False)
This CCAN module is on the chromosome: chr3
/home/rtrimbou/ATACNet/circe/circe/draw.py:244: UserWarning: varp parameter is ignored for DataFrame input.
  warnings.warn(
../_images/b868aaccc1cee18df05ef3e3963873c79a7e40a9b11f21479bd3dcf6c5b84b96.png

7.Coordinates overlap between co-accessible regions and gene bodies

To better understand the role of specific regions and modules, you can additionally plot gene bodies falling into your window of interest.
You need first to load gene coordinates as a dataframe, or to download them through circe.downloads.download_genes. This functions will require to install the pybiomart package Then you can use ci.draw.plot_connections_genes, using gene infos and either your AnnData object or the dataframe of Circe’s results to compare these locations.

pip install pybiomart
Requirement already satisfied: pybiomart in /home/rtrimbou/miniconda3/envs/circe_local/lib/python3.10/site-packages (0.2.0)
Requirement already satisfied: future in /home/rtrimbou/miniconda3/envs/circe_local/lib/python3.10/site-packages (from pybiomart) (1.0.0)
Requirement already satisfied: pandas in /home/rtrimbou/miniconda3/envs/circe_local/lib/python3.10/site-packages (from pybiomart) (2.2.3)
Requirement already satisfied: requests in /home/rtrimbou/miniconda3/envs/circe_local/lib/python3.10/site-packages (from pybiomart) (2.32.3)
Requirement already satisfied: requests-cache in /home/rtrimbou/miniconda3/envs/circe_local/lib/python3.10/site-packages (from pybiomart) (1.2.1)
Requirement already satisfied: numpy>=1.22.4 in /home/rtrimbou/miniconda3/envs/circe_local/lib/python3.10/site-packages (from pandas->pybiomart) (1.26.4)
Requirement already satisfied: python-dateutil>=2.8.2 in /home/rtrimbou/miniconda3/envs/circe_local/lib/python3.10/site-packages (from pandas->pybiomart) (2.9.0.post0)
Requirement already satisfied: pytz>=2020.1 in /home/rtrimbou/miniconda3/envs/circe_local/lib/python3.10/site-packages (from pandas->pybiomart) (2025.1)
Requirement already satisfied: tzdata>=2022.7 in /home/rtrimbou/miniconda3/envs/circe_local/lib/python3.10/site-packages (from pandas->pybiomart) (2025.1)
Requirement already satisfied: charset_normalizer<4,>=2 in /home/rtrimbou/miniconda3/envs/circe_local/lib/python3.10/site-packages (from requests->pybiomart) (3.4.1)
Requirement already satisfied: idna<4,>=2.5 in /home/rtrimbou/miniconda3/envs/circe_local/lib/python3.10/site-packages (from requests->pybiomart) (3.10)
Requirement already satisfied: urllib3<3,>=1.21.1 in /home/rtrimbou/miniconda3/envs/circe_local/lib/python3.10/site-packages (from requests->pybiomart) (2.2.1)
Requirement already satisfied: certifi>=2017.4.17 in /home/rtrimbou/miniconda3/envs/circe_local/lib/python3.10/site-packages (from requests->pybiomart) (2024.12.14)
Requirement already satisfied: attrs>=21.2 in /home/rtrimbou/miniconda3/envs/circe_local/lib/python3.10/site-packages (from requests-cache->pybiomart) (25.1.0)
Requirement already satisfied: cattrs>=22.2 in /home/rtrimbou/miniconda3/envs/circe_local/lib/python3.10/site-packages (from requests-cache->pybiomart) (24.1.2)
Requirement already satisfied: platformdirs>=2.5 in /home/rtrimbou/miniconda3/envs/circe_local/lib/python3.10/site-packages (from requests-cache->pybiomart) (4.3.6)
Requirement already satisfied: url-normalize>=1.4 in /home/rtrimbou/miniconda3/envs/circe_local/lib/python3.10/site-packages (from requests-cache->pybiomart) (1.4.3)
Requirement already satisfied: exceptiongroup>=1.1.1 in /home/rtrimbou/miniconda3/envs/circe_local/lib/python3.10/site-packages (from cattrs>=22.2->requests-cache->pybiomart) (1.2.2)
Requirement already satisfied: typing-extensions!=4.6.3,>=4.1.0 in /home/rtrimbou/miniconda3/envs/circe_local/lib/python3.10/site-packages (from cattrs>=22.2->requests-cache->pybiomart) (4.12.2)
Requirement already satisfied: six>=1.5 in /home/rtrimbou/miniconda3/envs/circe_local/lib/python3.10/site-packages (from python-dateutil>=2.8.2->pandas->pybiomart) (1.17.0)
Note: you may need to restart the kernel to use updated packages.
import circe.downloads
genes_df = ci.downloads.download_genes()
import matplotlib.pyplot as plt
fig, ax = plt.subplots(1, figsize = (20, 6))
ci.draw.plot_connections_genes(
    connections=metacells,
    genes=genes_df,
    chromosome="chr1",
    start=50_000,
    end=350_000,
    gene_spacing=30_000,
    y_lim_top=-0.02,
    abs_threshold=0.0,
    track_spacing=0.01,
    track_width=0.01,
    legend=True,
    ax=ax
)
../_images/7ab4f153282a48390ec2e3207fd6a34b692b76baddefb8a62551f27b13b3c7d5.png

7. Work in progress! Happy to get feedbacks :)

If you feel any function could be useful for you on others, don’t hesitate to write me at remi.trimbour@pasteur.fr or to submit an issue on github.com/cantinilab/Circe.