Epigenetic memory of colitis promotes tumour growth - Nature

Animals and cell lines

Animal work

Mouse (Mus musculus) strains C57BL/6J (strain no. 000664) and Cdx2:CreER^T2;APC^fl/fl;Kras^WT (strain no. 035169)⁵⁰ were obtained from The Jackson Laboratory. Mice were housed at room temperature and ambient humidity in individually ventilated cages at a maximum density of five mice per cage with ad libitum access to food and water in a specific-pathogen-free facility accredited by the Association and Accreditation of Laboratory Animal Committee. Cages contained Anderson’s Bed-o’Cob bedding (The Anderson Inc.), one nestlet (Ancare, 2 × 2-inch² compressed cotton squares) and a red mouse hut (Bioserv). The colony room was kept on a 12 h–12 h light–dark cycle. All animal handling and experiments were conducted in accordance with procedures approved by the Institutional Animal Care and Use Committee at Harvard University (protocol no. 19-10-362). For tumour formation experiments, euthanasia criteria were weight loss of more than 20%, persistent grossly bloody stool for greater than or equal to 3 days and/or excessively lethargic or moribund state, as determined by veterinary care. These criteria were not exceeded in any experiments.

Cell culture

Human embryonic kidney 293T (HEK293T) cells (American Type Culture Collection (ATCC), CRL-3216; authenticated by short tandem repeat profiling and tested for mycoplasma by ATCC) were grown in DMEM (Thermo, 11965-092) with 10% fetal bovine serum (FBS) and 1% penicillin–streptomycin. Cells were incubated at 37 °C in 5% CO₂ and maintained in exponential phase.

Mouse organoid derivation and culture

Colitis organoids were derived from whole colonic tissue 11 days following cessation of the third cycle of DSS. Animals were anaesthetized with 2,2,2-tribromoethanol (Sigma, T48402-25G) and cardiac perfusion was performed with PBS to remove peripheral immune cells. Epithelium was removed by incubating colonic tissue in EDTA solution (section ‘Colon tissue processing and cell sorting’ below) supplemented with 100 μg ml⁻¹ primocin (Invitrogen, ant-pm-05) for 20–30 min and scraping the luminal surface with a glass slide. Epithelial fragments were washed once with Advanced DMEM/F12 (ADMEM) and resuspended in Crypt Basal (ADMEM, 10 mM HEPES, 1× GlutaMax (Thermo, 35050061), 1× Pen-Strep (Thermo, 15140122), 1× N2 Supplement (Thermo, 17502048), 1× B27 Supplement (Thermo, 17504044), 1 mM N-acetylcysteine (Sigma, A9165-5G)) before mixing with an equal volume of Matrigel (Corning, 47743-722). Crypts were plated as roughly 30-l domes in a six-well plate and allowed 10–15 min to polymerize. Colon organoids were grown in WENR media: 50% ENR (Crypt Basal with 50 ng ml⁻¹ epidermal growth factor (EGF) (Thermo, PMG8041), 100 ng ml⁻¹ Noggin (Peprotech, 250-38), 1:100 Rspondin conditioned media) and 50% Wnt conditioned media. Conditioned media was generated in-house from L WNT3A cells (ATCC, CRL 2647) or HA-R-Spondin1-Fc 293T Cells (R&D, 3710-001-01). Organoids were passaged every 7–10 days by mild dissociation in TrypLE for 8–10 min, triturating every 4–5 min and quenching with 10% FBS in ADMEM. When collected for SHARE-seq, organoids were dissociated to near single cell for 15 min in TrypLE and quenched before treating with 1:100 recombinant DNase (Roche, 04716728001) in ADMEM at room temperature for 5 min to reduce dead cell DNA contamination. Cells were then washed and frozen in CryoStor at −80 °C before SHARE-seq.

Human organoid derivation and culture

Human organoid lines were derived from de-identified biopsies from grossly unaffected tissue in patients undergoing endoscopy at Boston Children’s Hospital. Informed consent and developmentally appropriate assent were obtained at Boston Children’s Hospital from the donors’ guardian and the donor, respectively. All methods were approved and carried out in accordance with the Institutional Review Board of Boston Children’s Hospital (Protocol number IRB-P00000529).

Organoids were derived from biopsies as previously described in ref. ⁵⁸. Briefly, intestinal crypts were isolated from frozen tissue and then resuspended and plated in 40-μl Matrigel domes. Once established, human rectal organoids were sustained in specialized growth media that has been previously described⁵⁸. Media changes occurred every 2 days during expansion, with organoids being passaged once every 6–8 days as necessary. To induce differentiation, organoids were grown in growth media for 2 days postpassage to allow for stem cell expansion; after which, the organoids were transitioned to differentiation media. Media was changed every 2 days for the length of the experiment, with organoids being collected for analysis after a total of 10 days.

Experimental procedures

Colitis induction

Male mice aged between 8 weeks and 15 weeks were administered dextran sulfate sodium (VWR, IC16011080) in drinking water at 1–1.5% final concentration to induce chronic colitis. Animals were weighed every day during DSS administration and every 2–3 days during rest periods. On the fourth day of each DSS administration, stool was tested for occult blood (VWR, 10012-002) to ensure successful induction of colitis. DSS concentrations were reduced if excess disease severity was observed through any of the following metrics: frank blood in the stool at any point, weight of loss of more than 10% before the ninth day of any cycle, failure to recover back to 90% of starting weight before the next cycle or poor body condition.

Acute injury timepoints were collected 3 days after the end of the first DSS cycle (day 11), chronic injury 9–11 days after the third cycle (days 51–53) and recovery 21–22 days after the third cycle (days 63–64).

Colon tissue processing and cell sorting

For in vivo colitis memory SHARE-seq experiments, animals were anaesthetized and perfused as previously described. Entire colons were dissected, lumens were exposed and tissue was transferred to EDTA Dissociation Solution (10% FBS, 4 mM EDTA, 10 mM HEPES in PBS). Following rotation for 20–30 min at room temperature, epithelium was coarsely removed by scraping the luminal surface with a glass slide and remaining muscle and submucosal were crudely chopped with scissors. Both epithelial and tissue fragments were then dissociated to single cells in ADMEM (Fisher, 12-634-028) with 10 mM HEPES, 0.4 mg ml⁻¹ collagenase (Millipore, C9263-25MG), 1.25 U ml⁻¹ dispase (Millipore, D4693-1G), 1 U ml⁻¹ DNase (Worthington Biochemical, LS002004) and 5 μM Y-27632 (R&D, 1254). Cells were washed with 0.1% BSA/PBS, stained with Calcein Red-AM (BioLegend, 425205) then with antibodies for EPCAM 1:100 (Fisher, 501129753), CD45 1:100 (BioLegend, 103116), Ly6g 1:100 (BioLegend, 127605) and SiglecF 1:100 (BioLegend, 155503). 4,6-diamidino-2-phenylindole (DAPI) (Fisher, 62248) dead cell staining was performed before sorting.

Stained cells were sorted on a BD FACSAria for epithelial (EPCAM⁺CD45⁻) and non-granulocyte (EPCAM⁻CD45⁺LY6G⁻SiglecF⁻) populations into ADMEM with 0.2% BSA, 0.1 U μl⁻¹ Enzymatics RNase inhibitor (Qiagen, Y9240L) and 15 μM Y-27632. For extended recovery timepoints (50 days and longer), only EPCAM⁺ cells were sorted. Cells were pelleted, resuspended in CryoStor CS10 (StemCell Technologies, 07959) and stored at −80 °C.

Histology and colitis scoring

Animals were anaesthetized and perfused as described above. After colonic tissue was dissected and the luminal surface was exposed, Swiss rolls or tissue fragments were fixed in 4% PFA and PBS overnight at 4 °C and then placed in 70% ethanol previous dehydration and paraffin embedding. Haematoxylin–eosin (H&E) staining was performed for general histology evaluation.

Colitis scoring was performed as described in ref. ⁵⁹ with researchers blinded to sample identity. Immune infiltration was scored as follows: mucosa, 0, normal; 1, mildly increased immune infiltrate; 2, modest infiltration; and 3, severe infiltration; submucosa, 0, normal; 1, mild to modest immune infiltration; and 2, severe infiltration; and muscularis, 0 normal and 1, modest to severe.

Immunohistochemistry was performed for CD45 (anti-CD45 1:500, Abcam, ab10558; anti-rabbit secondary 1:1,000, Vectastain Elite ABC, PK-6101) and the total CD45⁺ cells were counted in the mucosa and submucosa in all images before normalizing to total tissue area assessed. Researchers were blinded to conditions during imaging and quantification.

Immunofluorescence

Tissue was extracted, fixed with 4% PFA and PBS and subsequently cryogenically protected in 30% sucrose and PBS before optimal cutting temperature (OCT) compound embedding. Following sectioning, tissue was washed with PBS then permeabilized and blocked for 1 h in PBS with 3% normal donkey serum (Jackson Immuno, 017-000-121) and 0.5% Triton X-100. Sections were incubated overnight at 4 °C in Antibody Diluent (PBS with 1% NDS, 0.3% Triton X-100) with primary antibodies for EPCAM 1:500 (Abcam, ab213500), FOS 1:5,000 (Synaptic System, 226308), FOSB 1:800 (Fisher Scientific, PIMA515056), FOSL1 1:500 (Fisher Scientific, PIPA5115252), FOSL2 1:200 (Sigma, HPA004817) and/or CD44 1:100 (BioLegend, 103001). Excess antibody was removed with three PBS washes and secondary antibodies (Jackson Immuno, 712-605-15, 706-586-148, 711-545-152, 715-546-150) were added 1:500 in Antibody Diluent. Following 1–4 h of incubation at room temperature, excess antibody was washed away with two PBS washes. Nuclei were counterstained with DAPI and slides were mounted with Prolong Gold (Thermo, P36934). Imaging was performed on a Andor CR-DFLY-201-40 confocal spinning disc coupled to a Nikon Ti-E microscope. For FOS⁺CD44⁺ costaining analysis, FOS positivity in crypt epithelial CD44⁺ cells was measured and researchers were blinded during both imaging and quantification.

SHARE-seq

SHARE-seq was performed with minor modifications to the protocol described in ref. ²⁴ (https://www.protocols.io/view/share-seq-v1-6qpvrdexpgmk/v1). For sorted cell and organoid experiments, frozen cells in CryoStor were briefly thawed (roughly 2 min) at room temperature before diluting with ice-cold PBS supplemented with 0.04% BSA, 0.1 U μl⁻¹ Enzymatics RNase Inhibitor and 0.05 U μl⁻¹ SUPERase RNase inhibitor (Thermo, AM2696). Cells were pelleted and supernatant was discarded before lysis with hypotonic lysis buffer (HLB), which is H-RSB (10 mM HEPES, 10 mM NaCl, 3 mM MgCl₂) with 0.1% NP-40 (Thermo, 28324), 0.04% BSA, 0.1 U μl⁻¹ Enzymatics RNase Inhibitor and 0.05 U μl⁻¹ SUPERase RNase inhibitor. Following 5 min of incubation on ice, buffer was diluted with HDT-2RI (H-RSB, 0.04% BSA, 0.1% Tween-20, 0.01% digitonin (Thermo, 300410), 0.1 U μl⁻¹ Enzymatics RNase Inhibitor and 0.05 U μl⁻¹ SUPERase RNase inhibitor) and nuclei were pelleted. Supernatant was discarded and nuclei were resuspended in HDT-2RI at a density of 1 M ml⁻¹ and fixed with 0.2% formaldehyde for 5 min at room temperature. Fixation was quenched with 140 mM glycine, 50 mM Tris pH 8.0 and 0.1% BSA on ice for 5 min. Fixed nuclei were washed once with HDT-2RI, once without SUPERase RNase inhibitor and stored at −80 °C until SHARE-seq was performed.

For adenoma tissue, nuclei were isolated from OCT-embedded tissue. Two to four 40-μm sections were collected from each tissue block, then excess peripheral OCT was removed and sections were placed into 1.5-ml tubes on dry ice, not allowing the tissue to thaw. Tubes were allowed to briefly warm before resuspending in 200 μl of H-RSB with 0.1% NP-40, 0.04% BSA, Enzymatics RNase inhibitor and SUPERase RNase inhibitor. Tissue was dissociated by triturating with a P1000 for 20 strokes then a P200 for 80 strokes. Nuclei were diluted with HDT-2RI, pelleted, resuspended in 500 μl of HDT-RI and filtered through a 40-μm filter (Millipore, BAH136800040-50EA) to remove large fragments of undissociated tissue. Nuclei were then fixed as described above.

Fixed nuclei were transposed as previously described in ref. ²⁴ with Protease Inhibitor Cocktail (Sigma, P8340) and 0.1% NP-40. Reverse transcription was performed as described in ref. ²⁴ except here we used 1× Smart-seq3 Buffer (40 mM DTT, 125 mM Tris pH 8.0, 5 mM GTP, 150 mM NaCl, 12.5 mM MgCl₂) in place of Maxima RT Buffer. Washes and split-pool barcoding were performed with 0.1% Tween-20 and 0.01% digitonin instead of NP-40. Sublibrary generation, reverse crosslinking, complementary DNA (cDNA) pulldown and assay for transposase-accessible chromatin (ATAC) library preparation were all performed as previously described. Template switching was performed with 1× Smart-seq3 Buffer in place of Maxima RT Buffer. cDNA amplification and tagmentation were performed as previously described.

Organoid proliferation and AP-1 inhibition

Organoids were passaged once before AP-1 inhibition to remove dead or dying cells from primary plating. For acute AP-1 inhibition, after 3 days following passaging, the media was supplemented with 10 μM T-5224 (MedChemExpress, HY-12270) or an equal volume of dimethylsulfoxide (DMSO). After 24 h, 10 μM 5-ethynyl-2′-deoxyuridine (EdU) was added for 3 h before cell dissociation as described above. A portion of cells were banked for ATAC-seq and footprinting while the rest were fixed with 4% PFA/PBS for 10 min at room temperature, washed with PBS then permeabilized with 0.5% Triton X-100 in PBS. Following two washes with 3% BSA and PBS, EdU staining was performed with Click-iT EdU Assay kit (Life Tech, C10340) and cells were counterstained with DAPI. Percentage EdU was measured on an Attune CytPix cytometer as EdU⁺ cells over total DAPI⁺ cells. For baseline EdU differences between colitis and control organoids, EdU assays were performed at 9 days of culture.

For washout experiments, organoids were expanded for one passage to purify cultures before treatment with 10 μM T-5224 for 5 days, refreshing drug after the first 2 days. Cultures were routinely maintained for an extra 20 days as described above. Organoids were imaged using an EVOS M5000 at ×10 magnification. Organoid size was quantified using CellProfiler in which individual organoids were manually selected in the ‘MeasureObjectSizeShape’ module and the ‘Estimated Diameter Size’ was used. A two-sided Wilcoxon rank-sum test was used across all organoids quantified to compare treatment conditions.

Barcode vector cloning and library construction

The pLARRY empty vector (Addgene no. 140025) was first modified to insert a TruSeq sequencing adaptor (ACACTCTTTCCCTACACGACGCTCTTCCGATCT) upstream of the barcode insertion site to allow for direct amplification. A further sequence, including a mouse U1 hairpin, was introduced downstream of the barcode site to promote nuclear translocation of RNA transcripts and more efficient SHARE-seq capture. For nuclear localization validation, lentivirus was generated (using the method below), HEK293T cells were infected and sorted for a pure green fluorescent protein positive (GFP⁺) culture. Plasmids will be deposited in Addgene on publication.

Fluorescence in situ hybridization was performed using anti-GFP probes (LGC Biosearch, VSMF-1014-5) and imaging was performed on an Andor CR-DFLY-201-40 confocal spinning disc coupled to a Nikon Ti-E microscope.

For constructing barcode libraries, the following oligonucleotides were ordered from IDT:

Forward oligo: CCTATAGTGAGTCGTATTAGAGACATNNNNCTNNNNACNNNNTCNNNNGTNNNNTGNNNNCANNNNATNNNNGCATCATCAAGATCGGAAGAGCGTCGTG

Reverse oligo: CACGACGCTCTTCCGATCTTGATG

The two oligos were annealed with the following program: 95 °C for 5 min; 58 cycles of: 95 °C for 1 min, then −1 °C per cycle; 37 °C hold.

Double-stranded barcode inserts were then generated. Extension was performed by adding 1 U μl⁻¹ Exo-Klenow (NEB, M0212L) and 1 mM dNTPs and incubating at 37 °C for 2 h, followed by enzyme inactivation at 75 °C for 20 min. The resulting annealed barcodes were purified into 20 μl of 10 mM Tris pH 8.0. The empty barcode vector (2 μg) was digested and dephosphorylated with BamHI (NEB, R3136T), XbaI (NEB, R0145S) and FastAP (Thermo, EF0652) for 4–12 h at 37 °C. Following purification, 1 μg of digested plasmid and 60 ng of annealed barcodes were assembled using NEBuilder 2× master mix (NEB, E2621S) for 1–4 h at 50 °C. The resulting product was purified into 10 μl of water, electroporated into Stbl4 ElectroMAX cells (Thermo, 11635018) and plated onto bioassay plates (Sigma, CLS431111-16EA) with carbenicillin. After growth at 30 °C for 20–24 h, colonies were scraped into media and grown for an extra 1–4 h before plasmid purification.

Organoid lentiviral infection

Lentivirus was generated by transfecting LentiX cells with second generation packaging constructs. Viral supernatant was concentrated by overnight incubation with 300 mM NaCl and 8% PEG-6000 (Millipore, 8074911000) and centrifugation at 3,000g for 30 min. Concentrated virus was resuspended in PBS before cell treatment.

Murine organoids were pretreated for 24 h with 5 μM Y-27632 before dissociation to near single cells. Cells were pre-incubated in infection media (WENR, 10 μM Y-27632, 10 μg ml⁻¹ polybrene) for 15 min before addition of concentrated virus at less than or equal to 50% the volume of Infection Media. A rough ratio of 750 μl of concentrated virus to 25,000 cells was used. Spinoculation was performed by centrifuging at 600g for 1 h at 32 °C and the cell–viral mixture was incubated at 37 °C for 3–4 h before organoids were replated. A multiplicity of infection of less than 0.3 was used and verified by GFP expression.

SHARE-TRACE

Nuclear isolation, fixation and transposition were performed as described above. For clonal barcode capture from SHARE-seq, barcode-specific RT primer was spiked into the reverse transcription reaction at 10% general RT primer concentration:

(/5Phos/GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGNNNNNNNNNN/iBiodT/CTCATTCAGCCACGGTGG)

Split-pool barcoding, ligation and reverse crosslinking were all performed without modification. ATAC-seq libraries were generated as above. During cDNA PCR amplification, a barcode-specific forward primer mix was spiked into the reaction at 2 μM final concentration. The primer mix consisted of an equimolar mixture of:

ACACTCTTTCCCTACACGACGCTCTTCCGATC

ACACTCTTTCCCTACACGACGCTCTTCCGATCTNTAGACAT

ACACTCTTTCCCTACACGACGCTCTTCCGATCTNNTAGACAT

ACACTCTTTCCCTACACGACGCTCTTCCGATCTNNNTAGACAT

where N represent a random base mixture, introducing a frameshift in the amplification products and increased sequencing diversity of the barcode library.

Following total cDNA amplification and purification, 0.5 μl of product was removed and further amplified with the above primer mix and P7 primer for 10 cycles and Ampure purified into 10 μl. A barcode-enriched library was then generated by amplifying 5 μl of these products with P7 and a barcode-specific index primer:

AATGATACGGCGACCACCGAGATCTACAC(index5)ACACTCTTTCCCTACACGACGCTCTTCCGATCT

Whole-genome methylation profiling

Genomic DNA was extracted using the Qiagen DNeasy Blood & Tissue Kit (Qiagen, 69504). For organoid profiling, cultures were dissociated per standard protocol described above at 25 days of culture and processed per kit protocol. For whole tissue profiling, animals were perfused as described and whole colons were dissected and freshly embedded in blocks of OCT before freezing. Tissue was cryogenically sectioned and excess OCT was removed before processing per kit protocol.

For each sample, 200 ng of DNA was resuspended in water to 48 μl, mixed with control DNA from the NEBNext Enzymatic Methyl-seq V2 Kit (NEB, E8015S) total and sonicated on a Covaris S220 with settings: 60 s, 140 W, 10% duty and 200 cycles per burst. The resulting distribution of fragments was almost entirely between 200 base pairs (bp) and 600 bp. Sonicated DNA was then processed per kit protocol and libraries were sequenced on a NovaSeq X at roughly 20 times coverage.

In vivo AP-1 inhibition

The compound T-5224 was resuspended in DMSO at 200 mg ml⁻¹ before being mixed at 10:90 with prewarmed corn oil, resulting in a 20 mg ml⁻¹ final suspension. Mice were given 100 mg kg⁻¹ of drug by oral gavage daily for 5 days (19 days post-DSS withdrawal to 23 days postwithdrawal). For concurrent tumorigenesis experiments, tamoxifen was administered intraperitoneally at the third day of T-5224 treatment.

Regional colon motif accessibility

The entirety of the colon (between caecum and rectum) from a healthy control mouse was extracted as previously described and then cut into six equal length fragments (each roughly 1 cm). These fragments were then individually treated with EDTA solution, crudely scraped and dissociated to single cells as previously described. Cells were resuspended in ADMEM, counted and 10,000 cells per segment were collected. Bulk ATAC-seq and library preparation were performed in the same manner as SHARE-seq without crosslinking.

Bulk ATAC-seq

Following acute AP-1 inhibition in organoids, bulk ATAC-seq was performed analogously to that of SHARE-seq without crosslinking. Briefly, cultures were dissociated to single cells and 10,000 cells were used per transposition reaction. Following 30 min of transposition, DNA was purified using a Qiagen MinElute kit and libraries were prepared following the same protocol as SHARE-seq.

FOS CUT&Tag

Epithelium was crudely scraped from control or recovered animal colons as described above and nuclei were isolated by resuspension in HLB supplemented with Protease Inhibitor Cocktail. Following 5 min of incubation on ice, nuclei were diluted with Working NE Buffer—Nuclear Extraction Buffer (20 mM HEPES pH 8, 10 mM KCl, 0.1% Triton X-100, 20% glycerol) supplemented with 0.5 mM spermidine and PIC. Following another 5 min of incubation, nuclei were centrifuged and 500,000 nuclei were resuspended in 100 μl of Working NE Buffer. Concanavalin A beads (Bangs Laboratories, BP531) were activated by washing twice in Bead Activation Buffer (20 mM HEPES pH 8, 10 mM KCl, 1 mM CaCl₂, 1 mM MnCl₂) and 10 μl of beads were added per sample. Nuclei were bound to beads for 10 min at room temperature, magnetized and supernatant was removed. Bound nuclei were resuspended in 50 μl of Digitonin150 Buffer (20 mM HEPES, 150 mM NaCl, 0.5 mM spermidine, 0.01% digitonin, PIC) supplemented with 2 mM EDTA, 1 μg (1:100) of anti-FOS antibody (AE-1059 from the laboratory of M. Greenberg) was added and samples were rotated overnight at 4 °C.

Beads were magnetized and resuspended in 50 μl of Digitonin150 Buffer before being added to anti-rabbit secondary (Novus, NBP1-72763). Samples were rotated for 30 min at room temperature and washed twice with Digitonin150 Buffer. Nuclei were then resuspended in 50 μl of Digitonin300 Buffer (20 mM HEPES pH 8, 300 mM NaCl, 0.5 mM spermidine, PIC, 0.01% digitonin) and 2.5 μl of CUTANA pAG-Tn5 was added (EpiCypher, 15-1017). Samples were rotated for 1 h at room temperature and washed twice with 200 μl of Digitonin300 Buffer. Nuclei were resuspended in tagmentation buffer (20 mM HEPES pH 8, 300 mM NaCl, 0.5 mM spermidine, PIC, 10 mM MgCl₂) and placed on a thermocycler at 37 °C for 1 h. Tagmented nuclei were magnetized, washed with 50 μl of TAPS Buffer (10 mM TAPS, 0.2 mM EDTA) and resuspended in 5 μl of SDS Release Buffer (10 mM TAPS, 0.1% SDS). Following an incubation at 58 °C for 1 h, 15 μl of SDS Quench Buffer (0.67% Triton X-100) was added and libraries were prepared using NEBNext HiFi 2× Master Mix.

In vitro TF binding assay

Genomic regions to investigate were selected by filtering for peaks with a difference in normalized footprint score of at least 0.2 at a FOS or JUN motif between colitis-recovered and control tissue (described below in ‘Footprinting and seq2PRINT’). Regions were then partitioned into those with only an AP-1 or AP-1/FOX composite motif based on motif matching (also described below in the section ‘Footprinting and seq2PRINT’) and confirmed by visual inspection of DNA sequence. The top regions by change in footprinting score were then selected and a roughly 1,000-bp region from each was amplified from mouse genomic DNA. Following purification, amplicons were pooled in equimolar concentrations.

In vitro footprinting was performed as described in ref. ⁴⁶ with the following modifications: briefly, selected sequences (25 ng per reaction) were incubated with various combinations of recombinant JUN (Active Motif, 31116), FOS (OriGene, TP760257), FOXP1 (OriGene, TP313862) and FOXA1 protein (OriGene, TP306045), along with tagmentation buffer (20 mM Tris, 10 mM MgCl₂ and 20% dimethylformamide) and water in a 22.5-μl total volume at room temperature for 1 h. Then, 0.15 μl of preassembled Tn5 (seqWell, Tagify) was combined with 2.35 μl of dilution buffer (50 mM Tris, 100 mM NaCl, 0.1 mM EDTA, 1 mM DTT, 0.1% NP-40 and 50% glycerol), and subsequently added to samples (resulting in final TF concentrations of 300 nM each). Tagmentation was performed for 30 min at 37 °C. A Qiagen MinElute PCR clean-up kit was used to purify tagmented DNA, and samples were then PCR amplified for seven cycles. After pooling, sample libraries were sequenced on a Next-seq 500/550.

Adenoma induction, macroscopic quantification and proliferation

Mice were ordered from The Jackson Laboratory (035169) with genotype wild type for Kras<tm4Tyj>, homozygous for Apc<tm1Tno> and homozygous for Tg(CDX2–cre/ERT2)752Erf. Adenomas were induced at 21–23 days following DSS cessation. For macroscopic tumours, tamoxifen dissolved in corn oil was administered at 50 mg kg⁻¹ by intraperitoneal injection on 3 consecutive days and animals were euthanized 25–28 days following the first injection. The entire length of the colon was removed and imaged. Owing to higher expression of the CDX2–cre driver of this mouse model in the proximal colon⁵⁰, more tumours form in the 1 cm of colon adjacent to the caecum and adenomas were quantified in the distal 5 cm of colon for more accurate counting. Adenoma diameter was measured in ImageJ along the longest axis of each tumour and scaled to millimetres using a ruler placed in the same image. Researchers were blinded during quantification.

To measure proliferation, immunohistochemistry for Ki67 (anti-Ki67 1:100, Abcam, ab15580; secondary anti-rabbit 1:500, Vectastain Elite ABC, PK-6101) was performed and the proliferating fraction was quantified in only adenoma areas, as identified by H&E morphology on adjacent sections. Researchers were blinded during quantification.

Ex vivo adenoma organoid clonogenicity

Organoids were derived from adenoma tissue 23 days following the first tamoxifen administration. Tissue was crudely scraped following EDTA treatment and dissociated to near single cell as previously described. Cells were plated in Crypt Basal with 50 ng ml⁻¹ EGF and 100 ng ml⁻¹ Noggin and allowed to grow for at least 1 passage to purify cultures. Clonogenicity (colony-forming efficiency) was calculated on the secondary organoids by plating 1,000 cells passaged from the primary organoids and assessing organoid formation 7 days after initiation of cultures.

Microscopic tumour initiation and quantification

Adenomas were induced 21 days following DSS cessation with a single dose of tamoxifen at 10 mg kg⁻¹ and euthanized 13–14 days following. For AP-1 inhibition assays, T-5224 was administered 2 days before tamoxifen induction, the day of induction and 2 days following, as described above in the section ‘In vivo AP-1 inhibition’. The entire length of colon was removed, fixed and paraffin embedded as a Swiss roll as previously described. Immunohistochemistry for β-catenin (anti-β-catenin 1:200, BD, 610153; secondary anti-mouse 1:500 Vectastain Elite ABC, PK-7200) was performed. The entire section was first imaged at low magnification (×2) to quantify total tissue area, then microscopic tumours were identified by high nuclear β-catenin staining and imaged under high magnification (×20). Both total tissue and individual tumour areas were quantified using ImageJ, using scale bars as reference, and tumour area summed across all tumours and reported as a percentage of total tissue. Researchers were blinded during imaging and quantification.

Spatial transcriptomics on adenoma tissue

Spatial RNA-seq was performed on fresh frozen Swiss rolls of colons with adenomas induced at 50 mg kg⁻¹ × 3. The Visium HD Kit was used according to standard protocol with the following modifications: the OCT-embedded Swiss rolls were sectioned on a cryostat at 10-μm thickness and mounted on a Fisherbrand Superfrost Plus glass slide. The slides were fixed using 4% PFA and stained using H&E method. During that process the wash buffers were supplemented with either ribonucleoside vanadyl complex or a commercial RNase inhibitor. After imaging of the H&E staining, the samples were destained and permeabilized using 1% SDS and then prechilled 70% methanol. Tissue processed this way was then analysed using a standard 10X Visium HD method described in the GenBank Nuccore User Guide CG000685 (Rev. A).

Data processing and analysis

SHARE-seq raw data processing

Raw SHARE-seq data were processed as previously described²⁴ with minor modifications (code available at https://github.com/masai1116/SHARE-seq-alignmentV2/). Briefly, raw fastq files were demultiplexed using custom Python scripts. ATAC-seq reads were aligned to the mm10 or hg38 genome using bowtie2 (v.2.3.3.1)⁶⁰, removing fragments with length longer than 2 kb. RNA-seq reads were aligned using STAR (v.2.5.3a)⁶¹, removing reads with greater than 20 alignments or score less than 0.3. Both library types were further filtered to remove mitochondrial reads and reads with a mapping quality less than 30, and chrY reads were removed from ATAC libraries. Filtered ATAC reads were then deduplicated and further filtered for cell barcodes with at least 100 raw reads. Filtered RNA reads were assigned to genes using featureCounts⁶² using only primary mapping coordinates and unique molecular identifiers (UMIs) were counted using umi_tools (v.1.0.1)⁶³, removing those consisting of only ‘G’s. Libraries were then filtered for RNA cell barcodes with at least 300 UMIs or ATAC cell barcodes with a library size of 500 reads before further processing and filtering.

scRNA-seq processing

All filtered cell barcodes were normalized by the total number of transcripts detected. The top 5,000 most variable genes were selected and principal components analysis was performed on the log₂ + 1 transformed values of these genes. Library sizes were smoothened over the 20 k-NN in this space and clumps of cells were manually identified as those barcodes with extremely high smoothened library sizes. Once clumps were removed, the remaining barcodes were processed with scrublet⁶⁴ to identify doublets and barcode collisions, in which doublet score thresholds were manually selected. The remaining singlet filtered barcodes were again normalized before training scVI⁶⁵ models (20-dimensional latent space; negative-binomial likelihood) with batch as a covariate to learn a shared low-dimensional representation of cells. The resulting latent features were used to build a k-NN graph and to compute uniform manifold approximation and projection (UMAP) embeddings for visualization. Gene expression was visualized after using k = 5 k-NN for smoothening normalized values and capping maximum z score values at 3.

scATAC-seq processing

Peaks were called on a merged set of fragments from all sublibraries for each dataset (in vivo tissue memory, ex vivo organoid culture or adenoma tissue) using MACS2 (v.2.2.9.1)⁶⁶. These peaks were then filtered using a previously described approach⁶⁷ in which summits were padded with 400 bp on either end, overlapping windows were filtered for those with higher significance and finally resized to a set of non-overlapping 301-bp peaks. For adenoma tissue samples, this process was done in conjunction with peaks identified from the in vivo tissue memory dataset. Fragments were counted within peaks for each cell barcode and barcodes with low library size (in vivo tissue memory and ex vivo organoids, 1,000 fragments and adenoma tissue, 2,000 fragments) or fraction of reads in peaks (in vivo tissue memory and adenoma tissue fraction of reads in peaks less than 0.2, ex vivo organoids fraction of reads in peaks less than 0.25) were removed. cisTopic (v.0.3.0)⁶⁸ models were generated on all filtered cell barcodes for 10 × n topics for n = 1 to 9, with 150 iterations and burnin of 120. Library sizes were smoothened over the 20 k-NN in this space and clumps of cells were manually identified as those barcodes with extremely high smoothened library sizes. These clump barcodes, as well as clump and doublet barcodes identified in the matched RNA cell barcodes, were removed to identify singlet cell barcodes. Using the value of n selected on all cell barcodes, cisTopic models were generated again on singlet barcodes using 10 × (n − 1), 10 × n, 10 × (n + 1) topics. UMAP, k = 5 k-NN and Louvain graph-based clustering analysis were performed in the resulting topic space. Gene expression visualization was performed by matching ATAC cell barcodes to corresponding RNA cell barcodes to get smoothened normalized expression values. For genome track visualization, fragments were separated by colitis stage, normalized to 10 million total reads, Tn5 insertion sites were counted and images were made using UCSC Genome Browser.

Cell type identification

For each single-cell ATAC using sequencing (scATAC-seq) Louvain cluster, the fraction of cells expressing each marker gene and the average normalized expression within those cells was computed and plotted. Clusters with high expression of Ptprc or Acta2 were designated as non-epithelial cells. Clusters with high expression of Muc1, Chga/b or Dclk1 were designated as secretory cells and subdivided as goblet, enteroendocrine or tuft cells when possible. In non-neoplastic tissue, the clusters with high expression of Lgr5 and Lrig1 were assigned as stem and progenitor cells and those with high expression of Car4, Car1, Lypd8 or Aqp8 were assigned as differentiated absorptive enterocytes. The remaining clusters with moderate expression of those genes were assigned as intermediate absorptive enterocytes. Analogous assignment was performed for ex vivo organoid clusters. In extended timepoints analysis (50 days, 79 days and 102 days), data from each timepoint was processed as described above to identify singlets. Each individual timepoint was then merged with the larger main dataset to rederive topics (up to 150) and cocluster. Cell type labels from the main dataset were used to identify new clusters and cell types in the new timepoint datasets. Labels from the original identification were retained.

For adenoma experiments, the clusters with high expression of Axin2 and high motif scores for LEF1, in conjunction with high expression of Lgr5, Lrig1 and Mki67, were assigned as adenoma cells. The remaining epithelial non-secretory cells were then partitioned as described for non-neoplastic tissue.

Gene expression change analysis

For differential analysis, raw UMI counts were first pseudobulked by cell type in each animal then tested using DESeq2 (v.1.42.1)⁶⁹. For in vivo memory analysis, genes were filtered for those with a minimum reads per million (RPM) of 10 in at least one pseudobulk, then each disease stage was compared to controls and an FDR of 0.05 was used for significance. Genes with suspected multimapping artefacts were identified as those with excessively large expression values across all cells and blacklisted (14 genes). The log₂[fold change] values were taken from DESeq2 output. For adenoma gene activation, all adenoma pseudobulks were tested against all stem and progenitor pseudobulks, regardless of colitis condition. When plotting individual genes, raw UMI counts were pseudobulked across stem and progenitor cells within each animal and RPM values were calculated using total assigned UMIs. Z scores were calculated across all animals for each gene and change was calculated by subtracting average value across control pseudobulks. Gene ontology analysis⁷⁰ was done with all differential genes.

k-NN enrichment analysis

For the in vivo memory dataset, cells were subsetted to stem and progenitors only and 100 k-NN were determined for each cell using cells from all other biological replicates. The cisTopic space defined above was used for scATAC-seq and scVI was used for dimensional reduction for scRNA-seq. To obtain the batch-corrected latent representation of scRNA-seq data, count matrices are normalized by total counts per single cell and log-transformed. In total 10,000 highly variable genes were selected while accounting for batch effects using scanpy (highly_variable_genes with ‘flavour’ set as ‘seurat_v3’ and ‘batch_key’ as batch id.) We then used k-NN from scVI trained models, as described above. For each condition (control, acute injury, chronic injury, recovery), the expected value of k-NN was calculated for random assignment to the 100 k-NN and therefore was the fraction of cells in each condition. Enrichment for each condition was then calculated as the (observed percentage of k-NN) − (expected percentage of k-NN). The analogous procedure was used for human organoids except using all cells.

Motif accessibility analysis

Peaks were annotated as containing a motif using motifmatchr (v.1.12.0) for known cisBP⁷¹ motifs. For de novo derived motifs, peak annotation was performed as described in the ‘Footprinting and seq2PRINT’ method section. Single-cell motif deviations and accessibility scores were calculated with chromVAR (v.1.24.0)²⁸ using 250 background peaks across all single cells within each dataset (in vivo memory, ex vivo organoids or adenoma-induced tissue). A motif similarity matrix was calculated on all known and de novo motifs using Tomtom (memesuite v.5.5.7)⁷² and a q value cut-off of 0.05 was used to group motifs into families based on sequence similarity. This step ensures reliable motif–motif comparisons for downstream analysis. During the bagging process, motifs are sorted based on their variation across cells and those with highest variation were retained as ‘leaders’, whereas other motifs with high similarity scores to these representatives are merged into their respective ‘families’, effectively consolidating similar motifs into unified groups. These motif families were created using the in vivo tissue memory dataset and held constant across all other murine tissue and organoid experiments. Motif families were derived analogously in the human organoid dataset.

For motif accessibility testing, the mean value was computed across single-cell scores of all cells of a given type within each animal or all cells in an organoid line. For in vivo memory, P values were computed for the 50 most variable families by t-test and adjusted using the Benjamini–Hochberg method. Motif accessibility change was defined as the difference between the score for each replicate and the mean across control animals or organoid lines. When visualizing the in vivo change per mouse as a heatmap, samples with fewer than 200 stem cells were excluded. For extended recovery analysis (50 days, 79 days and 102 days), all cells were rescored together with the new data and AP-1 motif accessibility change was recalculated using controls from all timepoints.

Heterogeneity in single-cell motif accessibility was evaluated by randomly downsampling each condition to 500 stem and progenitor cells and computing P values for selected motifs with a two-sided Wilcoxon rank-sum test. Activated cells were defined as stem and progenitor cells with a score greater than 1.5 and the fraction of activated cells was calculated for each animal using all stem and progenitor cells.

Stem and progenitor subcluster analysis

For reclustering to stem cells, all cells were subsetted to those identified as stem and progenitor (section ‘Cell type identification’) and cisTopics scores generated on the overall dataset were used. UMAP embeddings and clusters (k = 5) were generated as was done on the all cell dataset. Groups were assigned manually. In assessing stemness genes, expression was averaged across all cells within a group from a given animal and the mean across animals was then plotted.

The AP-1 high subpopulation was identified as stem cells with AP-1 motif score greater than 1.5. Differential testing was performed using a two-sided Wilcoxon rank-sum test between these AP-1 high cells and all other stem cells. For motifs, this was done across the top 50 most variable motif families, as identified in the overall dataset previously. For genes, the top 1,000 most variable genes across stem cells were tested using expression values normalized to reads per cell.

SHARE-TRACE clone assignment

Demultiplexed reads were processed with custom Python scripts to search for common barcode vector sequence TAGACAT, allowing at most one mismatch. This sequence and all preceding base pairs were trimmed, and any reads with UMIs consisting of only Gs were removed. The remaining 48 bp of barcode sequences were validated by checking for staggered invariant sequences every 4 bases (CT,AC,TC,GT,TG,CA,AT,GC), removing off-target PCR products. To account for sequencing base call errors, the number of reads for each cell–UMI–barcode triple were counted and those with fewer than five reads were removed.

To identify clonal barcode sequences, a Levenshtein distance matrix was calculated across all remaining barcode sequences. For each barcode sequence, all other sequences within a distance of four were found and the barcode was assigned to the most abundant of those sequences. Distance between members of this set of most abundant sequences was computed once more and any sequences within a distance of two were collapsed to the more abundant sequence, generating a consensus set of clonal barcodes. The remaining reads were used to assign each cell–UMI pair to a clone.

To account for transcript mixing that occurs during SHARE-seq split-pooling, we leveraged the fact that each clone should be unique to organoids generated from a single mouse. Cell-clone assignments were matched to validated ATAC or RNA singlets and each clone was assigned to the animal from which the most cells were present (typically more than 95% of barcode reads). Only clones with at least five cells were used for subsequent analysis.

SHARE-TRACE clonal variance calculation

For each feature (motif families or gene programs), the standard deviation of the single-cell scores was calculated across all cells in a clone and the observed clonal variance was then defined as the median of these values across clones to the second power. Cell-clone assignments were then randomly permuted and the shuffled clonal variance was computed analogously. This process was repeated 1,000 times and the mean and standard deviation of the resulting distribution of randomized clonal variance was used to compute a P value:

Z = (observed clonal variance − mean shuffled clonal variance)/(s.d. shuffled clonal variance)

P = 2 × pnorm(−abs(Z))

The FDR was then calculated using the Benjamini–Hochberg method of P value adjustment.

For comparisons between colitis and control organoids, the median value of the scores for all cells belonging to each clone was computed and a two-sided t-test was performed on these values. Clones with high AP-1 accessibility were defined as those with a median score greater than 1.25.

Linear mixed model of single-cell variance

A linear model was created to evaluate the contribution to variance for past exposure to colitis and clonal identity. A data table was formed in which each cell barcode had a designation as having originated from a mouse that experienced colitis (‘is_colitis’) and clone assigned (‘clone’). This was represented as:

form ~ (1|is_colitis) + (1|clone)

Variance contributions were then calculated using the variancePartition R package using fitExtractVarPartModel(), providing the matrix of single-cell motif scores across all motif families. This was compared to randomized clonal distribution by permuting clonal labels within colitis conditions. Cells were subsetted to those exposed to colitis and clone labels were permuted within this, before performing the same procedure on cells not exposed to colitis.

Gene expression program derivation and scoring

For gene expression program modelling, the input was a cell-by-gene count matrix. Unlike the cistopic approach for scATAC-seq data⁶⁸, this matrix was not binarized but remains in raw count format. Latent Dirichlet allocation, implemented in the Mallet package, was used to infer: (1) the probability distribution of topics for each cell, and (2) the probability distribution of genes for each topic. Latent Dirichlet allocation models were trained with a range of topic numbers (30–90), and the model with the highest log-likelihood was selected, following a procedure similar to cisTopic.

To score single cells on these programs, we adapted the chromVAR algorithm for RNA topics. The input cell-by-peak accessibility matrix was replaced with the cell-by-gene transcription matrix, and the motif-by-peak matching matrix is substituted with the topic-by-gene probability distribution matrix inferred by latent Dirichlet allocation. The rest of the calculations remain identical to the original chromVAR workflow. Background genes were generated by grouping all genes in 20 bins of equivalent size based on average expression and 250 background genes were chosen for each gene in the annotated set. The resulting cell-by-topic scores represented the activity levels of RNA topics in each cell while controlling for sequencing depth, gene expression level and other biases.

To identify AP-1- and HNF4/PPAR-associated gene programs, the mean motif score and gene program score across all cells within each clone was calculated. These mean values were then correlated across clones to get motif–gene program correlation values. The top programs were selected for each motif family (gene P20 for AP-1, and gene programs 9 and 30 for HNF4/PPAR) and the top 150 genes by weight of contribution to the program were selected for Gene Ontology analysis and subsequent gene program scoring. For plotting scores on UMAP projections, single-cell score values were capped at −3 and 3.

For single gene analysis in organoids, scRNA-seq reads were pseudobulked by organoid clone and normalized to total reads per clone. Fold change was calculated as the ratio between the mean normalized expression value across all colitis clones and all control clones. Values were scaled by gene before plotting as heatmap. Plotting individual gene change in tissue across disease timepoints was done as described in the section ‘Gene expression change analysis’. Differential testing comparison between chronic injury and 50-day recovery was done with DESeq2 as previously described in ‘Gene expression change analysis’ and then one-sided values were computed as one_sided_P = two_sided_P/2 for genes with positive log₂[fold change] and 1 − two_sided_P/2 for negative log₂[fold change].

EM-seq methylation data processing

EM-seq data were processed and aligned to the mm10 reference genome using the nf-core/methylseq pipeline (v.4.0.0)⁷³ with GPU-enabled bwa-meth, and methylation calls were obtained with MethylDackel (v.0.6.1). We derived two types of feature: (1) ATAC-seq peak-anchored methylation, defined as the fraction of methylated cytosines within ±500 bp of ATAC-seq peak summits, and (2) per-CpG methylation, calculated by combining strand-specific counts at each CpG site, and further derived the fraction of methylated cytosines.

Methylation fraction quantification

For testing change in methylation, peaks were first filtered for those with at least 10% methylation in at least one sample and a standard deviation in fraction methylation of at least 0.05. A two-sided t-test was then performed between colitis-derived organoids and controls across 58,454 regions resulting from this filtering and FDR was calculated with Benjamini–Hochberg adjustment. These methylation change values were compared to ATAC-seq signal by pseudobulking scATAC-seq counts by organoid line, normalizing to 1 million reads per sample and calculating fold change per peak as the average RPM for all colitis organoid lines over all control organoid lines. The methylation at individual CpGs was then visualized by creating a per-base heatmap in a given genomic interval coloured by percentage methylation, with the values for each CpG being extended half way to the next CpG to cover non-CpG bases.

For analysis of AP-1 sites, motif annotations from seq2PRINT (section ‘Footprinting and seq2PRINT’) were used to identify relevant peaks. Variance in methylation fraction across these peaks was calculated and the top 500 peaks were selected to visualize. Average fraction methylation was calculated across all animals in a given condition and the difference between these values was plotted. For colitis recovered versus control comparison, this was all DMSO-treated colitis-recovered animals and DMSO-treated control animals. For T-5224 comparison, this was all T-5224-treated colitis-recovered animals and DMSO-treated colitis-recovered animals. The matched background set of peaks (100 per AP-1 peak selected before) was generated with chromVAR across single cells, controlling for average accessibility and GC-content. P values were then computed using two-sided Wilcoxon rank-sum test between AP-1 peaks and matched background peaks.

Footprinting and seq2PRINT

Footprint scores were calculated using the scPrinter package (v.1.2.0)⁴⁶. The data used were as follows: in vivo memory, all epithelial cells; AP-1 inhibition, all bulk ATAC-seq reads; human organoids, all cells and mouse adenomas and only adenoma cells. Briefly, the seq2PRINT model was trained to take DNA sequences spanning a candidate cis-regulatory element (cCRE) of interest and its surrounding regions (±920 bp) as input and predict multiscale footprints derived using the PRINT method. After training, DeepLIFT is used to extract sequence attribution scores, which represent the contribution of each base pair in the input sequence to the predicted footprints. These sequence attribution scores enable highly accurate TF binding predictions through a neural network model trained on chromatin immunoprecipitation sequencing data⁴⁶ and are referred to as seq2PRINT footprint scores. In this study, raw seq2PRINT footprint scores were binned at a 10-bp resolution for all cCREs. Bins with maximum seq2PRINT footprint scores below 0.2 across all conditions are excluded from further analysis. To facilitate comparisons across different conditions, the scores were quantile-transformed for each condition using the quantile_transform function in scikit-learn (n_quantiles=100000, target_distribution=‘uniform’; v.1.5.2).

Sequence attribution scores from seq2PRINT were further analysed to identify de novo motifs. TF-MoDISco (v.2.2.1) is used to align and cluster seqlets (local subsequences with high sequence attribution scores) into groups of de novo motifs. To assign these motifs to cCREs, the software finemo (v.0.40) was used, which takes the output de novo motifs from TF-MoDISco and the sequence attribution scores as input for motif matching.

Motif accessibility change in de novo motifs was calculated as described above, in which single-cell scores were averaged across all stem and progenitor cells in a given animal, P values computed by t-test compared with control animals and change was defined as the difference from the control animal average.

To visualize how the seq2PRINT model learns the association between input DNA sequences and output multiscale footprints, we generated marginalized predictions (referred to as delta effects) for given motifs using the delta_effects_seq2print function from the scprinter package. Briefly, we randomly selected 10,000 CRE sequences, inserted the consensus sequence of a specific motif at the centre and averaged the difference in model predictions with and without the inserted motif across the 10,000 CREs.

In vitro binding score analysis

Sequencing data were processed using the bulk ATAC-seq pipeline described previously in ref. ⁴⁶. All conditions were downsampled to an equivalent number of insertions in selected regions. Multiscale footprinting was performed using a one-sided binomial test, testing for depletion of insertions in centre versus flanking windows of varying radius lengths (2–100). To test for depletion and account for intrinsic Tn5 sequence preferences, expected insertions in centre and flanking windows were calculated using observed frequencies in a naked (no TF) control condition. As in PRINT⁴⁶, binding scores represent −log[P] of depletion, relative binding score per locus was then calculated by dividing the binding score for each TF combination by the binding score for FOS and/or JUN at the same region. P values were calculated for each TF combination by a one-sample t-test against a μ value of 1, separately for all AP-1 only loci and all AP-1:FOX composite loci.

AlphaFold3-based structure prediction

To nominate FOX factors to investigate, scRNA-seq reads of stem cells from each mouse were pseudobulked and normalized to RPM. The full amino acid sequences and features of murine FOS (P01101), JUN (P05627), FOXP1 (P58462), FOXA1 (P35582), FOXN2 (E9Q7L6) and FOXJ2 (Q9ES18) were downloaded from UniProt. Structures were predicted using AlphaFold3 (r2024.05.23)⁷⁴ by means of the AlphaFold Server by combining nucleotide sequences and amino acid sequences. Each structure included the motif nucleotide sequence, its reverse complement, and the amino acid sequences of FOS, JUN and FOX proteins. PyMOL (v.3.1.3) was used to select for interacting residues and structure visualization. To reduce the amount of disordered regions in each structure and to focus on protein-to-protein and protein-to-DNA interactions, the beginning and end of the protein sequences were truncated to only include regions containing residues within 3.5 Å of neighbouring protein and DNA structures. AlphaFold3 was once again used to predict the truncated structures containing the complete DNA sequence, truncated FOS chain, truncated JUN chain and truncated FOX chain. FOX residues within 3.5 Å of either FOS or JUN were highlighted in the truncated structures as an estimate of interaction between proteins. The location of the interaction residues were compared to known features for each FOX protein. For visualization purposes, structures were modified to only include regions with a B-factor greater than 50.

CUT&Tag processing and analysis

FOS CUT&Tag reads were aligned to the mm10 genome annotation and processed analogously to SHARE-seq (ignoring the single-cell barcode considerations). FOS binding signal analysis was performed by counting CUT&Tag reads within ATAC-seq peaks called from SHARE-seq data and summing reads across replicates for control or recovered samples. RPM was calculated at each peak and variable peaks were identified as the top 10,000 by standard deviation across RPM values after filtering for peaks with a minimum of 20 raw reads in at least 1 sample.

For footprinting prediction performance, peaks were first called on FOS CUT&Tag data using MACS2. Reads were counted within peaks, normalized to 10 million total reads and a t-test was performed between recovered and control samples. Differential peaks were defined as having a P value of less than 0.05. Normalized footprint scores were then condensed by motif, in which overlapping AP-1 motifs were collapsed to one site and change in footprint score was calculated between recovered and control. Performance was then evaluated by calculating the sensitivity and specificity of identifying differential peaks on the basis of varying thresholds of footprint score difference.

Cobinding analysis

Change in footprint score was calculated as the difference between acute, chronic or recovered scores and their batch matched controls. Memory footprints were defined as those showing a minimum change of 0.2. Using all pairs across the selected families, peaks were labelled as containing a memory motif for the first family only, second family only, both families or neither family, and an odds ratio of the resulting contingency table was calculated. Cobinding scores were then calculated by log₂-transforming these odds ratios, performing 10th to 90th quantile normalization across all pairs on positive and negative ratios separately, and linearly scaling positive values between 0 and 1 and negative values between 0 and −1. For primary tissue memory, this normalization and scaling was performed on values across all timepoints. For human IBD organoids and mouse adenomas, footprint differences were calculated as (IBD − Control) and (Recovered adenoma − Control adenoma), respectively. For AP-1 cobinding loss, the analysis was performed analogously using a decrease in footprint score of 0.2 (T-5224 − DMSO). Distance between TFs was calculated by finding the midpoints of all memory sites and calculating all distances between sites within the same peak. Negative footprint scores were set to 0 before plotting.

Motif families were identified as above under ‘Motif accessibility analysis’ with the following modifications:

ETS: ETS family with SPI1, SPIB and SPIC added.

RUNX: RUNX1, RUNX2 and RUNX3.

SNAI/MESP: SNAI2, SNAI3, MESP1, MESP2, TCF3 and TCF4.

ESRR: ESRRA, ESRRB and ESRRG.

Retinoid: RXRA, RXRB, RXRG, RARA and RARB.

Visium HD data processing and visualization

Raw BCL files were converted to fastqs using spaceranger (v.3.0.1) mkfastq and gene expression values were computed using spaceranger count using the Visium_Mouse_Transcriptome_Probe_Set_v2.0 and mm10 genome reference data. Bins of a size of 16 μm were used. For plotting of individual genes, beads were first filtered for a minimum of 300 reads, and then normalized to average read depth across all remaining beads. Principal components analysis was performed on the log₂ + 1 transformation of these values and the top 20 principal components were used to find k-NN (k = 20) for smoothening before plotting of Z-scored expression values (capped at −3 and 3).

Spatial transcriptomic adenoma identification and gene scoring

To identify tumour cells, the principal component smoothened Axin2 expression values were Z scored and adenoma cells were defined as beads with a minimum value of 1. Tumours were then called by finding k = 5 k-NN using spatial coordinates of adenomas cells only and performing Louvain graph-based clustering.

For AP-1-associated genes, the top 150 genes from P20 were overlapped with probes present in the Visium output and then this set of genes was scored analogously to single cells as above, treating bins as cells. In single bin spatial plots, scores were plotted without smoothening and capped at −2 and 2. For whole tumour analysis, raw expression values were pseudobulked by tumour call and these pseudobulks were scored analogously as cells. Tumours with high AP-1-associated gene expression were identified as those with scores greater than 1.5. For analysis of single AP-1 related genes, RPM values were computed on single bins. Visualization of these genes in adenomas only was performed by subsetting bins to only tumour cell calls, renormalizing and resmoothening expression values in only these cells, smoothening once more across the k = 20 x–y coordinate k-NN and capping Z scores at −3 and 3 before plotting.

Adenoma heterogeneity analysis

Expression values from scRNA-seq were pseudobulked by tumour and normalized as described. For unbiased analyses, the top 1000 most variable genes across tumour pseudobulks were selected. For P20, genes were filtered to be expressed in at least 20% of tumours. Fold change was calculated as the ratio between average expression across colitis tumours and across control tumours. Expression values were scaled by gene for heatmap visualization. Differential testing between high P20 tumours and others, as well as colitis tumours and controls, was performed using tumour pseudobulks and DESeq2 as described in ‘Gene expression change analysis’.

Statistical analysis

Except where noted in figures, legends or methods, all experiments in this study were repeated at least three times. All sample numbers (n) of biological replicates and definitions of centre and dispersion are defined in the figure legends. No statistical methods were used to predetermine sample size. Light microscopy, immunofluorescence, immunohistochemistry and gross morphology images each represent one of six or more biological replicates unless otherwise stated. All values are shown as mean ± standard error of the mean (s.e.m.) unless otherwise specified. No animals were excluded from statistical comparisons. Age-matched mice were randomly assigned to treatment groups. Blinding was not performed except for imaging experiments, as noted in the Methods.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Source link

Epigenetic memory of colitis promotes tumour growth – Nature