Welcome to proChIPdb, the PROkaryotic Chromatin ImmunoPrecipitation DataBase! This tool enables microbiologists to easily browse 271 chip-seq and chip-exo profiles for various transcription factors (TFs) across 13 organisms. Currently, most of our profiles are for Escherichia coli. We provide curated tables of binding sites, interactive plots and genome viewers, as well as comparisons against literature binding sites.
Begin by selecting your organism of interest from the splash page. Next, you can browse the table of transcription factors from our dataset page. proChIPdb covers well-studied transcription factors, as well as a wide scope of E. coli y-TFs (relatively poorly characterized transcription factors).
You can also search for transcription factors or genes using the search button in the upper left corner, or download all site data to do your own custom analysis.
Once you’ve reached a transcription factor dashboard, you can browse its binding sites and target genes in the curated table. Interested in a specific binding site? Click on it to see the peak in our genome viewer. You can also click through the tabs in the upper right panel to see global characterizations of the transcription factor’s binding, such as its binding motif, peak width, peak location relative to its target genes, and the concordance between the data in proChIPdb and other databases of transcription factor binding. For more details on the dashboard components, see the sections below.
It is our hope that proChIPdb will enrich your research by allowing easy access to this compendium of binding intensity data, curated binding sites, and summary statistics. If you are trying to understand how genes are transcriptionally regulated (e.g. differential expression result interpretation, gene module identification, relative binding strength comparison, binding motif analysis), then proChIPdb can provide detailed information relevant to you.
Each dashboard features a specific transcription factor (for a specific organism and strain) and its ChIP results for at least one condition. Most transcription factors were characterized under M9 media conditions, and some were also characterized under other relevant conditions. In the following section, we include examples of each page element from the E. coli K-12 Fur dashboard.
Located in the left-side column of the page, this panel contains basic details about the organism, strain, media used, and supplement(s), if applicable. Hovering over thebutton adjacent to the media name will reveal additional details about the composition of the media. Supplements are included in detailed, nested dictionaries which describe the concentrations in the units used by their original publications. The accession number from GEO or SRA is also displayed, along with the PMID and DOI of the associated publication.
Links to the transcription factor's page on external databases are also included here, featuring EcoCyc, RegulonDB, UniProt, the Protein Data Bank, Pseudomonas Genome DB, and AureoWiki. Use these to access up-to-date information about the transcription factor, including relevant journal articles and protein structures.
Below the main links, there may be additional links under the heading "iModulons". iModulons are machine-learning derived gene groupings from analysis of transcriptomes, which can be associated with transcription factors. If the page's transcription factor is predicted to regulate any iModulons, then those iModulons will be listed as links here. Since iModulons capture independent signals in a dataset, they may combine the effects of several transcription factors or approximate nonlinear responses as multiple iModulons, which means that some cases (including Fur) will have more than one associated iModulon. Links go to iModulonDB, which has an about page at which you can learn more about this approach. It may be interesting to compare the genes from the iModulon with the binding sites from proChIPdb, as well as to use iModulonDB to learn more about the transcription factor's activity over a wide range of conditions.
The example on the left shows the metadata for Fur. Fur is the ferric uptake regulator, meaning it controls genes relevant to iron transport. It was tested with two conditions: iron-replete (Fe) and iron-starvation (DPD) conditions. In the presence of iron, it is expected that Fur will bind to DNA and repress its target genes. In the presence of DPD, DPD will bind any free iron to create iron starvation conditions, leading to decreased Fur binding (more details available here). More information about Fur is available at the provided links. Fur is a major cellular regulator, so it takes part in the regulation of several iModulons.
This table represents all identified binding locations for the transcription factor. Tabs across the top correspond to each binding condition. Each row represents a curated binding peak. Clicking on a row will update the Genome Viewer to display a zoomed in plot of the corresponding peak. The rows are initially sorted by genome location, and the tallest peaks can easily be found by sorting by descending "Peak Intensity" (clicking the column header twice). In most cases, peaks were identified by processing the sequence read data with MACE. The columns are as follows:
In the above table, we can learn a great deal about Fur binding. The DPD tab shows relatively low peak intensities, because DPD induces iron starvation that suppresses Fur binding. Switching to the Fe tab, we can sort by peak intensity to see the strongest binding events when Fur binding is stimulated. For example, the binding site Fur-11 has a very high peak intensity and corresponds to the entCEBAH operon, which produces the iron chelator enterobactin. Note that searching on our search page for any of the target genes in this table would return this page as a result.
proChIPdb’s genome viewer provides access to a complete, genome-wide view of the transcription factor’s activity under the given conditions. The genome viewer was made using igv.js, and it visualizes bigWig ChIP read files generated from bam files using deeptools. From top to bottom, the features and tracks of the viewer are:
In the above genome viewer, the bottom four rows show the ChIP data for Fur binding (2 conditions with 2 replicates each). Note that in the DPD conditions, the y axis does not reach a very high value because none of the binding events are particularly strong; this means that noise dominates our view in those conditions. On the other hand, the Fe condition creates several strong peaks. If you'd like, you can scroll back up to the table for the Fe condition and select a binding site like Fur-11. This will zoom the genome viewer into that binding site to see its specific shape. The two peaks for Fur-11 align with the annotated "Fur" binding sites in the "Published TFBS" row. The target genes (starting with entC) can be seen in the "Genes" row. In our less studied transcription factors, this view represents a powerful opportunity for discovery.
The tabs within this panel provide additional characterization of the data. In the upper right hand corner of each tab, the menu button (menu) enables PNG, SVG, and data download.
This tab shows a histogram of the binding peak widths from the binding site table. Hover over each bar to see a count of peaks that fall within its corresponding bin.
In this tab, proChIPdb compares the binding locations relative to each target gene. For each gene (as listed in the Closest Gene column of the Binding Site Table), the distance from the gene start site to the binding site is measured in base pairs and normalized to the length of the gene. Points are then plotted with this value on the x axis and the peak intensity (S/N) on the y axis. Clustering on the x axis indicates the distance at which the transcription factor usually exerts its influence on gene expression. Hover over a point to view more details, such as the gene name.
If this tab exists, then it contains a sequence logo of a significantly enriched motif and its corresponding E-value (E < 0.001). Motifs provide valuable insight about which sequences will bind the TF. Sub-tabs across the bottom of the panel allow you to select from each of the conditions in the dashboard. The menu button in the upper right hand corner of each tab allows download of both the image itself and the position weight matrix (PWM) of the motif.
These motifs were generated using MEME-ChIP (parameters: meme-minw=5bp, meme-maxw=45bp, -meme-nmotifs=4, filter-thresh=0.001), run on the binding peaks extended with a 20bp margin.
In addition, a final tab named using a PMID may be available; this contains the sequence logo as it appears in the publication from which the data is from. You are encouraged to refer to the original publications for more details about those sequence logos.
The venn diagram compares the target genes obtained in proChIPdb to other target gene sets in the literature. Hover over a section of the venn diagram to see the genes it contains. Areas of agreement indicate strong evidence of direct regulation, and areas of disagreement represent opportunities to improve literature annotation or elucidate condition-specific differences in binding. The specific literature sources are mentioned below each diagram. For E. coli, EcoCyc’s transcriptional regulatory network was used.
The search page can be reached from any page by clicking "Search" in the upper right hand corner. You have the option to search transcription factors by name (e.g. "AtoC"), by PMID (e.g. "25222563") or by accession number (e.g. "GSE54901"). You can also search genes by name (e.g. "thrA"; includes common synonyms) or locus tag (e.g. "b0002"). Leave both options selected to return all relevant results. Search terms are case-insensitive. Each result that appears below the search bar will be a link to a transcription factor dashboard. The portion of the result that matches your search term will appear bolded and underlined.
Transcription factor results will simply list the name, organism, strain, PMID, and accession number of the matching page. Gene results are associated with specific binding peaks upstream of the gene of interest, so they include additional details:
As described throughout this page, any specific content on a transcription factor dashboard may be downloaded using the buttons in the panels. If you would prefer to download all of the proChIPdb data, you can do so by following the link in the lower left corner of the splash page (or here). The folder is organized as follows:
proChIPdb manuscript coming soon!
To ask questions, provide feedback, report an issue, or collaborate with us, please email us at [email protected].