Draft:Clustergrammer: Difference between revisions
Saideepa0501 (talk | contribs) No edit summary |
Saketrohit24 (talk | contribs) No edit summary |
||
Line 180: | Line 180: | ||
=== 1) Visium Spatial Transcriptomics Data from 10X Genomics === |
=== 1) Visium Spatial Transcriptomics Data from 10X Genomics === |
||
This case study |
This case study examines a tool for analyzing high-dimensional spatial transcriptomics data using Clustergrammer2, bqplot, and voila, focusing on the V1_Mouse_Brain_Sagittal_Anterior Visium dataset from 10x Genomics. It integrates spatial tissue data with high-dimensional gene expression analysis, offering researchers an approach to studying the mouse brain's cellular and molecular organization. By associating spatial patterns with gene expression variability, this case study is relevant for research in neuroscience and genomics.[[File:Small visium gif.gif|thumb|319x319px]]The study combines spatial and high-dimensional data through interactive panels. The left panel displays spatial tissue data and includes a UMAP-based clustering view to organize spots by gene expression similarity. The right panel shows the top 250 variable genes across ~2,500 spots, excluding ribosomal and mitochondrial genes. Hierarchical clustering, supported by Clustergrammer2, enables the identification of co-expressed genes, visualization of tissue-specific expression, and interaction with heatmaps to explore relationships between genes, cell clusters, and spatial locations. |
||
⚫ | For enhanced interpretation, this case study incorporates single-cell RNA-seq data (~14,000 cortical cells) from the Allen Institute as a reference, facilitating cell type annotation of the Visium data. This integration aligns spatial gene expression patterns with expected cell type distributions, aiding the identification of functional cell populations and regulatory networks. By integrating spatial and high-dimensional data, this case study highlights the utility of interactive visualization tools in biological and medical research. |
||
The case study integrates spatial and high-dimensional data through interactive panels. The left panel visualizes spatial tissue data and enables a UMAP-based clustering view to group spots by gene expression similarity. The right panel highlights the top 250 variable genes across ~2,500 spots, excluding ribosomal and mitochondrial genes. Hierarchical clustering, powered by Clustergrammer2, allows researchers to identify co-expressed genes, visualize tissue-specific expression, and interact with heatmaps to uncover relationships between genes, cell clusters, and spatial locations. |
|||
[[File:Small visium gif.gif|thumb|319x319px]] |
|||
⚫ | |||
=== 2) CODEX Single Cell Multiplexed Imaging Dashboard === |
=== 2) CODEX Single Cell Multiplexed Imaging Dashboard === |
||
This case study |
This case study examines the application of Clustergrammer2 and CODEX, a highly multiplexed cytometric approach developed by Goltsev et al., to analyze spatially resolved single-cell data from mouse spleens. The dataset includes ~5,000 single cells derived from a segmented spleen image, where ~30 surface markers were measured. This combination of spatial resolution and high-dimensional data allows for a detailed examination of the cellular composition and organization within the spleen.[[File:Codex gif short.gif|thumb|313x313px]]Clustergrammer2 was used to hierarchically cluster cells based on their marker profiles, identifying patterns of co-expression and cellular heterogeneity. Spatial context was incorporated using the Jupyter Widget bqplot, which visualized single-cell locations through Voronoi plots. The heatmap generated by Clustergrammer2 was linked to the spatial map via a dashboard built with voila, converting Jupyter notebooks into interactive web-based dashboards. This linkage enabled users to interact dynamically with the heatmap and highlight corresponding cells in the spatially resolved map, facilitating the exploration of relationships between cellular phenotypes and their spatial distribution. |
||
⚫ | This case study underscores the value of linked views for analyzing spatially resolved, high-dimensional single-cell data. The integration of clustering and visualization tools allows researchers to uncover meaningful biological patterns and spatial relationships. The dashboard, hosted on MyBinder, offers a replicable and accessible platform for data exploration, showcasing the potential of interactive visualization tools in advancing spatial multi-omics research. |
||
Clustergrammer2 was utilized to hierarchically cluster the cells based on their marker profiles, revealing patterns of co-expression and cellular heterogeneity. To integrate spatial context, the Jupyter Widget bqplot was employed to visualize single-cell locations using Voronoi plots. The heatmap generated by Clustergrammer2 was linked to the spatial map via a dashboard built using voila, which converts Jupyter notebooks into web-based interactive dashboards. This integration allowed users to interact with the heatmap and highlight corresponding cells in the spatially resolved map, enabling a dynamic exploration of the relationship between cellular phenotypes and their spatial arrangements. |
|||
[[File:Codex gif short.gif|thumb|313x313px]] |
|||
=== 3)scRNA-seq Gene Expression 2,700 PBMC === |
|||
⚫ | This case study |
||
This case study examines the application of single-cell RNA sequencing (scRNA-seq) to analyze gene expression across thousands of individual cells, offering insights into cellular heterogeneity. The dataset, consisting of 2,700 peripheral blood mononuclear cells (PBMCs) obtained from 10X Genomics, includes thousands of gene expression measurements per cell, facilitating high-dimensional analysis. |
|||
Clustergrammer2 was utilized to explore the dataset interactively. Bulk gene expression signatures from CIBERSORT were used to assign tentative cell type labels to each cell. This approach enabled the clustering of cells based on gene expression profiles and the identification of patterns of co-expression among genes, providing insights into the diversity and functionality of immune cell populations within the PBMC dataset. |
|||
The study highlights the value of Clustergrammer2's dynamic visualization capabilities in combination with scRNA-seq data for uncovering biologically relevant patterns and relationships. The data and analysis workflow are accessible on GitHub through clustergrammer2-notebooks, allowing researchers to replicate and expand upon the analysis, underscoring its utility as a resource for studying immune system dynamics. |
|||
= References =<!-- Inline citations added to your article will automatically display here. See en.wikipedia.org/wiki/WP:REFB for instructions on how to add citations. --> |
= References =<!-- Inline citations added to your article will automatically display here. See en.wikipedia.org/wiki/WP:REFB for instructions on how to add citations. --> |
Revision as of 17:35, 19 November 2024
Clustergrammer
Draft article not currently submitted for review.
This is a draft Articles for creation (AfC) submission. It is not currently pending review. While there are no deadlines, abandoned drafts may be deleted after six months. To edit the draft click on the "Edit" tab at the top of the window. To be accepted, a draft should:
It is strongly discouraged to write about yourself, your business or employer. If you do so, you must declare it. Where to get help
How to improve a draft
You can also browse Wikipedia:Featured articles and Wikipedia:Good articles to find examples of Wikipedia's best writing on topics similar to your proposed article. Improving your odds of a speedy review To improve your odds of a faster review, tag your draft with relevant WikiProject tags using the button below. This will let reviewers know a new draft has been submitted in their area of interest. For instance, if you wrote about a female astronomer, you would want to add the Biography, Astronomy, and Women scientists tags. Editor resources
Last edited by Saketrohit24 (talk | contribs) 9 days ago. (Update) |
Clustergrammer is a web-based interactive tool designed for visualizing and analyzing high-dimensional data through heatmaps. It was developed by the Ma'ayan Laboratory at the Icahn School of Medicine at Mount Sinai. The tool addresses the limitations of static heatmaps by integrating interactive features, facilitating the analysis of complex biological datasets, including genomics and proteomics
Introduction
Clustergrammer is a visualization tool specifically designed for high-dimensional data commonly encountered in computational biology and data science [1]. Unlike traditional static heatmaps, it enables users to explore data interactively by zooming, panning, clustering, and reordering rows and columns. The tool is applicable across various domains, including gene expression analysis, protein interaction networks, and single-cell data visualization. By leveraging web-based technologies, Clustergrammer provides accessible and shareable visualizations that simplify the interpretation of complex datasets [2].
Features
Interactive Heatmaps
Clustergrammer enables users to create interactive heatmaps that allow for dynamic exploration of data. Features [3]include
- Zooming and Panning: Users can navigate large datasets efficiently
- Filtering and Reordering: Rows and columns can be reordered by hierarchical clustering, sum, variance, or labels.
- Search and Highlighting: Specific rows or columns can be located quickly using search functions. (upload a gif we just downloaded)
Interactive Dimensionality Reduction
Dimensionality reduction techniques like Principal Component Analysis (PCA) and t-Distributed Stochastic Neighbor Embedding (t-SNE) simplify high-dimensional data for visualization. Clustergrammer enhances this process by allowing users to filter rows based on sum or variance, focusing on the most informative data points. This interactive filtering helps identify how specific dimensions affect clustering patterns. For smaller datasets, it uses animations to show the impact of these changes, aiding in data interpretation.
Clustering Algorithms
Clustergrammer employs hierarchical clustering algorithms, with support for additional methods such as K-means clustering. Users can visualize dendrograms, toggle between clustering levels, and extract enriched clusters.
Interactive Dendrograms: Clustergrammer employs interactive dendrograms to represent hierarchical clustering of data rows and columns. Instead of displaying the entire tree, it shows one slice at a time using gray trapezoids. Users can adjust the dendrogram slider to explore different clustering levels, revealing larger or smaller clusters. Interacting with these trapezoids highlights specific clusters, provides detailed information, and allows exporting of row or column names. For gene-level data, users can send clustered genes to Enrichr for enrichment analysis, facilitating deeper biological insights.
Customization Options
The tool provides various customization features:
- Users can adjust the opacity, highlight categories, and crop data subsets for detailed exploration.
- Integrations with external APIs, such as Enrichr, allow for enrichment analysis directly within the visualization.
Applications
1. High-Dimensional Data Visualization
Clustergrammer is a powerful tool for analyzing large and complex datasets by creating interactive heatmaps. These visualizations enable researchers to examine high-dimensional data intuitively, even when datasets contain thousands of rows and columns. This makes it particularly useful for summarizing, filtering, and interpreting large-scale experiments or studies.
2. Gene Expression Analysis
Widely used in genomics, Clustergrammer aids in analyzing gene expression data, including single-cell RNA sequencing (scRNA-seq) [4]>. By visualizing relationships among genes or samples, the tool helps researchers identify meaningful patterns, clusters, and correlations, offering insights into underlying biological processes or gene functions.
3. Biological Network Visualization
The tool is applied to represent biological networks such as protein-protein interactions, metabolic pathways, or gene regulatory networks. Clustergrammer’s clustering capabilities help pinpoint highly interconnected nodes or significant components, which are often critical in understanding the system's overall function or discovering key biomarkers.
4. Hierarchical Clustering
Clustergrammer supports hierarchical clustering, a method for organizing data into groups based on similarity. This is essential for categorizing features like genes, conditions, or samples into clusters, revealing relationships and structures within the data. Such clustering is especially valuable in understanding biological datasets, where interconnectedness is common.
5. Single-Cell Data Analysis
In single-cell studies, Clustergrammer is instrumental in exploring datasets derived from technologies like 10X Genomics. It allows researchers to classify cells based on gene expression signatures, visualize population structures, and assess how cells relate to one another, helping to uncover novel cell types or states.
6. Comparative Data Analysis
Clustergrammer facilitates the comparison of multiple datasets or experimental conditions. By visualizing and contrasting data in heatmaps, researchers can quickly identify similarities or differences between groups, aiding in hypothesis generation or validation.
Technical details
Architecture
Clustergrammer operates on a modular architecture comprising:
- Backend: Built using Python, with key libraries such as NumPy and SciPy for data processing.
- Frontend: Employs JavaScript and D3.js for rendering interactive visualizations.
- Integration: The tool supports integration with Jupyter Notebooks and REST APIs, enabling seamless workflow incorporation.
Core Libraries are Clustergrammer-PY and Clustergrammer-JS.
Clustergrammer2
Clustergrammer2 is a specialized Jupyter widget that enables interactive visualization of high-dimensional biological data. Developed using widget-ts-cookiecutter[5]> and regl WebGL library [6]>, it focuses on analyzing single-cell datasets, particularly RNA sequencing data. The tool supports exploration of large-scale data, like the analysis of gene expression patterns across thousands of cells [7]. For example, researchers have used it to examine 2,700 PBMCs and identify cell types based on gene expression signatures.
Clustergrammer-JS
Clustergrammer-JS is a JavaScript visualization library that generates interactive heatmaps in web browsers. Built on D3.js and SVG technology, it renders complex data in an explorable format with features like:
- Data filtering options (Data filtering capabilities encompass three main categories: value-based, categorical, and interactive filtering. Value filters allow threshold-based row/column manipulation, handling of numerical criteria, and removal of sparse data points. Category-based filtering enables grouping by metadata, visibility toggling of specific groups, and filtering based on clustering outcomes. Interactive selections provide manual row/column control, subset data visualization, and dynamic content reordering, allowing users to explore and analyze complex datasets efficiently through both preprocessing and real-time filtering options.)
- Customizable information displays on hover
- Seamless web application integration
The library works with JSON data produced by Clustergrammer-PY and provides developers the tools to embed dynamic visualizations in their web projects. Its source code and installation details are available on [8]
Clustergrammer-PY
Clustergrammer-JS is a Python package that enables users to create dynamic heatmap visualizations through automated data analysis. The tool processes input data to generate JSON files that power interactive web-based displays via Clustergrammer-JS.
Key features include:
- Data preprocessing capabilities like hierarchical clustering and multiple normalization options
- Support for both file-based and DataFrame inputs
- Integration with major scientific Python libraries (The library demonstrates broad compatibility through integration with essential scientific Python packages, including NumPy for matrix operations, Pandas for DataFrame processing, SciPy for statistical analysis, and scikit-learn for machine learning capabilities.)
- Cross-version compatibility (Its cross-version support ensures functionality across both Python 2.7 and Python 3.x versions, maintaining backward compatibility through consistent function implementations and careful management of package dependencies.)
The package handles data transformation and prepares structured JSON output suitable for visualization. Users can access it through the source code repository [9].
Implementation Guide
Clustergrammer is accessible through multiple platforms, including its web-based interface, Python API, and Jupyter Notebook integration. Below is a step-by-step guide to implementing Clustergrammer in various scenarios:
1. Using the Web Interface
The easiest way to use Clustergrammer is through its web interface:
- Visit the Clustergrammer Web Tool.[10]
- Upload a CSV or TSV file containing your high-dimensional data.
- Use the interactive heatmap to explore, filter, and cluster your data dynamically
2. Python API: Clustergrammer-PY
The Python API provides advanced users with full control over preprocessing and visualization. Follow these steps to use the API:
Step 1: Installation
Install the Clustergrammer-PY library using pip:
pip install clustergrammer-py
Step 2: Import the Library
Start by importing the Clustergrammer-PY module:
from clustergrammer import Network
Step 3: Load and Preprocess Data
Initialize the Network
object and load the data:
net = Network()
net.load_df(data)
Step 4: Apply Clustering
Use the built-in clustering algorithms:
net.cluster()
Step 5: Save and Visualize Results
Save the clustered data as a JSON file for visualization:
net.write_json_to_file('viz', 'clustergrammer_output.json')
3. Jupyter Notebook Integration
To Visualize Clustergrammer heatmaps directly within Jupyter Notebooks, use the Clustergrammer2
widget
1.Install the clustergrammer2
package
pip install clustergrammer2
2.Import and use the widget in a Jupyter Notebook:
import clustergrammer2
from clustergrammer2 import CGM
# Initialize the Clustergrammer2 object
cgm = CGM()
# Load data into the widget
cgm.load_data(data)
# Display the interactive heatmap
cgm.widget()
This integration allows for seamless interaction with heatmaps during data exploration.
4. Integration with REST APIs
Clustergrammer supports REST API endpoints for automation:
- Prepare a JSON-formatted data file as described in the Clustergrammer documentation.
- Use tools like
curl
or Python’srequests
library to send POST requests to the API:
import requests
# Define API endpoint and data payload
url = "https://clustergrammer_api_url"
payload = {"data": data.to_json()}
# Send POST request
response = requests.post(url, json=payload)
# Retrieve clustered data
clustered_data = response.json()
Case studies
1) Visium Spatial Transcriptomics Data from 10X Genomics
This case study examines a tool for analyzing high-dimensional spatial transcriptomics data using Clustergrammer2, bqplot, and voila, focusing on the V1_Mouse_Brain_Sagittal_Anterior Visium dataset from 10x Genomics. It integrates spatial tissue data with high-dimensional gene expression analysis, offering researchers an approach to studying the mouse brain's cellular and molecular organization. By associating spatial patterns with gene expression variability, this case study is relevant for research in neuroscience and genomics.
The study combines spatial and high-dimensional data through interactive panels. The left panel displays spatial tissue data and includes a UMAP-based clustering view to organize spots by gene expression similarity. The right panel shows the top 250 variable genes across ~2,500 spots, excluding ribosomal and mitochondrial genes. Hierarchical clustering, supported by Clustergrammer2, enables the identification of co-expressed genes, visualization of tissue-specific expression, and interaction with heatmaps to explore relationships between genes, cell clusters, and spatial locations.
For enhanced interpretation, this case study incorporates single-cell RNA-seq data (~14,000 cortical cells) from the Allen Institute as a reference, facilitating cell type annotation of the Visium data. This integration aligns spatial gene expression patterns with expected cell type distributions, aiding the identification of functional cell populations and regulatory networks. By integrating spatial and high-dimensional data, this case study highlights the utility of interactive visualization tools in biological and medical research.
2) CODEX Single Cell Multiplexed Imaging Dashboard
This case study examines the application of Clustergrammer2 and CODEX, a highly multiplexed cytometric approach developed by Goltsev et al., to analyze spatially resolved single-cell data from mouse spleens. The dataset includes ~5,000 single cells derived from a segmented spleen image, where ~30 surface markers were measured. This combination of spatial resolution and high-dimensional data allows for a detailed examination of the cellular composition and organization within the spleen.
Clustergrammer2 was used to hierarchically cluster cells based on their marker profiles, identifying patterns of co-expression and cellular heterogeneity. Spatial context was incorporated using the Jupyter Widget bqplot, which visualized single-cell locations through Voronoi plots. The heatmap generated by Clustergrammer2 was linked to the spatial map via a dashboard built with voila, converting Jupyter notebooks into interactive web-based dashboards. This linkage enabled users to interact dynamically with the heatmap and highlight corresponding cells in the spatially resolved map, facilitating the exploration of relationships between cellular phenotypes and their spatial distribution.
This case study underscores the value of linked views for analyzing spatially resolved, high-dimensional single-cell data. The integration of clustering and visualization tools allows researchers to uncover meaningful biological patterns and spatial relationships. The dashboard, hosted on MyBinder, offers a replicable and accessible platform for data exploration, showcasing the potential of interactive visualization tools in advancing spatial multi-omics research.
3)scRNA-seq Gene Expression 2,700 PBMC
This case study examines the application of single-cell RNA sequencing (scRNA-seq) to analyze gene expression across thousands of individual cells, offering insights into cellular heterogeneity. The dataset, consisting of 2,700 peripheral blood mononuclear cells (PBMCs) obtained from 10X Genomics, includes thousands of gene expression measurements per cell, facilitating high-dimensional analysis.
Clustergrammer2 was utilized to explore the dataset interactively. Bulk gene expression signatures from CIBERSORT were used to assign tentative cell type labels to each cell. This approach enabled the clustering of cells based on gene expression profiles and the identification of patterns of co-expression among genes, providing insights into the diversity and functionality of immune cell populations within the PBMC dataset.
The study highlights the value of Clustergrammer2's dynamic visualization capabilities in combination with scRNA-seq data for uncovering biologically relevant patterns and relationships. The data and analysis workflow are accessible on GitHub through clustergrammer2-notebooks, allowing researchers to replicate and expand upon the analysis, underscoring its utility as a resource for studying immune system dynamics.
References
- ^ Clustergrammer documentation: https://clustergrammer.readthedocs.io/
- ^ Fernandez, Nicolas F.; Gundersen, Gregory W.; Rahman, Adeeb; Grimes, Mark L.; Rikova, Klarisa; Hornbeck, Peter; Ma’ayan, Avi (2017). "Clustergrammer, a web-based heatmap visualization and analysis tool for high-dimensional biological data". Scientific Reports. 7. doi:10.1038/s41598-017-01819-3.
- ^ "Clustergrammer Documentation". Read the Docs. Retrieved 2024-11-19.
- ^ "single cell RNA".
- ^ "widget-ts-cookiecutter".
- ^ "regl".
- ^ "Clustergrammer2 GitHub Repository". GitHub. Icahn School of Medicine at Mount Sinai. Retrieved 2024-11-19.
- ^ "Clustergrammer-JS GitHub Repository". GitHub. Retrieved 2024-11-19.
- ^ "Clustergrammer-PY GitHub Repository". GitHub. MaayanLab. Retrieved 2024-11-19.
- ^ "ClusterGrammer Webtool".
{{cite web}}
: CS1 maint: url-status (link)