Jump to content

Draft:Clustergrammer: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
No edit summary
No edit summary
Line 180: Line 180:


=== 1) Visium Spatial Transcriptomics Data from 10X Genomics ===
=== 1) Visium Spatial Transcriptomics Data from 10X Genomics ===
This case study explores a powerful tool for analyzing high-dimensional spatial transcriptomics data using Clustergrammer2, bqplot, and voila, focusing on the V1_Mouse_Brain_Sagittal_Anterior Visium dataset from 10x Genomics. It bridges spatial tissue data with high-dimensional gene expression analysis, providing researchers with valuable insights into the mouse brain's complex cellular and molecular organization. By linking spatial patterns with gene expression variability, this case study is particularly relevant for neuroscience and genomics research.
This case study examines a tool for analyzing high-dimensional spatial transcriptomics data using Clustergrammer2, bqplot, and voila, focusing on the V1_Mouse_Brain_Sagittal_Anterior Visium dataset from 10x Genomics. It integrates spatial tissue data with high-dimensional gene expression analysis, offering researchers an approach to studying the mouse brain's cellular and molecular organization. By associating spatial patterns with gene expression variability, this case study is relevant for research in neuroscience and genomics.[[File:Small visium gif.gif|thumb|319x319px]]The study combines spatial and high-dimensional data through interactive panels. The left panel displays spatial tissue data and includes a UMAP-based clustering view to organize spots by gene expression similarity. The right panel shows the top 250 variable genes across ~2,500 spots, excluding ribosomal and mitochondrial genes. Hierarchical clustering, supported by Clustergrammer2, enables the identification of co-expressed genes, visualization of tissue-specific expression, and interaction with heatmaps to explore relationships between genes, cell clusters, and spatial locations.


For enhanced interpretation, this case study incorporates single-cell RNA-seq data (~14,000 cortical cells) from the Allen Institute as a reference, facilitating cell type annotation of the Visium data. This integration aligns spatial gene expression patterns with expected cell type distributions, aiding the identification of functional cell populations and regulatory networks. By integrating spatial and high-dimensional data, this case study highlights the utility of interactive visualization tools in biological and medical research.
The case study integrates spatial and high-dimensional data through interactive panels. The left panel visualizes spatial tissue data and enables a UMAP-based clustering view to group spots by gene expression similarity. The right panel highlights the top 250 variable genes across ~2,500 spots, excluding ribosomal and mitochondrial genes. Hierarchical clustering, powered by Clustergrammer2, allows researchers to identify co-expressed genes, visualize tissue-specific expression, and interact with heatmaps to uncover relationships between genes, cell clusters, and spatial locations.
[[File:Small visium gif.gif|thumb|319x319px]]
To enhance interpretation, this case study incorporates single-cell RNA-seq data (~14,000 cortical cells) from the Allen Institute as a reference, enabling accurate cell type annotation of the Visium data. This integration reveals strong alignment between spatial gene expression patterns and expected cell type distributions, supporting the identification of functional cell populations and regulatory networks. Through its seamless integration of spatial and high-dimensional data, this case study showcases the potential of interactive visualization tools for advancing biological and medical research.


=== 2) CODEX Single Cell Multiplexed Imaging Dashboard ===
=== 2) CODEX Single Cell Multiplexed Imaging Dashboard ===
This case study showcases the use of Clustergrammer2 and CODEX, a highly multiplexed cytometric approach developed by Goltsev et al., to analyze spatially resolved single-cell data from mouse spleens. The dataset consists of ~5,000 single cells derived from a segmented spleen image, where ~30 surface markers were measured. This combination of high-dimensional data and spatial resolution enables a detailed exploration of cellular composition and organization within the spleen.
This case study examines the application of Clustergrammer2 and CODEX, a highly multiplexed cytometric approach developed by Goltsev et al., to analyze spatially resolved single-cell data from mouse spleens. The dataset includes ~5,000 single cells derived from a segmented spleen image, where ~30 surface markers were measured. This combination of spatial resolution and high-dimensional data allows for a detailed examination of the cellular composition and organization within the spleen.[[File:Codex gif short.gif|thumb|313x313px]]Clustergrammer2 was used to hierarchically cluster cells based on their marker profiles, identifying patterns of co-expression and cellular heterogeneity. Spatial context was incorporated using the Jupyter Widget bqplot, which visualized single-cell locations through Voronoi plots. The heatmap generated by Clustergrammer2 was linked to the spatial map via a dashboard built with voila, converting Jupyter notebooks into interactive web-based dashboards. This linkage enabled users to interact dynamically with the heatmap and highlight corresponding cells in the spatially resolved map, facilitating the exploration of relationships between cellular phenotypes and their spatial distribution.


This case study underscores the value of linked views for analyzing spatially resolved, high-dimensional single-cell data. The integration of clustering and visualization tools allows researchers to uncover meaningful biological patterns and spatial relationships. The dashboard, hosted on MyBinder, offers a replicable and accessible platform for data exploration, showcasing the potential of interactive visualization tools in advancing spatial multi-omics research.
Clustergrammer2 was utilized to hierarchically cluster the cells based on their marker profiles, revealing patterns of co-expression and cellular heterogeneity. To integrate spatial context, the Jupyter Widget bqplot was employed to visualize single-cell locations using Voronoi plots. The heatmap generated by Clustergrammer2 was linked to the spatial map via a dashboard built using voila, which converts Jupyter notebooks into web-based interactive dashboards. This integration allowed users to interact with the heatmap and highlight corresponding cells in the spatially resolved map, enabling a dynamic exploration of the relationship between cellular phenotypes and their spatial arrangements.

[[File:Codex gif short.gif|thumb|313x313px]]
=== 3)scRNA-seq Gene Expression 2,700 PBMC ===
This case study emphasizes the importance of linked views in the analysis of spatially resolved, high-dimensional single-cell data. By combining powerful clustering and visualization tools, researchers can uncover biologically significant patterns and spatial relationships. The dashboard, accessible via MyBinder, provides a replicable platform for advanced data exploration and showcases the utility of interactive visualization in spatial multi-omics research
This case study examines the application of single-cell RNA sequencing (scRNA-seq) to analyze gene expression across thousands of individual cells, offering insights into cellular heterogeneity. The dataset, consisting of 2,700 peripheral blood mononuclear cells (PBMCs) obtained from 10X Genomics, includes thousands of gene expression measurements per cell, facilitating high-dimensional analysis.

Clustergrammer2 was utilized to explore the dataset interactively. Bulk gene expression signatures from CIBERSORT were used to assign tentative cell type labels to each cell. This approach enabled the clustering of cells based on gene expression profiles and the identification of patterns of co-expression among genes, providing insights into the diversity and functionality of immune cell populations within the PBMC dataset.

The study highlights the value of Clustergrammer2's dynamic visualization capabilities in combination with scRNA-seq data for uncovering biologically relevant patterns and relationships. The data and analysis workflow are accessible on GitHub through clustergrammer2-notebooks, allowing researchers to replicate and expand upon the analysis, underscoring its utility as a resource for studying immune system dynamics.


= References =<!-- Inline citations added to your article will automatically display here. See en.wikipedia.org/wiki/WP:REFB for instructions on how to add citations. -->
= References =<!-- Inline citations added to your article will automatically display here. See en.wikipedia.org/wiki/WP:REFB for instructions on how to add citations. -->

Revision as of 17:35, 19 November 2024

Clustergrammer

Clustergrammer is a web-based interactive tool designed for visualizing and analyzing high-dimensional data through heatmaps. It was developed by the Ma'ayan Laboratory at the Icahn School of Medicine at Mount Sinai. The tool addresses the limitations of static heatmaps by integrating interactive features, facilitating the analysis of complex biological datasets, including genomics and proteomics

Introduction

File:CLUSTERGRAMMERINTRO.png

Clustergrammer is a visualization tool specifically designed for high-dimensional data commonly encountered in computational biology and data science [1]. Unlike traditional static heatmaps, it enables users to explore data interactively by zooming, panning, clustering, and reordering rows and columns. The tool is applicable across various domains, including gene expression analysis, protein interaction networks, and single-cell data visualization. By leveraging web-based technologies, Clustergrammer provides accessible and shareable visualizations that simplify the interpretation of complex datasets [2].

Features

Interactive Heatmaps

Clustergrammer enables users to create interactive heatmaps that allow for dynamic exploration of data. Features [3]include

The interactive heatmap displayed was generated using Clustergrammer to visualize gene expression data from the Cancer Cell Line Encyclopedia (CCLE)
  • Zooming and Panning: Users can navigate large datasets efficiently
  • Filtering and Reordering: Rows and columns can be reordered by hierarchical clustering, sum, variance, or labels.
  • Search and Highlighting: Specific rows or columns can be located quickly using search functions. (upload a gif we just downloaded)

Interactive Dimensionality Reduction

File:DIMENSIONALITY REDUCTION.png
The interactive heatmap using Clustergrammer when PCA applied to the CCLE.

Dimensionality reduction techniques like Principal Component Analysis (PCA) and t-Distributed Stochastic Neighbor Embedding (t-SNE) simplify high-dimensional data for visualization. Clustergrammer enhances this process by allowing users to filter rows based on sum or variance, focusing on the most informative data points. This interactive filtering helps identify how specific dimensions affect clustering patterns. For smaller datasets, it uses animations to show the impact of these changes, aiding in data interpretation.

Clustering Algorithms

The interactive heatmap using Clustergrammer when clustering applied to the CCLE.

Clustergrammer employs hierarchical clustering algorithms, with support for additional methods such as K-means clustering. Users can visualize dendrograms, toggle between clustering levels, and extract enriched clusters.

Interactive Dendrograms: Clustergrammer employs interactive dendrograms to represent hierarchical clustering of data rows and columns. Instead of displaying the entire tree, it shows one slice at a time using gray trapezoids. Users can adjust the dendrogram slider to explore different clustering levels, revealing larger or smaller clusters. Interacting with these trapezoids highlights specific clusters, provides detailed information, and allows exporting of row or column names. For gene-level data, users can send clustered genes to Enrichr for enrichment analysis, facilitating deeper biological insights.

Customization Options

The tool provides various customization features:

  • Users can adjust the opacity, highlight categories, and crop data subsets for detailed exploration.
  • Integrations with external APIs, such as Enrichr, allow for enrichment analysis directly within the visualization.

Applications

1. High-Dimensional Data Visualization

Clustergrammer is a powerful tool for analyzing large and complex datasets by creating interactive heatmaps. These visualizations enable researchers to examine high-dimensional data intuitively, even when datasets contain thousands of rows and columns. This makes it particularly useful for summarizing, filtering, and interpreting large-scale experiments or studies.

2. Gene Expression Analysis

Widely used in genomics, Clustergrammer aids in analyzing gene expression data, including single-cell RNA sequencing (scRNA-seq) [4]>. By visualizing relationships among genes or samples, the tool helps researchers identify meaningful patterns, clusters, and correlations, offering insights into underlying biological processes or gene functions.

3. Biological Network Visualization

The tool is applied to represent biological networks such as protein-protein interactions, metabolic pathways, or gene regulatory networks. Clustergrammer’s clustering capabilities help pinpoint highly interconnected nodes or significant components, which are often critical in understanding the system's overall function or discovering key biomarkers.

4. Hierarchical Clustering

Clustergrammer supports hierarchical clustering, a method for organizing data into groups based on similarity. This is essential for categorizing features like genes, conditions, or samples into clusters, revealing relationships and structures within the data. Such clustering is especially valuable in understanding biological datasets, where interconnectedness is common.

5. Single-Cell Data Analysis

In single-cell studies, Clustergrammer is instrumental in exploring datasets derived from technologies like 10X Genomics. It allows researchers to classify cells based on gene expression signatures, visualize population structures, and assess how cells relate to one another, helping to uncover novel cell types or states.

6. Comparative Data Analysis

Clustergrammer facilitates the comparison of multiple datasets or experimental conditions. By visualizing and contrasting data in heatmaps, researchers can quickly identify similarities or differences between groups, aiding in hypothesis generation or validation.

Technical details

Architecture

Clustergrammer operates on a modular architecture comprising:

  • Backend: Built using Python, with key libraries such as NumPy and SciPy for data processing.
  • Frontend: Employs JavaScript and D3.js for rendering interactive visualizations.
  • Integration: The tool supports integration with Jupyter Notebooks and REST APIs, enabling seamless workflow incorporation.

Core Libraries are Clustergrammer-PY and Clustergrammer-JS.

Clustergrammer2

Clustergrammer2 is a specialized Jupyter widget that enables interactive visualization of high-dimensional biological data. Developed using widget-ts-cookiecutter[5]> and regl WebGL library [6]>, it focuses on analyzing single-cell datasets, particularly RNA sequencing data. The tool supports exploration of large-scale data, like the analysis of gene expression patterns across thousands of cells [7]. For example, researchers have used it to examine 2,700 PBMCs and identify cell types based on gene expression signatures.

Clustergrammer-JS

Clustergrammer-JS is a JavaScript visualization library that generates interactive heatmaps in web browsers. Built on D3.js and SVG technology, it renders complex data in an explorable format with features like:

  • Data filtering options (Data filtering capabilities encompass three main categories: value-based, categorical, and interactive filtering. Value filters allow threshold-based row/column manipulation, handling of numerical criteria, and removal of sparse data points. Category-based filtering enables grouping by metadata, visibility toggling of specific groups, and filtering based on clustering outcomes. Interactive selections provide manual row/column control, subset data visualization, and dynamic content reordering, allowing users to explore and analyze complex datasets efficiently through both preprocessing and real-time filtering options.)
  • Customizable information displays on hover
  • Seamless web application integration

The library works with JSON data produced by Clustergrammer-PY and provides developers the tools to embed dynamic visualizations in their web projects. Its source code and installation details are available on [8]

Clustergrammer-PY

Clustergrammer-JS is a Python package that enables users to create dynamic heatmap visualizations through automated data analysis. The tool processes input data to generate JSON files that power interactive web-based displays via Clustergrammer-JS.

Key features include:

  • Data preprocessing capabilities like hierarchical clustering and multiple normalization options
  • Support for both file-based and DataFrame inputs
  • Integration with major scientific Python libraries (The library demonstrates broad compatibility through integration with essential scientific Python packages, including NumPy for matrix operations, Pandas for DataFrame processing, SciPy for statistical analysis, and scikit-learn for machine learning capabilities.)
  • Cross-version compatibility (Its cross-version support ensures functionality across both Python 2.7 and Python 3.x versions, maintaining backward compatibility through consistent function implementations and careful management of package dependencies.)

The package handles data transformation and prepares structured JSON output suitable for visualization. Users can access it through the source code repository [9].

Implementation Guide

Clustergrammer is accessible through multiple platforms, including its web-based interface, Python API, and Jupyter Notebook integration. Below is a step-by-step guide to implementing Clustergrammer in various scenarios:

1. Using the Web Interface

The easiest way to use Clustergrammer is through its web interface:

  1. Visit the Clustergrammer Web Tool.[10]
  2. Upload a CSV or TSV file containing your high-dimensional data.
  3. Use the interactive heatmap to explore, filter, and cluster your data dynamically

2. Python API: Clustergrammer-PY

The Python API provides advanced users with full control over preprocessing and visualization. Follow these steps to use the API:

Step 1: Installation

Install the Clustergrammer-PY library using pip:

pip install clustergrammer-py
Step 2: Import the Library

Start by importing the Clustergrammer-PY module:

from clustergrammer import Network
Step 3: Load and Preprocess Data

Initialize the Network object and load the data:

net = Network()
net.load_df(data)
Step 4: Apply Clustering

Use the built-in clustering algorithms:

net.cluster()
Step 5: Save and Visualize Results

Save the clustered data as a JSON file for visualization:

net.write_json_to_file('viz', 'clustergrammer_output.json')

3. Jupyter Notebook Integration

To Visualize Clustergrammer heatmaps directly within Jupyter Notebooks, use the Clustergrammer2 widget

1.Install the clustergrammer2 package

pip install clustergrammer2

2.Import and use the widget in a Jupyter Notebook:

import clustergrammer2
from clustergrammer2 import CGM

# Initialize the Clustergrammer2 object
cgm = CGM()

# Load data into the widget
cgm.load_data(data)

# Display the interactive heatmap
cgm.widget()

This integration allows for seamless interaction with heatmaps during data exploration.

4. Integration with REST APIs

Clustergrammer supports REST API endpoints for automation:

  1. Prepare a JSON-formatted data file as described in the Clustergrammer documentation.
  2. Use tools like curl or Python’s requests library to send POST requests to the API:
import requests

# Define API endpoint and data payload
url = "https://clustergrammer_api_url"
payload = {"data": data.to_json()}

# Send POST request
response = requests.post(url, json=payload)

# Retrieve clustered data
clustered_data = response.json()

Case studies

1) Visium Spatial Transcriptomics Data from 10X Genomics

This case study examines a tool for analyzing high-dimensional spatial transcriptomics data using Clustergrammer2, bqplot, and voila, focusing on the V1_Mouse_Brain_Sagittal_Anterior Visium dataset from 10x Genomics. It integrates spatial tissue data with high-dimensional gene expression analysis, offering researchers an approach to studying the mouse brain's cellular and molecular organization. By associating spatial patterns with gene expression variability, this case study is relevant for research in neuroscience and genomics.

The study combines spatial and high-dimensional data through interactive panels. The left panel displays spatial tissue data and includes a UMAP-based clustering view to organize spots by gene expression similarity. The right panel shows the top 250 variable genes across ~2,500 spots, excluding ribosomal and mitochondrial genes. Hierarchical clustering, supported by Clustergrammer2, enables the identification of co-expressed genes, visualization of tissue-specific expression, and interaction with heatmaps to explore relationships between genes, cell clusters, and spatial locations.

For enhanced interpretation, this case study incorporates single-cell RNA-seq data (~14,000 cortical cells) from the Allen Institute as a reference, facilitating cell type annotation of the Visium data. This integration aligns spatial gene expression patterns with expected cell type distributions, aiding the identification of functional cell populations and regulatory networks. By integrating spatial and high-dimensional data, this case study highlights the utility of interactive visualization tools in biological and medical research.

2) CODEX Single Cell Multiplexed Imaging Dashboard

This case study examines the application of Clustergrammer2 and CODEX, a highly multiplexed cytometric approach developed by Goltsev et al., to analyze spatially resolved single-cell data from mouse spleens. The dataset includes ~5,000 single cells derived from a segmented spleen image, where ~30 surface markers were measured. This combination of spatial resolution and high-dimensional data allows for a detailed examination of the cellular composition and organization within the spleen.

Clustergrammer2 was used to hierarchically cluster cells based on their marker profiles, identifying patterns of co-expression and cellular heterogeneity. Spatial context was incorporated using the Jupyter Widget bqplot, which visualized single-cell locations through Voronoi plots. The heatmap generated by Clustergrammer2 was linked to the spatial map via a dashboard built with voila, converting Jupyter notebooks into interactive web-based dashboards. This linkage enabled users to interact dynamically with the heatmap and highlight corresponding cells in the spatially resolved map, facilitating the exploration of relationships between cellular phenotypes and their spatial distribution.

This case study underscores the value of linked views for analyzing spatially resolved, high-dimensional single-cell data. The integration of clustering and visualization tools allows researchers to uncover meaningful biological patterns and spatial relationships. The dashboard, hosted on MyBinder, offers a replicable and accessible platform for data exploration, showcasing the potential of interactive visualization tools in advancing spatial multi-omics research.

3)scRNA-seq Gene Expression 2,700 PBMC

This case study examines the application of single-cell RNA sequencing (scRNA-seq) to analyze gene expression across thousands of individual cells, offering insights into cellular heterogeneity. The dataset, consisting of 2,700 peripheral blood mononuclear cells (PBMCs) obtained from 10X Genomics, includes thousands of gene expression measurements per cell, facilitating high-dimensional analysis.

Clustergrammer2 was utilized to explore the dataset interactively. Bulk gene expression signatures from CIBERSORT were used to assign tentative cell type labels to each cell. This approach enabled the clustering of cells based on gene expression profiles and the identification of patterns of co-expression among genes, providing insights into the diversity and functionality of immune cell populations within the PBMC dataset.

The study highlights the value of Clustergrammer2's dynamic visualization capabilities in combination with scRNA-seq data for uncovering biologically relevant patterns and relationships. The data and analysis workflow are accessible on GitHub through clustergrammer2-notebooks, allowing researchers to replicate and expand upon the analysis, underscoring its utility as a resource for studying immune system dynamics.

References

  1. ^ Clustergrammer documentation: https://clustergrammer.readthedocs.io/
  2. ^ Fernandez, Nicolas F.; Gundersen, Gregory W.; Rahman, Adeeb; Grimes, Mark L.; Rikova, Klarisa; Hornbeck, Peter; Ma’ayan, Avi (2017). "Clustergrammer, a web-based heatmap visualization and analysis tool for high-dimensional biological data". Scientific Reports. 7. doi:10.1038/s41598-017-01819-3.
  3. ^ "Clustergrammer Documentation". Read the Docs. Retrieved 2024-11-19.
  4. ^ "single cell RNA".
  5. ^ "widget-ts-cookiecutter".
  6. ^ "regl".
  7. ^ "Clustergrammer2 GitHub Repository". GitHub. Icahn School of Medicine at Mount Sinai. Retrieved 2024-11-19.
  8. ^ "Clustergrammer-JS GitHub Repository". GitHub. Retrieved 2024-11-19.
  9. ^ "Clustergrammer-PY GitHub Repository". GitHub. MaayanLab. Retrieved 2024-11-19.
  10. ^ "ClusterGrammer Webtool".{{cite web}}: CS1 maint: url-status (link)