Rapid advancements in technology have led to an exponential explosion of biomedical data, leading to new research opportunities and discoveries that have enriched the understanding of human health and improved patient care. However, as the wealth of data continues to accumulate, researchers must navigate the challenging process of accessing and analyzing big data, despite many of them lacking formal training in programming and database skills.
Aiming to reduce the barriers to data-driven research, Daniel Blankenberg, PhD, and colleagues developed in 2005 a web-based scientific analysis platform centered on three key goals: (1) making data analyses accessible to all researchers and tool developers, (2) ensuring all analyses are reproducible regardless of the particular platform and (3) allowing transparent communication of analyses to support their reuse and extension across all types of research. Known as the Galaxy Project, this platform has enabled thousands of scientists across the globe to analyze large biomedical datasets, including those found in genomics, proteomics, metabolomics and imaging.
Galaxy has experienced significant growth in recent years as Dr. Blankenberg and the Galaxy community remain engaged in expanding Galaxy’s framework and tools to meet user need. In a recent correspondence piece published in Nature Methods, they presented the Galaxy External Display Application (GEDA) framework, which facilitates the interoperability between user data in Galaxy and the growing number of independent web services that offer visualization and analysis capabilities within individual resources, such as genome browsers, analysis pipelines or locally running desktop applications.
These web services allow users to upload their own datasets from a computer or URL, but these methods have significant disadvantages. Uploading a dataset directly from a computer is a lengthy process as the dataset must be downloaded from one platform, saved on a computer and then uploaded to another platform – all of which can be hindered by connectivity and transfer speed issues. Furthermore, using a URL, while optimal for large datasets, typically requires the user’s ability to access and operate web-hosting services.
The GEDA framework bypasses these challenges by enabling Galaxy data to interact directly with external resources, with the condition that they accept URL parameter values. External resources that are valid for a particular dataset appear as labeled links in the expanded preview window on the Galaxy platform. A user must only click on a link to be sent to an external resource along with a URL for the dataset content, which the resource will use to load the data. As a result, users can discover and efficiently access external resources with minimal effort, which facilitates the integration of disparate datasets and tools that remains essential to modern scientific investigation and analysis.
Dr. Blankenberg is Assistant Staff in the Genomic Medicine Institute. Information about the vibrant communities that use, support and extend Galaxy can be accessed here.
Figure: Anatomy of a basic Galaxy External Display Application
Discover how you can help Cleveland Clinic save lives and continue to lead the transformation of healthcare.
Give to Cleveland Clinic