Three healthcare professionals having a discussion; patient data concept

What Are the Primary Sources of Specialized Patient Data?

Patient data has always been central to medical research. Only by collecting data can any given test or experiment hope to generate usable insights.

Today, the world is awash in data. Yet data on some topics, such as rare diseases, remains challenging to find or use. Understanding the traditional, emerging and digital data sources available for specialized patient data is essential to better data analysis — and better research.

Traditional and Emerging Data Sources

Rare disease registries continue to play an essential role in providing researchers with specialized patient data.

In an article in Frontiers in Endocrinology, Stefan Kölker and fellow researchers identify three key characteristics that place rare disease registries at the forefront of patient data sources:

  • The ability to pool data. Rare diseases are “rare” at the individual level, but any given rare disease may affect millions of people worldwide. Rare disease registries overcome the challenge of geographic location by pooling patients’ data into a single source, regardless of patients’ physical locations.
  • The potential to achieve sufficient sample sizes. Finding a sufficiently large population of rare disease patients to achieve a meaningful sample size can be difficult due to geographic constraints. Rare disease registries remove this hurdle by aggregating data, making it more likely researchers can create a meaningful sample set.
  • The opportunity to foster research by overcoming knowledge gaps. While each individual patient or team of researchers may feel isolated, rare disease registries are not. These registries can fill in knowledge gaps, allowing research teams to focus on the work and patients in front of them.

Kölker et al. identify a number of uses for rare disease registries, such as tracking the natural history and phenotypes of diseases, improving information on signs and symptoms, and tracking the progress and therapeutic effects of treatment.

Rare disease registries play a key role in collecting and disseminating specialized patient data, but they are not the only source of such data. The University of Pittsburgh Health Sciences Library System, for example, identifies dozens of publicly available data sets that can be used for a wide range of research purposes. These data sets include data that has long been collected by various government agencies, as well as information from health-specific sources.

Using publicly available data sets, however, can pose challenges. Sorting through this data to find meaningful information may be a difficult or insurmountable task. While digitization of data can help reduce the burden, work in the digital realm remains onerous.

Two researchers working together; patient data concept

Incorporating Digital Data Sources

Digital data sources provide a number of opportunities for rare disease researchers seeking specialized patient data. Yet they also create challenges. These challenges include incompleteness, difficulties moving between quantitative and qualitative representations of data, and data interoperability — or the ability to move data between various machines and algorithms.

In a 2022 article in AJHG, Cong Liu and fellow researchers presented one potential solution to these hurdles in the form of open annotation for rare diseases (OARD). The researchers describe OARD as “a real-world-data-derived resource with annotation for rare-disease-related phenotypes.”

The researchers note that while diagnosis for rare diseases often depends on phenotype-driven methods, annotations that describe these phenotypes are typically curated manually — leaving them susceptible to incompleteness and errors. While electronic health records (EHRs) provide an opportunity to aggregate annotations for greater completeness, this opportunity remains largely unexplored. OARD seeks to use EHR data to create more accurate and complete annotations for various rare disease phenotypes.

“The unique advantage of OARD is that it is data-driven. We use a large amount of EHR data from a large, diverse patient population. This data-driven approach is more generalizable and scalable than expert-driven approach,” says Chunhua Weng, professor of biomedical informatics at Columbia University and one of the authors of the study.

Tools like OARD help address some of the top challenges of using digital data sources for specialized patient information. OARD seeks to make phenotype annotations more complete, to bridge the gap between qualitative and quantitative representations, and to encourage interoperability by incorporating information among various EHR datasets.

Other digital platforms offer opportunities to improve access to specialized patient data in similar ways. When data is collected from a number of sources and standardized within a single tool or platform, it becomes easier to use and to audit for errors or incompleteness. Researchers gain access to more complete information to drive research.

Data interoperability remains a major challenge for rare disease researchers. Digital tools that focus on interoperability can help break down the remaining barriers among specialized patient data sources, unlocking new opportunities to understand, diagnose and treat rare diseases.

Images by: f8studios/©, prathanchorruangsak/©