The Phenotype day is an initiative developed jointly with the Bio-Ontologies Special Interest Group.

The systematic description of phenotype variation has gained increasing importance since the discovery of the causal relationship between a genotype placed in a certain environment and a phenotype. It plays not only a role when accessing and mining medical records but also for the analysis of model organism data, genome sequence analysis and translation of knowledge across species. Accurate phenotyping has the potential to be the bridge between studies that aim to advance the science of medicine (such as a better understanding of the genomic basis of diseases), and studies that aim to advance the practice of medicine (such as phase IV surveillance of approved drugs).

Various research activities that attempt to understand the underlying domain knowledge exist, but they are rather restrictively applied and not very well synchronized. In this Phenotype Day we propose to trigger a comprehensive and coherent approach to studying (and ultimately facilitating) the process of knowledge acquisition and support for Deep Phenotyping by bringing together researchers and practitioners that include but are not limited to the following fields:

  • biology as well as computational biology
  • genomics, clinical genetics, pharmacogenomics, healthcare
  • text/data mining and knowledge discovery
  • knowledge representation and ontology engineering



Example topics include but are not limited to:

  • Representation of phenotypes
    • Controlled vocabularies
    • Ontologies (pre- and post-composed)
    • Data standards
  • Acquisition of phenotype descriptions
    • NLP annotation tools and pipelines
    • Tools and methods to support data curation for phenotypes
    • Integration of textual data and controlled vocabularies/ontologies
    • Phenotype discovery
    • Collaborative development and peer-review
    • Guidelines for phenotype data curation
    • Quality control and evaluation
  • Application of phenotypes to real world problems
    • Methods for phenotype alignment and interoperability
    • Drug repurposing / development
    • Genotype-environment/phenotype-genotype/phenotype-disorder relation discovery
    • Personalised medicine

Accepted papers


Draft program


(This is a tentative schedule and it may still suffer changes.)

Draft program for download: HERE

Invited speakers


Dr Wendy W Chapman

University of Utah, US

Title: What do you mean when you say you want to find patients with a cough? Knowledge representation to support phenotyping from text

Abstract: Leveraging clinical narratives to classify patients based on phenotype requires layers of annotations. Representation of the knowledge described in the reports is critical to accurate extraction of that information. In this talk, Dr. Chapman will describe application ontologies her lab has developed for modeling annotations of information described in clinical reports. She will illustrate the usefulness of the formalism with several use cases and describe a vision of how the ontologies can potentially support collaborative knowledge authoring and NLP customization.

Bio: Dr. Chapman earned her Bachelor's degree in Linguistics and her PhD in Medical Informatics from the University of Utah in 2000. From 2000-2010 she was a National Library of Medicine postdoctoral fellow and then a faculty member at the University of Pittsburgh. She joined the Division of Biomedical Informatics at the University of California, San Diego in 2010. In 2013, Dr. Chapman became the chair of the University of Utah, Department of Biomedical Informatics.
Dr. Chapman’s research focuses on developing and disseminating resources for modeling and understanding information described in narrative clinical reports. She is interested not only in better algorithms for extracting information out of clinical text through natural language processing (NLP) but also in generating resources for improving the NLP development process (such as shareable annotations and open source toolkits) and in developing user applications to help non-NLP experts apply NLP in informatics-based tasks like clinical research and decision support.

A/Prof Melissa Haendel

Oregon Health and Science University, US

Title: Deep phenotyping for the everyone

Abstract: The Human Phenotype Ontology (HPO) was developed to describe phenotypic abnormalities, aka, “deep phenotyping”, whereby symptoms and characteristic phenotypic findings (a phenotypic profile) are captured. The HPO has been utilized to great success for assisting computational phenotype comparison against known diseases, other patients, and model organisms to support diagnosis of rare disease patients. Clinicians and geneticists create phenotypic profiles based on clinical evaluation, but this is time consuming and can miss important phenotypic features. Patients are sometimes the best source of information about their symptoms that might otherwise be missed in a clinical encounter. However, HPO primarily use medical terminology, which can be difficult for patients and their families to understand. To make the HPO accessible to patients, we systematically added non-expert terminology (i.e., layperson terms) synonyms. Using semantic similarity, patient-recorded phenotypic profiles can be evaluated against those created clinically for undiagnosed patients to determine the improvement gained from the patient-driven phenotyping, as well as how much the patient phenotyping narrows the diagnosis. This patient-centric HPO can be utilized by all: in patient-centered rare disease websites, in patient community platforms and registries, or even to post one’s hard-to-diagnosed phenotypic profile on the Web.

Bio: Dr. Haendel is an Associate Professor in the Library and the Department of Medical Informatics & Clinical Epidemiology at the Oregon Health & Science University (OHSU), where she directs the Ontology Development Group. She is the principal investigator of the Monarch Initiative and is an active researcher in ontologies and data standards. Melissa is known for her work on biomedical resource discovery, open science and reproducibility, and for her work on anatomy, cell, and phenotype ontologies such as Uberon and the Human Phenotype Ontology. She holds a Ph.D. Neuroscience from the University of Wisconsin and completed postdoctoral training at the University of Oregon and Oregon State University.

Dr Zhiyong Lu

National Center for Biotechnology Information, US

Title: Large-scale Text Mining Genotype-Phenotype Associations for Precision Medicine

Abstract: The explosion of biomedical big data and information in the past decade or so has created new opportunities for discoveries to improve the treatment and prevention of human diseases. But the large body of knowledge—mostly exists as free text in journal articles for humans to read—presents a grand new challenge: individual scientists around the world are increasingly finding themselves overwhelmed by the sheer volume of research literature and are struggling to keep up to date and to make sense of this wealth of textual information. Our research aims to break down this barrier and to empower scientists towards accelerated knowledge discovery. We will discuss our work on developing large-scale text-mining tools as well as their uses in real-world applications such as extracting genotype-phenotype associations from PubMed articles for precision medicine and computer-assisted database curation.

Bio: Dr. Lu is Earl Stadtman investigator at NCBI, part of the National Library of Medicine/National Institutes of Health, where he directs the text mining research and oversees the development and operation of PubMed search to enhance information access to the biomedical literature. Dr. Lu is an Associate Editor for BMC Bioinformatics and serves on the editorial board for the Journal Database. He is an organizer of the international BioCreative challenge and has authored over 120 publications.



Nigel Collier

Nigel Collier, University of Cambridge and the European Bioinformatics Institute (EMBL-EBI). Nigel is Principal Research Associate at the University of Cambridge and Visiting Scientist at the European Bioinformatics Institute. He has been active in many projects related to natural language processing for biomedical knowledge acquisition and data integration. He developed the BioCaster system for early alerting of infectious diseases from Web and social media data which has been widely used by international human and animal health agencies. In 2012 he was awarded an EC Marie Curie Fellowship to conduct research into the acquisition and linking of phenotypes in scientific and clinical texts and in 2014 he was awarded an EPSRC fellowship to conduction research into the Semantic Interpretation of Personal Health messages (SIPHS).

Anika Oellrich

Anika Oellrich, Senior Post-Doctoral Researcher at King's College London, UK. Anika conducted PhD studies in Bioinformatics at the University of Cambridge under supervision at the European Bioinformatics Institute, Rebholz group. Previously, she worked as a Senior Bioinformatician in the Mouse Genome Informatics group at the Wellcome Trust Sanger Institute, Hinxton, together with Damian Smedley. Her research work focuses on aspects of phenotype mining, in large data sets as well as scientific literature. Having investigated the different representations of phenotypes, she applies this knowledge to data integration and mental illnesses with the aim of improving the understanding about the pathology of these diseases, and potentially improve existing or discover novel treatments.

Tudor Groza

Tudor Groza is Phenomics Team Leader in the Kinghorn Centre for Clinical Genomics, at the Garvan Institute of Medical Research, Sydney Australia. Previously, he was a Research Fellow in the e-Research Group of the School of ITEE, at The University of Queensland. Tudor received his PhD in Computer Science from the Digital Enterprise Research Institute (DERI) Galway, National University of Ireland, Galway in 2010. In 2012 he has been awarded an ARC Discovery Early Career Researcher Award to investigate novel ways of extracting, consolidating and linking scientific artefacts present in biomedical publications, with a focus on evidence-based medicine. His current research covers the entire phenotype analytics stack, from representation to acquisition (from publications or clinical reports) and from cross-species integration to decision making (including disorder prediction, patient matchmaking or variant prioritisation).

Karin Verspoor

Karin Verspoor is Associate Professor in the Department of Computing and Information Systems at the University of Melbourne. She was formerly the Scientific Director of Health and Life Sciences at NICTA Victoria Research Laboratory, Principal Researcher and leader of the NICTA Biomedical Informatics team. Her research addresses the development of knowledge-based methods to support biological discovery and clinical decision making, with recent work in protein function prediction and genetic variant interpretation, in addition to projects investigating the role of structured vocabularies for information retrieval in the clinical context. Karin has also been active in efforts to develop text annotation standards, both in terms of software architectures and data representations, to facilitate interoperability and reuse of tools and resources.

Nigam H. Shah

Dr. Nigam Shah is associate professor of Medicine (Biomedical Informatics) at Stanford University, Assistant Director of the Center for Biomedical Informatics Research, and a core member of the Biomedical Informatics Graduate Program. Dr. Shah's research focuses on combining machine learning and prior knowledge in medical ontologies to enable use cases of the learning health system. Dr. Shah received the AMIA New Investigator Award for 2013 and the Stanford Biosciences Faculty Teaching Award for outstanding teaching in his graduate class on “Data driven medicine” (Biomedin 215). Dr. Shah was elected into the American College of Medical Informatics (ACMI) in 2015. He holds an MBBS from Baroda Medical College, India, a PhD from Penn State University and completed postdoctoral training at Stanford University. More at:

Program Committee

  • Olivier Bodenreider, U.S. National Library of Medicine, US
  • Hong-Jie Dai, Taipei Medical University, Taiwan
  • Melissa Haendel, Oregon Health & Science University, US
  • Simon Jupp, European Bioinformatics Institute, UK
  • Jung-Jae Kim, Nanyang Technological University, Singapore
  • Jin-Dong Kim, Database Center for Life Science, Japan
  • Martin Krallinger, Spanish National Cancer Research Centre, Spain
  • Sebastian Koehler, Charite Medical University Berlin, Germany
  • Hilmar Lapp, Duke University, US
  • Suzanna E. Lewis, Lawrence Berkeley National Lab, US
  • Zhiyong Lu, National Institutes of Health, US
  • Chris Mungall, Berkeley Lab, US
  • Dietrich Rebholz-Schuhmann, National University of Ireland, Galway, Ireland
  • Peter N. Robinson, Charite Medical University Berlin, Germany
  • Paul N. Schofield, University of Cambridge, UK
  • Lynn Schriml, University of Maryland, US
  • Hagit Shatkay, University of Delaware, US
  • Damian Smedley, Queen Mary University London, UK
  • Nicole Washington, Helix, San Francisco, CA, US
  • Antonio Jimeno-Yepes, IBM, Australia
  • Andreas Zankl, University of Sydney, Australia