Acquiring Crucial Medical information Using Language Technology

Start date: 2016-01-01
End date: 2019-12-31
This project is part of the SBO Program of the IWT (IWT-SBO-Nr. 150056).


ACCUMULATE will automatically recognize crucial information in the free text of clinical reports written in English and Dutch by designing, developing and evaluating advanced language technology for deep semantic processing of the texts which are often morpho-syntactically not well-formed. An extra focus is on easy portability of the technology across domains and languages.

Work Packages

The research goals of ACCUMULATE will be realized in 7 work packages (WP) described below. WP1 relates to the management and dissemination of project results. The research in WP2, WP3 and WP4 together accomplishes a common semantic representation layer of the content recognized in the texts. WP5 focuses on the visualization aspects of this project, including both methodologies for introducing visual analytics in the data mining phase, and methods for representing the results of the analyses to the end user. The results of the language technology and visualization are demonstrated in several use cases that have relevance both for the pharmaceutical industry and the care organizations (WP6). WP7 prepares the valorization of the results by the industry.

WP1: Managementork Packages

Management of the internal affairs of the project; organization of dissemination events (publication of scientific papers is part of the technical WPs).

WP2:Terminology extraction, alignment, named entity recognition and coding

This WP develops the necessary preparatory steps for successful medical event and relation extraction. The main focus of this work package is on extraction, normalization and semantic classification of single and multi-term units in English and Dutch patient records. The scientific challenge is mastering the domain-specific (sometimes physician-specific) terminology and map it to terminological standards.

WP3: Medical event extraction

This WP focuses on the recognition of medical events and their attributes including the processing of extra-propositional aspects of meaning such as negation, modality and quantification.

WP4: Extraction of temporal, spatial and causal information

This WP deals with the recognition of: (1) temporal and causal relationships between events; (2) spatial relationships between entities; and (3) attributes of these relationships. This recognition is considered as a more advanced form of knowledge extraction from text. First, we have to deal with a limited number of training examples that are manually annotated, with this annotation being a costly process. Second, it is essential to make the models scalable and suitable for fast processing. An additional aim for this WP is to integrate knowledge obtained from external sources to improve the quality of recognitions and to recognize content, which is implicit in the texts of the clinical reports.

WP5: Textual data visualization

This work package focuses on the visualization aspects of the machine reading. The specific goals of this work include: (1) To develop novel exploratory data visualization methodologies and applications for raw machine reading data, allowing the investigation of parameter space of these algorithms on well-structured texts (2) To enhance these methods for handling unstructured text including spelling and syntactic variants (3) To develop novel approaches for finding and representing local structures in patient record data (4) To develop new methods for incorporating textual information in visualizations of electronic health and patient records.

WP6: Proof-of-concept applications and demonstrators

To demonstrate the language and visualization technologies developed in WP2, WP3, WP4 and WP5, to apply them on clinical reports of UZA and UZ Leuven and to evaluate their use in a realistic hospital setting with professional users from the healthcare organizations and the pharmaceutical industry.

WP7: Valorization and business models

The aim is to define the valorization potential of the ACCUMULATE technologies and design corresponding business models taking into account privacy constraints and estimations of the economic value of the technologies.