Disease-gene associations mined from literature

The DISEASES resource is available for download:

Text mining channel: full filtered
Knowledge channel: full filtered
Experiments channel: full filtered
Integrated channel (experimental): full

The files contain all links in the DISEASES database. All files start with the following four columns: gene identifier, gene name, disease identifier, and disease name. The knowledge files further contain the source database, the evidence type, and the confidence score. The experiments files instead contain the source database, the source score, and the confidence score. Finally, the textmining files contain the z-score, the confidence score, and a URL to a viewer of the underlying abstracts.

The full files contain all links in the DISEASES database. The filtered files contain only the non-redundant associations that are shown within the web interface when querying for a gene.

We also provide the original benchmark file used in the DISEASES publication (using STRING v9.1 identifiers).

Download files from earlier versions are archived on figshare.

DISEASES tagger and the latest dictionary of human gene and disease names can also be downloaded for local installation on Unix platforms. We also make available a list of PubMed IDs for excluded publications from research papermills.

Creative Commons License