The disease associations are derived via automatic text mining of the biomedical literature and have not been manually verified. The confidence of each association is signified by stars, where ★★★★☆ is the highest confidence and ★☆☆☆☆ is the lowest.
Each disease–gene association is based on a text-mining score, which is proportional to 1) the absolute number of comentionings and 2) the ratio of observed to expected comentionings (i.e. the enrichment). These scores are normalized to z-scores by comparing them to a random background. This is represented by stars, each star corresponding to two standard deviations above the mean of the background distribution.
Developed by Sune Frankild and Lars Juhl Jensen from the Novo Nordisk Foundation Center for Protein Research.
