Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.

UniRef

The UniProt Reference Clusters (UniRef) provide clustered sets of sequences from the UniProt Knowledgebase (including isoforms) and selected UniParc records in order to obtain complete coverage of the sequence space at several resolutions while hiding redundant sequences (but not their descriptions) from view. Unlike in UniParc, sequence fragments are merged in UniRef: The UniRef100 database combines identical sequences and sub-fragments with 11 or more residues from any organism into a single UniRef entry, displaying the sequence of a representative protein, the accession numbers of all the merged entries and links to the corresponding UniProtKB and UniParc records. UniRef90 is built by clustering UniRef100 sequences with 11 or more residues using the CD-HIT algorithm (Li W. and Godzik A., Bioinformatics, 22: 1658-1659, 2006) such that each cluster is composed of sequences that have at least 90% sequence identity to and 80% overlap with the longest sequence (a.k.a. seed sequence) of the cluster. Similarly, UniRef50 is built by clustering UniRef90 seed sequences that have at least 50% sequence identity to and 80% overlap with the longest sequence in the cluster. Prior to 2013 there was no overlap threshold, so clusters were more heterogeneous in length. UniRef90 and UniRef50 yield a database size reduction of approximately 58% and 79%, respectively, providing for significantly faster sequence similarity searches. The seed sequences are the longest members of the cluster. However, the longest sequence is not always the most informative. There is often more biologically relevant information (name, function, cross-references) available on other cluster members. All the proteins in a cluster are therefore ranked as follows to facilitate the selection of a biologically relevant representative for the cluster:

  1. quality of the entry: manually reviewed entries (from the UniProtKB/Swiss-Prot section) are preferred
  2. annotation score: entries that have higher UniProtKB annotation scores are preferred. This also means that UniProtKB entries will always take precedence over entries that are in UniParc but not in UniProtKB (annotation score is undefined in UniParc, which does not contain any annotations).
  3. organism: entries from reference proteomes and model organisms are preferred
  4. length of the sequence: longest sequence is preferred

UniRef100

UniRef100 contains all UniProt Knowledgebase records plus selected UniParc records (see below). In UniRef100, all identical sequences and subfragments with 11 or more residues are placed into a single record. UniRef50 and UniRef90 are built based on UniRef100.

The UniRef100 identifier is generated by placing a “UniRef100_” prefix before the UniProtKB accession or UniParc identifier of the representative UniProtKB or UniParc entry, e.g. “UniRef100_P99999” or “UniRef100_UPI0000027233”.

In addition to UniProtKB records, UniRef100 also includes the UniParc entries that are not covered by UniProtKB and contains cross-references to the RefSeq and PDB databases.

UniRef90

UniRef90 is generated by clustering UniRef100 seed sequences.

The UniRef100 sequences shorter than 11 residues are excluded in UniRef90 clusters. Each UniRef90 cluster has one representative sequence from the UniRef100 database.

UniRef90 cluster titles and identifiers are derived from the representative UniRef100 entry. The UniRef90 identifier is generated by replacing the “UniRef100_” prefix of the representative with “UniRef90_”, e.g. “UniRef90_P99999”.

UniRef50

UniRef50 is generated by clustering UniRef90 seed sequences.

UniRef50 cluster titles and identifiers are derived from the representative UniRef90 entry. The UniRef50 identifier is generated by replacing the “UniRef100_” prefix of the representative with “UniRef50_”, e.g. “UniRef50_P99999”.

Further information

UniProt is an ELIXIR core data resource
Main funding by: National Institutes of Health

We'd like to inform you that we have updated our 新疆十一选五开奖结果 to comply with Europe’s new General Data Protection Regulation (GDPR) that applies since 25 May 2018.

Do not show this banner again
  • 合肥庐阳18小区将现“十八变” 4254户居民将受益 2019-05-19
  • 铁岭市公安局:破解实有人口管理难题 2019-05-19
  • 和亲戚一起学习传统文化 2019-05-18
  • 评论:抓实支部 责任上肩 2019-05-18
  • 这个邻国居然是全球看极光性价比最高的地方! 2019-05-17
  • 新歌声惊现切葱歌 谢霆锋不当厨师做导师 2019-05-17
  • 陕西国防工业职业技术学院百名大学生志愿者敬老院慰问孤寡老人陕西国防工业职业技术学院百名大学生志愿者敬老院慰问-陕西教育新闻 2019-05-16
  • [微笑]商品价格受供需关系影响围绕价值上下波动!房价为什么高?炒作改变了供需关系! 2019-05-16
  • 黑龙江:2017年6500名大学生入伍 2019-05-15
  • 【周展安】重新认识《在延安文艺座谈会上的讲话》的现实意义 2019-05-15
  • 国科大“科教融合” 科学家上讲台做导师 2019-05-14
  • 张继科再负张本智和获日本公开赛男单亚军 2019-05-13
  • 杭州约谈58同城等3家网上房源发布平台负责人 2019-05-12
  • 我们包住内力,在不断变化中寻找契机可出击可借力亦可卸力。 2019-05-11
  • 港青租房有压力?香港“青年宿舍计划”了解一下 2019-05-10
  • 26| 296| 126| 834| 714| 13| 675| 382| 666| 239|