TY - JOUR
T1 - Curation of the genome annotation of Pichia pastoris (Komagataella phaffii) CBS7435 from gene level to protein function
AU - Valli, Minoska
AU - Tatto, Nadine E
AU - Peymann, Armin
AU - Gruber, Clemens
AU - Landes, Nils
AU - Ekker, Heinz
AU - Thallinger, Gerhard G
AU - Mattanovich, Diethard
AU - Gasser, Brigitte
AU - Graf, Alexandra B
N1 - © FEMS 2016. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
PY - 2016/9
Y1 - 2016/9
N2 - As manually curated and non-automated BLAST analysis of the published Pichia pastoris genome sequences revealed many differences between the gene annotations of the strains GS115 and CBS7435, RNA-Seq analysis, supported by proteomics, was performed to improve the genome annotation. Detailed analysis of sequence alignment and protein domain predictions were made to extend the functional genome annotation to all P. pastoris sequences. This allowed the identification of 492 new ORFs, 4916 hypothetical UTRs and the correction of 341 incorrect ORF predictions, which were mainly due to the presence of upstream ATG or erroneous intron predictions. Moreover, 175 previously erroneously annotated ORFs need to be removed from the annotation. In total, we have annotated 5325 ORFs. Regarding the functionality of those genes, we improved all gene and protein descriptions. Thereby, the percentage of ORFs with functional annotation was increased from 48% to 73%. Furthermore, we defined functional groups, covering 25 biological cellular processes of interest, by grouping all genes that are part of the defined process. All data are presented in the newly launched genome browser and database available at www.pichiagenome.org In summary, we present a wide spectrum of curation of the P. pastoris genome annotation from gene level to protein function.
AB - As manually curated and non-automated BLAST analysis of the published Pichia pastoris genome sequences revealed many differences between the gene annotations of the strains GS115 and CBS7435, RNA-Seq analysis, supported by proteomics, was performed to improve the genome annotation. Detailed analysis of sequence alignment and protein domain predictions were made to extend the functional genome annotation to all P. pastoris sequences. This allowed the identification of 492 new ORFs, 4916 hypothetical UTRs and the correction of 341 incorrect ORF predictions, which were mainly due to the presence of upstream ATG or erroneous intron predictions. Moreover, 175 previously erroneously annotated ORFs need to be removed from the annotation. In total, we have annotated 5325 ORFs. Regarding the functionality of those genes, we improved all gene and protein descriptions. Thereby, the percentage of ORFs with functional annotation was increased from 48% to 73%. Furthermore, we defined functional groups, covering 25 biological cellular processes of interest, by grouping all genes that are part of the defined process. All data are presented in the newly launched genome browser and database available at www.pichiagenome.org In summary, we present a wide spectrum of curation of the P. pastoris genome annotation from gene level to protein function.
KW - Journal Article
U2 - 10.1093/femsyr/fow051
DO - 10.1093/femsyr/fow051
M3 - Article
C2 - 27388471
SN - 1567-1356
VL - 16
JO - FEMS yeast research
JF - FEMS yeast research
IS - 6
M1 - fow051
ER -