On splice site prediction using weight array models: A comparison of smoothing techniques

Leila Taher*, Peter Meinicke, Burkhard Morgenstern

*Korrespondierende/r Autor/-in für diese Arbeit

Publikation: Beitrag in einer FachzeitschriftArtikelBegutachtung

Abstract

In most eukaryotic genes, protein-coding exons are separated by non-coding introns which are removed from the primary transcript by a process called "splicing". The positions where introns are cut and exons are spliced together are called "splice sites". Thus, computational prediction of splice sites is crucial for gene finding in eukaryotes. Weight array models are a powerful probabilistic approach to splice site detection. Parameters for these models are usually derived from m-tuple frequencies in trusted training data and subsequently smoothed to avoid zero probabilities. In this study we compare three different ways of parameter estimation for m-tuple frequencies, namely (a) non-smoothed probability estimation, (b) standard pseudo counts and (c) a Gaussian smoothing procedure that we recently developed.

Originalspracheenglisch
Aufsatznummer012004
FachzeitschriftJournal of Physics: Conference Series
Jahrgang90
Ausgabenummer1
DOIs
PublikationsstatusVeröffentlicht - 1 Nov. 2007
Extern publiziertJa

ASJC Scopus subject areas

  • Allgemeine Physik und Astronomie

Fingerprint

Untersuchen Sie die Forschungsthemen von „On splice site prediction using weight array models: A comparison of smoothing techniques“. Zusammen bilden sie einen einzigartigen Fingerprint.

Dieses zitieren