Recent publications

Recent publications

A selection of recent publications are included here. My google scholar indexed publications can be found here.

Pseudogenes and intrinsic error in genomic data

Modern genomic annotation pipelines like PGAP and EGAP provide features annotated as pseudogenes. What these annotations realistically represent ends up being partially dependent upon the technologies used to generate the source sequence data and the strategies used to assemble the genomic data being annotated. Meaning pseudogenizations, from the perspective annotations can represent either a true recent mutation causing loss or modulation of function in a clearly identifiable feature, or a sequencing or assembly error that have been incorporated in the assembled genomic data.

Functional classification of protein coding sequences

Accurate classification of protein coding sequences is difficult. A functional classifier based on the previously existing taxonomic classifier IDTAXA was developed was an emphasis on specificity. Benchmarking classification strategies with misclassication and overclassification rates provide a clear picture of the strengths and weaknesses of different methods.