Multiple sequence alignment with arbitrary gap costs: Computing an optimal solution using polyhedral combinatorics

Abstract

Multiple sequence alignment is one of the dominant problems in computational molecular biology. Numerous scoring functions and methods have been proposed, most of which result in NP-hard problems. In this paper we propose for the first time a general formulation for multiple alignment with arbitrary gap-costs based on an integer linear program (ILP). In addition we describe a branchand-cut algorithm to effectively solve the ILP to optimality. We evaluate the performances of our approach in terms of running time and quality of the alignments using the BAliBase database of reference alignments. The results show that our implementation ranks amongst the best programs developed so far. 

Citation

[02+ACL] Althaus, E., Caprara, A., Lenhof, H.-P., Reinert, K. Multiple sequence alignment with arbitrary gap costs: Computing an optimal solution using polyhedral combinatorics. Bioinformatics, Vol. 18: Suppl. 2, S4-S16, 2002.
Read Publication