Abstract
Mass spectrometry is the major analytical tool for the identification and quantification of proteins
in biological samples. In so-called top-down proteomics, separation and mass spectrometric
analysis is performed at the level of intact proteins, without preparatory digestion steps. It
has been shown that the tandem mass tag (TMT) labeling technology, which is often used
for quantification based on digested proteins (bottom-up studies), can be applied in top-down
proteomics as well. This, however, leads to a complex interpretation problem, where we need to
annotate measured peaks with their respective generating protein, the number of charges, and
the a priori unknown number of TMT-groups attached to this protein. In this work, we give an
algorithm for the efficient enumeration of all valid annotations that fulfill available experimental
constraints. Applying the algorithm to real-world data, we show that the annotation problem can
indeed be efficiently solved. However, our experiments also demonstrate that reliable annotation
in complex mixtures requires at least partial sequence information and high mass accuracy and
resolution to go beyond the proof-of-concept stage.