Abstract
Abstract
NMR chemical shift prediction plays an important role in various applications in computational biology. Among others, structure determination, structure optimization, and the scoring of docking results can profit from efficient and accurate chemical shift estimation from a three-dimensional model. A variety of NMR chemical shift prediction approaches have been presented in the past, but nearly all of these rely on laborious manual data set preparation and the training itself is not automatized, making retraining the model, e.g., if new data is made available, or testing new models a time-consuming manual chore. In this work, we present a framework which enables automated data set generation as well as model training and evaluation of protein NMR chemical shift prediction. To this end, we extend the classical ansatz of hybrid protein chemical shift prediction by training a statistical model with semi-classical as well as various structural features. We present three main results: (a) the NightShift framework itself as well as (b) the resulting, automatically generated, data set and (c) a random forest model called Spinster that was built using the pipeline. We will demonstrate that the performance of the automatically generated random forest-based model exceeds former approaches for several atom types. The framework can be downloaded from https://bitbucket.org/akdehof/\\ballaxy-tools/ and requires version 1.4.1 of the open source Biochemical Algorithms Library (BALL). It is available under the conditions of the GNU Lesser General Public License (LGPL). We additionally offer a browser-based user interface to our NightShift instance employing the Galaxy framework via https://ballaxy.bioinf.uni-sb.de/.