
Peptide–protein interactions are highly abundant in living
cells and are important for many biological processes1.Itis
estimated that up to 40% of interactions in cells are medi-
ated by peptide–protein interactions, or peptide-like interaction:2
short segments, isolated or embedded within unstructured regions
that mediate binding to a partner3. In addition, peptides are often
used for biotechnological applications, drug delivery, imaging, as
therapeutic agents, and other applications4,5, by binding proteins
and mediating or blocking interactions.
Determining the 3-dimensional structure of these peptide–-
protein complexes is an important step for their further study.
They can provide the basis to identify hotspot residues that are
crucial for binding6–8, and by mutating these hotspots, the
functional importance of a given interaction can be uncovered9.
They could help to better understand disease-causing mutations
and also serve as a starting point for the design of strong and
stable peptidomimetics10,11.
However, peptide-mediated interactions pose significant chal-
lenges, both for their experimental as well as their computational
characterization: These interactions are in many cases weak,
transient, and considerably influenced by their context, resulting
in often noisy experiments. Widely used structure determination
methods (e.g., X-ray crystallography) are not applicable to many
of these interactions. Computational modeling, and particularly
blind peptide–protein docking12, is hindered by the lack of
known structure for the peptide side, in contrast to classical
domain-domain docking, where the structure of the free indivi-
dual domains is usually defined. In order to succeed in the study
and design of peptide–protein interactions, we must gain a better
understanding of the peptide conformational preferences.
One way to approach this challenge is based on the observa-
tion that a peptide bound conformation is often present in solved
monomer structures13. Based on this finding, we developed the
high-resolution blind peptide docking protocol, PIPER-
FlexPepDock (PFPD)13. First, a representative ensemble of
fragments is extracted from monomer structures using the
Rosetta Fragment Picker14, which takes into account both
sequence and (predicted) secondary structure similarity. Then
this ensemble is rigid-body docked onto the receptor with the
PIPER protocol15, followed by short local refinement by Rosetta
FlexPepDock16, which simultaneously optimizes internal peptide
and rigid-body degrees of freedom. Numerous other peptide
docking approaches have since been developed12,17,many
focusing on efficient low-resolution docking18,19, others lever-
aging information about protein interfaces to find matches for
similar interface patches20–22.
Another way to approach the global peptide docking challenge
is to view the binding of a peptide to its partner as the final step of
protein folding, complementing the receptor surface with a
missing piece23. Indeed, functional proteins can be reconstituted
experimentally from short fragments of the original sequence,
indicating that covalent linkage is not necessarily a prerequisite for
monomer folding24,25. We and others have successfully modeled
peptide–protein interactions using this principle, by finding frag-
ments in monomer structures and on protein-protein interfaces
that could complement structural patches derived from the surface
of a given receptor20–22,26. These concepts lay the groundwork for
novel approaches in peptide–protein docking, where the vast
information inherently stored in folded monomer structures is
efficiently integrated in the search space for peptide docking.
The advances in the field of protein structure prediction in
recent years open up exciting opportunities to fully leverage such
information. The development and application of deep learning
(DL) neural network (NN) architectures to predict monomeric
protein structures provided us with highly accurate computa-
tional models as particularly showcased by the last CASP14
experiment27. AlphaFold2 (AF2) developed by Google Deepmind
was able to generate models of exceptional accuracy, approaching
the resolution of crystallography experiments28. Significantly
improved modeling was also reported for RoseTTAFold, devel-
oped by RosettaCommons, that followed ideas from AF2 and also
implemented fully continuous crosstalk between 1D, 2D and 3D
information29. Most importantly, AF2, as well as RoseTTAFold,
are now freely available to the scientific community30,31, opening
up powerful avenues for protocol development and applications
to many biological systems that were not amenable to structural
characterization in the past. These are truly exciting times!
Can such NNs also model peptide–protein interactions, and not
only monomers? If peptide–protein interfaces are indeed abundant
in monomer structures, and if indeed peptide–protein interactions
can be captured as protein folding as stated above, RoseTTAFold
and AF2 should, in principle, also allow for the modeling of
peptide–protein complex structures. Moreover, they could alleviate
the lack of data impairing the ability to fully employ DL for
peptide–protein interactions. We note that both RoseTTAFold and
AF2 NNs were trained on single chain protein structural data, and
both use Multiple Sequence Alignments (MSA) as a critical step in
structure prediction. Prediction of protein-protein complexes was
shown to be possible given an informative MSA27,29,32, and it has
also been explored whether it is indeed necessary to provide paired
sequences for successful extraction of interface information33,34.As
both methods heavily rely on good quality MSA, the main chal-
lenge would be to accurately predict the peptide conformation.
Mainly due to their short length, creating an effective MSA for
these regions is challenging.
Here we present a global peptide–protein docking approach
that incorporates the biological concept of peptide–protein
interactions mimicking protein folding and harnesses NNs
trained to predict monomeric protein structures. We show that
by connecting the peptide to the receptor (e.g., by a poly-glycine
linker), monomer folding NNs generate accurate peptide–protein
complex structures (a similar idea was proposed in parallel by
others35). This is possible thanks to the ability of AF2 to (1)
accurately identify unstructured regions36 and model these as
extended linkers, and (2) predict peptide-receptor complexes
without a multiple sequence alignment for the peptide partner,
as we demonstrate in this study. Best performance is obtained
by combining our linker-based strategy with modeling of
peptide–protein complexes by presenting two separate chains to
AF2. The latter has been implemented for the modeling of homo-
and hetero-multimers in several recent studies on AF236,37.
We perform a short calibration on a small representative,
previously well-studied set of protein-peptide interactions, con-
sisting of peptides with and without known binding motifs13.We
then provide a detailed comparison to the currently top-
performing global peptide docking protocol PFPD13. We then
assess the protocol on an extensive, non-redundant set of curated
peptide–protein complexes consisting of 96 interactions, each
involving a distinct fold. Finally, we explore specific types of
interactions of special interest, including examples in which
peptide binding induces a large conformational change in the
receptor upon binding. The latter are very challenging to model
using docking, but easily amenable to AF2 which models the
complex as a whole. Beyond presenting an approach to dock
peptides, this study provides another view on what AF2 may have
learned beyond memorization.
Results
Adapting NN-based structure prediction to peptide docking.
By adding the peptide sequence via a poly-glycine linker to the
C-terminus of the receptor monomer sequence, we mimicked
ARTICLE NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-021-27838-9
2NATURE COMMUNICATIONS | (2022) 13:176 | https://doi.org/10.1038/s41467-021-27838-9 | www.nature.com/naturecommunications