Inverse folding aims at decoding protein sequences given structural information as input to a machine learning model. The fidelity of a network's sequence recovery depends on various factors. For ProteinMPNN, three outstanding contributions are the method to which structural information is encoded, the update schema for message passing, and order-agnostic decoding.
Beyond fixed forward decoding, ProteinMPNN leverages randomized decoding order construction. To create a randomized order of decoding positions, Gaussian noise is added to a binary mask representing chain connectivity of protein residue, 1 corresponding to a connected residue pair, and 0 corresponds to disconnected or missing regions. Upon noise addition, the augmented chain connectivity mask is sorted to create a tensor that describes the decoding order of the protein sequence, deviating from the canonical left-to-right autoregressive decoding schema to allow for arbitrary decoding order during inference.
Example of adding noise and sorting to construct decoding order:Before noise: [1, 1, 0, 0]
Dauparas, Justas, et al. "Robust deep learning–based protein sequence design using ProteinMPNN." Science 378.6615 (2022): 49-56.