p3P-Initial considerations for protein expression
Every protein is different and there is no universal recipe that can ensure that a given protein will be expressed and purified while retaining its activity and structural integrity. However there are a number of initial considerations at the start of the project that can increase the chance of success. Here are the most important:
-1) Get as much information as possible: knowing the protein you plan to express is important and can avoid many pitfalls.
Minimun informations on the protein:
- Protein sequence
- cDNA sequence
- MW
- Calculated pI
- Extinction coefficient
- Origin
- Subcellular localisation
- Biological function, enzymatic activity
- Co-factors if known
- Wether or not it forms stable complex and what are the known partners
- Previous history of production (and as much details as possible about those attempts)
- Successful production of orthologues
- Presence and description of structural/ functional domains
- Presence of intrinsically destructured region (IDR)
- Post-translational modification required?
Objective of the production:
- Usage?
- Minimum amount required (mg and concentration)?
- Minimum quality required (purity, contaminant to avoid, which ones?)
- Bioactivity required (way to monitor this activity?)
- Removal of the tag required, indifferent, or tag necessary for downstream application?
Material available: cDNA, expression plasmid, cloning strategy?
-2) Selection of the expression system
The choice of the expression system depends on several parameters:
- Origin of the protein; protein of eukaryotic origin present a stronger challenge for expression in prokaryotes.
- Post-translational modifications; if important for activity or stability, it will determine the choice of the expression system.
- Cost and ease of use; bacterial expression is the easiest, cheapest and fastest to set up (with a few exception such as in vitro expression); it gives very good yield for well behaving protein.
P3P provides vectors and strains for expression in the following systems (for more details go on this page)
- E. coli
- Insect cells/baculovirus
- Pichia pastoris (in development)
- In vitro expression (commercial BY-2 lysate ALiCE)
-3) Design of the construct
Tags: they will facilitate the purification of the target protein through affinity chromatography. The standard and minimal tag for E.coli expression is 6xHis (either at N or C-terminus) and affinity chromatography. Other common available tags with affinity include MBP, GST, StrepII. tags can dramatically improve target solubility and prevent aggregation. MBP, Sumo and HLT tags are the first choice for their effect in that matter. Last they can provide an optimized translation initiation context and favor efficient translation.
Domain definition: eucaryotic proteins are often composed of several structurally and functionally distinct domains. Depending on the application it can be better to omit certain domains of the protein. Bioinformatic sequence analysis and structure prediction can assist in the delimitation of the sequence to express.
Sequence optimization: Differences in codon usage can lead to poor expression of eukaryotic sequence in E.coli (and vice versa). The relative low cost of sequence synthesis makes sequence optimization worthwhile (and allows domestication of Goldengate restriction sites an easy task also). However if expression in several expression systems is considered, optimization might be counter-productive. For gene optimization GeneSmart provided free of charge (for now) by GeneScript can be used.
In the light of the above consideration and if there are no serious counter-indication, E.coli expression is generally chosen as first approach given its simplicity, and cost.
We maintain GoldenBraid cloning-adapted vectors for all expression system available at P3P, therefore we strongly recommend to clone the sequence of interest as B4 GoldenBraid module (refer to cloning protocol for details). It allows almost seamless cloning, interoperability between systems and flexibility.
Specific information, protocols and ressources can be found under the Expression systems section.