What does the study of protein expression involve?
Protein expression is the study of the process of synthesizing, folding, localizing, and regulating proteins in response to the functional demands of the body.
Protein expression represents the dynamic output of the genome, where specific proteins are produced to meet cellular and physiological needs. This process, necessary in biological research and biotechnology, is tightly regulated, as achieving efficient protein expression often poses significant challenges. Multiple factors, including chromatin state, transcriptional activity, mRNA translation, post-translational modifications (PTMs), and protein degradation pathways, influence it.
Protein expression studies involve the analysis of:
- Protein structure, properties, and function
- Assessment of activity
- Role in various cellular processes
- Protein interactions
- Protein production
Fundamentals of protein expression
The fundamentals of protein expression involve the process of transcribing DNA into RNA and translating it into functional proteins, with regulation at various stages, such as transcription, translation, and PTM.
The central dogma of molecular biology states that genetic information flows from DNA to RNA to protein. It is fundamental to understand genetics and molecular biology, which are essential in modern research across several fields.
Transcription and translation processes
Protein synthesis is an important process in the structural and functional aspects of life. This process includes:
-
Transcription: DNA is transcribed into a complementary messenger mRNA (mRNA) strand within the nucleus using RNA polymerase.
-
Post-transcriptional modifications:
- Splicing: Introns (non-coding regions) are removed, and exons (coding regions) are joined to form a mature mRNA.
- Capping: A 5' cap is added to protect the mRNA and assist in ribosome binding during translation.
- Polyadenylation: A poly-A tail is added to the 3' end to stabilize the mRNA and regulate its lifespan.
-
Translation: In the cytoplasm, ribosomes read the mRNA sequence to synthesize a polypeptide chain, using tRNAs to bring the corresponding amino acids.
-
PTMs: The synthesized polypeptide undergoes modifications, such as folding, phosphorylation, glycosylation, or cleavage, to become a functional protein.
Endogenous vs. recombinant protein expression
Endogenous protein expression occurs within organisms under normal biological conditions, while recombinant protein expression involves producing proteins using genetically engineered systems.
Both are used in scientific applications, with natural expression aiding in studying native protein functions and recombinant expression used for large-scale protein production for research and therapeutic purposes.
Applications of protein expression systems
Protein expression systems have a range of applications across industries. In biopharmaceuticals, they are used to produce monoclonal antibodies, recombinant hormones, vaccines, and therapeutic enzymes. Industrial uses include manufacturing enzymes for detergents, biofuels, food processing, and bioplastics.
In research, these systems facilitate studies on protein structure, synthetic biology, and gene editing tools like CRISPR. Agricultural applications include genetically modified crops and veterinary vaccines. Diagnostic uses range from producing reagents for tests to developing biosensors. They also enable protein engineering, biomaterial production, bioremediation, and carbon capture, driving innovation in environmental sustainability.
Types of protein expression systems
Protein expression systems can be categorized into prokaryotic, eukaryotic, and cell-free systems, each offering distinct advantages for producing recombinant proteins based on the required yield, complexity, and PTMs.
Overview of protein expression systems
There are various types of protein expression systems used in research and industry to produce proteins for a range of applications, such as drug development, enzyme production, and structural and functional studies. The choice of system depends on the specific production requirements.
For rapid, high-yield production of relatively simple proteins, prokaryotic expression systems are typically used. When producing proteins that require complex folding and post-translational modifications (PTMs), eukaryotic systems are preferred. For prototyping and specialized production, cell-free systems (CFPS) offer a flexible and rapid alternative.
Prokaryotic protein expression systems
Prokaryotic protein expression involves using bacteria to produce recombinant proteins by introducing a plasmid carrying the gene of interest. The plasmid is then transcribed and translated to express the gene of interest within the bacterial system.
Bacterial protein expression
Bacterial expression systems are foundational tools in recombinant DNA technology. They use inducible promoters to produce proteins in bacterial hosts. Systems are tailored for specific applications, such as industrial-scale production or controlled metabolic studies. This technique is frequently used to generate enzymes, antibodies, and other proteins for research purposes, such as insulin and growth hormones, as well as recombinant enzymes for industrial uses. It is generally suitable for manufacturing simple proteins when PTMs do not need to be considered.
Escherichia coli (E. coli) is a highly efficient and cost-effective bacterial expression system for large-scale protein production. This bacterium is easily genetically manipulated, has a rapid doubling time, and produces high protein yields quickly and cost-effectively. E. coli is used for expressing various proteins, such as those used in research, diagnostics, industrial enzymes, and therapeutics. However, E. coli expression systems may not be suitable for proteins that undergo complex PTMs or are toxic to the host cells.
Bacteria, such as Lactococcus lactis (L. lactis) and Pseudomonas spp., offer specialized advantages. L. lactis secretes recombinant proteins (such as interleukin-6) into the extracellular medium, avoiding the formation of inclusion bodies, thus simplifying purification. Unlike E. coli, it lacks endotoxins and tolerates recombinant proteins toxic to its growth, enabling the production of antimicrobial peptides that otherwise inhibit E. coli growth. Classified as ‘generally recognized as safe’(GRAS), L. lactis is suitable for food-grade applications, including oral delivery of vaccines and therapeutic proteins, enhancing its versatility in medical and industrial use.
Pseudomonas fluorescens is used to efficiently produce proteins with disulfide bonds via natural oxidative folding mechanisms, avoiding misfolding and inclusion bodies seen in E. coli (eg, alkaline phosphatase and lipases used in detergents and biocatalysis applications). Like L. lactis, Pseudomonas spp. also secrete proteins in the extracellular medium, with efficient production of large, complex proteins. This bacterium can tolerate toxic proteins, enabling the expression of antimicrobial peptides.
Other bacterial hosts include Bacillus subtilis, classified as GRAS, for efficient protein secretion; Streptomyces species, frequently used for producing complex secondary metabolites, and Corynebacterium glutamicum, which is widely used for large-scale industrial production of amino acids like glutamic acid.
Prokaryotic protein expression systems can produce large quantities of protein in a short duration due to their short doubling time, incurring low cultivation costs, with relatively less complex scaling. Additionally, these systems are genetically well-characterized and easily adaptable for genetic manipulation. However, these systems are incapable of performing complex PTMs such as glycosylation, essential for the activity of many eukaryotic proteins, and the proteins expressed encounter folding issues, exhibiting a propensity to form inclusion bodies, leading to misfolded proteins that require refolding.
Eukaryotic protein expression systems
Eukaryotic protein expression systems use eukaryotic cells, such as yeast, insect cells, or mammalian cells, to produce complex proteins that require post-translational modifications like glycosylation, which are often necessary for proper protein function.
Yeast protein expression system
Yeast protein expression systems, including hosts like Saccharomyces cerevisiae and Pichia pastoris, are optimized for producing recombinant proteins. Optimization strategies focus on enhancing protein expression, stability, and secretion in yeast expression systems. These include:
- Promoter engineering: Using robust, inducible promoters such as GAL1 in S. cerevisiae or AOX1 in P. pastoris to regulate gene expression.
- Gene copy number enhancement: Incorporating multiple gene copies into the yeast genome to boost protein production levels.
- Post-translational modifications: Ensuring proper protein function through modifications like phosphorylation, acetylation, or disulfide bond formation.
- Optimized secretion pathways: Utilizing signal peptides (eg, the α-factor signal in S. cerevisiae) to direct proteins into the medium, simplifying purification.
- Engineered yeast strains: Improving protein folding, reducing proteolytic degradation, and enhancing glycosylation patterns for better functionality.
- Fermentation condition optimization: Refining growth conditions to maximize yields, particularly in P. pastoris, which efficiently utilizes methanol as a carbon source.
Further advancements in more accurate humanized glycosylation improve the validity and versatility of these protein expression systems for potential medical and therapeutic applications. Humanization of glycosylation involves changing native yeast glycosylation pathways and genetically engineering yeast to express glycosylation enzymes that are expressed in human cells. This ensures that the glycosylation modification patterns of a protein generated in a non-human system closely resemble that of naturally expressed proteins in human proteins. This is vital for correct protein stability and function.
However, there remain challenges when using yeast expression systems to express recombinant proteins, such as hyperglycosylation, which can affect protein function, as well as the limited capability of these systems for other complex mammalian PTMs.
Mammalian protein expression systems
A mammalian protein expression system uses mammalian cell lines and lysates, such as Chinese hamster ovary (CHO) or human embryonic kidney (HEK)293 cells, to produce recombinant proteins. This system is preferred for producing proteins with complex PTMs and human-like molecular structures, making it ideal for clinical applications and biopharmaceutical production.
In this system, accurate PTMs, folding, and native functionality are closer to that observed natively, making them suitable for producing therapeutic proteins and monoclonal antibodies. Mammalian expression systems are more expensive to maintain than bacterial systems, and they typically yield lower protein quantities. However, they are used for producing proteins such as monoclonal antibodies and erythropoietin.
Insect cell expression systems
Insect cell expression systems are used to produce proteins or other recombinant products by using insect cells. For example, the baculovirus-insect cell expression system is widely used to produce viral antigens, vaccines, and recombinant proteins. It supports complex PTMs like glycosylation and disulfide bond formation, enables stable expression of genes, and offers safety, versatility across cell lines, and scalable protein production.
The gene of interest is cloned into a baculovirus vector under a strong promoter (such as polyhedrin or p10). Through homologous recombination, recombinant baculovirus is produced when the transfer vector and wild-type baculovirus DNA are co-transfected into insect cells. Cultured insect cell lines (such as Sf9 cells) are infected by amplified recombinant viruses, and the protein is expressed with appropriate protein folding, glycosylation, and the production of disulfide bonds.
Suspension cell culture and the growth and maintenance of insect cell cultures are relatively easy, facilitating large-scale production in bioreactors. Furthermore, baculovirus systems do not typically infect mammals and are thus safe to work with. However, glycosylation patterns of proteins expressed by insect expression systems may differ and potentially affect protein functionality. The entire process is more complex and time-consuming compared to bacterial systems.
This system is important for industrial-scale protein production, including the development of human papillomavirus (HPV) vaccines and virus-like proteins.
Cell-free protein expression systems
Cell-free protein synthesis (CFPS) systems are increasingly being used in research, including applications like pharmaceutical protein production, protein evolution, and structural genomics.
CFPS can be classified as:
Extract-based system
This system uses crude cell extracts that contain essential components for transcription and translation, including ribosomes, tRNAs, amino acids, ATP, and cofactors. Cells undergoing active translation are lysed to prepare the extracts.
- E. coli extracts are used for expressing prokaryotic and simple eukaryotic proteins.
- Wheat germ extracts are used for expressing eukaryotic enzymes and membrane proteins.
- Insect or mammalian extracts are employed for proteins requiring complex PTMs, including glycosylation.
- Rabbit reticulocyte lysate is another example of an extract-based system.
The components in the extract directly translate the target gene (provided as DNA or mRNA) into protein. These systems are ideal for producing hazardous proteins that cannot be expressed in living cells and offer high yields of soluble, functional proteins.
Enzyme-based system
The enzyme-based system uses specific enzymes to catalyze biochemical reactions outside of living cells. By removing cellular content, the reaction environment can be precisely controlled, minimizing background activity and enabling high-purity protein production.
Key advantages
- High purity and homogeneity - ideal for producing proteins with exceptional purity and uniformity.
- Optimized for complex proteins - facilitates the synthesis of multidomain or membrane proteins that are challenging to produce in cell-based systems.
- Protein engineering - enables the incorporation of non-canonical amino acids for advanced protein design and engineering.
Applications
Reconstituted E. coli-based CFPS is utilized in synthetic biology and functional assays. The E. coli-based CFPS system is flexible, cost-effective, and scalable, enabling in vitro protein production as it contains essential transcription and translation machinery.
E. coli-based CFPS is used in synthetic biology to engineer new proteins, biosensors, and metabolic pathways by providing a controlled environment to facilitate the rapid design of genetic circuits and protein modifications, allowing for quick prototyping and optimization of synthetic biological systems. E. coli CFPS is also used for functional assays for drug screening and to study protein-protein interactions and enzyme activity. For example, researchers use CFPS to express membrane proteins, which are otherwise difficult to produce in live cells, facilitating drug-target interaction studies.
It is also used for the development of vaccines and therapeutic proteins, such as for influenza and COVID-19, by enabling the rapid synthesis of viral antigens for vaccine research and is used for generating enzymes and monoclonal antibodies for medical applications.
In a report, rapid and successful production of antibody fragments was demonstrated using a crude E. coli-based CFPS system, highlighting the feasibility of using E. coli-based CFPS for rapid antibody production, with potential applications in drug development and personalized medicine.
CFPS offers significant advantages over cell-based systems by producing eukaryotic multidomain proteins in a folded state, overcoming limitations such as cellular toxicity, protein quality, quantity, and high-throughput production for functional and structural biology studies, and reducing the time and cost of production by eliminating steps involving cell culture and transformation.
However, the limitations of CFPS include limited protein yields, complicated reaction optimization, difficulty in maintaining protein stability, and the limited energy source of the host system.
Factors affecting protein expression
Factors such as the choice of expression system, temperature, pH, promoter strength, and the presence of specific chaperones can significantly influence the yield and quality of recombinant proteins.
Choosing the right protein expression system
Selecting the appropriate expression system depends on several factors:
-
Complexity of the desired protein
- Prokaryotic systems (eg, E. coli): for simple proteins without PTMs; most appropriate for producing enzymes or structural proteins in high quantities.
- Yeast systems (eg, P. pastoris): for proteins requiring simple PTMs, such as glycosylation.
- Mammalian systems (eg, CHO cells): most suitable for complex proteins requiring human-like PTMs, such as monoclonal antibodies or therapeutic proteins.
- Insect cell expression systems (eg, baculovirus-insect cell) have PTMs similar to mammalian cells, and simple cell growth requirements.
-
PTMs
- Proteins requiring disulfide bond formation can be expressed in insect cell systems.
- For therapeutic proteins like erythropoietin, mammalian systems are preferred due to their ability to perform complex PTMs.
-
The scalability of the system for production
- Bacterial systems: highly scalable for industrial enzyme production.
- Yeast systems: cost-effective and scalable for vaccine production.
- Mammalian systems: though expensive, are scalable for biopharmaceuticals like monoclonal antibodies.
Choosing an expression system, whether prokaryotic like E. coli or eukaryotic-like P. pastoris, is essential for achieving high yields and functional proteins.
Codon optimization and codon usage
Codon optimization adjusts gene sequences to match the host organism’s preferred codons, improving translation efficiency, protein yield, and functional properties of proteins, such as folding and activity.
Codon usage plays an important role in regulating gene expression and protein production, which is essential for producing high-quality proteins.
Codon optimization is essential for expressing eukaryotic genes in prokaryotic hosts like E. coli to prevent misfolding or premature translation termination. It is equally important in yeast or insect cells for producing complex proteins that require proper folding. By selecting codons that match the host’s tRNA pool, codon optimization enhances protein expression and prevents slow or stalled translation caused by rare codons. This process ensures gene sequences align with the host’s preferences, improving overall efficiency and yield.
For example, optimization of the human insulin gene for expression in E. coli involves replacing rare codons like AGG (arginine) with codons like CGT, which are more frequently used in E. coli2.
Expression of a recombinant protein in yeast, with PTMs differing from those observed in humans, for example, the production of hepatitis B surface antigen (HBsAg) in P. pastoris is enhanced by optimizing the gene sequence to match the yeast’s codon bias, significantly improving yield.
Although mammalian expression systems have a codon preference similar to humans, codon optimization can further enhance expression. For example, the production of monoclonal antibodies in CHO cells involves optimizing the codons of the heavy and light chain genes to align with CHO cell preferences, improving both expression levels and stability.
Plant-based systems require codon optimization due to plant-specific codon usage biases. For example, codon optimization of the Zaire ebola virus glycoprotein gene for N. benthamiana significantly improved its expression, enabling the production of the antibody cocktail against the virus.
Codon optimization of insect cell systems is also effective for expressing proteins with PTMs. For example, the production of influenza hemagglutinin proteins is optimized for insect cells by replacing rare human codons with codons frequently used by insect cells, increasing protein yields and immunogenicity.
In cell-free systems, codon optimization must match the source of the extract to maximize efficiency. In fact, the protein yield increases severalfold when the extract of rare tRNA overexpressing host strain and codon-optimized gene are used in the CFPS system.
Regulatory elements in expression vectors
Regulatory elements like promoters, enhancers, and terminators, are key to controlling gene expression levels. Promoter strength and ribosome binding sites significantly impact protein production. However, their choice must balance high expression with the host's capacity to avoid toxicity and metabolic strain. Regulatory elements such as promoters, enhancers, and terminators are integral to controlling gene expression.
- Strong promoters: for example, T7 in E. coli, allow protein expression at very high levels when paired with the T7 RNA polymerase in E. coli. The T7 RNA polymerase is highly specific and only transcribes DNA sequences downstream of this promoter. It is widely used in molecular biology to produce large quantities of recombinant proteins in E. coli.
- Inducible promoters: Alcohol oxidase 1 (AOX1) in P. pastoris, is a DNA sequence that can be switched on by the addition of a specific molecule called an inducer. For example, in E. coli, the lac operon promoter is inducible by lactose (or IPTG, a synthetic inducer, allowing transcription to occur.
- Mammalian promoters: The cytomegalovirus promoter (CMV) is a set of promoter sequences found in the DNA of mammals, which, when inserted into expression vectors, can be used to drive gene expression in mammalian cell lines.
Environmental conditions and their influence
Environmental factors like temperature, pH, nutrient availability, and growth media composition can impact the growth rate of expression hosts. This can significantly influence protein expression, impacting stability, yields, and translation efficiency under conditions of stress.
Post-translational modifications
Therapeutic proteins rely on accurate PTMs for their functionality, making the selection and engineering of suitable expression systems essential for optimizing their therapeutic properties. Some examples include:
- Monoclonal antibodies such as rituximab3 (for non-Hodgkin’s lymphoma and autoimmune diseases) and atezolizumab4 (A PD-L1 inhibitor used in cancer immunotherapy) require glycosylation.
- Erythropoietin 5 (clinically used to treat anemia, EPO requires heavy glycosylation).
- Insulin analogs 6 require disulfide bonds for correct folding and function.
- Coagulation factors, such as factor VIII 7 (for the management of hemophilia A, and undergoes glycosylation and sulfation) and factor IX 8 (for the management of hemophilia B, requires γ-carboxylation).
- Hormones, such as somatropin 9 require disulfide bonds for activity. The choice of an expression system must align with the PTMs required for the proper structure and function of the target protein.
For example, mammalian cells like CHO or HEK293 are preferred for producing human-like glycosylation, essential for therapeutic proteins such as antibodies or erythropoietin. Alternatively, yeast P. pastoris also achieves high levels of glycosylation, such as of mannose. Phosphorylation, an important PTM in signaling proteins such as kinases, occurs most effectively in mammalian cells due to their precise regulatory mechanisms.
For complex disulfide bond formation in the expressed protein, mammalian cells are ideal, while simpler systems, such as P. pastoris yeast or insect cells, suffice for less complex proteins. Mammalian and insect cells are suitable for proteolytic cleavage to process precursor proteins, for instance, to convert proinsulin to insulin. Mammalian cells are also preferred for other PTMs, such as acetylation and methylation of expressed recombinant proteins as seen in histones and transcription factors or epigenetic regulators.
For modifications like ubiquitination, lipidation, sulfation, and hydroxylation, mammalian systems are the preferred system because they more accurately mimic human cellular pathways. A few examples of PTMs include proteins involved in degradation pathways (ubiquitination of p53), GTPases (lipidation), hormones like gastrin (sulfation), and structural proteins such as collagen (hydroxylation).
Purified proteins can also be modified for PTMs in a test tube. The process involves mixing the purified protein with enzymes, substrates, and other factors in a test tube. The modification in the samples can be analyzed by western blotting.
Abcam offers a broad range of kits to detect and quantify post-translational modifications in a variety of proteins.
Moreover, tRNA modifications influence global protein expression by impacting translation fidelity and efficiency. One of the most common modifications of tRNA is 5-methylcytosine, often found near the anticodon loop. Other modifications include N1-methyladenosine, Ψ pseudouridine, wybutosine, thiolation, oxidation, acetylation, queuosine modifications, etc10.
Recombinant protein expression and production
Recombinant protein expression and production is an in-vitro process that involves inserting a gene of interest into a host organism, inducing protein synthesis, and purifying the resulting protein for further use.
Steps in recombinant protein production
Recombinant protein production involves key steps such as cloning and vector design, host cell transformation, induction, and protein expression, followed by protein purification and analysis.
Cloning and vector design
Cloning and vector design involve defining the gene segment to be cloned, constructing multiple vectors with tags for detection and purification, and employing various cloning methods.
A recombinant expression vector requires a multiple cloning site, a selectable marker, an origin of replication, the appropriate regulatory sequences for host-specific gene expression, like a transcription termination sequence or a ribosome binding site (in prokaryotes), and a strong promoter. However, the specific needs should be based on the host system.
In addition to the specifications mentioned above, additional host-specific requirements are:
- Expression in bacterial systems (eg, E. coli) requires vectors with strong bacterial promoters like the lac promoter (inducible by IPTG) or T7 promoter and Shine-Dalgarno sequence essential for efficient ribosome binding and translation initiation.
- Yeast (eg, S. cerevisiae) vectors need a yeast promoter such as ADH1 or GPD promoter and polyadenylation signal for proper mRNA stability.
- Vectors used in insect cell systems (using baculovirus) require a strong promoter, such as a polyhedrin promoter, for high protein expression in insect cells and signal peptide coding regions to direct protein secretion if required.
- Recombinant protein expression in mammalian systems (eg, CHO cells) requires a eukaryotic promoter, like CMV promoter and the Kozak sequence, for efficient translation initiation in eukaryotes. In some cases, intron sequences may be added to enhance gene expression. Additionally, cold-shock promoters can be employed in vectors for mammalian cells to stabilize mRNA, favor translation, and improve protein solubility.
The need for a high-expression system may depend on the goals of the study. Purification tags, like His-tags, facilitate isolation, downstream processing, and analysis, whereas inducible promoters provide controlled protein expression.
The use of advanced molecular cloning techniques, like restriction enzymes, PCR, and genome editing, to isolate, amplify, and transfer DNA fragments into vectors enables heterologous gene expression and routine synthesis of specific DNA sequences.
Protein tags
Protein (or affinity) tags are short peptides or protein sequences attached to recombinant proteins to facilitate purification, detection, or localization. Affinity tags simplify protein purification and detection. For example:
- His-tags: 6-10 histidine residues, are small in size and exert minimal impact on protein function, used widely in E. coli and yeast systems, facilitating purification via nickel or cobalt affinity chromatography. Our His-tag protein expression check kit (ab270048) allows users to verify and monitor the successful expression of His-tagged recombinant proteins.
- GST: Glutathione S-transferase tags, with a size of 26 kDa, bind to glutathione resin, enhance solubility and enable purification in E. coli. We provide anti-GST antibody [S-tag-05] (ab36415) to detect and determine the production of GST-tagged proteins.
- FLAG tags: small and hydrophilic, are used in mammalian systems for immunoprecipitation, detection, and single-step purification while improving solubility. Our anti-DDDDK tag antibody ab1162 is a rabbit polyclonal antibody that binds to FLAG® tag sequence.
- MBP: (maltose binding protein) tags promote enhanced solubility and expression primarily in bacterial systems, with a size of 42 kDa. The anti-maltose binding protein antibodyfrom Abcam recognizes native as well as denatured reduced forms of purified maltose binding protein (MBP) or MBP fusion proteins.
To attach a tag to a protein for purification, the gene encoding the protein of interest is genetically engineered to include a short sequence of amino acids (the “tag”) at either the N-terminus or C-terminus, which allows the protein to bind specifically to a resin during purification. This tag is then cleaved off after purification using a specific protease that recognizes a cleavage site engineered between the tag and the protein of interest, effectively releasing the pure protein.
Cleavage tags, such as the TEV (tobacco etch virus) protease site, facilitate the precise removal of tags post-purification, ensuring flexibility in downstream applications.
To remove the tag after purification, a specific protease cleavage site is engineered between the tag and the protein of interest. The choice of protease depends on the specific tag and the desired cleavage site sequence.
Other tags, such as fluorescent tags, including green fluorescent protein, are used to visualize and track protein expression, localization, or folding across bacterial, yeast, and mammalian systems.
Tags like His-tag protein expression check kit (ab270048) or GST-tag protein expression check kit (ab270052) are commonly used to simplify the purification process or help track the protein's presence in experimental systems.
Host cell transformation
Host cell transformation involves introducing constructed vectors into host cells like E. coli or mammalian cells for protein production, followed by small-scale expression screening to identify the optimal construct-host system.
The expression system being employed determines how vectors are introduced into host cells. In bacterial systems like E. coli, transformation involves the introduction of the vector into the host cells by chemical treatment (eg, calcium chloride) or electroporation, which causes the formation of transient pores in the cell membrane to permit DNA entry. The colonies positive for the plasmid constructs are identified by growing the transformed cells on selective media.
For stable and effective gene integration, mammalian cell transfection frequently employs chemical reagents (eg, lipofection with liposomes or calcium phosphate), electroporation (less commonly used), or viral-mediated delivery. Yeast transformation (eg, Pichia, Saccharomyces) is performed by electroporation or lithium acetate treatment, whereas insect cells (eg, employing baculovirus vectors) are infected with recombinant viruses expressing the target gene.
Upon the introduction of the vector, its expression is examined, and optimal conditions are identified for protein production, ensuring scalability and reproducibility for larger-scale applications. The choice of methods depends on factors like the host's compatibility, the size of the vector, and the desired protein yield.
Induction and protein expression
Induction involves promoting protein expression in host cells using specific inducers, followed by scale-up in batch cultures like shake flasks or bioreactors to produce larger protein quantities.
In prokaryotic systems, such as E. coli, IPTG is commonly used for induction. IPTG derepresses the lac operon by inactivating LacI, driving transcription through the T7 promoter. The auto-induction method in E. coli relies on diauxic growth dynamics controlled by the lac operon regulatory system11. Glucose is the principal carbon source in an auto-inducing media, as well as a catabolite repressor of the PT7 promoter during cell development.
When glucose levels are low, glycerol is used as a source of carbon, and lactose stimulates protein creation. This two-stage method suppresses heterologous gene expression during initial development, limiting leaky expression and growth inhibition, resulting in effective protein production upon lactose stimulation. Auto-induction has been successfully implemented in proteins like luciferase12.
In mammalian cell lines, induction strategies involve optimization of the expression vector by using a strong promoter using chemical inducers to stimulate protein production. Sometimes, the cell line itself is engineered to enhance expression levels.
The methanol-inducible AOX1 promoter is widely used in P. pastoris; it ensures tight regulation, minimizing leaky expression and enabling high yield. Likewise, the GAL1 promoter in S. cerevisiae is activated by galactose.
In plant systems, infiltration of Agrobacterium tumefaciens harboring the expression construct introduces desired genes into plant cells. Modified viral vectors are also used to deliver genetic material into cells. Tobacco mosaic virus infects a wide range of plants, including tobacco, and is often used as a viral vector along with agroinfiltration. In addition, inducible promoters, such as heat-shock or glucocorticoid-responsive promoters, enable controlled expression of the recombinant protein.
In insect cells, such as Sf9, Sf21, or Hi-5, natural induction occurs when the insect cells are infected with recombinant baculoviruses, activating strong polyhedrin or p10 promoters.
Unlike in whole-cell systems where inducers like IPTG might be added to the culture medium, in cell-free systems, the inducer is usually the DNA/mRNA template in the reaction mixtures with transcriptional and translational components. CFPS systems, like those derived from E. coli or wheat germ, commonly use T7 promoters for efficient expression.
Techniques in protein purification and analysis
Once the host expresses the recombinant protein, purification techniques help to efficiently isolate recombinant proteins for further analysis or application.
Protein purification and characterization
Protein purification and analysis use affinity chromatography for initial purification with tags like poly-histidine. Additionally, ion exchange chromatography and sometimes reverse-phase HPLC can be used. The final proteins are assessed for purity and activity using SDS-PAGE and activity assays such as Abcam’s global protein synthesis assay kits (ab273286).
Several purification methods can effectively isolate proteins and enhance purification efficiency by selectively removing impurities from complex mixtures. Some of the purification methods are:
- Affinity chromatography is a widely used method for antibody purification due to its high selectivity and rapidity, with the choice of support matrices, ligands, and optimization of purification protocols being vital for meeting clinical and performance requirements13. A biological agent that binds the target is used as the stationary phase (for example, antigen-antibody). Binding agents that are used include biotin-avidin, antibodies, enzymes, nucleic acids, etc, non-biological agents, such as dyes, boronates, aptamers, etc, are also used.
- Size exclusion chromatography separates proteins based on their size. It can improve refolding yields of recombinant proteins by effectively removing impurities leading to higher protein yield and specific activity recovery.
- Ion exchange chromatography, when combined with modern techniques like immunoaffinity or immobilized-metal affinity chromatography, is effective for large-scale recovery and purification of recombinant proteins expressed in bacterial cells14. It is based on the charge-charge interactions between the charged molecules immobilized in the chromatography column and the proteins of interest. Hence, based on the charge, there are cation exchange and anion exchange columns.
Characterization is essential for analyzing recombinant proteins and ensuring their functionality and quality.
- SDS-PAGE is used to confirm the molecular weight of a recombinant protein. It provides a visual representation of the protein's size and purity. This helps with enhanced expression and makes the study of structural properties easier. Read more in our comprehensive guide to SDS-PAGE.
- Western blotting is an important technique for detecting specific proteins, identifying post-translational modifications, quantifying protein levels, and ensuring reproducibility in protein characterization. Learn more about the western blot protocol.
Challenges in protein expression
Despite several advantages, protein expression faces challenges. Certain strategies can be incorporated in optimizing conditions for various proteins.
Common challenges in protein expression
Protein expression faces challenges such as improper folding, instability, toxicity, solubility issues, and difficulties with disulfide bonds, particularly during large-scale production. Additional hurdles include:
- Intra-RNA interactions
- Secondary structures affecting translation
- Inclusion body formation
- Varying protein-protein interactions across environments.
Overcoming expression challenges
Strategies to overcome expression challenges include managing protein toxicity, optimizing mRNA structure and codon bias, and utilizing advanced techniques such as codon optimization, big data, and machine learning to enhance recombinant protein production.
-
Intra-RNA interactions: The folding patterns within an mRNA molecule can affect how efficiently the mRNA is translated into a protein, potentially hindering the expression of the recombinant protein due to secondary structures that may impede ribosome binding or reading of the coding sequence; essentially, the RNA folds back on itself, creating obstacles for the translation machinery. The pairing of the complementary bases in the mRNA leads to the formation of loops, stems, and other structures, potentially obstructing the capability of the ribosome to reach the coding area. Binding of the ribosome and translation initiation is more difficult when the start codon is located inside a complex secondary structure of the mRNA, consequently lowering the amount of protein produced. The mRNA stability and translation efficiency may be impacted by intra-RNA interactions in the untranslated regions (UTR) at the 5' and 3' ends of the mRNA.
-
Secondary structures affecting translation: secondary structures in RNA can affect recombinant protein translation. These structures are formed when non-adjacent nucleotides interact and include hairpins, R-loops, G-quadruplexes, and long-range interactions. These structures can impact the translation of mRNA, as well as other processes like splicing and synthesis.
-
Inclusion body formation: Inclusion bodies are dense, aggregated structures of misfolded proteins that lack bioactivity. To achieve a native conformation, the protein must be extracted with denaturants and then refolded. The efficiency of renaturation is low, which can lead to a reduced yield of the final product. The formation of improper disulfide bonds can cause the protein to fold incorrectly and form inclusion bodies.
The formation of inclusion bodies can simplify protein purification, but the protein must be extracted and refolded. Several approaches to minimize inclusion body formation include:- Lowering the rate of synthesis.
- Introducing foldases or chaperones.
- Facilitating the expression of osmolytes or chaperones.
- Fusing the target with tags or a soluble protein.
- Removing the structural elements contributing to inclusion body formation.
- Varying protein-protein interactions across environments: The interactions between different proteins within a cell can be strongly affected by the surrounding environment; this potentially affects recombinant protein production, resulting in issues such as protein aggregation, misfolding, or lower expression.
Further, chaperone proteins aid in protein folding by engaging with nascent polypeptide chains and directing them to the optimal shape. The availability of chaperones varies among host cells (eg, bacteria, yeast, or mammalian cells), impacting folding.
Overexpressed recombinant proteins can form inclusion bodies due to incorrect folding caused by protein-protein interactions. Additionally, post-translational changes (eg, glycosylation and phosphorylation) impact protein stability, functionality, and interactions with other proteins.
High temperatures denature proteins, disrupting associations, whereas low temperatures limit the folding process. Variations in pH affect surface charge distribution, which influences binding, while salt concentrations modify electrostatic forces between proteins. Environmental factors can affect chaperone activity, resulting in protein misfolding and aggregation.
Strategies to enhance recombinant protein expression include selecting host cells with appropriate chaperone systems, adjusting expression levels to prevent aggregation, protein sequence modifications to improve stability, and adding genes expressing chaperones to facilitate folding.
Optimization of protein expression
High-throughput methodologies, parallel screening and purification, and customized expression systems help optimize protein expression by improving solubility, yield, and functionality. For instance, a nearly 5-fold increased expression of clotting factor VIII in tobacco was achieved due to codon optimization15.
Another approach for high-throughput protein production is bicistronic design (BCD)-based transcriptional fusion, which enables simultaneous expression of a gene of interest and a reporter gene within a single mRNA transcript, enabling efficient expression optimization and direct readout of protein production. The target protein's expression is monitored via the reporter protein's activity, typically a fluorescent signal. Through this approach, the expression of a protein is optimized by ensuring consistent stoichiometry between the target and reporter proteins, also allowing real-time monitoring of the expression16.
Regarding cultivation platforms, microliter-level microtiter plate systems have attracted much attention due to their low cost and easy-to-operate approach. Further, the medium composition can be optimized faster using MTPs or microbioreactors through experimental optimization.
Advances and directions in protein expression
Advances in protein expression have vastly improved the production of recombinant proteins for research, medicine, and industry, with new technologies making it easier to increase yields, solubility, and functionality.
Optimized promoters, enhancers, and regulatory sequences designed for certain host systems are all part of enhanced vector design. Additionally, codon optimization promotes compatibility with the host's tRNA, improving translation efficiency.
Engineering the host species, including yeast, insects, mammalian cells, and E. coli, has increased their efficiencies. For example, strains of E. coli are currently designed to reduce protease activity or promote the formation of disulfide bonds17.
Precision-controlled gene expression is made possible by cutting-edge tools such as cell-free protein synthesis, CRISPR/Cas9 genome editing, and synthetic biology tools. Additionally, improved baculovirus systems for insect cells have shown better transfection efficiency and production consistency18.
High-throughput screening, automated bioreactors, perfusion culture systems, and improved media formulations have streamlined the scale-up process. Furthermore, strategies such as machine learning-based sequence optimization, new-and-improved fusion tags, and improved secretion signals have caused increased protein expression and functionality. For example, fusion tags (eg, MBP, GST) and chaperone co-expression systems can enhance solubility, while cold-shock promoters can reduce aggregation during expression19,20.
Emerging techniques in protein expression
Emerging tools like synthetic biology, CRISPR, and genome editing are revolutionizing protein expression by enabling precise modifications to host organisms for optimized production.
Synthetic biology aids in engineering microorganisms or cells to produce proteins with higher yields. Genome editing techniques like CRISPR/Cas9, and catalytically dead Cas9 (dCas9) enable precise gene manipulation and expression regulation, advancing synthetic biology platforms for enhanced protein production.
High-throughput and AI-driven screening
High-throughput analytical technologies and automation platforms with the use of artificial intelligence (AI) enable the efficient testing of variables to optimize protein expression and purification for research and therapeutic purposes.
The strategies include engineering 5' untranslated regions (5' UTRs) and integrating genome-editing tools with next-generation sequencing. This enhances protein production and enables advancements in gene therapy and crop improvement.
Sustainability
Advancements in high-throughput and automated technologies in protein expression indirectly contribute to sustainability by reducing resource consumption and waste through more efficient optimization processes.
Future perspectives in personalized medicine
Advances in screening technologies are enabling personalized medicine by facilitating precise genetic modifications and optimizing protein expression for individualized therapies. These may eventually become a potential tool for treating genetic diseases, with advancements in AI and CRISPR integration aiming to enhance precision and efficacy in patient-specific treatments.
FAQs
What are the main differences between bacterial and yeast protein expression systems?
Bacterial protein expression systems are fast, cost-effective, and suitable for producing simple proteins but often lack proper post-translational modifications like glycosylation. Yeast systems provide eukaryotic features for more complex protein folding and modifications, though they can be slower and more expensive than bacterial systems.
How do mammalian cells compare to bacterial cells in terms of protein expression efficiency?
Mammalian cells have lower protein expression efficiency than bacterial cells, with slower growth rates and higher production costs, but they excel in producing complex proteins with accurate post-translational modifications. Bacterial cells are faster and more cost-efficient but may fail to produce correctly folded or modified eukaryotic proteins.
What are the advantages and disadvantages of using insect cells for protein expression?
Insect cells offer the advantage of producing eukaryotic proteins with more accurate folding and post-translational modifications, closely resembling those in higher eukaryotes. They also grow quickly in suspension cultures and can express high yields of recombinant proteins using baculovirus expression systems. However, insect cell expression systems have certain disadvantages.
Maintenance of these expression systems can be relatively more expensive and may take longer to obtain suitable yields than yeast and E. coli expression systems. Furthermore, the pattern of glycosylation of proteins produced by insect cells is limited, along with their inability to produce proteins with complex N-linked side chains.
How does the baculovirus-mediated expression system work?
The baculovirus-mediated expression system introduces recombinant baculovirus into insect cells, where the virus infects the cells and drives high-level protein expression using a strong promoter. This system allows the insect cells to produce recombinant proteins with proper folding and post-translational modifications.
What factors influence the choice of an expression system for recombinant proteins?
The choice of an expression system for recombinant proteins depends on factors like the complexity of the protein, desired post-translational modifications, expression yield, production cost, and the speed of protein synthesis. Other considerations include scalability, host cell compatibility, and regulatory requirements for therapeutic applications.
References
- Calhoun, K.A., Swartz J.R. Energy systems for ATP regeneration in cell-free protein synthesis reactions. Methods in molecular biology. 375, 3-17 (2007).
- Aguirre-López, B., Cabrera, N., de Gómez-Puyou, M.T., et al. The importance of arginine codons AGA and AGG for the expression in E. coli of triosephosphate isomerase from seven different species. Biotechnology reports. 13, 42-48 (2017).
- Edwards, E., Livanos, M., Krueger, A., et al. Strategies to control therapeutic antibody glycosylation during bioprocessing: Synthesis and separation. Biotechnology and bioengineering. 119, 1343-1358 (2022).
- Li, M., Zhao, R., Chen, J., et al. Next generation of anti-PD-L1 atezolizumab with enhanced anti-tumor efficacy in vivo. Scientific reports. 11, 5774 (2021).
- O'Flaherty, R., Amez, M.M., Gardner, R.A., et al. Erythropoietin N-glycosylation of therapeutic formulations quantified and characterized: an interlab comparability study of high-throughput methods. Biomolecules. 14, 125 (2024).
- Baeshen, N.A., Baeshen, M.N., Sheikh, A., et al. Cell factories for insulin production. Microbial cell factories. 13, 141 (2014).
- Zacchi, L.F., Roche-Recinos, D., Pegg, C.L., et al. Coagulation factor IX analysis in bioreactor cell culture supernatant predicts quality of the purified product. Communications biology. 4, 390 (2021).
- Zacchi, L.F., Roche-Recinos, D., Pegg, C.L. et al. Coagulation factor IX analysis in bioreactor cell culture supernatant predicts quality of the purified product. Communications biology. 4, 390 (2021).
- Castillo-Corujo, A., Uchida, Y., Saaranen, M.J., et al. Escherichia coli cytoplasmic expression of disulfide-bonded proteins: side-by-side comparison between two competing strategies. Journal of microbiology and biotechnology. 34, 1126-1134 (2024).
- Zhang, M., & Lu, Z. tRNA modifications: greasing the wheels of translation and beyond. RNA biology. 22, 1-25 (2024).
- Sung K.L., Jay D.K. Heterologous protein production in Escherichia coli using the propionate-inducible pPro system by conventional and auto-induction methods. Protein expression and purification. 61, 197-203 (2008).
- Blommel, P.G., Becker, K.J., Duvnjak, P., et al. Enhanced bacterial protein expression during auto-induction obtained by alteration of lac repressor dosage and medium composition. Biotechnology progress. 23, 585-598 (2007).
- Rodriguez, E.L., Poddar, S., Iftekhar, S., et al. Affinity chromatography: a review of trends and developments over the past 50 years. Journal of chromatography B, analytical technologies in the biomedical and life sciences. 1157, 122332 (2020).
- Adhikari, S., Manthena, P.V., Sajwan, K., et al. A unified method for purification of basic proteins. Analytical biochemistry. 400, 203-206 (2010).
- Kwon, K.C., Chan, H.T., León, I.R., et al. Codon optimization to enhance expression yields insights into chloroplast translation. Plant physiology. 172, 62-77 (2016).
- Sun, M., Gao, A.X., Li, A., et al. Bicistronic design as recombinant expression enhancer: characteristics, applications, and structural optimization. Applied microbiology and biotechnology. 105, 7709-7720 (2021).
- Anna, D., Charlotte, R., Nick, J.B., et al. A comparative study of the performance of E. coli and K. phaffii for expressing α-cobratoxin. Toxicon 239, 107613 (2024).
- Yoko, M., Hoang-Oanh, B.N., David S., et al. A robust and flexible baculovirus-insect cell system for AAV vector production with improved yield, capsid ratios and potency. Molecular therapy - methods & clinical development. 32, 101228 (2024).
- Shendge, A.A., D’Souza, J.S. Strategic optimization of conditions for the solubilization of GST-tagged amphipathic helix-containing ciliary proteins overexpressed as inclusion bodies in E. coli. Microbial cell factory. 21, 258 (2022).
- Yaneth, B-A., Cipriano, C-C., Luis, B.F-C., et al. The potential of cold-shock promoters for the expression of recombinant proteins in microbes and mammalian cells. Journal of Genetic Engineering and Biotechnology. 20, 173 (2022).