A subscription to JoVE is required to view this content. Sign in or start your free trial.
Method Article
The protocols described allow the construction, characterization and selection (against the target of choice) of a "domainome" library made from any DNA source. This is achieved by a research pipeline that combines different technologies: phage display, a folding reporter and next generation sequencing with a web tool for data analysis.
Folding reporters are proteins with easily identifiable phenotypes, such as antibiotic resistance, whose folding and function is compromised when fused to poorly folding proteins or random open reading frames. We have developed a strategy where, by using TEM-1 β-lactamase (the enzyme conferring ampicillin resistance) on a genomic scale, we can select collections of correctly folded protein domains from the coding portion of the DNA of any intronless genome. The protein fragments obtained by this approach, the so called "domainome", will be well expressed and soluble, making them suitable for structural/functional studies.
By cloning and displaying the "domainome" directly in a phage display system, we have showed that it is possible to select specific protein domains with the desired binding properties (e.g., to other proteins or to antibodies), thus providing essential experimental information for gene annotation or antigen identification.
The identification of the most enriched clones in a selected polyclonal population can be achieved by using novel next-generation sequencing technologies (NGS). For these reasons, we introduce deep sequencing analysis of the library itself and the selection outputs to provide complete information on diversity, abundance and precise mapping of each of the selected fragment. The protocols presented here show the key steps for library construction, characterization, and validation.
Here, we describe a high-throughput method for the construction and selection of libraries of folded and soluble protein domains from any genic/genomic starting source. The approach combines three different technologies: phage display, the use of a folding reporter and next generation sequencing (NGS) with a specific web tool for data analysis. The methods can be used in many different contexts of protein-based research, for identification and annotation of new proteins/protein domains, characterization of structural and functional properties of known proteins as well as definition of protein-interaction network.
Many open questions are still present in protein-based research and the development of methods for optimal protein production is an important need for several fields of investigation. For example, despite the availability of thousands of prokaryotic and eukaryotic genomes1, a corresponding map of the relative proteomes with a direct annotation of the coded proteins and peptides is still missing for the great majority of organisms. The catalogue of complete proteomes is emerging as a challenging goal requiring a huge effort in terms of time and resources. The gold standard for experimental annotation remains the cloning of all the Open Reading Frames (ORFs) of a genome, building the so called "ORFeome". Usually gene function is assigned based on homology to related genes of known activity but this approach is poorly accurate due to the presence of many incorrect annotations in the reference databases2,3,4,5. Moreover, even for proteins that have been identified and annotated, additional studies are required to achieve characterization in terms of abundance, expression patterns in different contexts, including structural and functional properties as well as interaction networks.
Furthermore, since proteins are composed of different domains, each of them showing specific features and differently contributing to protein functions, the study and the exact definition of these domains can allow a more comprehensive picture, both at the single gene and at the full genome level. All this necessary information makes protein-based research a wide and challenging field.
In this perspective, an important contribution could be given by unbiased and high-throughput methods for protein production. However, the success of such approaches, beside the considerable investment required, relies on the ability to produce soluble/stable protein constructs. This is a major limiting factor since it has been estimated that only about 30% of proteins can be successfully expressed and produced at sufficient levels to be experimentally useful6,7,8. An approach to overcome this limitation is based on the use of randomly fragmented DNA to produce different polypeptides, which together provide overlapping fragment representation of individual genes. Only a small percentage of the randomly generated DNA fragments are functional ORFs whilst the great majority of them are non-functional (due to the presence of stop codons inside their sequences) or encode for un-natural (ORF in a frame other than the original) polypeptides with no biological meaning.
To address all these issues, our group has developed an high-throughput protein expression and interaction analysis platform that can be used on a genomic scale9,10,11,12. This platform integrates the following techniques: 1) a method to select collections of correctly folded protein domains from the coding portion of DNA from any organism; 2) the phage display technology for selecting partners of interactions; 3) the NGS to completely characterize the whole interactome under study and identify the clones of interest; and 4) a web tool for data analysis for users without any bioinformatics or programming skills to perform Interactome-Seq analysis in an easy and user-friendly way.
The use of this platform offers important advantages over alternative strategies of investigation; above all the method is completely unbiased, high-throughput, and modular for study ranging from a single gene up to a whole genome. The first step of the pipeline is the creation of a library from randomly fragmented DNA under study, which is then deeply characterized by NGS. This library is generated using an engineered vector where genes/fragments of interest are cloned between a signal sequence for protein secretion into the periplasmic space (i.e., a Sec leader) and the TEM1 β-lactamase gene. The fusion protein will confer ampicillin resistance and the ability to survive under ampicillin pressure only if cloned fragments are in-frame with both these elements and the resulting fusion protein is correctly folded10,13,14. All clones rescued after antibiotic selection, the so called "filtered clones", are ORFs and, a great majority of them (more than 80%), are derived from real genes9. Moreover, the power of this strategy lies in the findings that all ORF filtered clones are encoding for correctly folded/soluble proteins/domains15. As many clones, present in the library and mapping in the same region/domain, have different starting and ending points, this allows unbiased, single-step identification of the minimum fragments that are likely to result in soluble products.
A further improvement in the technology is given by the use of NGS to characterize the library. The combination of this platform and of a specific web tool for data analysis gives important unbiased information on the exact nucleotide sequences and on the location of selected ORFs on the reference DNA under study without the need of further extensive analyses or experimental effort.
Domainome libraries can be transferred into a selection context and used as a universal instrument to perform functional studies. The high-throughput protein expression and interaction analysis platform that we integrated and that we called Interactome-Seq takes advantage of the phage display technology by transferring the filtered ORF into a phagemid vector and creating a phage-ORF library. Once re-cloned into a phage display context, protein domains are displayed on the surface of M13 particles; in this way domainome libraries can be directly selected for gene fragments encoding domains with specific enzymatic activities or binding properties, allowing interactome networks profiling. This approach was initially described by Zacchi et al.16 and later used in several other context13,17,18.
Compared to other technologies used to study protein-protein interaction (including yeast two hybrid system and mass spectrometry19,20), one major advantage is the amplification of the binding partner that occurs during phage display multiple rounds of selection. This increases the selection sensitivity thus allowing the identification of low abundant binding proteins' domains present in the library. The efficiency of the selection performed with ORF-filtered library is further increased due to the absence of non-functional clones. Finally, the technology allows the selection to be performed against both protein and non-protein baits21,22,23,24,25.
Phage selections using the domainome-phage library can be performed using antibodies coming from sera of patients with different pathological conditions, e.g. autoimmune diseases13, cancer or infection diseases as bait. This approach is used to obtain the so called "antibody signature" of the disease under study allowing to massively identify and characterize the antigens/epitopes specifically recognized by the patients' antibodies at the same time. Compared to other methods the use of phage display allows the identification of both linear and conformational antigenic epitopes. The identification of a specific signature could potentially have an important impact for understanding pathogenesis, new vaccine design, identification of new therapeutic targets and development of new and specific diagnostic and prognostic tools. Moreover, when the study is focused on infectious diseases, a major advantage is that the discovery of immunogenic proteins is independent from pathogen cultivation.
Our approach confirms that the folding reporters can be used on a genomic scale to select the "domainome": a collection of correctly folded, well expressed, soluble protein domains from the coding portion of the DNA and/or cDNA from any organism. Once isolated the protein fragments are useful for many purposes, providing essential experimental information for gene annotation as well as for structural studies, antibody epitope mapping, antigen identification, etc. The completeness of high-throughput data provided by NGS enables the analysis of highly complex samples, such as phage display libraries, and holds the potential to circumvent the traditional laborious picking and testing of individual phage rescued clones.
At the same time thanks to the features of the filtered library and to the extreme sensitivity and power of the NGS analysis, it is possible to identify the protein domain responsible of each interaction directly in an initial screen, without the need to create additional libraries for each bound protein. NGS allows to obtain a comprehensive definition of the whole domainome of any genic/genomic starting source and the data analysis web tool enables the obtainment of a highly specific characterization both from a qualitative and quantitative point of view of the interactome proteins' domains.
1. Construction of the ORF Library (Figure 1)
2. Subcloning of Filtered ORFs in a Phagemid Vector (Figure 2)
3. Phage Library Preparation and Selection Procedure
4. Phage Library Deep Sequencing Platform (Figure 3)
5. Bioinformatic Data Analysis by Using the Interactome-Seq Web Tool
The filtering approach is schematized in Figure 1. Each kind of intronless DNA can be used. In Figure 1A the first part of the filtering approach is represented: after loading on an agarose gel or a bioanalyzer, a good fragmentation of the DNA of interest appears as a smear of fragments with a length distribution in the desired size of 150-750 bp. A representative virtual gel image of the fragmented DNA obtained is given. Fragmen...
The creation of a high quality highly diverse ORFs filtered library is the first critical step in the whole procedure since it will affect all the subsequent steps of the pipeline.
An important advantageous feature of our method is that any source of (intronless) DNA (cDNA, genomic DNA, PCR derived or synthetic DNA) is suitable for library construction. The first parameter that should be taken into account is that the length of the DNA fragments cloned into the pFILTER vector should provide a ...
The authors have nothing to disclose.
This work was supported by a grant from the Italian Ministry of Education and University (2010P3S8BR_002 to CP).
Name | Company | Catalog Number | Comments |
Sonopuls ultrasonic homogenizer | Bandelin | HD2070 | or equivalent |
GeneRuler 100 bp Plus DNA Ladder | Thermo Scientific | SM0321 | or equivalent |
GeneRuler 1 kb DNA Ladder | Thermo Fisher Scientific | SM0311 | or equivalent |
Molecular Biology Agarose | BioRad | 161-3102 | or equivalent |
Green Gel Plus | Fisher Molecular Biology | FS-GEL01 | or equivalent |
6x DNA Loading Dye | Thermo Fisher Scientific | R0611 | or equivalent |
QIAquick Gel Extraction Kit | Qiagen | 28704 | or equivalent |
Quick Blunting Kit | New England Biolabs | E1201S | |
NanoDrop 2000 UV-Vis Spectrophotometer | Thermo Fisher Scientific | ND-2000 | |
High-Capacity cDNA Reverse Transcription Kit | Thermo Fisher Scientific | 4368813 | |
Streptavidin Magnetic Beads | New England Biolabs | S1420S | or equivalent |
QIAquick PCR purification Kit | Qiagen | 28104 | or equivalent |
EcoRV | New England Biolabs | R0195L | |
Antarctic Phosphatase | New England Biolabs | M0289S | |
T4 DNA Ligase | New England Biolabs | M0202T | |
Sodium Acetate 3M pH5.2 | general lab supplier | ||
Ethanol for molecular biology | Sigma-Aldrich | E7023 | or equivalent |
DH5aF' bacteria cells | Thermo Fisher Scientific | ||
0,2 ml tubes | general lab supplier | ||
1,5 ml tubes | general lab supplier | ||
0,1 cm electroporation cuvettes | Biosigma | 4905020 | |
Electroporator 2510 | Eppendorf | ||
2x YT medium | Sigma-Aldrich | Y1003 | |
Ampicillin sodium salt | Sigma-Aldrich | A9518 | |
Chloramphenicol | Sigma-Aldrich | C0378 | |
DreamTaq DNA Polymerase | Thermo Fisher Scientific | EP0702 | |
Deoxynucleotide (dNTP) Solution Mix | New England Biolabs | N0447S | |
96-well thermal cycler (with heated lid) | general lab supplier | ||
150 mm plates | general lab supplier | ||
100 mm plates | general lab supplier | ||
Glycerol | Sigma-Aldrich | G5516 | |
BssHII | New England Biolabs | R0199L | |
NheI | New England Biolabs | R0131L | |
QIAprep Spin Miniprep Kit | Qiagen | 27104 | or equivalent |
M13KO7 Helper Phage | GE Healthcare Life Sciences | 27-1524-01 | |
Kanamycin sulfate from Streptomyces kanamyceticus | Sigma-Aldrich | K1377 | |
Polyethylene glycol (PEG) | Sigma-Aldrich | P5413 | |
Sodium Cloride (NaCl) | Sigma-Aldrich | S3014 | |
PBS | general lab supplier | ||
Dynabeads Protein G for Immunoprecipitation | Thermo Fisher Scientific | 10003D | or equivalent |
MagnaRack Magnetic Separation Rack | Thermo Fisher Scientific | CS15000 | or equivalent |
Tween 20 | Sigma-Aldrich | P1379 | |
Nonfat dried milk powder | EuroClone | EMR180500 | |
KAPA HiFi HotStart ReadyMix | Kapa Biosystems, Fisher Scientific | 7958935001 | |
AMPure XP beads | Agencourt, Beckman Coulter | A63881 | |
Nextera XT dual Index Primers | Illumina | FC-131-2001 or FC-131-2002 or FC-131-2003 or FC-131-2004 | |
MiSeq or Hiseq2500 | Illumina | ||
Spectrophotomer | Nanodrop | ||
Agilent Bioanalyzer or TapeStation | Agilent | ||
Forward PCR primer | general lab supplier | 5’ TACCTATTGCCTACGGCA GCCGCTGGATTGTTATTACTC 3’ | |
Reverse PCR primer | general lab supplier | 5’ TGGTGATGGTGAGTACTA TCCAGGCCCAGCAGTGGGTTTG 3’ | |
Forward primer for NGS | general lab supplier | 5’ TCGTCGGCAGCGTCAGA TGTGTATAAGAGACAGGCA GCAAGCGGCGCGCATGC 3’; | |
Reverse primer for NGS | general lab supplier | 5’ GTCTCGTGGGCTCGGAGA TGTGTATAAGAGACAGGGG ATTGGTTTGCCGCTAGC 3’; |
Request permission to reuse the text or figures of this JoVE article
Request PermissionThis article has been published
Video Coming Soon
Copyright © 2025 MyJoVE Corporation. All rights reserved