The C-type lectin-like domain (CTLD) is a protein domain family which has been identified in a number of proteins isolated from many animal species (reviewed in Drickamer and Taylor (1993) and Drickamer (1999)). Initially, the CTLD domain was identified as a domain common to the so-called C-type lectins (calcium-dependent carbohydrate binding proteins) and named “Carbohydrate Recognition Domain” (“CRD”). More recently, it has become evident that this domain is shared among many eukaryotic proteins, of which several do not bind sugar moieties, and hence, the canonical domain has been named as CTLD.
CTLDs have been reported to bind a wide diversity of compounds, including carbohydrates, lipids, proteins, and even ice [Aspberg et al. (1997), Bettler et al. (1992), Ewart et al. (1998), Graversen et al. (1998), Mizumo et al. (1997), Sano et al. (1998), and Tormo et al. (1999)]. Only one copy of the CTLD is present in some proteins, whereas other proteins contain from two to multiple copies of the domain. In the physiologically functional unit multiplicity in the number of CTLDs is often achieved by assembling single copy protein protomers into larger structures.
The CTLD consists of approximately 120 amino acid residues and, characteristically, contains two or three intra-chain disulfide bridges. Although the similarity at the amino acid sequence level between CTLDs from different proteins is relatively low, the 3D-structures of a number of CTLDs have been found to be highly conserved, with the structural variability essentially confined to a so-called loop-region, often defined by up to five loops. Several CTLDs contain either one or two binding sites for calcium and most of the side chains which interact with calcium are located in the loop-region.
On the basis of CTLDs for which 3D structural information is available, it has been inferred that the canonical CTLD is structurally characterised by seven main secondary-structure elements (i.e. five β-strands and two α-helices) sequentially appearing in the order β1; α1; α2; β2; β3; β4; and β5 (FIG. 1, and references given therein). In all CTLDs, for which 3D structures have been determined, the β-strands are arranged in two anti-parallel β-sheets, one composed of β1 and β5, the other composed of β2, β3 and β4. An additional β-strand, β0, often precedes β1 in the sequence and, where present, forms an additional strand integrating with the β1, β5-sheet. Further, two disulfide bridges, one connecting α1 and β5 (CI-CIV, FIG. 1) and one connecting β3 and the polypeptide segment connecting β4 and β5 (CII-CIII, FIG. 1) are invariantly found in all CTLDs characterised so far. In the CTLD 3D-structure, these conserved secondary structure elements form a compact scaffold for a number of loops, which in the present context collectively are referred to as the “loop-region”, protruding out from the core. These loops are in the primary structure of the CTLDs organised in two segments, loop segment A, LSA, and loop segment B, LSB. LSA represents the long polypeptide segment connecting β2 and β3 which often lacks regular secondary structure and contains up to four loops. LSB represents the polypeptide segment connecting the β-strands β3 and β4. Residues in LSA, together with single residues in β4, have been shown to specify the Ca2+- and ligand-binding sites of several CTLDs, including that of tetranectin. E.g. mutagenesis studies, involving substitution of single or a few residues, have shown, that changes in binding specificity, Ca2+-sensitivity and/or affinity can be accommodated by CTLD domains [Weis and Drickamer (1996), Chiba et al. (1999), Graversen et al. (2000)].
As noted above, overall sequence similarities between CTLDs are often limited, as assessed e.g. by aligning a prospective CTLD sequence with the group of structure-characterized CTLDs presented in FIG. 1, using sequence alignment procedures and analysis tools in common use in the field of protein science. In such an alignment, typically 22-30% of the residues of the prospective CTLD will be identical with the corresponding residue in at least one of the structure-characterized CTLDs. The sequence alignment shown in FIG. 1 was strictly elucidated from actual 3D structure data, so the fact that the polypeptide segments of corresponding structural elements of the framework also exhibit strong sequence similarities provide a set of direct sequence-structure signatures, which can readily be inferred from the sequence alignment.
The implication is that also CTLDs, for which precise 3D structural information is not yet available, can nonetheless be used as frameworks in the construction of new classes of CTLD libraries. The specific additional steps involved in preparing starting materials for the construction of such a new class of CTLD library on the basis of a CTLD, for which no precise 3D structure is available, would be the following: (1) Alignment of the sequence of the new CTLD with the sequence shown in FIG. 1; and (2) Assignment of approximate locations of framework structural elements as guided by the sequence alignment, observing any requirement for minor adjustment of the alignment to ensure precise alignment of the four canonical cysteine residues involved in the formation of the two conserved disulfide bridges (CI-CIV and CII-CIII, in FIG. 1). The main objective of these steps would be to identify the sequence location of the loop-region of the new CTLD, as flanked in the sequence by segments corresponding to the β2-, β3-, and β4-strands. To provide further guidance in this the results of an analysis of the sequences of 29 bona fide CTLDs are given in Table 1 below in the form of typical tetrapeptide sequences, and their consensus sequences, found as parts of CTLD β2- and β3-strands, and the precise location of the β4-strand by position and sequence characteristics as elucidated.
TABLE Iβ2 and β3 consensus elements analysisSEQIDCTLDβ2         ---               LSA            ---             β3  LSB     β4NOIX-AWIGLRW---QGKVKQCNSEWSDGSSVS--YENWIE--------AESKT-----------CLGLEKETDFRKWVNIYC92 MGLWIGLTDQ--NGP--WRWVDGTDFEKGFKNWAP--------LQPDNWFGHGLGGGEDCAHITTG--GFWNDDVC93 LITWIGLHDPKKNRR--WHWSSGSLVS--YKSWGI--------GAPSSVNP-----GY-CVSLTSSTGFQKWKDVPC94 CHLWIGLTDENQEGE--WQWVDGTDTRSSFTFWKE--------GEPNNRGF-----NEDCAHVWTS--GQWNDVYC95 IGE-WIGLRNLDLKGEFIWV--DGSHVD--YSNWAP--------GEPTSRSQ-----GEDCVMMRGS--GRWNDAFC96FCR TCL-1WIGLTDKDSEGT--WKWVDGTPLT--TAFWST--------DEPNDGAVN----GEDCVSLYYHTQPEFKNWNDLAC97 KUCRWIGLTDQGTEGN--WRWVDGTPFDYVQSRRFWRK--------GQPDWRHGNGE--REDCVHLQ----RMWNDMAC98 CD94WIGLSYSEEHTA--WLWENGSALSQ-YLSFET------------FNTKN-------CIAYNPN--GNALDESC99 CPCPWIGLNDRTIEGDFRWS--DGHPMQ--FENWRP--------NQPDNFFAA----GEDCVVMIWHEKGEWNDVPC100 PAPWIGLHDPTQGTEPNGEG-WEWSSSDVMN--YFAWER--------N-PSTISSPGH-----CASLSRSTAFLRWKDYNC101 NEUWIGLNDRIVEQD--FQWTDNTGLQ--YENWRE--------NQPDNFFAG----GEDCVVLVSHEIGKWNDVPC102 ESLWIGIRKVNNV----WVW-VGTQKPLTEEAKNWAP--------GEPNNRQK-----DEDCVEIYIKREKDVGMWNDERC103 NKg2AWIGVFRNSSHHP--WVTMNGLAFKHEIKDSDNA--------------------ELNCAVLQV---NRLKSAQC104 GP120WMGLSDLNQEGT--WQWVDGSPLLPS-FKQYWNR--------GEPNNVG------EEDCAEFSGN--G-WNDDKC105 MMRWIGLFRNV-EGT--WLWINNSPVS--FVNWNT--------GDPSGE-------RNDCVALHASS-GFWSNIHC106 TNWLGLNDMAAEGT----WVDMTGARIAYKNWETEIT-----AQPDGGK------TENCAVLSGAANGKWFDKRC107 SCGFWLGVHDRRAEGL--YLFENGQRVS--FFAWHRSPRPELGAQPSASPHPLSPDQPNGGT------LENCVAQASDD-GSWWDHDC108 PLCWLGASDLNIEGR--WLW-EGQRRMN-YTNWSP--------GQPDNAGG-----IEHCLELRRDLGNYLWNDYQC109 H1-WMGLHD--QNGP--WKWVDGTDYETGFKNWRP--------EQPDDWYGHGLGGGEDCAHFTDD--GRWNDDVC110ASR IX-BWMGLSNVWNQCN--WQWSNAAMLR--YKAWAE--------ESY-------------CVYFKSTN-NKWRSRAC111 LY49AWVGLSYDNKKKD--WAWIDNRPSKLALNTRKY--------NIRDGG----------CMLLSKT----RLDNGNC112 TU14WVGADN-LQDGAYNFNWNDGVSLPTDSDLWSP--------NEPSNPQSWQL-----CVQIWSKY-NLLDDVGC113 rSP-AYLGMIEDQTPGD--FHYLDGASVN--YTNWYP--------GEPRGQG------KEKCVEMYTD--GTWNDRGC114 BCONYLSMNDISTEGR--FTYPTGEILV--YSNWAD--------GEPNNSDEGQ---PENCVEIFPD--GKWNDVPC115 BCL43YLSMNDISKEGK--FTYPTGGSLD--YSNWAP--------GEPNNRAKDEG--PENCLEIYSD--GNWNDIEC116 MBP-AFLGITDEVTEGQ--FMYVTGGRLT--YSNWKK--------DEPNDHGS-----GEDCVTIVDN--GLWNDISC117 SP-DFLSMTDSKTEGK--FTYPTGESLV--YSNWAP--------GEPNDDGG-----SEDCVEIFTN--GKWNDRAC118 CL-L1FIGVNDLEREGQ--YMFTDNTPLQN-YSNWNE--------GEPSDPYG-----HEDCVEMLSS--GRWNDTEC119 DCIRFVGLSDP--EGQRHWQWVDQTP----YNESSTFWHP--------REPSDPN-------ERCVVLNFRKSPKRWG-WNDVNC120Notes:LSA, Loop Segment A; LSB, Loop Segment B. Sequences taken from: Berglund and Petersen (1992) [TN, tetranectin]; Bartrand et al. (1996) [LIT, lithostatin]; Mann et al. (2000) [MGL, mouse macrophage galactose lectin, KUCR, Kupffer cell receptor, NEU, chicken neurocan, PLC, perlucin, H1-ASR, asialoglycoprotein receptor]; Mio et al. (1998) [CPCP, cartilage proteoglycan core protein, IGE-FCR, IgE Fc receptor, PAP, pancreatitis-associated protein, MMR, mouse macrophage receptor, NKG2, Natural Killer group, SCGF, stem cell growth factor]; Mizuno et al. (1997) [IX-A and B, factor IX/X binding protein, MBP, mannose binding protein]; Obtani et al. (1999) [BCON, bovine conglutinin, BCL43, bovine CL43, CL-L1, collectin liver 1, SP-A, surfactant protein A, SP-D, surfactant protein D]; Poget et al. (1999) [ESL, e-selectin, TU14, tunicate c-type lectin]; Tormo et al. (1999) [CD94, CD94 NK receptor domain, LY49A, LY49A NK receptor domain]; Zhang et al. (2000) [CHL, chicken hepatic lectin, TCL-1, trout c-type lectin, GP120, HIV gp 120-binding c-type lectin, DCIR, dendritic cell immuno receptor]
Of the 29β2-strands,                14 were found to conform to the consensus sequence WIGX [SEQ ID NO: 305] (of which 12 were WIGL [SEQ ID NO: 306] sequences, 1 was a WIGI [SEQ ID NO: 307] sequence and 1 was a WIGV [SEQ ID NO: 308] sequence);        3 were found to conform to the consensus sequence WLGX [SEQ ID NO: 309] (of which 1 was a WLGL [SEQ ID NO: 310] sequence, 1 was a WLGV [SEQ ID NO: 311] sequence and 1 was a WLGA [SEQ ID NO: 312] sequence);        3 were found to be WMGL [SEQ ID NO: 313] sequences;        3 were found to conform to the consensus sequence YLXM [SEQ ID NO: 314] (of which 2 were YLSM [SEQ ID NO: 315] sequences and 1 was an YLGM [SEQ ID NO: 316] sequence);        2 were found to conform to the consensus sequence WVGX [SEQ ID NO: 317] (of which 1 was a WVGL [SEQ ID NO: 318] sequence and 1 was a WVGA [SEQ ID NO: 319] sequence); and        the sequences of the remaining 4 β2-strands in the collection were FLGI [SEQ ID NO:320], FVGL [SEQ ID NO:321], FIGV [SEQ ID NO: 322] and FLSM [SEQ ID NO: 323] sequences, respectively.        
Therefore, it is concluded that the four-residue β2 consensus sequence (“β2cseq”) may be specified as follows:                Residue 1: An aromatic residue, most preferably Trp, less preferably Phe and least preferably Tyr.        Residue 2: An aliphatic or non-polar residue, most preferably Ile, less preferably Leu or Met and least preferably Val.        Residue 3: An aliphatic or hydrophilic residue, most preferably Gly and least preferably Ser.        Residue 4: An aliphatic or non-polar residue, most preferably Leu and less preferably Met, Val or Ile.        
Accordingly the β2 consensus sequence may be summarized as follows:
β2cseq:(W, Y, F)-(I, L, V, M)-(G, S)-(L, M, V, I),                where the underlined residue denotes the most commonly found residue at that sequence position.        
All 29 β3-strands analysed are initiated with the CYSII residue canonical for all known CTLD sequences, and of the 29 β3-strands,                5 were found to conform to the consensus sequence CVXI [SEQ ID NO: 324] (of which 3 were CVEI [SEQ ID NO: 325] sequences, 1 was a CVTI [SEQ ID NO: 326] sequence and 1 was a CVQI [SEQ ID NO: 327] sequence);        4 were found to conform to the consensus sequence CVXM [SEQ ID NO: 328] (of which 2 were CVEM [SEQ ID NO: 329] sequences, 1 was a CVVM [SEQ ID NO: 330] sequence and 1 was a CVMM [SEQ ID NO: 331] sequence);        6 were found to conform to the consensus sequence CVXL [SEQ ID NO: 332] (of which 2 were CVVL [SEQ ID NO: 333] sequences, 2 were a CVSL [SEQ ID NO: 334] sequence, 1 was a CVHL [SEQ ID NO: 335] sequence and 1 was CVAL[SEQ ID NO: 336] sequence);        3 were found to conform to the consensus sequence CAXL [SEQ ID NO: 337] (of which 2 were CAVL [SEQ ID NO: 338] sequences and 1 was a CASL [SEQ ID NO: 339] sequence);        2 were found to conform to the consensus sequence CAXF [SEQ ID NO: 340] (of which 1 was 1 CAHF [SEQ ID NO: 341] sequence and 1 was a CAEF [SEQ ID NO: 342] sequence);        2 were found to conform to the consensus sequence CLXL [SEQ ID NO: 343] (of which 1 was a CLEL [SEQ ID NO: 344] sequence and 1 was a CLGL [SEQ ID NO: 345] sequence); and        the sequences of the remaining 7 β3-strands in the collection were CVYF [SEQ ID NO: 346], CVAQ [SEQ ID NO: 347], CAHV [SEQ ID NO: 348], CAHI [SEQ ID NO: 349], CLEI [SEQ ID NO: 350], CIAY [SEQ ID NO: 351], and CMLL [SEQ ID NO: 352] sequences, respectively.        
Therefore, it is concluded that the four-residue β3 consensus sequence (“β3cseq”) may be specified as follows:                Residue 1: Cys, being the canonical CysII residue of CTLDs        Residue 2: An aliphatic or non-polar residue, most preferably Val, less preferably Ala or Leu and least preferably Ile or Met        Residue 3: Most commonly an aliphatic or charged residue, which most preferably is Glu        Residue 4: Most commonly an aliphatic, non-polar, or aromatic residue, most preferably Leu or Ile, less preferably Met or Phe and least preferably Tyr or Val.        
Accordingly the β3 consensus sequence may be summarized as follows:
β3cseq:(C)-(V, A, L, I, M)-(E, X)-(L, I, M, F, Y, V),                where the underlined residue denotes the most commonly found residue at that sequence position.        
It is observed from the known 3D-structures of CTLDs (FIG. 1), that the β4-strands most often are comprised by five residues located in the primary structure at positions −6 to −2 relative to the canonical CysIII residue of all known CTLDs, and less often are comprised by four residues located at positions −5 to −2 relative to the canonical CysIII residue of all known CTLDs. The residue located at position −3, relative to CysIII, is involved in co-ordination of the site 2 calcium ion in CTLDs housing this site, and this notion is reflected in the observation, that of the 29 CTLD sequences analysed in Table 1, 27 have an Asp-residue or an Asn-residue at this position, whereas 2 CTLDs have a Ser at this position. From the known CTLD 3D-structures it is also noted, that the residue located at position −5, relative to the CysIII residue, is involved in the formation of the hydrophobic core of the CTLD scaffold. This notion is reflected in the observation, that of the 29 CTLD sequences analysed 25 have a Trp-residue, 3 have a Leu-residue, and 1 an Ala-residue at this position. 18 of the 29 CTLD sequences analysed have an Asn-residue at position −4. Further, 19 of the 29 β4-strand segments are preceded by a Gly residue.
Of the 29 central three residue motifs located at positions −5, −4 and −3 relative to the canonical CysIII residue in the β4-strand:                22 were of the sequence WXD (18 were WND, 2 were WKD, 1 was WFD and 1 was WWD),        2 were of the sequence WXN (1 was WVN and 1 was WSN), and the remaining 5 motifs (WRS, LDD, LDN, LKS and ALD) were each represented once in the analysis.        
It has now been found that each member of the family of CTLD domains represents an attractive opportunity for the construction of new protein libraries from which members with affinity for new ligand targets can be identified and isolated using screening or selection methods. Such libraries may be constructed by combining a CTLD framework structure in which the CTLD's loop-region is partially or completely replaced with one or more randomised polypeptide segments.
One such system, where the protein used as scaffold is tetranectin or the CTLD domain of tetranectin, is envisaged as a system of particular interest, not least because the stability of the trimeric complex of tetranectin protomers is very high (International Patent Application Publication No. WO 98/56906 A2).
Tetranectin is a trimeric glycoprotein [Holtet et al. (1997), Nielsen et al. (1997)], which has been isolated from human plasma and found to be present in the extracellular matrix in certain tissues. Tetranectin is known to bind calcium, complex polysaccharides, plasminogen, fibrinogen/fibrin, and apolipoprotein (a). The interaction with plasminogen and apolipoprotein (a) is mediated by the so-called kringle 4-protein domain therein. This interaction is known to be sensitive to calcium and to derivatives of the amino acid lysine [Graversen et al. (1998)].
A human tetranectin gene has been characterised, and both human and murine tetranectin cDNA clones have been isolated. Both the human and the murine mature protein comprise 181 amino acid residues (FIG. 2). The 3D-structures of full length recombinant human tetranectin and of the isolated tetranectin CTLD have been determined independently in two separate studies [Nielsen et al. (1997) and Kastrup et al. (1998)]. Tetranectin is a two- or possibly three-domain protein, i.e. the main part of the polypeptide chain comprises the CTLD (amino acid residues Qly53 to Val181), whereas the region Leu26 to Lys52 encodes an alpha-helix governing trimerisation of the protein via the formation of a homotrimeric parallel coiled coil. The polypeptide segment Glu1 to Glu25 contains the binding site for complex polysaccharides (Lys6 to Lys15) [Lorentsen et al. (2000)] and appears to contribute to stabilisation of the trimeric structure [Holtet et al. (1997)]. The two amino acid residues Lys148 and Glu150, localised in loop 4, and Asp165 (localised in β4) have been shown to be of critical importance for plasminogen kringle 4 binding, whereas the residues Ile140 (in loop 3) and Lys166 and Arg167 (in β4) have been shown to be of some importance [Graversen et al. (1998)]. Substitution of Thr149 (in loop 4) with an aromatic residue has been shown to significantly increase affinity of tetranectin to kringle 4 and to increase affinity for plasminogen kringle 2 to a level comparable to the affinity of wild type tetranectin for kringle 4 [Graversen et al. (2000)].