ALKBH5 inhibitor 2 – Omilancor Chemical

Title: Multi-Protein Dynamic Combinatorial Chemistry: A Novel Strategy that Leads to Simultaneous Discovery of Subfamily-Selective Inhibitors for Nucleic Acid Demethylases FTO and ALKBH3

Abstract: Dynamic combinatorial chemistry (DCC) is a powerful supramolecular approach for discovering ligands for biomolecules. To date, most, if not all biologically-templated DCC employ only a single biomolecule in directing the self-assembly process. To expand the scope of DCC, herein, we developed a novel multi-protein DCC strategy which combines the discriminatory power of zwitterionic ‘thermal-tag’ with the sensitivity of differential scanning fluorimetry. This strategy is highly sensitive and could differentiate the binding of ligands to structurally-similar subfamily members. Through this strategy, we were able to simultaneously identify subfamily-selective probes against two clinically-important epigenetic enzymes, FTO (7; IC50 = 2.6 µM) and ALKBH3 (8; IC50 = 3.7 µM). To our knowledge, this is the first report of a subfamily-selective ALKBH3 inhibitor. The developed strategy could, in principle, be adapted to a broad range of proteins, thus it shall be of broad scientific interest.

Introduction

Dynamic combinatorial chemistry (DCC) is a powerful supramolecular approach for discovering ligands for biological targets. The idea was first independently conceived and developed by the Sanders and Lehn groups (for excellent reviews, see Refs 1-10).[1-10] In the DCC method, simple building blocks are linked together by reversible covalent chemistry to generate dynamic libraries of structures. Because of the reversible nature of these libraries, they are highly responsive to external influence, such that the introduction of a template triggers rapid structural adaptation of the library members, resulting in the assembly of structures that are highly complementary to the template.

To date, DCC has been successfully applied to a range of biological templates, including enzymes, receptors, transmembrane transporters, nucleotides, and polymer-supported targets.[11-17] However, most, if not all, such biologically-templated DCC approaches employ only a single template in directing the self- assembly process; this severely limits the applications of DCC approaches.

To expand the scope and potential of DCC, in this study, we explored the concept of multi-protein DCC, where two or more protein templates were used concurrently in the same dynamic system. We envisaged that such a strategy would enable the discovery of ligands against several proteins of interest simultaneously, thus greatly multiply the power and efficiency of DCC. Another distinct advantage of the multi-protein DCC approach is that it permits the use of several structurally and/or functionally related protein isoforms in concert. This ensures that only ligands that are highly selective for a particular protein isoform or target will be assembled and identified. Although it is also possible to achieve highly selective inhibition from conventional DCC approaches (for instance, Greaney et al.[18] successfully identified isozyme-selective glutathione S-transferases inhibitors using an acyl hydrazone-based DCC), single-templated DCC only generate hits against one target at a time.

The development of the proposed multi-protein DCC system poses a number of analytical challenges. Not only is there a requirement to analyse several protein-ligand interactions simultaneously, it is also necessary to assign binding of ligands to specific proteins in a complex, multi-component mixture. Furthermore, structures generated from reversible self-assembly system are constantly exchanging, and this further complicates their analysis.

Currently, only few methods have been reported for the analysis of single-templated DCC,[8,19] this include HPLC analysis,[20,21] NMR studies,[22-24] X-ray crystallography,[25] size-exclusion chromatography (SEC)[26,27] and, more recently, the use of polymer-scaffolded (PS-DCL),[28] DNA[29-32] or peptide nucleic acids (PNA)-encoded[33,34] DCC libraries. However these methods only detect amplification of the best binders in the presence of the template, and could not distinguish which template is responsible for the observed amplification, hence they are unsuitable for the proposed multi-protein DCC system. Although it is possible to directly observe different protein-ligand complexes using native protein mass spectrometric (MS) techniques, the results of MS analyses are not always representative of what exists in solution, since different non-covalent complexes survive the transition from solution to gas phase differently.[35] Indeed, our group[14] and that of Poulsen[36] have previously observed fragmentation of protein- ligand complexes under certain MS ionisation conditions. To date, no method exists which allow the analysis of multiple protein- ligand interactions in a dynamic system. This severely limits the potential applications of DCC-based approaches.

Recent studies demonstrated that zwitterionic polymers, in general, are able to confer a wide range of biophysical properties when conjugated to proteins, such as increased water solubility, pH resistance, and antifouling property.[37-40] More recently, Jiang et al. showed that poly(carboxybetaine) polymers and poly(Glu-Lys) polypeptides were able to improve the thermal stability of fusion partner proteins, rendering them more resistant to heat inactivation.[41,42]

Inspired by these interesting observations, herein, we present a novel detection strategy, which enables the simultaneous analysis of several protein-ligand complexes in a multi-protein DCC system (Figure 1A). It combines the discriminatory power of ‘thermal tag’ with the sensitivity of differential scanning fluorimetry (DSF) technique. In this approach, the proteins of interest are genetically labelled with a zwitterionic peptide-based thermal tags, which by design are capable of modifying the thermal stability of the host proteins without disrupting their structural and functional properties. The expectation is that, by appending appropriate thermal tags, one could specifically fine-tune the melting temperatures (Tm) of the proteins such that their individual melting profiles could be simultaneously monitored in a single DSF melting analysis. When used in combination with DCC, formation of protein-ligand complexes can be easily detected by an increase in Tm of the proteins that are engaged in ligand binding.

This approach is conceptually similar to DSF-based ‘thermal shift assay’ which has been widely used for the detection of protein- ligand interactions with stable ligands.[43] However, prior to this work, it is not known whether DSF could be applied to the analysis of dynamic library. We are also not aware of the combined use of ‘thermal tag’ with DSF to facilitate the study of multiple protein- ligand interactions. As we shall demonstrate, multi-protein DSF is a simple, yet powerful strategy for monitoring protein-templated self-assembly. Through this method, one is able to assess Tm shift and thermodynamic parameters of an equilibrating system in a single, rapid DSF experiment. We further provide proof of principle that the combined use of multi-protein DSF strategy and DCC can enable simultaneous templating of several target proteins. This is demonstrated by the concurrent discovery of subfamily-selective inhibitors against two clinically-important epigenetic enzymes, FTO and ALKBH3.

To the best of our knowledge, the present study represents not only the first report of multi-protein DCC method, but also a new strategy for probing dynamic chemical systems, which hopefully will further our understanding of protein-directed self-assembly, and inspire new applications for DCC-based approaches.

Results and Discussion

Thermal tag design and protein engineering

In this study, we selected three members of the iron- and 2-oxoglutarate (2OG)-dependent AlkB oxygenases, namely FTO, ALKBH3 and ALKBH5, as our protein templates.[44] These AlkB proteins are currently of intense biological and medical interest because of their critical roles in several key cellular processes, such as epigenetic gene regulation and RNA metabolism.[45-47] It is also increasingly clear that dysregulation of these enzymes may underlie the pathogenesis of a range of human diseases, including metabolic diseases, neurodegenerative diseases and cancers.[48-51] However, despite their clinical significance, to date, there have only been few reports of inhibitors that selectively target these enzymes, in part due to their close structural similarity, which renders the development of subfamily-selective inhibitors particularly challenging. In this regard, the selection of AlkB subfamilies for the present study not only allows us to 1) evaluate the power of multi-protein DCC strategy to simultaneously discover probes against several clinically-important targets, but also 2) its potential in directing the self-assembly of probes with subfamily-selectivity.

To implement the proposed strategy (Figure 1A), we first designed a series of thermal tags that are capable of conferring different melting temperatures to the AlkB proteins. As potential candidates, we considered short zwitterionic peptides, that are 6, 12 or 18 residues in length, with either alternating Lys/Glu (KE) sequences (abbreviated as KE6-, KE12-, and KE18-tags) or alternating Arg/Asp (RD) sequences (abbreviated as RD6-, RD12-, and RD18-tags) (Figure 1B). Recent studies suggested that zwitterionic polymers such as poly(carboxybetaine) polymers and poly(Glu-Lys) polypeptides can increase the thermal resistance of fusion partner proteins.[41,42] However, we are not aware of the use of zwitterionic peptides to specifically modulate the melting temperature of proteins. The effect of zwitterionic peptides on the melting behaviour of proteins has also not been systematically studied.

To investigate the suitability of zwitterionic peptides as thermal tags, the KE and RD sequences were genetically appended to the C-terminus of FTO, ALKBH3 and ALKBH5 using protein engineering techniques; modelling analysis suggests that fusion of a relatively short peptide to the C-terminus of AlkB subfamilies is likely to be minimally disruptive on the structure and function of the protein (Figure 1C, for details of gene constructs and protein expression, see Supporting Information). Successful expression of all recombinant fusion proteins was confirmed by SDS-PAGE and ESI-MS analyses (Figure S1).

Introduction of a single thermal tag triggered dramatic increase in the thermal stability and melting temperature of the AlkB subfamilies
We initially investigated the impact of thermal tags on protein melting characteristics. DSF-based melting analyses showed that all KE-tagged and RD-tagged ALKBH5 fusion proteins unfold cooperatively to produce monophasic melting transitions similar to that observed with unmodified ALKBH5, implying that the proteins remained stably folded when conjugated with KE or RD tag (Figures 2A and S2A). The melting temperatures of the ALKBH5 fusion proteins are however significantly higher than that of unmodified ALKBH5 (Tm = 40.3 ± 0.1 °C; Table 1), suggesting a marked improvement in their thermal stability. The extent of thermal stabilisation appears to be dependent on the length of the thermal tag used, as demonstrated by observation that relatively short KE6 tag increases the Tm of ALKBH5 by 2.1 °C, whilst longer KE12 and KE18 tags triggered a more dramatic Tm increase of 4.9 °C and 9.2 °C, respectively. Similar trend (i.e. longer tag leading to larger Tm increase) was also observed with RD tags, although correspondingly smaller Tm increases of 1.8 °C,3.4 °C and 8.2 °C were produced by RD6, RD12 and RD18 tags, respectively. Thus the RD tags are slightly less effective in improving protein thermal stability than KE tags of the same length.

Interestingly, the thermal stabilisation effect of KE and RD tags also extends to other AlkB subfamilies, such as FTO and ALKBH3, where modification of both proteins with either KE or RD tag again triggered remarkable increase in their Tms (Figures 2 and S2). As observed for ALKBH5, FTO and ALKBH3 containing longer tags exhibited a larger Tm increase than their counterparts with shorter tags (Table 1). However, KE tag of the same length appears to have a greater impact on ALKBH3 than on FTO and ALKBH5. For instance, KE18 tag produced a Tm increase of 12.3 °C in ALKBH3, but only 10.3 °C in FTO, and 9.2 °C in ALKBH5 (Table 1). In contrast, RD tag of the same length triggered a similar Tm increase in all three proteins.

These findings are notable because it suggests that the attachment of a single KE or RD tag as short as six residues in length is sufficient to induce a considerable change in the thermal stability and melting temperature of the fusion protein. It further suggests that one can, in principle, fine-tune the melting temperature of AlkB subfamilies by incorporating KE or RD tag of appropriate length. The mechanism by which the KE/RD-tags enhances the melting temperatures of their partner proteins is unclear at present, nevertheless, it has been suggested that zwitterionic peptide, being strongly hydrophilic in nature, has the propensity to pull water away from the hydrophobic regions of the protein.[41,42] Presumably this stabilises the protein folded structures and thus increases its thermal stability and melting temperature. Alternatively, it could also result from ionic interaction between the protein and the zwitterionic tag or a
change in the oligomerisation state of the fusion protein. Further studies are needed to understand the underlying mechanisms.

The thermal tags have little impact on the overall conformation of the AlkB subfamilies

Intriguingly, despite having a profound impact on thermal stability, the thermal tags do not significantly alter the overall secondary structures of the AlkB subfamilies, as clearly demonstrated by far-UV circular dichroism (CD) spectroscopy. In particular, all KE- and RD-tagged ALKBH5 share well-ordered secondary structure elements, with a positive maximum at 194 nm and a negative band at 215 nm, which are similar to that observed for unmodified ALKBH5 (Figures 2D and S2D). Spectral fitting further indicates approximately equal α-helical (24-27%) and β‐sheet contents (26-28%) in all ALKBH5 fusion proteins, in agreement with the reported X‐ray crystal structure of native ALKBH5 (PDB ID: 4NJ4).[52] Similarly, the CD characteristics and secondary structure contents of FTO and ALKBH3 fusion proteins are also highly consistent with that of their unmodified equivalents, suggesting that all thermally-tagged proteins continued to adopt a natively folded conformations (Figures 2E,F and S2; PDB ID: 4CXW[53] and 2IUW[54]).

The thermal tags do not alter the catalytic activity of the AlkB subfamilies

We were also able to demonstrate that the thermal tags have little impact on the catalytic activities of AlkB demethylases. Notably, ALKBH5 and FTO preferentially demethylate N6-methyladenosine (m6A) substrates, whilst ALKBH3 favours N1-methyladenosine (m1A) substrates.[55,56] Consistent with their inherent substrate specificity, detailed kinetic analyses revealed that ALKBH5 and FTO modified with either KE or RD tag displayed similar affinities (Km) and catalytic efficiencies (kcat/Km) for m6A substrate as their unmodified proteins (Figures 2G,H and S2), whilst ALKBH3 conjugated with either KE or RD tag retained > 90% of its canonical demethylase activity against m1A substrate (Figures 2I and S2I). This finding is interesting because, contrary to our result, previous study showed that the attachment of larger zwitterionic peptides, such as poly(Glu-Lys) 10KDa and 30KDa in length, in fact led to significant increase in the substrate affinities of β-lactamases.[42] Apparently the greater number of charged residues in longer peptides confer stronger hydrophilic properties which promotes hydrophobic interactions between the substrate and its binding site. Our result therefore suggests that the effects of zwitterionic tags on protein activity could be reduced through the use of relatively short 6-18mer peptide sequences. Moreover the apparent lack of influence on protein conformations and catalytic activities further suggests that both KE and RD tags likely exist as separate entities, structurally and functionally independent from the proteins to which they were attached.

ALKBH5-KE6 (Tm = 42.4 °C), FTO-KE12 (Tm = 52.6 °C) and

ALKBH3-KE18 (Tm = 61.7 °C); these proteins were selected as they have reasonably large difference in Tm between them. We appreciate that most proteins undergo aggregation during thermal unfolding, we were therefore concerned that protein aggregates derived from the melting of one protein might interfere with the melting behaviour of other proteins within the same mixture. Nevertheless, in our multi-protein DSF analysis, three distinct denaturation peaks were observed which resembled the melting profile of ALKBH5-KE6 superimposed with that of FTO-KE12 and ALKBH3-KE18, suggesting that mixing of fusion proteins do not significantly alter their individual melting characteristics (Figures 3A,B). Consistent with our detection strategy, there was also a marked separation of individual melting peaks (Tm > 9 °C), which enables any ligand-induced Tm shift to be clearly detected without merging of peaks, as demonstrated by ligand binding experiments with 2,4-pyridine dicarboxylic acid 1 (a known ‘generic’ inhibitor of AlkB subfamilies),[57] and FTO-selective inhibitor 2 (PubChem ID : 126970771; Figure 3C,D).[53] By comparison, DSF analysis of a mixture of untagged ALKBH5, FTO, and ALKBH3 produced an overlapping melting profile which is difficult to deconvolute (Figure 3E). Thus the combined use of thermal tags with DSF enables the simultaneous analysis of different fusion proteins in a mixture, even structurally-similar subfamily members which is otherwise challenging to differentiate.

Taken together, our results clearly demonstrated that both KE- and RD-tags are highly versatile thermal tags that are able to exclusively fine-tune the thermal stability and melting temperature of AlkB proteins, without compromising their structural conformation and catalytic activity. To our knowledge, this is the first report demonstrating that zwitterionic peptides could provide a general strategy for bioengineering proteins with the desired melting characteristics.

The thermal tags provide a basis for recognition and analysis of multiple proteins in a mixture

To establish whether thermal tag-induced Tm change could facilitate the discrimination of different fusion proteins in the same mixture, we performed one-pot DSF analysis of a mixture of Notably, in our multi-protein DSF strategy, different thermal tags were applied to different AlkB subfamilies. Hence one possible issue is that fusion proteins bearing longer, more stabilising tags might be less sensitive to further stabilisation by ligands than those with shorter tags. However, titration of each fusion protein against increasing concentrations of 1 revealed that proteins with KE18 tag exhibited similar concentration-dependent increase in Tm as those with KE12 and KE6 tags (Figure S3). Thus the length of thermal tag used does not significantly impact the magnitude of ligand-induced Tm shift.

Analysis of self-assembly dynamics with multi-protein DSF detection strategy

On the basis of these promising results, we next investigated whether the developed detection strategy could be used in combination with DCC to achieve multi-protein targeting. As our model dynamic reaction, we chose the reversible acyl hydrazone reaction between scaffold ligand 3 and aldehydes (Figure 4A). The acyl hydrazone exchange is one of the most well-established reversible reactions, first introduced by the Sanders group[58] and subsequently by others[18,59,60] for the preparation of dynamic library. By design, scaffold 3 contains a hydrazide moiety that is free to participate in acyl hydrazone exchange with a set of aldehydes. It also contains a pyridyl function which, based on modelling analysis, is expected to bind to the AlkB subfamilies via chelation with the active site Fe(II). Another notable feature is that 3 exhibits similar binding affinity for ALKBH5-KE6 (KD,app = 37.9 ± 4 μM; Tm = 1.5 °C), FTO-KE12 (KD,app = 40.4 ± 4 μM; Tm = 0.7 °C) and ALKBH3-KE18 (KD,app = 31.0 ± 3 μM ; Tm = 1.4 °C), thus it does not inherently favour any particular protein template KE12 and ALKBH3-KE18 increased steadily over time, giving Tm shifts of 6.1 °C and 4.0 °C, respectively after 5 h incubation (Figure 5D,H).

This was accompanied with an increase in amplification of 5e and 5h to 35% and 32% of total adduct concentration, respectively (Figure 5I). Detailed analysis revealed an excellent linear relationship between Tm shift and adduct concentration, hence the magnitude of Tm shift provides a direct indication of the extent of adducts amplification (Figure 5J). Although some degree of fluorescence quenching occurred in the presence of the library components, the observed Tm shifts were highly reproducible (standard deviations < 1 °C). To determine if heating of the DCL during DSF analysis could affect DCL equilibration, we performed HPLC analysis of the DCL at various temperatures, ranging from 37 °C to 70 °C (Figure S5). The results revealed no significant change in DCL equilibration even when heated to a temperature of 70 °C, although increasing DCL temperature did cause a decrease in retention time, which led to the merging of some adduct peaks. The identity of the preferentially binding hydrazones was subsequently confirmed by separate DSF experiments in which the ten aldehydes 4a-j were individually mixed with 3, and then analysed for Tm shift with the fusion proteins. This established adducts 5e and 5h to be the specific binder for FTO-KE12 and ALKBH3-KE18, respectively. Consistent with DCC result, no Tm shifts were observed for ALKBH5-KE6 for all combinations of 3 and aldehydes. The combined use of multi-protein DSF detection with DCC enables multi-protein targeting We appreciate that, to date, DSF analysis has primarily been used to study the binding of stable ligands to protein. Therefore Tm shift data derived from labile and interchanging protein-ligand complexes may not reflect actual ligand binding affinity. To verify our multi-protein DCC result, we determined the binding affinities of the identified hits 5e and 5h using NMR-based water relaxation substitute for Fe(II)) and scaffold 3 (40 μM), in the presence of ten aldehydes 4a-j, each aldehyde at 20 μM (library A; Figure 4B). The use of a fairly low concentration of aldehydes (10-fold relative to individual protein concentration) was a deliberate strategy to only detect high affinity ligand. To facilitate rapid acyl hydrazone exchange, the reaction was performed under slightly acidic condition (pH 6.0) in the presence of aniline (5 mM), which serves as a nucleophilic catalyst.[18,61] The dynamic exchange was monitored using both DSF and HPLC analyses (Figure 5 and S4). Initially at t0, the reaction mixture consisted predominantly of scaffold 3 and free aldehydes, with no apparent hydrazone adduct formation or Tm shift (Figure 5A,E). After a 1-hour incubation, a distinct Tm shift could be detected for FTO-KE12 (Tm = 3.9 °C; Figure 5B). This was likely induced by specific binding of 5e to FTO-KE12, as supported by the concomitant amplification of hydrazone adduct 5e in HPLC analysis (Figure 5F). As the reaction progressed, amplification of a second hydrazone adduct 5h became apparent at 2 h, which resulted in a slight Tm shift for ALKBH3-KE18 (Tm = 2.7 °C; Figure 5C,G). The Tms of FTO-were determined by monitoring the bulk water relaxation rate, which decreased when water access to paramagnetic Mn(II) in the active site was hindered through binding of ligands to the metal. Our NMR assays support results from DCC, where a substantial increase in the binding affinity of 3 for FTO-KE12 (KD,app = 40.4 ± 4μM) was observed only in the presence of aldehyde 4e (KD,app (5e) = 0.46 ± 0.1 μM; Figure 5K); the presence of other aldehydes has negligible effect on the binding affinity of 3 (Figure S6). As anticipated, the affinity of 3 for ALKBH3-KE18 (KD,app = 31.0 ± 3 μM) was improved only when combined with aldehyde 4h (KD,app (5h) = 1.0 ± 0.2 μM; Figure 5L), and not with other aldehydes (Figure S7). Finally, the binding of 3 to ALKBH5- KE6 (KD,app = 37.9 ± 4 μM) was also not significantly altered by any of the aldehydes investigated (Figure S8). Thus there is qualitative agreement between Tm shift data derived from DCC screen and ligand binding affinities. Our results therefore clearly demonstrated that DSF technique can indeed be used to monitor and analyse self-assembly dynamics in multi-component DCC system. Importantly, the successful identification of adducts 5e and 5h validates the capability of our multi-protein DCC approach to identify highly selective ligands against several targets simultaneously. Notably, adducts 5e differ from 5h by only a single isosteric replacement of para-CF3 with ortho-Me group, thus this approach has the sensitivity to detect subtle active site selectivity between structurally-related subfamily members. Comparing the effects of single-templating and multi- templating on ligand selection We next compared the impact of single-templating and multi-templating on ligand selection. We performed another set of DCC experiments in which the dynamic library was constituted by mixing scaffold 3 and aldehydes 4k-t (library B, Figure 4B). When the dynamic library was screened against each protein individually (single-templating), we observed substantial binding of adduct 5k to ALKBH5-KE6 (Tm = 4.8 °C), FTO-KE12 (Tm = 3.0 °C), and ALKBH3-KE18 (Tm = 3.4 °C; Figure S9A-C). However, when the same dynamic library was screened against all three proteins concurrently (multiple-templating), the binding of 5k to each protein was reduced to near zero (Tms 0.1-0.3 °C; Figure S9D). DSF-based titration experiments further revealed that when all three proteins were present, a considerably higher concentration of 5k (31 µM) was required to produce a significant Tm shift of 1.0 °C in FTO-KE12, compared to just 3 µM when only FTO-KE12 was present (Figure S10). In sharp contrast, the binding of selective ligands to their protein targets was not affected by the presence or absence of other proteins. For instance, the binding of 5e (10 µM) to FTO-KE12 elicits a similar Tm shift of 5 °C whether in the presence of a single or multiple protein templates (Figure S9E,F). Importantly, in dynamic library containing both selective and non-selective adducts 5e and 5k (set up by adding aldehyde 20 µM 4e to the above dynamic library), only the specific binding of 5e to FTO-KE12 could be detected (Figure S9G). Taken together, the results suggest that the extent of Tm shifts in multi-templating DCC experiment is primarily dependent on the affinity of the ligands for the target proteins. Presumably, strong binders, such as 5e, dominates the DCL equilibrium at the expense of weak binders, such as 5k, whilst weak binders dominate the DCL equilibrium over the non- binders. Multi-protein DCC led to the identification of subfamily- selective probes for FTO and ALKBH3 To establish whether the hits identified from our multi-protein DCC screen were in fact active inhibitors of AlkB demethylases, we synthesised stable analogues of 5c, 5e, 5h and 5k, wherein the relatively labile acyl hydrazone moiety was replaced with a stable sulphonamide moiety (Scheme 1). It should be emphasised that the key result of any DCC experiment is the information about the most effective structural combinations of building blocks. Hence labile groups like hydrazones are often replaced to form stable ligands.[63] The resulting stable analogues, 6-9, were evaluated for activity against a panel of untagged AlkB subfamilies using both HPLC- based demethylase assay and Tm shift assay (for assay conditions, see Supporting Information). As shown in Figure 6, the Tm shift and inhibition data are in close agreement with multi-protein DCC results. In particular, compounds 7 (analogue of 5e) and 8 (analogue of 5h) were indeed potent inhibitors of FTO (IC50 = 2.6 µM, Tm = 8.1 °C) and ALKBH3 (IC50 = 3.7 µM, Tm = 7.4 °C), respectively, with IC50 values in low micromolar range, which represents more than 10-fold improvement in activity compared with scaffold ligand 3. Notably, compound 9 (analogue of 5k), which was identified by the single-templated DCC, was also found to be a relatively potent, non-selective inhibitor, and control compound 6 (analogue of 5c) a poor inhibitor against all AlkB enzymes investigated (IC50 > 25 µM).

The selectivity profiles of compounds 7 and 8 are also strikingly consistent with multi-protein DCC results (Figure 6). In particular, we observed remarkable selectivity of 7 for FTO, with substantially reduced inhibition and Tm shift against ALKBH3 (IC50 = 69.1 µM, Tm = 1.8 °C) and ALKBH5 (IC50 = 201.3 µM, Tm = 0.3 °C), whilst 8 exhibited > 100-fold greater activity for ALKBH3 over FTO (IC50 = 105.6 µM, Tm = 0.9 °C) and ALKBH5 (IC50 = 128.2 µM, Tm = 0.8 °C). Further profiling studies revealed that 7 and 8 also discriminate against other human 2OG oxygenases, as demonstrated by their poor inhibitory activities against ALKBH2 (IC50s > 40 µM) and JMJD2A (IC50s > 100 µM; a histone demethylase critically involved in chromatin remodelling).[64] Our results therefore validate the multi-protein DCC screen as a reliable method for the simultaneous identification of subfamily-selective probes.

To rationalise the selectivity of 7 and 8, a series of mono-, di-, and tri-methyl substituted analogues 10-15 was prepared and evaluated for activities against the AlkB subfamilies. The assay result was summarised in Figure 7. It suggests that the introduction of a methyl substituent at the ortho-position (i.e. 8) favours ALKBH3 selectivity, whereas shifting it to either the meta-(i.e. 10) or para-positions (i.e. 11) drives selectivity towards FTO. This change in selectivity was particularly apparent when comparing the activity of 8 with that of analogues 12-14. In all cases, the addition of either a m-methyl or p-methyl group caused a substantial increase in selectivity towards FTO.

A similar trend was also observed for 7, whereby a translocation of its trifluoromethyl substituent from the para- to ortho-position (i.e. 15) resulted in a 14-fold reduction in potency against FTO, coupled with an increase (3-fold) in potency towards ALKBH3 (Figure 7B). Notably, these substitution changes have very little or no effect on activities towards ALKBH2 and ALKBH5. The structural models of FTO-7 and ALKBH3-7 complexes built using the reported crystal structures of FTO (PDB ID 4CXW[53]) and ALKBH3 (PDB ID 2IUW[54]; Figure 8) reveal that compound 7 is able to coordinate with the active site Fe(II) in a bidentate manner and participate in hydrogen-bonding interactions with the side chains of Arg96 (2.4Å and 2.5Å). The pyridyl carboxylate group is further stabilised through formation of salt bridges with both N of Arg319 (2.5Å and 2.7Å), and hydrogen-bonds with the hydroxyl groups of Tyr295 (2.7Å) and Ser318 (2.6Å). Modelling studies further suggest that 7 might not be able to fit into the binding pocket of ALKBH3 due to potential steric clash between its 4-trifluoromethylphenyl group and the side chain of Tyr143. This may contribute to the poor activity of 7 against ALKBH3 (Figure 8B). However the proposed model is unable to account for the activity of compounds 8 and 15, since similar steric clash is also possible between these compounds and ALKBH3. Further crystallographic studies are needed to determine the exact structural basis for the selectivity of these inhibitors. Studies are also currently underway to determine the potency and selectivity of 7 and 8 in cells, which will provide insight on their usefulness as functional probes and, possibly, therapeutic leads.

Conclusions

Overall, we used a combination of protein engineering, spectroscopic, thermodynamic, kinetic, biochemical and thermal denaturation studies to investigate the effects of zwitterionic tags on protein structures and functions. Our results revealed that the incorporation of a single KE or RD peptide, as short as 6 to 18 residues in length, can trigger dramatic increase in thermal stability and melting temperature of the host proteins without disrupting their structural conformations and catalytic activities. We further showed that both KE and RD tags are highly versatile thermal tags and are generally applicable to a range of proteins, including AlkB subfamilies such as FTO, ALKBH3, and ALKBH5. To our knowledge, this is the first report showing that zwitterionic peptides can be used to exclusively fine-tune the melting temperature of proteins. Importantly, by exploiting thermal tag- induced Tm change, we developed a novel DSF-based detection strategy which enables the analysis of several protein-ligand interactions simultaneously. The assay is remarkably sensitive and could effectively distinguish the binding of ligands to different proteins, including structurally-similar subfamily members, which is extremely challenging to achieve. Through the combine use of multi-protein DSF and DCC, we were able to achieve simultaneous discovery of highly selective probes against several target proteins. This was demonstrated by the identification of compounds 7 and 8, which not only exhibit exceptional selectivity for FTO (IC50 = 2.6 µM, Tm = 8.1 °C) and ALKBH3 (IC50 = 3.7 µM, Tm = 7.4 °C), respectively over other AlkB subfamilies, but also discriminate against structurally-related human 2OG oxygenases, such as JMJD2A. Notably, to date, there is no report of subfamily- selective ALKBH3 inhibitor. In light of the biological and clinical significance of FTO and ALKBH3, we envisaged that the identified

inhibitors would be of interest as functional probes and, possibly, therapeutic leads.

One limitation of this approach is that it is only applicable to proteins that are amenable to Tm modification by the thermal tags. Conceivably the use of a larger number of proteins will produce significant Tm signal overlap, and this limits the number of protein templates that can be analysed concurrently. Moreover, it is necessary to carry out separate experiments to determine the identity of the best binders. Nevertheless, the method is relatively simple, cheap and amenable to a high-throughput format. Moreover unlike current techniques, such as HPLC and NMR analyses, it detects ligands binding rather than amplification of the best binders, which greatly simplifies spectral analysis, thus potentially permits the use of larger, more complex dynamic library. Although this study focuses on the development of multi- protein DCC, the developed method is highly versatile and may, in principle, be adapted to a wide range of studies, such as parallel profiling of compound library, high-throughput mapping of protein substrate specificity, and the study of protein-protein/nucleic acid interactions. Thus, the approach outlined here shall be of general scientific interest.

To the best of our knowledge, this study represents not only the first example of multi-protein DCC, but also a new strategy for probing dynamic chemical system, which hopefully will further our understanding of protein-directed self-assembly, and inspire new applications for DCC-based approaches.

Experimental Section
General procedures for compound synthesis

Starting materials, reagents, and solvents were obtained from commercial sources and used as received. Progress of the reactions were monitored by thin layer chromatography, which was performed on precoated aluminum-backed plates (Merck, silica 60 F254). Purification of intermediates and final products was carried out using flash column chromatography conducted on silica gel (230-400 mesh). Melting points were determined using a Gallenkamp melting point apparatus. Infrared spectra were recorded on a PerkinElmer Spectrum 100 FT-IR spectrometer using neat compound. 1H and 13C NMR spectra were recorded using TMS as the internal standard with a Bruker Advance 400 Ultrashield NMR spectrometer at 400 MHz and 100 MHz, respectively. Chemical shifts (δ) were reported in ppm downfield from the internal standard. The signals were quoted as s (singlet), d (doublet), t (triplet), m (multiplet), br (broad). Coupling constants J were given in Hz (± 0.5 Hz). ESI-MS spectra were recorded on a Shimadzu LC-MS2020 mass spectrometer. High resolution mass spectra (HRMS) were recorded using Bruker MicroTOF-QII mass spectrometer. The purity of synthesised compounds was confirmed to be higher than 95% using analytical reverse- phase HPLC (Ultimate 3000). Compound 1 was commercially available and used without any further purification. Compound 2 was synthesised as previously reported.[53] Synthesis and characterisation of selected compounds 7, 8, 17, 21, 22 are described below. For synthesis of other compounds investigated in this study, see Supporting Information.

4-(Methoxycarbonyl)picolinic acid (17). To a solution of dimethyl pyridine-2,4-dicarboxylate 16 (1.4 g, 7.3 mmol) in methanol (20 mL),
1.2 mL of aqueous sodium hydroxide (6 M) solution was added. The mixture was then stirred at room temperature for 10 h, after which the solvent was evaporated in vacuo. The resulting residue was washed with ethyl acetate (20 mL), then dissolved in water (40 mL) and the aqueous solution acidified to pH 2 with conc. HCl at 0 °C. The mixture was then filtered to give 17 (368 mg, 28%) as a white solid. m.p. 230-231 °C; 1H NMR (400 MHz, DMSO-d6): δ = 3.92 (s, 3H; CO2CH3), 8.05 (d, J = 3.6 Hz, 1H), 8.38 (s, 1H), 8.91 (d, J = 4.4 Hz, 1H), 13.95 (s, 1H; CO2H); 13C NMR (100 MHz, DMSO-d6): δ = 52.5 (CO2CH3), 123.4, 126.0, 139.4, 148.4, 150.9, 164.5 (CO2CH3), 165.2 (CO2H); IR (neat) υ/cm-1 = 1707 (CO); MS (ESI-) m/z: calcd. for C8H6NO4 [M-H]- : 180.0. Found 180.3.

Plasmid and thermal tag construction

DNA fragments encoding full length human FTO and N-terminally truncated ALKBH5 (residues 66-292) were cloned into pNIC28-Bsa4 to generate N-terminal His6-tagged FTO1-505 and N-terminal His6-tagged ALKBH566-292 constructs, respectively, as previously described.[53,55] DNA for full length human ALKBH3 was cloned into pET28a to generate an N-terminal His6-tagged ALKBH31-286 construct, as previously described.[56] The above constructs were then subjected to mutagenesis experiments using the QuikChange mutagenesis kit (Agilent Technologies) whereby DNA encoding various the KE and RD tag sequences (commercially synthesised by GenScript USA) were fused to the C-terminus of the respective genes. The sequences of all of fusion protein constructs were confirmed by DNA sequencing.

Protein expression and purification

Full length human FTO,[47,53,55] human ALKBH3,[56] human ALKBH566-292,[53,55] human ALKBH256-258[46,53,56] and human JMJD2A1-359[53] was expressed and purified as previously reported, with modifications. In brief, all the constructs for the proteins and their KE/RD tagged fusion proteins were transformed into E. coli BL21 (DE3) Rosetta cells. The transformed cells were grown at 37 °C until an OD600 of 0.6 was reached. Protein expression was then induced with isopropyl β-D-1-thiogalactopyranoside (IPTG, 0.5 mM, Gold Biotechnology). Cell growth was continued at 16 °C for 16 h, after which the cells were harvested by centrifugation and the resulting cell pellet was stored at -80 °C. The frozen cell pellets were then thawed, resuspended in lysis buffer and disrupted by French Press. Further purification of the protein was achieved using Ni affinity chromatography and gel filtration, as described below. Full length human FTO was sub-cloned into pNIC28-Bsa4 to generate a His6-tagged FTO1–505 construct. FTO in lysis buffer (25 mM Tris, pH 7.5, 500 mM NaCl, 40 mM imidazole and 5 mM β-mercaptoethanol (β-ME) was purified using Ni affinity chromatography (GE healthcare), followed by gel filtration using HiLoad superdex 200 26/60 (GE healthcare) into the final buffer (25 mM Tris buffer, pH 7.5, 100 mM NaCl, 5% (v/v) glycerol and 5 mM β-ME). Full length ALKBH3 was sub-cloned into pET28a to generate a His6-tagged ALKBH31–286 construct. ALKBH3 in lysis buffer (50 mM Sodium Phosphate, pH 8.0, 300 mM NaCl, 10 mM imidazole and 5 mM β-ME) was purified using Ni affinity chromatography (GE healthcare), followed by gel filtration using a HiLoad superdex 75 16/60 (GE healthcare) into the final buffer (25 mM Sodium Phosphate buffer, pH 8.0, 150 mM NaCl, 5% (v/v) glycerol and 5 mM β-ME). For human ALKBH5, a His6-tagged ALKBH566– 292 construct in pNIC28-Bsa4 was used. ALKBH5 in lysis buffer (20 mM Tris, pH 8.0, 500 mM NaCl, 40 mM imidazole and 5 mM β-ME) was first purified using Ni affinity chromatography (GE healthcare), followed by anion chromatography using a 5 mL HiTrap Q HP column (GE healthcare) and gel filtration using HiLoad superdex 75 16/60 (GE healthcare) into the final buffer (20 mM Tris buffer, pH 8.0, 100 mM NaCl and 5 mM β-ME). For human ALKBH2, a His6-tagged ALKBH256-258 construct in pET28b was used. ALKBH2 in lysis buffer (50 mM Sodium Phosphate buffer, pH 8.0, 300 mM NaCl, 10% (v/v) glycerol, 5 mM β-ME) was purified by Ni affinity chromatography (GE healthcare), followed by anion chromatography using a 5 mL HiTrap Q HP column (GE healthcare) and gel filtration using a HiLoad superdex 75 26/60 (GE healthcare) into a final buffer of 10 mM Tris, pH 8.0, 100 mM NaCl, 5 mM β-ME. For human JMJD2A, a His6-tagged JMJD2A1-359 construct in pNIC28-Bsa4 was used. The transformed E.coli cells were grown in Terrific Broth (TB) supplemented with 8 g/L of glycerol and appropriate antibiotics. JMJD2A in lysis buffer containing 100 mM HEPES, 500 mM NaCl, 10 mM Imidazole, 10% glycerol,
0.5 mM TCEP, pH 8.0, Benzonase, Protease Inhibitor Cocktail Set III was purified using Ni-NTA Superflow (Qiagen) column followed by HiLoad 16/60 Superdex-200 column (GE Healthcare) into a final buffer of 20 mM HEPES, pH 7.5, 300 mM NaCl, 10% (v/v) glycerol, 0.5 mM TCEP. All KE/RD tagged proteins were expressed using similar methods. For details, see Supporting Information.

Differential Scanning Fluorimetry (DSF) melting analysis

All DSF-based experiments were performed using a MiniOpticon Real- Time PCR Detection System (Bio-Rad), monitoring protein unfolding using SYPRO orange (Invitrogen) according to the reported method.To characterise the effects of thermal tags on the melting temperature (Tm) of proteins: The reaction mixtures contained proteins (2 μM), MnCl2 (50 μM; active site metal) and 5x SYPRO orange (Invitrogen; fluorescent dye) in a final volume of 50 μL. Reagents were prepared in 50 mM HEPES buffer, pH 6.0 except MnCl2, which was dissolved as 100 mM stocks in 20 mM HCl, then further diluted in MilliQ water. Fluorescence were detected on FAM channel, with readings taken every 0.5 °C in the range 25-95 °C, with the temperature increased linearly by 1 °C/min. The fluorescence intensity data was then fitted to Boltzmann Sigmoidal curve using GraphPad Prism 6.0 to determine melting temperature (Tm) of the proteins. The Tm shift caused by the thermal tags was determined by subtraction of the Tm of untagged protein from the Tm of fusion protein. The assay was performed in triplicate for each protein, with standard deviations typically < 1 °C. For thermal shift assay with inhibitors:The reaction mixtures contained proteins (2 μM), MnCl2 (50 μM), compounds (100 μM) and 5x SYPRO in a final volume of 50 μL. The inhibitors tested were prepared in 100% DMSO and added such that the final concentration of DMSO was not more than 1% v/v of assay mix. The Tm was determined as described above. The Tm shift caused by the addition of each inhibitor was determined by subtraction of the ‘reference’ Tm (derived from protein incubated with MnCl2 and 1% v/v DMSO) from the Tm obtained in the presence of the inhibitor. The assay was performed in triplicate for each inhibitor, with standard deviations typically < 1 °C. Circular dichroism (CD) spectroscopy The proteins were exchanged into 10 mM sodium phosphate buffer (pH 6.0). Far UV-CD spectra were recorded at a protein concentration of 0.1 mg/mL at 25 °C on a JASCO J-810 spectropolarimeter over a wavelength range of 190-250 nm with a scan rate of 20 nm/min. All the spectra were subtracted with the buffer blank and smoothed using the Savitsky-Golay algorithm (polynomial order 10). All measurements were performed in triplicates. Steady-state kinetic analyses of nucleic acid demethylation by untagged ALKB proteins and their tagged counterparts Steady-state kinetic analysis is done using HPLC-based demethylation assays. The substrates used are 5’-GG(m6A)CU-3’ (for FTO); 5’-AAAGCAG(m1A)ATTCGAA-3’ (for ALKBH3) and 5’-GG(m6A)CU-3’ (for ALKBH5). The Km and kcat values were determined by keeping a constant enzyme concentration of 0.5µM and varying the substrate concentrations (0.5, 1, 2, 3, 5 and 10μM), as previously reported.[53] The concentration of demethylated product at different substrate concentrations was plotted as a function of time. The initial velocity (V0) for each substrate concentration was determined from the slope of the curve at the beginning of a reaction. The Michaelis–Menten curve was fitted using non-linear regression, and the kinetic constants (Vmax, Km) of the substrate was estimated using GraphPad Prism 6.0. All reactions were performed at 4 ºC in triplicate and were adjusted to ensure that less than 20% of the substrate was consumed. Multi-protein DCC approach The dynamic library was constituted by mixing ALKBH5-KE6 (2 μM), FTO- KE12 (2 μM), ALKBH3-KE18 (2 μM), scaffold ligand 3 (40 μM), aldehydes 4a-j (library A, each aldehyde at 20 μM), MnCl2 (50 μM) and aniline (5 mM) in HEPES buffer (50 mM, pH 6.0) in a final reaction volume of 50 µL. The reaction was incubated at 25 ºC for the specified time points (see Figure 5), after which SYPRO orange (5x; Invitrogen) was added, and the reaction mixture analysed using a MiniOpticon™ Real-Time PCR Detection System (Bio-Rad). Melting curve was obtained by steady heating from 25 °C to 95 °C at a rate of 1 °C/min. Fluorescence were detected on FAM channel, with readings taken every 0.5 °C. The fluorescence intensity data was then fitted to Boltzmann Sigmoidal curve using GraphPad Prism 6.0 to determine melting temperature (Tm) of the proteins. The assay was performed in triplicate, with standard deviations typically < 1 °C. HPLC analysis of multi-protein DCC The DCL was analysed using a Zorbax C18 column (4.6 mm x 250 mm) at a flow rate of 1 mL/min at 25 °C. Gradient used: from 95% solvent A (water + 0.1% TFA) to 15% solvent B (MeOH) over 20 min, then to 60% solvent B over 30 min. The UV detection wavelength was set at 254 nm. The concentration of adducts 5e and 5h were calculated based on their peak areas, using calibration plots obtained from pure standards. The percentages of adduct formation were calculated relative to the starting concentration of scaffold ligand 3 (40 μM). NMR water relaxation experiment NMR assay was performed as previously reported, with modifications.[14,65] The experiments were conducted at 500 MHz using a Bruker DRX 500 spectrometer equipped with a standard 5 mm z-gradient TXI probe. Unless otherwise stated, all experiments were conducted at 298 K in conventional 3 mm diameter NMR tube (Norell). A freshly prepared mixture of protein (50 μM), Mn(II) (50 μM) and aldehyde (500 μM) buffered with 50 mM Tris dissolved in 10% H2O and 90% D2O (pH 6.0) was titrated against scaffold ligand 3 at varying concentrations(10, 25, 50, 80, 100 μM). A 1:1 molar ratio of protein to Mn(II) was used to minimise any contribution to relaxation rate changes from binding of small molecules to free Mn(II). Inversion recovery experiments were performed with two scans with a relaxation delay of at least 5 times T1 (longitudinal relaxation time constants) between transients. Pulse tip-angle calibration using the single-pulse nutation method was undertaken for each sample ensuring accurate 90° and 180° pulses. The data were plotted as the fractional change in relaxation rate (1 - R1/R1(0)) against ligand concentration.ALKBH5 inhibitor 2 KD curves were fitted using GraphPad Prism 6.0.The assay was performed in triplicate.