Microorganisms are a rich source of bioactives; however, chemical identification is a major bottleneck. tandem mass spectrometry (MS/MS) networking to identify molecular families of the same biosynthetic origin, and the associated pathways were probed using comparative genomics. Most of the discriminating features were related to antibacterial compounds, including the thiomarinols that were reported from here for the first time. By comparative genomics, we identified the biosynthetic cluster responsible for the production of the antibiotic indolmycin, which could not be predicted with standard methods. In conclusion, we present an efficient, integrative strategy for elucidating the chemical richness of a given set of bacteria and link the chemistry to biosynthetic genes. IMPORTANCE We here combine chemical analysis and genomics to probe for new bioactive secondary metabolites based on their pattern of distribution within bacterial species. We demonstrate the usefulness of this combined approach in a group of marine Gram-negative bacteria closely related to prediction tools (7), such as antiSMASH (8, 9) and NaPDoS (10) for secondary metabolite pathway identification. Several studies have explored the general genomic capabilities within a group of related bacteria (11,C16), but only a few studies have explored the overall biosynthetic potential and pathway diversity (17,C21). Ziemert et al. (18) compared 75 genomes from three closely related species and predicted 124 distinct biosynthetic pathways, which by far exceeds the 13 currently known compound classes from these bacteria. The study underlined the discovery potential in looking at multiple strains within Fadrozole IC50 a limited phylogenetic space, as a third of the predicted pathways were found only in a single strain. A large potential is found by combining genome mining with the significant advances in analytical methods for compound identification. Building on the versatility, accuracy, and high sensitivity that liquid chromatography-mass spectrometry (LC-MS) platforms have achieved, sophisticated algorithms and software suites have been developed for untargeted metabolomics (22,C26). The core of these programs is, first, feature detection (or peak picking), i.e., the identification of all signals Rabbit polyclonal to Neurogenin1 caused by true ions (27), and, second, peak alignment, matching identical features across a batch of samples. Today, many programs consider not only the parent mass and the retention time (RT) but also the isotopic pattern, ion adducts, charge states, and potential fragments (27), which greatly improves the confidence in these feature detection algorithms (28). These high-quality data can be combined with multivariate analysis tools, which not only aids analysis and interpretation but also forms a perfect basis for integration with genomic information. Recently, molecular networking has been introduced as a powerful tool in small-molecule genome mining (21, 29, Fadrozole IC50 30). It builds on an algorithm (31, 32) capable of comparing characteristic fragmentation patterns, thus highlighting molecular families with the same structural features and potentially the same biosynthetic origin. This enables the study and comparison of a high number of samples, at the same time aiding dereplication and tentative structural identification or classification (33). Here, we present an integrated diversity mining approach that links genes, pathways, and chemical features at the very first stage of the discovery process using a combination of publicly available prediction tools and machine learning algorithms. We use genomic data to interrogate the chemical data and vice versa to get an overview of the biosynthetic capabilities of a group of related organisms and identify unique strains and compounds suitable for further chemical characterization. We demonstrate our approach on a unique group of marine bacterial strains all closely related to based on 16S rRNA gene sequence similarity (34, 35). Previous studies in our lab have shown that it is a highly chemically prolific and Fadrozole IC50 diverse species with strains producing a cocktail of the antibiotics violacein and either pentabromopseudilin or indolmycin (36). We use the integrated approach to evaluate the promise of continued sampling and discovery efforts within this species as demonstrated by the finding of an additional group of antibiotics, the thiomarinols. RESULTS Thirteen closely related strains previously identified as by gene sequence similarity (36) were analyzed for their genomic potential and ability to produce secondary metabolites. The bacteria were cultivated on a complex medium known to support production of secondary metabolites (37) and extracted sequentially by ethyl acetate and butanol to obtain broad metabolite coverage. To obtain a global, unbiased view of the metabolites produced, molecular features were detected by LC-electrospray ionization (ESI)Chigh-resolution MS (HRMS) in an untargeted metabolomics experiment. On average, more than ~2,000 molecular features.