<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet href="/rss.css" type="text/css"?>
<rdf:RDF xmlns="http://purl.org/rss/1.0/"
    xmlns:cc="http://web.resource.org/cc/"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:extra="http://www.w3.org/1999/xhtml"
    xmlns:prism="http://prismstandard.org/namespaces/1.2/basic/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
    <channel rdf:about="http://www.microbialinformaticsj.com/feeds/latestarticles/journal?quantity=&amp;format=rss&amp;version=">
        <title>Microbial Informatics and Experimentation - Latest Articles</title>
        <link>http://www.microbialinformaticsj.com</link>
        <description>The latest research articles published by Microbial Informatics and Experimentation</description>
        <dc:date>2013-04-10T00:00:00Z</dc:date>
        <items>
            <rdf:Seq>
                                <rdf:li rdf:resource="http://www.microbialinformaticsj.com/content/3/1/2" />
                                <rdf:li rdf:resource="http://www.microbialinformaticsj.com/content/3/1/1" />
                                <rdf:li rdf:resource="http://www.microbialinformaticsj.com/content/2/1/9" />
                                <rdf:li rdf:resource="http://www.microbialinformaticsj.com" />
                                <rdf:li rdf:resource="http://www.microbialinformaticsj.com/content/2/1/7" />
                                <rdf:li rdf:resource="http://www.microbialinformaticsj.com/content/2/1/6" />
                                <rdf:li rdf:resource="http://www.microbialinformaticsj.com/content/2/1/5" />
                                <rdf:li rdf:resource="http://www.microbialinformaticsj.com/content/2/1/4" />
                                <rdf:li rdf:resource="http://www.microbialinformaticsj.com/content/2/1/3" />
                                <rdf:li rdf:resource="http://www.microbialinformaticsj.com/content/2/1/2" />
                            </rdf:Seq>
        </items>
                 <cc:license rdf:resource="http://creativecommons.org/licenses/by/2.0/" />
    </channel>
        <item rdf:about="http://www.microbialinformaticsj.com/content/3/1/2">
        <title>Beginner&apos;s guide to comparative bacterial genome analysis using next-generation sequence data</title>
        <description>High throughput sequencing is now fast and cheap enough to be considered part of the toolbox for investigating bacteria, and there are thousands of bacterial genome sequences available for comparison in the public domain. Bacterial genome analysis is increasingly being performed by diverse groups in research, clinical and public health labs alike, who are interested in a wide array of topics related to bacterial genetics and evolution. Examples include outbreak analysis and the study of pathogenicity and antimicrobial resistance. In this beginner&#8217;s guide, we aim to provide an entry point for individuals with a biology background who want to perform their own bioinformatics analysis of bacterial genome data, to enable them to answer their own research questions. We assume readers will be familiar with genetics and the basic nature of sequence data, but do not assume any computer programming skills. The main topics covered are assembly, ordering of contigs, annotation, genome comparison and extracting common typing information. Each section includes worked examples using publicly available E. coli data and free software tools, all which can be performed on a desktop computer.</description>
        <link>http://www.microbialinformaticsj.com/content/3/1/2</link>
                <dc:creator>David Edwards</dc:creator>
                <dc:creator>Kathryn Holt</dc:creator>
                <dc:source>Microbial Informatics and Experimentation 2013, null:2</dc:source>
        <dc:date>2013-04-10T00:00:00Z</dc:date>
        <dc:identifier>doi:10.1186/2042-5783-3-2</dc:identifier>
                                <prism:require>/content/figures/2042-5783-3-2-toc.gif</prism:require>
                <prism:publicationName>Microbial Informatics and Experimentation</prism:publicationName>
        <prism:issn>2042-5783</prism:issn>
        <prism:volume>${item.volume}</prism:volume>
        <prism:startingPage>2</prism:startingPage>
        <prism:publicationDate>2013-04-10T00:00:00Z</prism:publicationDate>
                <prism:versionidentifier>XML</prism:versionidentifier>
                <cc:license rdf:resource="http://creativecommons.org/licenses/by/2.0/" />
    </item>
        <item rdf:about="http://www.microbialinformaticsj.com/content/3/1/1">
        <title>An efficient rRNA removal method for RNA sequencing in GC-rich bacteria</title>
        <description>Background:
Next generation sequencing (NGS) technologies have revolutionized gene expression studies and functional genomics analysis. However, further improvement of RNA sequencing protocols is still desirable, in order to reduce NGS costs and to increase its accuracy. In bacteria, a major problem in RNA sequencing is the abundance of ribosomal RNA (rRNA), which accounts for 95-98% of total RNA and can therefore hinder sufficient coverage of mRNA, the main focus of transcriptomic studies. Thus, efficient removal of rRNA is necessary to achieve optimal coverage, good detection sensitivity and reliable results. An additional challenge is presented by microorganisms with GC-rich genomes, in which rRNA removal is less efficient.
Results:
In this work, we tested two commercial kits for rRNA removal, either alone or in combination, on Burkholderia thailandensis. This bacterium, chosen as representative of the important Burkholderia genus, which includes both pathogenic and environmental bacteria, has a rather large (6.72 Mb) and GC-rich (67.7%) genome. Each enriched mRNA sample was sequenced through paired-end Illumina GAIIx run in duplicate, yielding between 10 and 40 million reads. We show that combined treatment with both kits allows an mRNA enrichment of more than 238-fold, enabling the sequencing of almost all (more than 90%) B. thailandensis transcripts from less than 10 million reads, without introducing any bias in mRNA relative abundance, thus preserving differential expression profile.
Conclusions:
The mRNA enrichment protocol presented in this work leads to an increase in detection sensitivity up to 770% compared to total RNA; such increased sensitivity allows for a corresponding reduction in the number of sequencing reads necessary for the complete analysis of whole transcriptome expression profiling. Thus we can conclude that the MICROBExpress/Ovation combined rRNA removal method could be suitable for RNA sequencing of whole transcriptomes of microorganisms with high GC content and complex genomes enabling at the same time an important scaling down of sequencing costs.</description>
        <link>http://www.microbialinformaticsj.com/content/3/1/1</link>
                <dc:creator>Clelia Peano</dc:creator>
                <dc:creator>Alessandro Pietrelli</dc:creator>
                <dc:creator>Clarissa Consolandi</dc:creator>
                <dc:creator>Elio Rossi</dc:creator>
                <dc:creator>Luca Petiti</dc:creator>
                <dc:creator>Letizia Tagliabue</dc:creator>
                <dc:creator>Gianluca De Bellis</dc:creator>
                <dc:creator>Paolo Landini</dc:creator>
                <dc:source>Microbial Informatics and Experimentation 2013, null:1</dc:source>
        <dc:date>2013-01-07T00:00:00Z</dc:date>
        <dc:identifier>doi:10.1186/2042-5783-3-1</dc:identifier>
                                <prism:require>/content/figures/2042-5783-3-1-toc.gif</prism:require>
                <prism:publicationName>Microbial Informatics and Experimentation</prism:publicationName>
        <prism:issn>2042-5783</prism:issn>
        <prism:volume>${item.volume}</prism:volume>
        <prism:startingPage>1</prism:startingPage>
        <prism:publicationDate>2013-01-07T00:00:00Z</prism:publicationDate>
                <prism:versionidentifier>XML</prism:versionidentifier>
                <cc:license rdf:resource="http://creativecommons.org/licenses/by/2.0/" />
    </item>
        <item rdf:about="http://www.microbialinformaticsj.com/content/2/1/9">
        <title>Bioinformatic identification of Mycobacterium tuberculosis proteins likely to target host cell mitochondria: virulence factors?</title>
        <description>Background:
M. tuberculosis infection either induces or inhibits host cell death, depending on the bacterial strain and the cell microenvironment. There is evidence suggesting a role for mitochondria in these processes.On the other hand, it has been shown that several bacterial proteins are able to target mitochondria, playing a critical role in bacterial pathogenesis and modulation of cell death. However, mycobacteria&#8211;derived proteins able to target host cell mitochondria are less studied.
Results:
A bioinformaic analysis based on available genomic sequences of the common laboratory virulent reference strain Mycobacterium tuberculosis H37Rv, the avirulent strain H37Ra, the clinical isolate CDC1551, and M. bovis BCG Pasteur strain 1173P2, as well as of suitable bioinformatic tools (MitoProt II, PSORT II, and SignalP) for the in silico search for proteins likely to be secreted by mycobacteria that could target host cell mitochondria, showed that at least 19 M. tuberculosis proteins could possibly target host cell mitochondria. We experimentally tested this bioinformatic prediction on four M. tuberculosis recombinant proteins chosen from this list of 19 proteins (p27, PE_PGRS1, PE_PGRS33, and MT_1866). Confocal microscopy analyses showed that p27, and PE_PGRS33 proteins colocalize with mitochondria.
Conclusions:
Based on the bioinformatic analysis of whole M. tuberculosis genome sequences, we propose that at least 19 out of 4,246 M. tuberculosis predicted proteins would be able to target host cell mitochondria and, in turn, control mitochondrial physiology. Interestingly, such a list of 19 proteins includes five members of a mycobacteria specific family of proteins (PE/PE_PGRS) thought to be virulence factors, and p27, a well known virulence factor. P27, and PE_PGRS33 proteins experimentally showed to target mitochondria in J774 cells. Our results suggest a link between mitochondrial targeting of M. tuberculosis proteins and virulence.</description>
        <link>http://www.microbialinformaticsj.com/content/2/1/9</link>
                <dc:creator>María Maximina Bertha Moreno-Altamirano</dc:creator>
                <dc:creator>Iris Selene Paredes-González</dc:creator>
                <dc:creator>Clara Espitia</dc:creator>
                <dc:creator>Mauricio Santiago-Maldonado</dc:creator>
                <dc:creator>Rogelio Hernández-Pando</dc:creator>
                <dc:creator>Francisco Javier Sánchez-García</dc:creator>
                <dc:source>Microbial Informatics and Experimentation 2012, null:9</dc:source>
        <dc:date>2012-12-22T00:00:00Z</dc:date>
        <dc:identifier>doi:10.1186/2042-5783-2-9</dc:identifier>
                                <prism:require>/content/figures/2042-5783-2-9-toc.gif</prism:require>
                <prism:publicationName>Microbial Informatics and Experimentation</prism:publicationName>
        <prism:issn>2042-5783</prism:issn>
        <prism:volume>${item.volume}</prism:volume>
        <prism:startingPage>9</prism:startingPage>
        <prism:publicationDate>2012-12-22T00:00:00Z</prism:publicationDate>
                <prism:versionidentifier>XML</prism:versionidentifier>
                <cc:license rdf:resource="http://creativecommons.org/licenses/by/2.0/" />
    </item>
        <item rdf:about="http://www.microbialinformaticsj.com">
        <title>Analysis of evolutionary patterns of genes in
Campylobacter jejuni and C. coli</title>
        <description>Background:
The thermophilic Campylobacter jejuni and Campylobacter coli are considered weakly clonal populations where incongruences between genetic markers are assumed to be due to random horizontal transfer of genomic DNA. In order to investigate the population genetics structure we extracted a set of 1180 core gene families (CGF) from 27 sequenced genomes of C. jejuni and C. coli. We adopted a principal component analysis (PCA) on the normalized evolutionary distances in order to reveal any patterns in the evolutionary signals contained within the various CGFs.
Results:
The analysis indicates that the conserved genes in Campylobacter show at least two, possibly five, distinct patterns of evolutionary signals, seen as clusters in the score-space of our PCA. The dominant underlying factor separating the core genes is the ability to distinguish C. jejuni from C. coli. The genes in the clusters outside the main gene group have a strong tendency of being chromosomal neighbors, which is natural if they share a common evolutionary history. Also, the most distinct cluster outside the main group is enriched with genes under positive selection and displays larger than average recombination rates.
Conclusions:
The Campylobacter genomes investigated here show that subsets of conserved genes differ from each other in a more systematic way than expected by random horizontal transfer, and is consistent with differences in selection pressure acting on different genes. These findings are indications of a population of bacteria characterized by genomes with a mixture of evolutionary patterns.</description>
        <link>http://www.microbialinformaticsj.com</link>
                <dc:creator>Lars Snipen</dc:creator>
                <dc:creator>Trudy Wassenaar</dc:creator>
                <dc:creator>Eric Altermann</dc:creator>
                <dc:creator>Jonathan Olson</dc:creator>
                <dc:creator>Sophia Kathariou</dc:creator>
                <dc:creator>Karin Lagesen</dc:creator>
                <dc:creator>Monica Takamiya</dc:creator>
                <dc:creator>Susanne Knøchel</dc:creator>
                <dc:creator>David Ussery</dc:creator>
                <dc:creator>Richard Meinersmann</dc:creator>
                <dc:source>Microbial Informatics and Experimentation 2012, null:8</dc:source>
        <dc:date>2012-08-28T00:00:00Z</dc:date>
        <dc:identifier>doi:10.1186/2042-5783-2-8</dc:identifier>
                                <prism:require>/content/figures/2042-5783-2-8-toc.gif</prism:require>
                <prism:publicationName>Microbial Informatics and Experimentation</prism:publicationName>
        <prism:issn>2042-5783</prism:issn>
        <prism:volume>${item.volume}</prism:volume>
        <prism:startingPage>8</prism:startingPage>
        <prism:publicationDate>2012-08-28T00:00:00Z</prism:publicationDate>
                <prism:versionidentifier>XML</prism:versionidentifier>
                <cc:license rdf:resource="http://creativecommons.org/licenses/by/2.0/" />
    </item>
        <item rdf:about="http://www.microbialinformaticsj.com/content/2/1/7">
        <title>Computational genomics-proteomics and Phylogeny analysis of twenty one mycobacterial genomes (Tuberculosis &amp; non Tuberculosis strains)</title>
        <description>Background:
The genus Mycobacterium comprises different species, among them the most contagious and infectious bacteria. The members of the complex Mycobacterium tuberculosis are the most virulent microorganisms that have killed human and other mammals since millennia. Additionally, with the many different mycobacterial sequences available, there is a crucial need for the visualization and the simplification of their data. In this present study, we aim to highlight a comparative genome, proteome and phylogeny analysis between twenty-one mycobacterial (Tuberculosis and non tuberculosis) strains using a set of computational and bioinformatics tools (Pan and Core genome plotting, BLAST matrix and phylogeny analysis).
Results:
Considerably the result of pan and core genome Plotting demonstrated that less than 1250 Mycobacterium gene families are conserved across all species, and a total set of about 20,000 gene families within the Mycobacterium pan-genome of twenty one mycobacterial genomes.Viewing the BLAST matrix a high similarity was found among the species of the complex Mycobacterium tuberculosis and less conservation is found with other slow growing pathogenic mycobacteria.Phylogeny analysis based on both protein conservation, as well as rRNA clearly resolve known relationships between slow growing mycobacteria.
Conclusion:
Mycobacteria include important pathogenic species for human and animals and the Mycobacterium tuberculosis complex is the most cause of death of the humankind. The comparative genome analysis could provide a new insight for better controlling and preventing these diseases.</description>
        <link>http://www.microbialinformaticsj.com/content/2/1/7</link>
                <dc:creator>Fathiah Zakham</dc:creator>
                <dc:creator>Othmane Aouane</dc:creator>
                <dc:creator>David Ussery</dc:creator>
                <dc:creator>Abdelaziz Benjouad</dc:creator>
                <dc:creator>Moulay Mustapha Ennaji</dc:creator>
                <dc:source>Microbial Informatics and Experimentation 2012, null:7</dc:source>
        <dc:date>2012-08-28T00:00:00Z</dc:date>
        <dc:identifier>doi:10.1186/2042-5783-2-7</dc:identifier>
                                <prism:require>/content/figures/2042-5783-2-7-toc.gif</prism:require>
                <prism:publicationName>Microbial Informatics and Experimentation</prism:publicationName>
        <prism:issn>2042-5783</prism:issn>
        <prism:volume>${item.volume}</prism:volume>
        <prism:startingPage>7</prism:startingPage>
        <prism:publicationDate>2012-08-28T00:00:00Z</prism:publicationDate>
                <prism:versionidentifier>XML</prism:versionidentifier>
                <cc:license rdf:resource="http://creativecommons.org/licenses/by/2.0/" />
    </item>
        <item rdf:about="http://www.microbialinformaticsj.com/content/2/1/6">
        <title>Bacterial phylogenetic tree construction based on genomic translation stop signals</title>
        <description>Background:
The efficiencies of the stop codons TAA, TAG, and TGA in protein synthesis termination are not the same. These variations could allow many genes to be regulated. There are many similar nucleotide trimers found on the second and third reading-frames of a gene. They are called premature stop codons (PSC). Like stop codons, the PSC in bacterial genomes are also highly bias in terms of their quantities and qualities on the genes. Phylogenetically related species often share a similar PSC profile. We want to know whether the selective forces that influence the stop codons and the PSC usage biases in a genome are related. We also wish to know how strong these trimers in a genome are related to the natural history of the bacterium. Knowing these relations may provide better knowledge in the phylogeny of bacteria
Results:
A 16SrRNA-alignment tree of 19 well-studied &#945;-, &#946;- and &#947;-Proteobacteria Type species is used as standard reference for bacterial phylogeny. The genomes of sixty-one bacteria, belonging to the &#945;-, &#946;- and &#947;-Proteobacteria subphyla, are used for this study. The stop codons and PSC are collectively termed &#8220;Translation Stop Signals&#8221; (TSS). A gene is represented by nine scalars corresponding to the numbers of counts of TAA, TAG, and TGA on each of the three reading-frames of that gene. &#8220;Translation Stop Signals Ratio&#8221; (TSSR) is the ratio between the TSS counts. Four types of TSSR are investigated. The TSSR-1, TSSR-2 and TSSR-3 are each a 3-scalar series corresponding respectively to the average ratio of TAA: TAG: TGA on the first, second, and third reading-frames of all genes in a genome. The Genomic-TSSR is a 9-scalar series representing the ratio of distribution of all TSS on the three reading-frames of all genes in a genome. Results show that bacteria grouped by their similarities based on TSSR-1, TSSR-2, or TSSR-3 values could only partially resolve the phylogeny of the species. However, grouping bacteria based on thier Genomic-TSSR values resulted in clusters of bacteria identical to those bacterial clusters of the reference tree. Unlike the 16SrRNA method, the Genomic-TSSR tree is also able to separate closely related species/strains at high resolution. Species and strains separated by the Genomic-TSSR grouping method are often in good agreement with those classified by other taxonomic methods. Correspondence analysis of individual genes shows that most genes in a bacterial genome share a similar TSSR value. However, within a chromosome, the Genic-TSSR values of genes near the replication origin region (Ori) are more similar to each other than those genes near the terminus region (Ter).
Conclusion:
The translation stop signals on the three reading-frames of the genes on a bacterial genome are interrelated, possibly due to frequent off-frame recombination facilitated by translational-associated recombination (TSR). However, TSR may not occur randomly in a bacterial chromosome. Genes near the Ori region are often highly expressed and a bacterium always maintains multiple copies of Ori. Frequent collisions between DNA- polymerase and RNA-polymerase would create many DNA strand-breaks on the genes; whereas DNA strand-break induced homologues-recombination is more likely to take place between genes with similar sequence. Thus, localized recombination could explain why the TSSR of genes near the Ori region are more similar to each other. The quantity and quality of these TSS in a genome strongly reflect the natural history of a bacterium. We propose that the Genomic- TSSR can be used as a subjective biomarker to represent the phyletic status of a bacterium.</description>
        <link>http://www.microbialinformaticsj.com/content/2/1/6</link>
                <dc:creator>Lijing Xu</dc:creator>
                <dc:creator>Jimmy Kuo</dc:creator>
                <dc:creator>Jung-Kang Liu</dc:creator>
                <dc:creator>Tit-Yee Wong</dc:creator>
                <dc:source>Microbial Informatics and Experimentation 2012, null:6</dc:source>
        <dc:date>2012-05-31T00:00:00Z</dc:date>
        <dc:identifier>doi:10.1186/2042-5783-2-6</dc:identifier>
                                <prism:require>/content/figures/2042-5783-2-6-toc.gif</prism:require>
                <prism:publicationName>Microbial Informatics and Experimentation</prism:publicationName>
        <prism:issn>2042-5783</prism:issn>
        <prism:volume>${item.volume}</prism:volume>
        <prism:startingPage>6</prism:startingPage>
        <prism:publicationDate>2012-05-31T00:00:00Z</prism:publicationDate>
                <prism:versionidentifier>XML</prism:versionidentifier>
                <cc:license rdf:resource="http://creativecommons.org/licenses/by/2.0/" />
    </item>
        <item rdf:about="http://www.microbialinformaticsj.com/content/2/1/5">
        <title>Growth comparison of several Escherichia coli strains exposed to various concentrations of lactoferrin using linear spline regression</title>
        <description>Background:
We wanted to compare growth differences between 13 Escherichia coli strains exposed to various concentrations of the growth inhibitor lactoferrin in two different types of broth (Syncase and Luria-Bertani (LB)). To carry this out, we present a simple statistical procedure that separates microbial growth curves that are due to natural random perturbations and growth curves that are more likely caused by biological differences.Bacterial growth was determined using optical density data (OD) recorded for triplicates at 620&#8201;nm for 18 hours for each strain. Each resulting growth curve was divided into three equally spaced intervals. We propose a procedure using linear spline regression with two knots to compute the slopes of each interval in the bacterial growth curves. These slopes are subsequently used to estimate a 95% confidence interval based on an appropriate statistical distribution. Slopes outside the confidence interval were considered as significantly different from slopes within. We also demonstrate the use of related, but more advanced methods known collectively as generalized additive models (GAMs) to model growth. In addition to impressive curve fitting capabilities with corresponding confidence intervals, GAM&#8217;s allow for the computation of derivatives, i.e. growth rate estimation, with respect to each time point.
Results:
The results from our proposed procedure agreed well with the observed data. The results indicated that there were substantial growth differences between the E. coli strains. Most strains exhibited improved growth in the nutrient rich LB broth compared to Syncase. The inhibiting effect of lactoferrin varied between the different strains. The atypical enteropathogenic aEPEC-2 grew, on average, faster in both broths than the other strains tested while the enteroinvasive strains, EIEC-6 and EIEC-7 grew slower. The enterotoxigenic ETEC-5 strain, exhibited exceptional growth in Syncase broth, but slower growth in LB broth.
Conclusions:
Our results do not indicate clear growth differences between pathogroups or pathogenic versus non-pathogenic E. coli.</description>
        <link>http://www.microbialinformaticsj.com/content/2/1/5</link>
                <dc:creator>Jon Bohlin</dc:creator>
                <dc:creator>Camilla Sekse</dc:creator>
                <dc:creator>Eystein Skjerve</dc:creator>
                <dc:creator>Gerd Vegarud</dc:creator>
                <dc:source>Microbial Informatics and Experimentation 2012, null:5</dc:source>
        <dc:date>2012-04-16T00:00:00Z</dc:date>
        <dc:identifier>doi:10.1186/2042-5783-2-5</dc:identifier>
                                <prism:require>/content/figures/2042-5783-2-5-toc.gif</prism:require>
                <prism:publicationName>Microbial Informatics and Experimentation</prism:publicationName>
        <prism:issn>2042-5783</prism:issn>
        <prism:volume>${item.volume}</prism:volume>
        <prism:startingPage>5</prism:startingPage>
        <prism:publicationDate>2012-04-16T00:00:00Z</prism:publicationDate>
                <prism:versionidentifier>XML</prism:versionidentifier>
                <cc:license rdf:resource="http://creativecommons.org/licenses/by/2.0/" />
    </item>
        <item rdf:about="http://www.microbialinformaticsj.com/content/2/1/4">
        <title>Mycobacterium tuberculosis and Clostridium difficille interactomes: demonstration of rapid development of computational system for bacterial interactome prediction</title>
        <description>Background:
Protein-protein interaction (PPI) networks (interactomes) of most organisms, except for some model organisms, are largely unknown. Experimental methods including high-throughput techniques are highly resource intensive. Therefore, computational discovery of PPIs can accelerate biological discovery by presenting &quot;most-promising&quot; pairs of proteins that are likely to interact. For many bacteria, genome sequence, and thereby genomic context of proteomes, is readily available; additionally, for some of these proteomes, localization and functional annotations are also available, but interactomes are not available. We present here a method for rapid development of computational system to predict interactome of bacterial proteomes. While other studies have presented methods to transfer interologs across species, here, we propose transfer of computational models to benefit from cross-species annotations, thereby predicting many more novel interactions even in the absence of interologs. Mycobacterium tuberculosis (Mtb) and Clostridium difficile (CD) have been used to demonstrate the work.
Results:
We developed a random forest classifier over features derived from Gene Ontology annotations and genetic context scores provided by STRING database for predicting Mtb and CD interactions independently. The Mtb classifier gave a precision of 94% and a recall of 23% on a held out test set. The Mtb model was then run on all the 8 million protein pairs of the Mtb proteome, resulting in 708 new interactions (at 94% expected precision) or 1,595 new interactions at 80% expected precision. The CD classifier gave a precision of 90% and a recall of 16% on a held out test set. The CD model was run on all the 8 million protein pairs of the CD proteome, resulting in 143 new interactions (at 90% expected precision) or 580 new interactions (at 80% expected precision). We also compared the overlap of predictions of our method with STRING database interactions for CD and Mtb and also with interactions identified recently by a bacterial 2-hybrid system for Mtb. To demonstrate the utility of transfer of computational models, we made use of the developed Mtb model and used it to predict CD protein-pairs. The cross species model thus developed yielded a precision of 88% at a recall of 8%. To demonstrate transfer of features from other organisms in the absence of feature-based and interaction-based information, we transferred missing feature values from Mtb orthologs into the CD data. In transferring this data from orthologs (not interologs), we showed that a large number of interactions can be predicted.
Conclusions:
Rapid discovery of (partial) bacterial interactome can be made by using existing set of GO and STRING features associated with the organisms. We can make use of cross-species interactome development, when there are not even sufficient known interactions to develop a computational prediction system. Computational model of well-studied organism(s) can be employed to make the initial interactome prediction for the target organism. We have also demonstrated successfully, that annotations can be transferred from orthologs in well-studied organisms enabling accurate predictions for organisms with no annotations. These approaches can serve as building blocks to address the challenges associated with feature coverage, missing interactions towards rapid interactome discovery for bacterial organisms.AvailabilityThe predictions for all Mtb and CD proteins are made available at: http://severus.dbmi.pitt.edu/TB and http://severus.dbmi.pitt.edu/CD respectively for browsing as well as for download.</description>
        <link>http://www.microbialinformaticsj.com/content/2/1/4</link>
                <dc:creator>Seshan Ananthasubramanian</dc:creator>
                <dc:creator>Rahul Metri</dc:creator>
                <dc:creator>Ankur Khetan</dc:creator>
                <dc:creator>Aman Gupta</dc:creator>
                <dc:creator>Adam Handen</dc:creator>
                <dc:creator>Nagasuma Chandra</dc:creator>
                <dc:creator>Madhavi Ganapathiraju</dc:creator>
                <dc:source>Microbial Informatics and Experimentation 2012, null:4</dc:source>
        <dc:date>2012-03-21T00:00:00Z</dc:date>
        <dc:identifier>doi:10.1186/2042-5783-2-4</dc:identifier>
                                <prism:require>/content/figures/2042-5783-2-4-toc.gif</prism:require>
                <prism:publicationName>Microbial Informatics and Experimentation</prism:publicationName>
        <prism:issn>2042-5783</prism:issn>
        <prism:volume>${item.volume}</prism:volume>
        <prism:startingPage>4</prism:startingPage>
        <prism:publicationDate>2012-03-21T00:00:00Z</prism:publicationDate>
                <prism:versionidentifier>XML</prism:versionidentifier>
                <cc:license rdf:resource="http://creativecommons.org/licenses/by/2.0/" />
    </item>
        <item rdf:about="http://www.microbialinformaticsj.com/content/2/1/3">
        <title>Metagenomics - a guide from sampling to data analysis</title>
        <description>Metagenomics applies a suite of genomic technologies and bioinformatics tools to directly access the genetic content of entire communities of organisms. The field of metagenomics has been responsible for substantial advances in microbial ecology, evolution, and diversity over the past 5 to 10 years, and many research laboratories are actively engaged in it now. With the growing numbers of activities also comes a plethora of methodological knowledge and expertise that should guide future developments in the field. This review summarizes the current opinions in metagenomics, and provides practical guidance and advice on sample processing, sequencing technology, assembly, binning, annotation, experimental design, statistical analysis, data storage, and data sharing. As more metagenomic datasets are generated, the availability of standardized procedures and shared data storage and analysis becomes increasingly important to ensure that output of individual projects can be assessed and compared.</description>
        <link>http://www.microbialinformaticsj.com/content/2/1/3</link>
                <dc:creator>Torsten Thomas</dc:creator>
                <dc:creator>Jack Gilbert</dc:creator>
                <dc:creator>Folker Meyer</dc:creator>
                <dc:source>Microbial Informatics and Experimentation 2012, null:3</dc:source>
        <dc:date>2012-02-09T00:00:00Z</dc:date>
        <dc:identifier>doi:10.1186/2042-5783-2-3</dc:identifier>
                                <prism:require>/content/figures/2042-5783-2-3-toc.gif</prism:require>
                <prism:publicationName>Microbial Informatics and Experimentation</prism:publicationName>
        <prism:issn>2042-5783</prism:issn>
        <prism:volume>${item.volume}</prism:volume>
        <prism:startingPage>3</prism:startingPage>
        <prism:publicationDate>2012-02-09T00:00:00Z</prism:publicationDate>
                <prism:versionidentifier>XML</prism:versionidentifier>
                <cc:license rdf:resource="http://creativecommons.org/licenses/by/2.0/" />
    </item>
        <item rdf:about="http://www.microbialinformaticsj.com/content/2/1/2">
        <title>Linear normalised hash function for clustering gene sequences and identifying reference sequences from multiple sequence alignments</title>
        <description>Background:
Comparative genomics has put additional demands on the assessment of similarity between sequences and their clustering as means for classification. However, defining the optimal number of clusters, cluster density and boundaries for sets of potentially related sequences of genes with variable degrees of polymorphism remains a significant challenge. The aim of this study was to develop a method that would identify the cluster centroids and the optimal number of clusters for a given sensitivity level and could work equally well for the different sequence datasets.
Results:
A novel method that combines the linear mapping hash function and multiple sequence alignment (MSA) was developed. This method takes advantage of the already sorted by similarity sequences from the MSA output, and identifies the optimal number of clusters, clusters cut-offs, and clusters centroids that can represent reference gene vouchers for the different species. The linear mapping hash function can map an already ordered by similarity distance matrix to indices to reveal gaps in the values around which the optimal cut-offs of the different clusters can be identified. The method was evaluated using sets of closely related (16S rRNA gene sequences of Nocardia species) and highly variable (VP1 genomic region of Enterovirus 71) sequences and outperformed existing unsupervised machine learning clustering methods and dimensionality reduction methods. This method does not require prior knowledge of the number of clusters or the distance between clusters, handles clusters of different sizes and shapes, and scales linearly with the dataset.
Conclusions:
The combination of MSA with the linear mapping hash function is a computationally efficient way of gene sequence clustering and can be a valuable tool for the assessment of similarity, clustering of different microbial genomes, identifying reference sequences, and for the study of evolution of bacteria and viruses.</description>
        <link>http://www.microbialinformaticsj.com/content/2/1/2</link>
                <dc:creator>Manal Helal</dc:creator>
                <dc:creator>Fanrong Kong</dc:creator>
                <dc:creator>Sharon Chen</dc:creator>
                <dc:creator>Fei Zhou</dc:creator>
                <dc:creator>Dominic Dwyer</dc:creator>
                <dc:creator>John Potter</dc:creator>
                <dc:creator>Vitali Sintchenko</dc:creator>
                <dc:source>Microbial Informatics and Experimentation 2012, null:2</dc:source>
        <dc:date>2012-01-26T00:00:00Z</dc:date>
        <dc:identifier>doi:10.1186/2042-5783-2-2</dc:identifier>
                                <prism:require>/content/figures/2042-5783-2-2-toc.gif</prism:require>
                <prism:publicationName>Microbial Informatics and Experimentation</prism:publicationName>
        <prism:issn>2042-5783</prism:issn>
        <prism:volume>${item.volume}</prism:volume>
        <prism:startingPage>2</prism:startingPage>
        <prism:publicationDate>2012-01-26T00:00:00Z</prism:publicationDate>
                <prism:versionidentifier>XML</prism:versionidentifier>
                <cc:license rdf:resource="http://creativecommons.org/licenses/by/2.0/" />
    </item>
        <cc:License rdf:about="http://creativecommons.org/licenses/by/2.0/">
        <cc:permits rdf:resource="http://creativecommons.org/ns#Reproduction" />
        <cc:permits rdf:resource="http://creativecommons.org/ns#Distribution" />
        <cc:permits rdf:resource="http://creativecommons.org/ns#DerivativeWorks" />
    </cc:License>
</rdf:RDF>
