High performance computing

High performance computing (HPC) has become an integral aspect of modern comparative biology. The dedicated HPC cluster "Ace" and large memory server "Hecket," both housed in the Museu de Zoologia da USP, are essential research tools that enable us to carry out large comparative genomic and phylogenetic analyses. The backup server "Tot" is used to safely backup the massive amounts of data generated by next-generation sequencing methods at a location across the city in the Instituto de Biociências da USP.



Ace

Ace is a FAPESP-funded SGI cluster housed in the Museu de Zoologia da USP that entered production in October 2013. It is composed of 12 quad-socket AMD Opteron 6376 16-core 2.3-GHz CPU, 16MB cache, 6.4 GT/s compute nodes (= 768 cores total), eight with 128GB RAM DDR3 1600 MHz (16 x 8GB), two with 256GB (16 x 16GB), and two with 512GB (32 x 16GB), and QDR 4x InfiniBand (32 Gb/s) networking.


A new administrator guide for the ACE cluster is available here. Please consider it a beta version.


ACE

Ace being delivered to the Museu de Zoologia da USP.


The original ACE

The original Ace.


Heket

Hecket is a FAPESP-funded high-memory server housed in the Museu de Zoologia da USP that is used for quality control and assembly of next-generation sequencing reads. Heket has dual Processor Intel Xeon E52620v2 (24 cores), 256GB DDR3 ECC, 4 x HDD 4Tb (10,8Tb RAID 5.0), SSD 240GB, and Infiniband (20 Gb/s).


Tot

Tot is a FAPESP-funded backup server housed in the Instituto de Biociências da USP that is used for storing next-generation sequencing data. Tot has 8 cores (Intel® Xeon® CPU E3-1245 V2 @ 3.40GHz) and 64Tb of disk space (two arrays, each with 8 x HDD 4Tb SATA Seagate) in RAID 5.0.


Publications

Below is a list of studies that used these HPC resources.


Sánchez-Pacheco S.J., Torres-Carvajal O., Aguirre-Peñafiel V., Sales P.M., Verrastro L., Rivas G.A., Rodrigues M.T., Grant T., Murphy R.W. 2017. Phylogeny of Riama (Squamata: Gymnophthalmidae), impact of phenotypic evidence on molecular datasets, and the origin of the Sierra Nevada de Santa Marta endemic fauna. Cladistics 1–32.


Targino M.T. 2016. Análise filogenética de Pristimantis, um gênero megadiverso de anfíbios (Amphibia, Anura, Terrarana). Tese (Doutorado em Ciências, na área de Zoologia) — Universidade de São Paulo.


Burin G., Kissling W.D., Guimarães P.R., Şekercioğlu Ç.H., Quental T.B. 2016. Omnivory in birds is a macroevolutionary sink. Nature Communications 7:11250.


Berneck B.V.M., Haddad C.F.B., Lyra M.L., da Cruz C.A.G., Faivovich J. 2016. The green clade grows: A phylogenetic analysis of Aplastodiscus (Anura; Hylidae). Molecular Phylogenetics and Evolution EARLY VIEW.


Pereyra M.O., Baldo D., Iglesias P.P., Blotto B.L., Thomé M.T.C., Haddad C.F.B., Barrio-Amorós C.L., Ibáñez R., Faivovich J. 2016. Phylogenetic relationships of toads of the Rhinella granulosa group Anura: Bufonidae): a molecular perspective with comments on hybridization and introgression. Cladistics 32: 36–53.


Machado D.J., Lyra M.L., Grant T. 2015. Mitogenome assembly from genomic multiplex libraries: comparison of strategies and novel mitogenomes for five species of frogs. Molecular Ecology Resources Early View.


Carvalho A.L.G. 2015. Systematic Revision of the Lizards of the Subfamily Tropidurinae (Tropiduridae), with Special Reference to Tropidurus Wied, 1825. Ph.D. Dissertation, Richard Gilder Graduate School, American Museum of Natural History.


Faivovich J., Nicoli L., Blotto B.L., Pereyra M.O., Baldo D., Barrionuevo J.S., Fabrezi M., Wild E.R., Haddad C.F.B. 2014. Big, bad, and beautiful: Phylogenetic relationships of the horned frogs (Anura: Ceratophryidae). South American Journal of Herpetology 9: 207–227.


de Sá R.O., Grant T., Camargo A., Heyer W.R., Ponssa M.L., Stanley E. 2014. Systematics of the Neotropical genus Leptodactylus( Fitzinger, 1826 (Anura: Leptodactylidae): Phylogeny, the relevance of non-molecular evidence, and species accounts. South American Journal of Herpetology 9: S1–S128.


Padial, J.M., Grant, T., Frost, D.R. 2014. Molecular systematics of terraranas (Anura: Brachycephaloidea) with an assessment of the effects of alignment and optimality criteria. Zootaxa 3825: 1–132.



Software

The following programs are currently installed in ACE. Each program section includes text files with installation instructions, example PBS scripts, and some additional information that may come in handy for first-time users.


POY (v4.1.2 to v5.1.1)

POY is a phylogenetic analysis program that supports multiple kinds of data (e.g. morphology, nucleotides, genes and gene regions, chromosomes, whole genomes, etc). POY is particular in that it can perform true sequence optimization and phylogeny inference simultaneously (i.e. input sequences need not to be prealigned). Insertions, deletions, and rearrangements can then be included in the overall tree score (under Maximum Parsimony), or in the model (under Maximum Likelihood). A variety of heuristic algorithms have been developed for this purpose and are implemented in POY.

General information Example PBS script Notes


ABySS v1.5.2

ABySS is a de novo, parallel, paired-end sequence assembler that is designed for short reads.

General information Example PBS script Notes


Bowtie v1.1.0

Bowtie is an ultrafast, memory-efficient short read aligner. It aligns short DNA sequences (reads) to the human genome at a rate of over 25 million 35-bp reads per hour. Bowtie indexes the genome with a Burrows-Wheeler index to keep its memory footprint small: typically about 2.2 GB for the human genome (2.9 GB for paired-end).

General information Example PBS script Notes


Garli v2.01

GARLI is a program that performs phylogenetic inference using the maximum-likelihood criterion. Several sequence types are supported, including nucleotide, amino acid and codon.

General information Example PBS script Notes

Sudo-parallel Garli


Mira v4.0

MIRA - Sequence assembler and sequence mapping for whole genome shotgun and EST / RNASeq sequencing data. Can use Sanger, 454, Illumina and IonTorrent data. PacBio: CCS and error corrected data usable, uncorrected not yet.

General information Example PBS script Notes


MITObim v1.7 (beta)

The MITObim procedure (mitochondrial baiting and iterative mapping) represents a highly efficient approach to assembling novel mitochondrial genomes of non-model organisms directly from total genomic DNA derived NGS reads. Requires Mira (see above).

General information Example PBS script Notes

Modified Perl script


R v3.1.0

R is a free software environment for statistical computing and graphics.

General information Example PBS script Notes


SOAPdenovo v2.04

SOAPdenovo is a novel short-read assembly method that can build a de novo draft assembly for the human-sized genomes. The program is specially designed to assemble Illumina GA short reads.

General information Example PBS script Notes


Velvet v1.2.10

Velvet manipulates de Bruijn graphs for genomic sequence assembly. The program represents a new approach to assembly that can leverage very short reads in combination with read pairs to produce useful assemblies.

General information Example PBS script Notes