Coffea arabica genome
Coffea arabica is a polyploid species, carrying four copies of the eleven chromosomes typical of the genus Coffea, totaling 44 (2n = 4x = 44). Technically, it is described as an allotetraploid genome, the result of a hybridization between two diploid species, Coffea canephora and Coffea eugenioides, which doubled arabica's chromosome number to 44.
This genome sequence was derived from a Coffea arabica plant of the Red Bourbon variety.
Sequencing and assembly
The genome was sequenced with Illumina technology at the Istituto di Genomica Applicata in Udine, Italy. Given the inherent complexity of a tetraploid genome, it was sequenced using a hierarchical approach instead of a more common whole genome shotgun approach.
Key numbers and facts
- 36,864 genomic fragments were cloned into bacterial artificial chromosomes (BACs) and sequenced in 96 pools of 384 clones
- 488 billion base pairs were produced, corresponding to 132 genome equivalents
- The genome size was estimated to be 1.3 Gb, based on a k-mers analysis
- 96 independent assemblies were generated, using the software programs ABySS and SSPACE, and then merged to generate a multifasta file (downloadable and available below).
- The sequence contains 1.51 billion base pairs, divided into 164,254 scaffold sequences
The genome was annotated at the Universities of Padova and Verona. Gene prediction was supported by RNA sequencing of twelve different samples derived from eight organs.
- 78,311 genes were predicted and functionally annotated in Coffea arabica
The coffee (C. arabica) genome, realized by an Italian partnership led by illycaffè and Lavazza, is made available for advancing research on a non-profit basis.
To respect the rights of the data producers and contributors, you acknowledge that by downloading the genome in scaffolds and annotation files below you are agreeing to the following principles:
To not redistribute, release, or otherwise provide access to the data to anyone outside of your research group
To contact Leader Simone Scalabrin (email: sscalabrin AT igatechnology.com) to discuss any publication plan that may utilize this data to avoid overlap of any planned analyses
To cite and acknowledge accurately and completely the data as “ARABICA GENOME RESEARCH REALIZED BY A PARTNERSHIP LED BY ILLYCAFFÈ AND LAVAZZA”
That this data as accessed is pre-competitive and is not patentable.
To use the data in compliance with all applicable statutes and regulations, guidelines for scientific research and publication.
You also acknowledge that the data providers (Italian partnership led by illycaffè and Lavazza):
Make no representations, assume no responsibility and extend no warranties of any kind, either expressed or implied, that the use of the data will not infringe any patent, copyright, trademark, or other proprietary rights
Assume no responsibility for the correctness, completeness, quality and reliability of the information and the results that can be obtained using the data, and assume no responsibility for any damages resulting from the use of data.
Are not responsible for the handling of the data by third parties, in particular following unauthorized access to networks and systems of World Coffee Research
The Coffea arabica Red Bourbon genome was sequenced by the Italian partnership for Coffea arabica Genome Project. This group is composed of:
- DNA Analytica
- Istituto di Genomica Applicata
- IGA Technology Services
- University of Udine
- University of Trieste
- University of Padova
- University of Verona
Funding was provided by illycaffè, Lavazza and Istituto di Genomica Applicata.
The Coffea arabica Red Bourbon genome is being made available pre-publication via World Coffee Research.
Please refer to it as: "Sequencing, assembly and annotation of Coffea arabica Red Bourbon, by the Italian partnership for Coffea arabica Genome Project."
For more information, please see the study published in Nature Scientific Reports, "A single polyploidization event at the origin of the tetraploid genome of Coffea arabica is responsible for the extremely low genetic variation in wild and cultivated germplasm" (13 March 2020). The study is summarized in less technical language in this article.