Coffea Arabica Genome

The first fully open-access genome sequence for Arabica coffee
Coffea arabica original

Coffea arabica is a polyploid species, carrying four copies of the eleven chromosomes typical of the genus Coffea, totaling 44 (2n = 4x = 44). Technically, it is described as an allotetraploid genome, the result of a hybridization between two diploid species, Coffea canephora and Coffea eugenioides, which doubled arabica's chromosome number to 44.

Genome download

See button above to download genome.

Genome browser


This genome sequence was derived from a Coffea arabica plant of the Red Bourbon variety.

Sequencing and assembly

The genome was sequenced with Illumina technology at the Istituto di Genomica Applicata in Udine, Italy. Given the inherent complexity of a tetraploid genome, it was sequenced using a hierarchical approach instead of a more common whole genome shotgun approach.

Key numbers and facts

  • 36,864 genomic fragments were cloned into bacterial artificial chromosomes (BACs) and sequenced in 96 pools of 384 clones
  • 488 billion base pairs were produced, corresponding to 132 genome equivalents
  • The genome size was estimated to be 1.3 Gb, based on a k-mers analysis
  • 96 independent assemblies were generated, using the software programs ABySS and SSPACE, and then merged to generate a multifasta file (downloadable and available below).
  • The sequence contains 1.51 billion base pairs, divided into 164,254 scaffold sequences


The genome was annotated at the Universities of Padova and Verona. Gene prediction was supported by RNA sequencing of twelve different samples derived from eight organs.

  • 78,311 genes were predicted and functionally annotated in Coffea arabica

User Acknowledgement

The coffee (C. arabica) genome, realized by an Italian partnership led by illycaffè and Lavazza, is made available for advancing research on a non-profit basis.

To respect the rights of the data producers and contributors, you acknowledge that by downloading the genome in scaffolds and annotation files below you are agreeing to the following principles:

  • To not redistribute, release, or otherwise provide access to the data to anyone outside of your research group
  • To contact Leader Simone Scalabrin (email: sscalabrin AT to discuss any publication plan that may utilize this data to avoid overlap of any planned analyses
  • To cite and acknowledge accurately and completely the data as “ARABICA GENOME RESEARCH REALIZED BY A PARTNERSHIP LED BY ILLYCAFFÈ AND LAVAZZA”
  • That this data as accessed is pre-competitive and is not patentable.
  • To use the data in compliance with all applicable statutes and regulations, guidelines for scientific research and publication.

You also acknowledge that the data providers (Italian partnership led by illycaffè and Lavazza):

  • Make no representations, assume no responsibility and extend no warranties of any kind, either expressed or implied, that the use of the data will not infringe any patent, copyright, trademark, or other proprietary rights
  • Assume no responsibility for the correctness, completeness, quality and reliability of the information and the results that can be obtained using the data, and assume no responsibility for any damages resulting from the use of data.
  • Are not responsible for the handling of the data by third parties, in particular following unauthorized access to networks and systems of World Coffee Research


The Coffea arabica Red Bourbon genome was sequenced by the Italian partnership for Coffea arabica Genome Project. This group is composed of:

Funding was provided by illycaffè, Lavazza and Istituto di Genomica Applicata.

The Coffea arabica Red Bourbon genome is being made available pre-publication via World Coffee Research.

Please refer to it as: "Sequencing, assembly and annotation of Coffea arabica Red Bourbon, by the Italian partnership for Coffea arabica Genome Project."

For more information, please see the study published in Nature Scientific Reports, "A single polyploidization event at the origin of the tetraploid genome of Coffea arabica is responsible for the extremely low genetic variation in wild and cultivated germplasm" (13 March 2020). The study is summarized in less technical language in this article.

Partners logos original