Download as PowerPoint Slide Figure 6. Contrasting levels of divergence and diversification between the Y chromosomes and the PAR. All analyses are plotted in intervals of kb and a sliding window of 25 kb. B Tajima's D-values within the males and hermaphrodites. Values near zero are consistent with a population at equilibrium under neutral evolution i. Finally, we used Tajima's D-test to assess whether the Y chromosomes show evidence for a severe bottleneck, which should lead to negative Tajima's D-values Tajima Previous Section Next Section Discussion We cannot, of course, identify the exact geographic location of papaya domestication, though our analyses show that the AU9 haplotype whose origin is currently unknown is not a potential ancestor of the HSY of the currently cultivated hermaphrodite strains.
We have shown that gene flow occurs between natural papaya populations and that Y chromosomes can migrate, sometimes leading to populations with two different Y haplotypes. Our analyses do, however, strongly suggest that divergence of this haplotype occurred recently from the MSY3 male Y haplotype, based on the remarkable sequence similarity with the HSY of hermaphrodites, despite the gender difference.
Given that no hermaphrodite papayas have been found in wild populations in Central America, this strongly suggests that the HSY resulted from papaya domestication by the Mayans or other indigenous cultures, supporting Storey's hypothesis Storey Now that the haplotype from which the HSY evolved has been identified, identification of the sex determination gene controlling carpel abortion that defines the hermaphrodite Yh chromosome should become possible.
The evolution of separate sexes in plants requires two mutations, one resulting in male sterility and a second in female sterility for review, see Ming et al.
The appearance of the first mutation results in sexual polymorphism, either androdioecy males and hermaphrodites or gynodioecy females and hermaphrodites , until a second mutation arises at a linked site and converts the hermaphrodites into females or males or partially female or male phenotypes. Intermediates to complete dioecy are found in many plants, such as spinach, which has mostly males and females but occasionally hermaphrodites Bemis and Wilson , and strawberry, where males, females, hermaphrodites, and neuters are present because linkage between the factors is incomplete Spigler et al.
The hypothesis that two or more genes are involved in plant sex determination predicts that reversion through recombinants in the region could occur. Our results indicate that gynodioecy in domesticated papaya populations is a reversion from dioecy rather than an intermediate state in the evolution of dioecy. The extreme similarity between the HSY and MSY sequences suggests that recombination within a sex-determining region was not involved; had a recombination event occurred, the HSY should include regions of sequence similar to X chromosomes from the ancestral population.
The carpel-suppressing gene in males presumably evolved from a gain-of-function mutation in the Y chromosome. Unlike the hermaphrodites that have been observed in Silene latifolia Lardon et al. Therefore, a small-scale, Y-linked mutation seems likely. Moreover, a Y-linked mutation predicts a strong selective sweep in the Yh, given the brief evolutionary time since the event, so that the Yh chromosome should have almost the same sequence as the ancestral Y chromosome, as is observed.
This is consistent with the population size of the HSY having been reduced to a single haplotype in a domestication event when a rare hermaphrodite was selected that carried a sex reversal mutation in the Y chromosome. If multiple hermaphrodites had been selected from a population with MSY diversity like that we estimate, this would lead to positive Tajima's D-values, because the most common haplotypes would have been selected, leading to a deficiency of rare variants after the event.
Having excluded either recombination with the X chromosome or a deletion of part of the Y chromosome, we therefore conclude that comparison of our multiple HSY and MSY sequence differences should yield candidates for a mutational difference. The gene responsible in papaya can potentially be identified by comparing the sequences of the sets of Y and Yh chromosomes. These chromosomes are extremely similar, and therefore the numbers of candidates are not large; so identification does not rely on being able to generate recombinant Y chromosomes to exclude multiple candidate differences that might be responsible.
Our study clarifies the choice of strain to be used to attempt identification of the gene controlling carpel abortion. Interestingly, none of the fixed SNPs distinguishing the HSY or hermaphrodites from the MSY3 of males are in coding regions, suggesting that hermaphroditism may be controlled by changes in gene regulation, involving upstream or downstream changes in enhancers, trans-acting factors, small RNAs, or epigenetic effects, making the identification of the sex determination gene suppressing carpel development a challenging task.
After quality trimming, reads with an average length of bp were assembled into contiguous sequences using gsAssembler Roche with the default settings. The assembled contigs N50 35 kb were anchored to the HSY reference sequence, and gaps were filled using a reference guided assembly approach with whole-genome shotgun reads from the AU9 cultivar in the CLC Genomics Workbench version 5.
FC according to the manufacturer's instructions Illumina. Second, we searched for additional transcripts in the MSY. Each predicted transcript was manually annotated and translated in six frames to distinguish the protein-coding genes and pseudogenes. We classified transcripts with premature stop codons, frame-shift mutations, or truncated proteins as pseudogenes. Potential functions for each transcript were predicted using conserved domains and homologous gene functions. Global chromosome similarity alignments were performed using the genome alignment tool Mauve Darling et al.
Whole-genome resequencing, alignment, and SNP calling Twenty-four wild male papaya plants from 10 natural populations across Costa Rica see Results section and 12 hermaphrodites from gynodioecious cultivars from the USDA tropical plant germplasm collection in Hilo, Hawaii were sequenced.
The libraries from each of the 36 individuals were sequenced on an Illumina HiSeq , generating 2. The reads were aligned to the SunUp papaya draft genome sequence Ming et al.
Indeed, DNA sequence divergence cannot determine with certainty whether this region is fully sex-linked or is within PAR, because the closely linked boundary region on the PAR side is expected to be slightly differentiated, just as regions very closely linked to any site maintained polymorphic by balancing selection will exhibit sequence variants associated with the functionally different alleles Charlesworth ; Kirkpatrick et al.
To phase reads in the rest of the presumptively fully sex-linked collinear region, we used strict alignment parameters allowing two mismatches per read and a high mismatch penalty.
The alignments were manually inspected during parameter optimization, using Tablet Milne et al. SNPs found in both the X- and Y-linked haplotypes were also removed from the analysis, as these may have represented alignments including both X and Y reads or repetitive regions, and we are interested largely in site differences fixed among all Y sequences and in distinguishing them from homologous X-linked sequences. A raw file of unfiltered SNPs and indels was generated using mpileup under the default parameters.
Such polymorphisms were called using all individuals simultaneously to provide accuracy for low-frequency or low-coverage variants. The number of clusters K was determined using the methods outlined by Evanno et al. Ten thousand iterations were used to determine the subgroup membership of each wild and cultivated accession.
The principal component analysis was performed using the PCO software https: A total of , coding sites across the Y chromosome were used. The species tree was constructed using a Yule prior, which assumes that Y lineages split at a constant rate. The tree with the highest clade posterior probabilities was chosen for divergence time estimates using the TreeAnnotator program from BEAST.
A strict molecular clock model was used with a rate suitable for papaya of 0. The closest plant taxon for which a molecular clock has been estimated is the closely related family Brassicaceae.
To take account of the slower molecular evolutionary rate with fewer generations in papaya because of its perennial nature, we reduced the latter value by a factor of 0. Population genetic analyses Y-specific SNPs were classified as noncoding, synonymous, or nonsynonymous based on the Y gene annotations reported by Wang et al. Wilcoxon tests were implemented in R R Core Team