COVID-19, SARS AND BATS CORONAVIRUSES GENOMES PECULIAR HOMOLOGOUS RNA SEQUENCES

We are facing the worldwide invasion of a new coronavirus. This follows several limited outbreaks of related viruses in various locations in a recent past (SARS, MERS). Although the main current objective of researchers is to bring efficient therapeutic and preventive solutions to the global population, we need also to better understand the origin of the newly coronavirus-induced epidemic in order to avoid future outbreaks. The present molecular appraisal is to study by a bio-infomatic approach the facts relating to the virus and its precursors. This article shows how 16 fragments (Env Pol and Integrase genes) from different strains, both diversified and very recent, of the HIV1, HIV2 and SIV retroviruses have high percentage of homology into parts of the genome of COVID_19. Moreover each of these elements is made of 18 or more nucleotides and therefore may have a function. They are called Exogenous Informative Elements (EIE). Among these EIE, 12 are concentrated in a very small region of the COVID-19 genome, length less than 900 bases, i.e. less than 3% of the total length of this genome. In addition, these EIE are positioned in two functional genes of COVID-19: the orf1ab and S spike genes. Here are the two main facts which contribute to our hypothesis of a partially synthetic genome: A contiguous region representing 2.49% of the whole COVID-19 genome of which 40.99% is made up of 12 diverse fragments originating from various strains of HIV SIV retroviruses. Some of these 12 EIE appear concatenated. Notably, the retroviral part of these regions, which consists of 8 elements from various strains of HIV1, HIV2 and SIV covers a COVID-19, SARS and Bats Coronaviruses Genomes Peculiar Homologous RNA Sequences International Journal of Research -GRANTHAALAYAH 218 length of 275 contiguous bases of COVID-19. The cumulative length of these 8 HIV/SIV elements represents 200 bases. Consequently, the HIV SIV density rate of this region of COVID-19 is 200/275 = 72.73%. A major part of these 16 EIE already existed in the first SARS genomes as early as 2003. However, we demonstrate how a new region including 4 HIV1 HIV2 Exogenous Informative Elements radically distinguishes all COVID-19 strains from all SARS and Bat strains with the exception of Bat RaTG13. We gather facts about the possible origins of COVID_19. We have particularly analyzed this small region of 225 bases common to COVID_19 and bat RaTG13. We have studied the most recent genetic evolution of the COVID_19 strains involved in the world epidemic. We found a significant occurrence of mutations and deletions in the 225 bases area. On sampling genomes, we show that this 225 bases key region of each genome, rich in EIE, and the 1770bases SPIKE region evolve much faster than the corresponding whole genome (cases of 44 patients genomes from WA Seattle state, original epicenter in USA). In the comparative analysis of both SPIKES genes of COVID_19 and Bat RaTG13 we note two abnormal facts: 1) the insertion of 4 contiguous PRRA amino acids in the middle of SPIKE (we show that this site was already an optimal cleavage site BEFORE this insertion). 2) an abnormal distribution of synonymous codons in the second half of SPIKE. Finally we show the insertion in this 1770 bases SPIKE region of a significant pair of EIEs from Plasmodium Yoelii and of apossible HIV1 EIE with a crucial Spike mutation.


INTRODUCTION
We are facing the worldwide invasion of a new coronavirus. This follows several limited outbreaks of related viruses in various locations in a recent past (SARS, MERS) [1], [2]. The human civilization has been very successful in the last centuries regarding demographic and economic growths. However, in our times, the economic power is concentrated in the hands of a few individuals and consequently economic interests are prevailing over the well being of humanity.
Although the main objective of researchers is to bring efficient therapeutic and preventive solutions to the global population, we also need to better understand the origin of the new coronavirus-induced epidemic in order to avoid future outbreaks. The present molecular appraisal is to study by a bio-infomatic approach the facts relating to the virus and its precursors.
We had analyzed the evolution of coronaviruses from the first SARS (2003), to the first genomes of COVID-19, when it was still called 2019-nCoV [3]. We had knowledge of the online article by J.Lyons-Weiler [4] according to which a region of around 1kb is totally new in the genome of COVID-19.
Using our proprietary bio-mathematic approach where we are able to evaluate the level of cohesion and organization of a genome, we discovered that the deletion by mutation of this new region of 1kb [4] would increase the level of «structural harmonization» of the genome.
This suggests a possible exogenous «addition» to the genome. Upon studying the publication of Pradhan et al. [15] we then searched in this genome for possible traces of HIV or even SIV. A first publication [5] reports the discovery of 6 HIV SIV RNA pieces relates to crucial retroviral genes like Envelope and RT Pol. The present article confirms and extends these initial results.  Note: « § » indicates location of each HIV / SIV EIE within COVID_19 genome (gene identification). First, it is important to note that all the regions found here are included in one of the 2 main genes of

Evidence for 4 other HIV/SIV EIE sequences in others areas of COVID-19 genome:
We also found 4 other non-contiguous HIV SIV regions summarized in Table 2 below. Details of these searches in the supplementary materials "d".
==> ==> These 4 HIV/SIV -EIE-are detailed in SUPPLEMENTARY MATERIALS (ref 2). They are summarized in Table 2.   First, it is important to note that all the regions found here are included in one of the two main genes of COVID-19, so they are «Informative Exogenous Elements». A synthetic chart is in Fig 1. Some significant results relating to this analyzed region of 930 base pairs (600 + 330) are: The entire genome has 29903 bases. At least 12 regions are located between the bases 21225 and 21969, which is exactly 744bases.
This therefore represents an average space of 744/12 = 62 bases for each EIE. Or as a % of the whole genome 744/29903 = 2.49% of the whole genome.
As the cumulative length of the 12    In these comparative trends in HIV/ SIV EIE densities (blue) and average cumulative homologies (red) for 3 clusters, where 3 region B EIE are side by side, joined by 5 more to complete 8 EIE from region B, plus the final six to integrate all the 14 EIE (A+B cumulated regions). Table 2 shows that two very different EIE follow each other side by side in the RNA sequence of COVID-19:

Concatenations of HIV/SIV regions "placed" in sequence and side by side.
The first, at location 20373 to 20401 comes from an HIV1 Integrase from a USA virus from 2004 ( Homo sapiens clone HIV1-H9-106 HIV-1 integration site, AY516986.1 ), while the second, at location 20400 to 20430 comes from an Envelope from another HIV1 virus from the USA from 2011 ( HIV-1 isolate JACH1853_A5 from USA envelope glycoprotein (env) gene, complete cds, HQ217329.1 ).
Even more surprisingly, in Table 1, we note the same phenomenon between, this time not 2 but 3 EIE from the radically different HIV SIV viruses: Here are these 3 EIE concatenated with seemingly perfect " watchmaker's precision": Malawi It will be observed that the cumulative length in COVID_19 of these 3 EIE is 126 bases of which the HIV occupied bases are 120. So, a total HIV/COVID_19 of 120/126 > 95%, which is artificially remarkable.
Part II Within this part, a 225-nucleotide long region is unique to COVID_19 and Bat RaTG13, and can also discriminate between these 2 genomes ( §3, 4, 5, 6 and 7).
In this second part of the RESULTS and DISCUSSION, we will present two types of facts: On the one hand, we will show that the 2 genomes of COVID_19 and Bat RaTG13 are exclusively distinguished from all the other genomes of SARS, MERS and other Bats.
On the other hand, we will analyze several specific facts suggesting that COVID_19 does not originate from Bat RaTG13. However, a novel long region of around 225 nucleotides, less than 1% of the genome, appears to us to have been inserted: This region is completely absent in all SARS genomes, whereas it is present and 100% homologous for all COVID-19 genomes listed in NCBI.  : this genome HIV-1 USA 2011 is self-contained within the HIV-1 2016 Netherlands variant in the 225  bases area (85-108 and 85-112), the 225 bases frontier is in the relative region "B".

Evidence of the absence of 4 HIV/SIV « Exogenous
Here we wanted to find out if the 16 EIE discovered in the COVID-19 genome already existed in the human SARS genomes that appeared in 2003. Table 4 summarizes this research. In particular, it appears that 14 of the 18 HIV/SIV EIE already existed since the first human SARS genomes that appeared in China around 2003.
However, a novel long region of around 225 nucleotides, appears to us to be totally new: This region is completely absent in ALL SARS genomes, whereas it is present and 100% homologous for all COVID-19 genomes listed in NCBI or GISAID COVID_19 genomic databases.
This region is located (in the COVID-19 genome which served as a reference) between the addresses 21550 and 21772. It is therefore located between the end of region "A" (from base 475 to 600) and the start of region "B" (from base 1 to 99).
A remarkable fact is also observed: the HIV/SIV EIEs which already existed in SARS have evolved a lot through numerous mutations. Thus, four EIEs have very weak homologies (near 30%) between their SARS version and their COVID-19 version. These homologies gradually improve in more recent SARS (2015 or 2017 for example, right column in Table 4 3?report=genbank ) shows that from the end of our "A" region, and from all of the key 225 base regions, of the "B" region and of the "Lyons-Weiler" region. FOUR crucial regions of our article are totally ABSENT in MERS.

Evidence for HIV/SIV sequences in this region, and their compaction in the 225 bases portion of both
COVID_19 and Bat coronavirus RaTG13 genomes. We now analyze the level of homologies between the four strains HIV/SIV of the 4 cases which are always present in COVID-19 but always absent in SARS. The remarkable point is as follows: It is strange that the most significant "Bat" genome, Bat coronavirus RaTG13 genome [12], is from 2020, just like COVID-19 ... In particular, for the HIV1 Kenia 2008 sequence [9], [10] bat RaTG13 is the only strain found in the "Bat" population to have it, while for the three other EIE, the "Bat" strains are very numerous but with non-significant HIV/SIV homologies. Zooming on the first HIV1 Kenia Homologies: Synthesis data: Comparing the 3 key regions « A », « B », and « Lyons-Weiler » region [4] in the cases of COVID-19, Bat RaTG13 coronavirus [12] and the best homologies for other Bat and SARS coronaviruses. In ALL cases, the 225 bases region is reduced to contiguous small regions between 17 and 96 bases length. In ALL cases, the Kenya 2008 EIE is totally absent.
We also note in this Table 6 that the Bats closest to COVID_19 were collected between 2013 and 2017, but only sequenced in 2020 (Bat RaTG13 (2013), Bat SARS-like coronavirus isolate Bat-SL-CoVZXC21 (2015), and Bat SARSlike coronavirus isolate bat-SL-CoVZC45 (2017). Alina Chan found that RaTG13 is the same as the "4991" strain with which Zheng-Li was working in 2017-18 (https://archive.vn/4Ot2j). The HIV1 Kenya EIE nonfunctional region from the COVID-19 genome is located overlapping between the end of the "Orf1ab" gene and the start of the "S spike" gene: Location of the Spike gene within Bat RaTG13 is: 21545. 25354 /gene="S" /codon_start=1 /product="spike glycoprotein" /protein_id="QHR63300.2" So, in terms of amino acids: START address of HIV1 KENYA: 6 amino acids after SPIKE begins. END address of HIV1 KENYA: 36 amino acids after the beginning of SPIKE.

Location of the EIE HIV1
Notably, unlike COVID-19 where HIV-1 Kenya starts before the start of the SPIKE gene, here, in the case of bat RaTG13, HIV1 Kenya is entirely contained within the SPIKE gene.

The discovery of a new EIE from the HIV1 group «O» differentiating COVID-19 from the Bat RaTG13
genome.
The HIV-1 group « O » constitutes a subgroup of HIV retroviruses very different comparing with others HIV/SIV subgroups, it appears particularly in Cameroon. However, little is known about group O and why this highly divergent retrovirus genome has not become pandemic [21].
We wanted to look for hypothetical traces of EIE coming from HIV group "O", more particularly, we looked for possible traces in COVID_19 and in bat RaTG13.
We then discover a POL (Integrase) homology from this strain HIV1 group "O", referenced as AF422215.1, which is located towards the 23800 bases of COVID_19.
==> As of May 4, 2020, BLASTn is providing 1578 COVID_19 sequences. All except 3 highly deleted at whole genome scale (Severe acute respiratory syndrome coronavirus 2 isolate SARS-CoV-2/human/USA/CA-CZB-IX00017/2020, ID: MT385497. It is very interesting to note the following points: • It is well known that bats have been studied in particular in China in recent years (https://en.wikipedia.org/wiki/Shi_Zhengli). Then, we will be particularly interested in the Bat RaTG13 strain whose genomic proximity to COVID-19 will be analyzed with the greatest attention and precision.
The theoretical method used here makes it possible to evaluate the overall level of cohesion -then also of heterogeneity -of a sequence of nucleotides, and that independantly of the scale due to the fractal nature of this numerical method.
Full details on this numerical method in [6][7][8], and recall Methods in supplementary Materials ref 9.
Here we analyze the Master Code of 3 characteristic genomes COVID_19, bat RaTG13 and SARS Urbani. We will study, for each of these 3 genomes, 5 successive amplitude scales and this according to the 3 reading frames of the codons and on the 2 main and complementary strands: • whole genomes.
In other words, in this above Table5 we see that apart from HIV1 KENYA the HIVs of the 225 bases area are more homologous in Wuhan market ID: LR757998.1 than in ba tRATG13.

Figure 4:
High level of HETEROGENEITY within the 225 bases area in Wuhan market reference genome. In this COVID_19 wuhan market ID: LR757998.1 reference genome, the coupling between Genomics pattern (red) and Proteomiics pattern (blue) appear highly disturbed, unstable, and "chaotic". Their correlation is poor (69.47%). Then, also, both Genomics pattern (red) and Proteomics pattern (blue) appear highly "harmonic" and correlated (92.13%).
We will draw the reader's attention to the 2 figs 4 and 5 above: The first concerns the 225 bases area of COVID-19 (Fig 4), it appears chaotic and not very organized. On the contrary, the same analysis for the same 225 bases region in bat RaTG13 ( Fig 5) shows a more "smoothed" and regular profile. Let us not forget that this sequence, although filed in 2020, was taken in 2013, then 7 years earlier.
Here is how we explain this difference: the "DNA master code" (see supplementary materials ref 9) allows us to measure a certain level of cohesion and homogeneity between the genomic pattern (double stranded DNA) and its corresponding proteomic image (translation into amino acids). Here, as we pointed out in the article, the 3 EIEs cap verde, cote d'ivoire and Afrika were probably integrated by the natural evolution of Bat RaTG13, we would assume that the EIE Kenya would have has been integrated very recently (red line in Fig 5). On the contrary (Fig 4), for COVID_19, there are the whole 4 EIEs that would have been inserted very recently. This would result in this chaotic image in Fig 4. Part III In the decreasing slope of the epidemic, this 225 bases area on exhibits an abnormally high rate of mutations/deletions, particularly in USA Seattle WA state ( §8, 9 and 10).

First encouraging mutations in the 225 bases, « A » and « B » regions, particularly in USA WA state.
We must recall here that the BLASTn analysis on April 10, 2020 option "SARS coronaviruses" reports 386 occurrences including 16 bats, 2 Rhinolophus, and 368 COVID_19. The same research running on 16 april 2020 reveals 523 strains sequences. The number of COVID_19 sequences available is therefore constantly changing principally due to USA new sequences deposits.
We were interested in the first cases of significant COVID_19 mutations in this key region of 225 bases (homologies of the order of 96%). we find 5 of them located in the BLASTn just in front of and near RaTG13, all come from the USA, taken and sequenced in April 2020, pathogenic.
A BLASTn analysis dated April 11, 2020 produces the following results: 386 sequences in total. whose: 351 strains with full 100% homology with 225 bases area. 17 strains with mutations in 225 bases area. 18 strains bat. Now let's look at these 17 cases of mutations in the 220 bases region. We observe that out of these 17 cases of mutations, the majority of them (13/17) concern the USA with dates posterior to the Chinese origin of the pandemic. Only 3 relate to China and one to Finland. There is probably the beginning of a mutation strategy of the genome to balance and integrate exogenous HIV EIE. 9 of these 17 mutations directly affect an HIV / SIV region. The others affect the intermediate region separating the 2 and 2 HIV / SIV pools.
It will also be noted that the majority of these strains come from recent samples (12/17 have dates of collection posterior or equal to March 2020). These dates would therefore correspond to a "mature" period of the COVID_19 genomes, which have now entered a phase of diversified mutations.
Finally, we observe the repetition of several mutations, proof of a robust mutation strategy which eliminates the hypothesis of sequencing errors.
We note that 5 different HIV/SIV EIE and 5 mutations regions are matching within the 17 different COVID_19 strains.
Now we consider Table 9 -Comparing 225 bases area significative mutations § deletions % with whole genomes mutations and deletions %. In Table 9, results involving 6 significant genomes show a great average mutations level in each 225 bases regions (13.5687%) than in their relating whole genomes (0.3496%). Then a ratio between average rate mutations region 225 bases and average rate mutations whole genome = 38.813, due principally to the wuhan market hyper deleted genome LR757997.1 Note: last line ref17 China has many deleted or « N » regions: 19263 TCAG nucleotides on 29470 length, then 10207 nucleotides deletions or undetermined nucleotides regions.
The following Fig 6 illustrates these results. This chart illustrates for 5 COVID_19 USA strains collected from NCBI data banks in April 2020, the mutation rate from 225 bases regions and whole genomes. In all cases, the mutation rate is greater at 225 bases region that at whole genome scale. Now, we do the same study for high density EIE regions « A » and « B » : ==> ==> The 2 Tables (Table Ref 6   In Italy and in France, we find no remarkable mutation vis-à-vis the Chinese reference genome. It is in Spain and the USA that we detect the most significant traces of a notorious evolution of the genome: In Spain, recent sequences (March 2020) demonstrate significant deletions and mutations in regions containing EIE. According to the first results of analyzes [13], this genome would not have increased its pathogenicity and would seem to use new modes of transmission.
In the USA, the analysis of multiple sequences from the Seatle region (WA) and Minnesota shows a clear growing trees progressiveness in the mutations then successive deletions of the regions "A", "B" and 225 bases, thus: Table8 (ref 1 to 7, then 11 to 13), we progress from simple mutations to longer mutations on 3 codons, they affect HIV / SIV EIE. here we illustrate at best a sort of "shedding" of EIE regions in which these genomes progress: thus, (ref 3 5 6 7), the mutations affect 2 or 3, then 8 consecutive bases.
Then (9 10 11 12), in addition to other new mutations, it is whole pieces, on several tens of bases of the genome which are deleted. The most remarkable point is that in all these cases, it is indeed EIE regions which are targeted.
On the most recent date of April 23, 2020, we can check how other COVID_19 strains from Seatle WA have new deletions located in regions "A" and "B" of our article. It is deletions that are "shedding" in part of the EIE HIV / SIV located in region "A" and also in region "B", particularly in the "side by side" EIE (see in In order to formally demonstrate the specificity of this region of 225 bases located from base 21542 of 225 bases, we are exploring regions of the same size every 5000 bases throughout the genome of COVID_19. Let be from bases 1542, 6542, 11542, 16542, 26542. We can then deny or affirm the fact that this region of 225 bases that we have highlighted would indeed have a tendency to mutate or even to be partially deleted as this seems to appear for certain WA Seattle strains reported here (Fig 8). Table 10 below shows how the mutation rate of the 225 bases area is always much higher than that of the 5 regions 225 bases explored every 5000 bases (34.82 times).  Horizontally: 5 patients from WA state with 225 bases area mutations. Vertically: proportional to mutations/deletions amount. The red surface is related to 225 bases Real area. The others four coloured areas are related to average amount of mutations/deletions rates for the 5 others 225 bases régions and whole genome. Ratio (i.e. 32.86 Times) is the ratio between the red 225 bases area and the average of others régions mutations/deletions rates. To summarize these remarkable results: they demonstrate (red areas) the exclusive specificity of the 225 bases area which appears here in an obvious way to mutate in priority, probably in order to get rid of the exogenous EIE regions characterizing this region.

New evidence of increased deletions from region 225 bases in WA State in the USA.
As of May 2, 2020, we wanted to assess whether the 225 bases area of the COVID-19 strains continued to mutate in the WA state region in particular. Out of 1578 COVID_19 strains accessible to date, 32 presented significant mutations (more than 2 bases out of 225). Among them, 30 came from the USA (see table 12 below and Fig 9), the last 2 from Wuhan and the Czech Republic are not considered here. Among these 30 USA strains, 22 came from the state of WA, 5 from CA, 2 from Utah, and 1 from the state of New York.
The 3 most remarkable facts are: On the one hand, a great diversity of places and types of mutations and deletions in the region of 225 bases. It will be interesting to locate these mutations vis-à-vis the positions of the 4 EIEs in this region.
On the other hand, new types of mutations are also appearing in states other than WA, in California in particular.We can conclude from this that this key region of 225 bases continues to be shed from its genome by the virus COVID_19.
Thirtly, there is a high variety and diversity of mutations and deletes: On these 30 USA cases, 20 cases are totally different mutation/deletions configurations. Note1 to Note5: these COVID_19 USA strains selected on our BLASTn April scanning (Table 9 and Fig 6) will be re-used, here, in Table11 and Fig 9. Then, we could compare 225 bases genome evolution and increasing mutations rate between April and May BLASTn scanning analyzes, particularly in the cases of USA WA state COVID_19 strains.
Remark: Considering patients WA2 to WA12, we note 2 sets of common deletions (3 cases from base 188 collected 18 to 24 mars 2020, and 8 other cases from base 189 collected 15 to 24 Mars 2020). This Table 11 demontrates expansion and diversity of 225 bases area on 2 May 2020, particularly in WA Seattle USA state. We compare evolution of patients with mutations/deletions between 2 NCBI genbank genomes sets collected with about 3 weeks delay. In "red" are the 5 "old" (11 April 2020) deletions from Table 10. In "blue" are the 25 "New" (2 May 2020) deletions from Table 11; we conclude that the COVID_19 genomes with deletions sequences available on 2 May 2020 has significantly increased in number but also in length of deletions. Then, we could conclude (blue colors) that USA COVID_19 genomes continue doing large deletions § mutations in critical 225 bases area. In the same time, both amount and diversity of these mutations are increasing and evolving.

The region 1770 bases of the 2 proteins SPIKE in COVID_19 and Bat RaTG13.
We will be interested in the sequences of the 2 respective SPIKE proteins of COVID_19 (reference genome used in the article) and Bat RaTG13.The relative addresses are respectively: SPIKBAT: address in Bat RaTG13 of address 21545 on 3810 bases. SPIKCOV: address in COVID_19 (ref 998) of address 21538 on 3822 bases. The comparative analysis of these 2 SPIKES sequences highlights the following partition: 1-A first region between bases 1 and 2040 common to COVID_19 and bat RaTG13.
Then comes a second common region of 1,770 bases: Located from 2041 on 1770 bases for Bat RaTG13. And located from 2053 to 1770 bases for COVID_19.
We are then confronted with two "anomalies" which are dificult to explain in natural biological conditions: 1) A short insert of 4 amino acids PRRA. This insert is UNIQUE in COVID_19 and does not exist in Bat RaTG13. 2) When comparing for these 2 pairs of regions the synonymous mutations and the non synonymous mutations, an abnormal fact will be highlighted for the second of the regions, that The first region of 2040 bases (680 amino acids) common to the SPIKES of COVID_19 and Bat RaTG13:The 2 sequences are differentiated by 172 nucleotide mutations.
The second region of 1770 bases (590 amino acids) common to the SPIKES of COVID_19 and Bat RaTG13: The This 1770b region represents an "abnormal" level because the ratio of synonymous codons / non-synonymous codons = 41 is completely ABNORMAL. This suggests the possible manipulation of this region of the COVID_19 genome.

Fig 10 below illustrates these "abnormal" results.
And it is the following § which will bring us an unexpected answer to this question ... Figure 10: Comparing all codons mutations differenciating both Spikes related to COVID_19 and Bat RaTG13.
On the left, we represent the 2040b Spike region upstream the 4 amino acids insert, on the right we represent the 1770b region downstream the 4 amino acids insert. In red, the synonymous codons, in blue the non-synonymous codons. The right chart appears "unnatural".
It is agreed that covid_19 would come from bat RaTG13. In such a case, the codons of covid_19 would have been modified from those of bat RaTG13.
The majority of these mutations would have led to synonymous codons whereas only 6 out of 590 amino acids in the 1770 base region would have changed values, or around 1%, which remains very low. A question then remains open: why this very low number of mutations in non synonymous codons?
Let us try to explain this abnormal phenomenon. When mutations are natural, the rate of synonymous/ nonsynonymous codon mutations is close to 5. This is the case for the region of 2040 bases located upstream of the PRRA (left image in Fig 10.). What is abnormal in the right part of Fig 10., region 1770b, is the very low number of nonsynonymous codons (blue) because the rate of change of synonymous codons is normal: the slopes of the 2 straight lines in red are similar. But, paradoxically, it is in the variation of synonymous codons that an explanation of the anomaly must be sought. In Fig 11. of next & 12, we demonstrate that almost all of the nucleotide mutations of this region 1770b concern the third base of codons, precisely, that which generally does not change the amino acid and produces a synonymous codon. The only question we will not be able to answer will be this, a question of Whoever observes the structure of the table of the universal genetic code organized according to the TCAG order, will notice that the 60 codons of classes 1 and 2 are found in 2 adjoining vertical boxes, therefore in the same amino acid. Likewise, certain amino acids like GLY, VAL, PRO, LEU, SER, ALA, THR or ARG occupy 4 contiguous vertical cells, where the 17 mutations of class3 TA/AT produce the same amino acid. This is how we demonstrate how 77 of 84 mutations on the 3rd base of codons will not have produced amino acid changes.

Evidence of a SPIKE significant EIE of Plasmodium Yoelii and of a possible HIV1 EIE with a crucial
Spike mutation.
The search for possible EIEs in COVID_19 and Bat RaTG13, both at the level of whole genomes, of the protein Spike, or of the critical region of 1770 bases highlights different candidate EIEs (see supplementary materials ref 7). The analysis of the region of 1770 bases more particularly reveals an EIE with a high probability BLASTn, moreover, the analysis via the Master Code points to a very probably precise functional site in this same region located towards the relative address 300 ( This EIE appears in several chromosomes of the plasmodium yoelii. In particular, it was quickly identified as a protein with the name "Fam a" Plasmodium yoelii "fam-a" protein (PY17X_0018000), partial mRNA Sequence ID: XM_022956016. 1 We should remember here that Plasmodium Yoelii is studied in mice in malaria vaccine strategies [29]. An analysis of amino acid homologies confirms the very probable insertion of this EIE in COVID_19, in fact, 10 amino acids concentrated in a short sequence are homologous between COVID_19 and Plasmodium Yoelii protein "Fam a" (supplementary materials ref7b). The remarkable fact is the following: the amino acid homology between the region COVID_19 and Yoelii "Fam a" (10/14) is greater than that between Bat RaTG13 and yoelli "Fam a" (6/14), and equivalent to the homology between Bat RaTG13 and COVID_19 (10/14).

Analysis of the region in SPIKE
Which is much less obvious as homology (6 amino acids instead of 10). One question: did this Plasmodium yoelii EIE already exist in SARS? We analyze SARS Exon1 Sequence ID: In this array we underlined amino acids homologies. It can be seen in this Another question: does this homology between COVID_19 and "Fam a" continue beyond? Indeed, an apparent continuity of this protein located downstream would extend this homology over a length of more than 60 bases: In [27], we had already demonstrated the presence of several EIEs of plasmodium yoelii in the "Lyons weiler" region of COVID_19. Indeed, thanks to a method allowing to detect heterogeneous sequences, therefore can be exogenous, we had suspected the possible presence of such sequences in the region "Lyons weiler" (& 7 and Figs 2 and 3 in [27]). By re-visiting this region, we show the existence of at least 4 EIEs in this region of COVID_19 Spike "Lyons weiler" région addresses 219, 464, 689, e 1132 (see supplementary materials ref 7f). In June 2020, a Korean team has just confirmed our results by publishing a PREPRINT demonstrating the presence of homologous sequences to Plasmodium in this same region [28].
Finally, here is the alignment of the nucleotides of these 3 respective sequences: COVID_19, Bat RaTG13 and Yoelii "Fam a": It is therefore clear that this second region of Yoelii does not coincide with the extension downstream of the sequence "Fam a", although concatenated with the fragment Yoelii "Fam a" in COVID_19, this region would come from another region (functional ?) from Plasmodium Yoelii ... Evidence that the majority of the 90 nucleotide mutations between COVID_19 and Bat RaTG13 SPIKE region 1770 bases are located on the third bases of the codons.
It will be interesting to note this major fact: in [26] (Fig 1), Petrovski et al demonstrate a whole region where the amino acids are massively changed between SARS and COVID_19. Very precisely, this region is the region of 1770 bases of u SPIKE of COVID_19 where the amino acids are almost ALL IDENTICAL between COVID_19 and Bat RaTG13, whereas, at the same time, almost all the codons are c "changed" into synonymous codons.
The major conclusion of this demonstration of an EIE of the plasmodium Yoelii in COVID_19 is as follows: This very high amino acid homology score of 10/14 between covid / yoelii "Fam a" results from a shift in the reading frame of the spike codons. This explains the poorer score of the RaTG13 bat with respect to the yoelii which, however, is homologous in amino acids in this region which is very poor in amino acid mutations! So these are the basic mini mutations between COVID_19 and bat RaTG13 which made the difference here! With this proof of yoelii, we obtain at the same time the explanation of this anomaly of the ratio codons synonyms / non-synonyms of the region 1770b highlighted previously. Indeed, as shown in Fig 11 above, the minor mutations do not change the amino acid values COVID_19 / bat RaTG13 (almost always the 3rd base of synonymous codons).
We believe that this strategy of shifting the codon reading frame was probably used throughout this region of 1770 bases, for example in this location (relative to 1770 bases region): 1464 TAATGCTTCAGTTGTAAA-CATTCAAAAA 1491 with 93% nucleotides homology, and a good amino acids homology considering the shift of codons reading frame. Effectively, this other EIE from plasmodium Yoelii also corresponds to a shifted position from the reading frame for Spike codons (see supplementary materials).
But with the change of the codon reading frame, a "synonymous" mutation on the Spike frame will become "not synonymous" on a second codon reading frame, which has just been demonstrated here, this is very precisely what who arrives here with this blatant proof of the fact that an EIE of the gene "Fam a" of the plasmodium Yoelii would have been inserted here using this "strategy for intelligent": while the 2 genes SPIKE of COVID-19 and Bat RaTG13 are almost identical according to their normal reading frame, a second reading frame radically differentiates the expression of the EIE "Fam a" between the 2 respective Spikes of COVID_19 and Bat RaTG13.
A possible HIV1 EIE contains a crucial Spike mutation. Besides this EIE of plasmodium yoelii, it seems important to note this other smaller and hypothetical EIE in the region 2040b (S1) of the Spike.
We analyze the region 1801 to 1899 of Spike, its 33 amino acids contain an important mutation of Spike.
End of April 2020, Bette Korber, from the Los Alamos National Laboratory, in New Mexico, claimed that a strain carrying a mutation called S-D614G seemed to take precedence over the others when it competed in a given geographic territory.
In vitro studies at the Scripps Research Department of Immunology and Microbiology of Florida have just confirmed this theory today. When they had this mutation, viruses more easily infected human cells in vitro [32].
This mutation identified in early March in Europe, Mexico, Brazil and China, Wuhan, modifies the structure of the Spike protein. This mutation, S-D614G: a glycine GLY replaced an aspartic acid ASP on codon 614 of protein Spike.
HIV-1 M:08GQ267 partial pol gene for gag-pol fusion polyprotein precursor, isolate 08GQ267 Sequence ID: FN557340.1Length: 1751Number of Matches: 1 If we make the mutation GAT (ASP) ==> GGT (GLY) This EIE homology with HIV1 is lost. COVID_19 becomes active if protein S is separated by an enzyme in S1 and S2 which then become functional, without however completely detaching from each other. It's here that the mutation acts: it seems to make the bond more "stable" linking S1 and S2 after action of this enzyme. The mutation "stabilizes" the virus in its most form effective. This would explain the predominance of this mutated strain. The mutation is present in 70% of the samples posted on Genbank in May 2020, and it now epresents 60% of the strains present in Genbank. This strain has circulated a lot in France, Italy and now in the USA, but almost not in the State of WA studied in our article. If we do not find deletions of this strain in WA, Genbank contains strains where this area is deleted in other places: Australia, India, USA MAsachussets, CAlifornia, UTah, and especially FLorida.
As we have shown for other areas of the genome (WA state Seattle), it seems that, here too, the genome is trying to delete this region of the Spike.

The analysis of deletions in the SPIKE critical region of 1770 bases in the USA WA state (Seattle).
As we did above for the region 225 bases of COVID_19, we will ask ourselves here the same question: "The region of 1770 bases of Spike, and more particularly the EIE of Plasmodium Yoelii undergo strong deletions in genomes from USA patients from Washington State WA Seattle "? It appears here very clearly that these genomes of the USA WA state (Seattle) region seem to try to "rid" of these EIE regions: indeed, of these 23 genomes analyzed, almost half have eliminated, partially (6) or totally (5), this region suspected of containing a EIE of plamodium Yoelii.
This second proof, with that relating to the 225 bases area, demonstrates that the COVID_19 genome tends to eliminate exogenous regions in priority. It can therefore be suggested that, as a result, the infectivity and pathogenicity of the virus gradually decrease over time ... The biomathematical method of the "DNA Master Code" makes it possible to assess the level of integrity and coherence of a genome on a global genome scale. Also, in the case of the 23 USA WA patients from table 12 who underwent deletions in the region 1770 bases of the Spike, we thought that this mathematical tool could make it possible to assess the possible impact of these deletions on the global scale of the respective genomes.
. The column on the right in Table 12 illustrates these results. We selected 2 reference genomes, the Wuhan reference genome and the non-mutated genome usually encountered in the WA state. The results demonstrate that in ALL cases the global coupling is affected by deletions. Note, however, that if this results in part from deletions in the 1770 base region of Spike, other deletions in other regions of the genome can also have a joint impact.   Fig 12 related Table 11 data). Note that the further we go to the right of both charts, the more the volume of deletions increases.
The LINK demonstrated here between DELETIONS and degradation of the coupling of the DNA Master Code is a FACT. It will remain to demonstrate its possible link with the contagiousness of the virus and perhaps its reduction in pathogenicity.
14. Is the COVID_19 Spike insertion site of the quadri-amino acid cleaving sequence PRRA the result of chance?
F. Castro-Chavez observed that the PRRA sequence is hyper rich in CG (10/12 bases) [30], we then have the intuition to analyze this region of Spike where PRRA is inserted by the « DNA Master Code » biomathematical method (this method is particularly based on a (-1,0) binary re-coding of sequences differenciating CG/TA) [31]. Indeed, one of its properties is the highlighting of active sites, breakdown points, cleavage sites. The challenge of such an analysis is: "is the PRRA insertion site randomly or did it already have FAVORABLE properties for such insertion"? Here is the result of this proof obtained by "induction": 1) The precise address of the insertion of the PRR A insert was even before this insertion a PRIVILEGED cleavage site of the protein Spike both for bat RaTG13 and for COVID_19. It would therefore not have been chosen at random. 2) The fact of inserting therein the fragment PRRA, very rich in CG (10/12), must accentuate and AMPLIFY this property of Cleavage. 3) The analysis by progressive integrations of increasing regions of the Spike part located downstream of the PRRA insert, PRESERVES the calculated address of the cleavage point ("dna master code"), it can be suggested that the numerous modifications of synonymous codons differentiating RaTG13 of covid_19 could have contributed to this invariability of the active site.
We will successively analyze 3 cases for various regions framing the PRRA insert address, ie base 2040 of the respective Spikes of bat RaTG13 and COVID_19: • Bat RaTG13.
• COVID_19 real, with PRRA. The "dna master code" "classifies" each of the codons with regard to the entire studied sequence. We successively study regions of 600, 900, 1200, 1500, and 1800 bases progressively integrating growing regions of the region of 1770 bases located downstream of the PRRA insert. In all analyzes cases, we are interested in the Top 10 of the first 10 codons likely to constitute an active cleavage site. The 1st part of Table 13 demonstrated the optimality of the "shear" form of the 2040 bases site (80 codons in relative address compared to base 1800 reference). This remains true for the 2 Spikes bat RaTG13 and COVID_19 sequences without PRRA, and for various lengths located downstream from the PRRA point. The second part studies the incidence of PRRA insertion in Spike COVID_19 (Codons 81-84). The upper graph shows the optimality of the relative address codon 80 (base 2040 of Spike) as a theoretical optimal cleavage site, and this as well for BatRaTG13 as for COVID_19 without PRRA. It would seem that the codons synonymous within the 1770b region located downstream of this site contribute to the conservation of this theoretical site all along the Spike. The graphic below shows the very slight offset from this theoretical site when we COVID-19, SARS and Bats Coronaviruses Genomes Peculiar Homologous RNA Sequences insert the PRRA (codons 81-84) to constitute the real genome of COVID_19. (Both curves Blue 1200b and Red 1800b COVID_19 with PRRA are superimposed).
Note that PRRA like inserts could be managed using CRISP RNA type technologies [23].

CONCLUSIONS
1) 18 RNA fragments of homology equal or more than 80% with human or simian retroviruses have been found in the COVID_19 genome. 2) These fragments are 18 to 30 nucleotides long and therefore have the potential to modify the gene expression of Covid19. We have named them external Informative Elements or EIE. 3) These EIE are not dispersed randomly, but are concentrated in a small part of the COVID_19 genome. 4) Among this part, a 225-nucleotide long region is unique to COVID_19 and Bat RaTG13 and can discriminate and formally distinguish these 2 genomes. 5) In the decreasing slope of the epidemic, this 225 bases area and the 1770 bases Spike region, exhibits an abnormally high rate of mutations/deletions (cases of 44 patients from WA Seattle state, original epicenter in USA). 6) In the comparative analysis of both SPIKES genes of COVID_19 and Bat RaTG13, we note two abnormal facts: • The insertion of 4 contiguous PRRA amino acids in the middle of SPIKE (then we show that this site was already an optimal cleavage site BEFORE this insertion). • An abnormal ratio of synonymous codons / non synonymous codons in the second half of SPIKE.
Finally we show the insertion in this 1770 bases SPIKE region of a significant EIE from Plasmodium Yoelii and of a possible HIV1 EIE with a crucial Spike mutation.
Through the 14 facts relating to each of the 14 paragraphs of this article, everything converges towards possible laboratory manipulations (End Note below) which contributed to modifications of the genome of COVID_19, but also, very probably much older SARS, with perhaps this double objective of vaccine design and of "gain of function" in terms of penetration of this virus into the cell.
This analysis, made in silico, is dedicated to the real authors of Coronavirus COVID_19. It belongs only to them to describe their own experiments and why it turned into a world disaster: 650 000 lives (on 26 July 2020), more than those taken by the two atomic bombs of Hiroshima and Nagasaki. We, the survivors, should take lessons from this serious alert for the future of humanity. We urge our colleagues scientists and medical doctors to respect ethical rules as expressed by Hipocrates oath: do not harm, never and never ! End Note: Why could COVID-19 come from Laboratory manipulations? The following 4 proofs concern differences with respect to SARS either common to COVID-19 and bat RaTG13, or facts radically differentiating these 2 sequences of which it is claimed that the first (COVID-19) comes from a natural evolution of the second (bat RaTG13). We have ranked these 4 proofs in ascending order of importance according to our point of view.
1) Four EIE formally distinguishes COVID-19 and bat RaTG13 genomes from all other SARS or bats genomes. However, their level of HIV/SIV homologies appears much more affirmed for COVID-19 than for bat RaTG13, as if these EIE fragments had recently been "re-injected" into the COVID-19 genome. ==> see & 7, (figures 4 and 5). 2) natural deletions (USA WA Seattle state) apply in priority to EIE inserts (HIV Kenya etc ..). ==> see full Part III and Figure 12 in §13. 3) Synonymous codons mutations within the 1770 bases region of the Spike, which simulate a natural evolution of bat RaTG13 towards COVID-19 while maintaining the optimality obtained in amino acid values, probably from "gain of function" Laboratory experiments (optimality common to both RNA sequences COVID-19 and bat RaTG13) ==> see Figure 10 in & 11 and Figure 11 in §12. 4) "PRRA" amino acids was inserted exactly on the Spike location already theoretically optimal on both COVID-19 and RATG13 (of which it constitutes the main difference). ==> see Figure 13 in & 14.