My question is exactly what is the point? Is this going to do anything? Save a life? This whole mass Democide shit show is exactly that, kill off the population.
Unfortunately, it’s working. Yet, I’ll remain hopeful that just maybe we will be able to find a way to reverse the damage while there are still humans walking the earth. But am not going to hold my breath.
They engineered the viruses - virus can not be eradicated from earth.
There's no 'reference' genome - in that us mere mortals could ever be the arbiters of what constitutes a genome. Virus (RNA) forges its path forward through life to get to its next host.
And that PCR test was a perfect tool that was NEVER supposed to be used as it's 100% unable to diagnose disease and sure cannot prove contagion (and contagion has never been proven for any "virus")
They flipped the script and inverted the truth over 40 years ago - to condition people into believing that ‘viruses kill’ and ‘vaccines save’ their lives.
They’ve been engineering viruses in labs for decades - to create vaccines and manipulate genetics.
The PCR test had nothing to do with anyone’s health - it drove the fear narrative and created a false data set that frustratingly people still, to this day, use as a metrics.
They also more than likely created a DNA database from everyone who got the PCR swabbed up the nose.
Virus spreads - but it doesn’t kill us and rarely makes us sick. The very fact it is contagious - means it can’t cause us much harm. Life is contagious - death and disease is not.
They discovered nothing! They created something so they could make people be-LIE-ve that a PCR test can diagnose a disease as it was able to amplify a very, very short sequence which they say belonged to viral genome. It's all been fabricated and people keep buying the narrative.
This whole charade is just wild! I am NO biologist. But if one used their own eyes you could plainly see the flu was having a moment, again, as it does every few years. Beyond that isn't everything else they've been shrieking, a lie...? Faux vaccines, hiding useful medicines, firing people for non-compliance, threatening interment camps and withholding medical treatment. I could go on, but it all sounds insane when you see it on paper. Reads like a poorly written novel. When they chose to fake people dropping dead in the street, they had already overplayed their hand. But the world was duped and I don't see that changing. I appreciate the sane voices out here, and it's good to know some people see their lies. Fauici and anyone else involved should be jailed. But that's another substack.😊
In the patent for PCR that was issued in 1986 to Kary Mullis et al., they wrote that one application for PCR would be to diagnose the presence of pathogenic micro-organisms including viruses (https://patents.google.com/patent/US4683195). Mullis was also one of the authors of a patent from 1989 titled "Detection of viruses by amplification and hybridization" (https://patents.google.com/patent/US5176995A/en).
When Kary Mullis said that "quantitative PCR is an oxymoron", I think he meant that PCR was not an accurate way to quantify viral load, and not that PCR was not an accurate way to determine whether or not a sample contains a virus. An article by John Lauritsen said: "With regard to the viral load tests, which attempt to use PCR for counting viruses, Mullis has stated: 'Quantitative PCR is an oxymoron.' PCR is intended to identify substances qualitatively, but by its very nature is unsuited for estimating numbers." (https://www.virusmyth.com/aids/hiv/jlprotease.htm)
In an interview with Gary Null, Mullis also said: "(9:05) PCR came along right about the same time that HIV did. And it was in that [unintelligible] that people started looking with PCR for HIV. That was the only way to see it, except for culture. Which was a long protracted procedure, which a lot of times didn't turn right. [...] The culture - the whole method - cell biology is a bunch of magic half of the time. And people who say that they can do quantitative estimations of HIV from culture, they're just - they're fooling themselves." (https://www.bitchute.com/video/8SjzUDxBZL9t/) But by "quantitative estimations", I think Mullis was again talking about estimating viral load, and you don't even have to do cell culture to do PCR.
Three years ago people were making a big deal about how in a document published by the CDC, they wrote: "Since no quantified virus isolates of the 2019-nCoV are currently available, assays designed for detection of the 2019-nCoV RNA were tested with characterized stocks of in vitro transcribed full length RNA (N gene; GenBank accession: MN908947.2) of known titer (RNA copies/μL) spiked into a diluent consisting of a suspension of human A549 cells and viral transport medium (VTM) to mimic clinical specimen." (http://web.archive.org/web/20200205171727/https://www.fda.gov/media/134922/download)
However by "quantified virus isolates", they may have meant samples of the virus which would've been normalized for the number of copies of the virus that were included in each sample, so just because they said that "quantified virus isolates" were not available, it didn't even mean that they didn't have any kind of a sample of the virus available (https://www.reuters.com/article/uk-factcheck-cdc-idUSKBN27633R):
> Dr Thushan de Silva, from the University of Sheffield's Department of Infection, Immunity and Cardiovascular Disease, told Reuters that this was not correct.
> De Silva said that the document is describing what was used to determine the lowest amount of viral genetic material the RT-PCR assay could detect.
> "They describe a very common process during assay set up, where the limit of detection of the RT-PCR assay was determined", he said.
> In this case, the CDC have used 'transcribed' RNA as the positive control - which means they used synthetically produced genetic material identical to that carried by the virus.
> "To calculate the limit of detection of an RT-PCR assay, you need to have a known quantity of virus to extract genetic material (RNA) from, or alternatively a known quantity of RNA identical to that carried by the virus", de Silva said.
> According to de Silva, one reason for using transcribed RNA would have been that at the time of set up, not many standardised and quantified viral stocks would have been available to extract viral RNA from.
It's because they are all viruses we shed (cold viruses) ssRNA
As opposed to viruses we host for life (like Varicella/Chicken Pox) dsRNA
A cold virus (strain) will outbreak and not stop until it reaches the herd immunity threshold. Which is 67% of the human race.
The virus will mutate it's transmission only - to go through or around anything in its path to get to its next host.
For a cold virus - once 2 in 3 from our 'human herd' are infected & immune (threshold) - the virus will disappear by spillover to another susceptible host species.
EG: Coronavirus can host in any vertebrae mammal.
At the start of an outbreak - the virus is using proofreading to find the next susceptible host species. ssRNA genome has a higher mutation rates - these characteristics are to ensure its chances of adaptation to new host species.
Nothing stops a virus - it'll continue to spillover through different host species - until it finds its host reserve.
The poly-A tail is the section of the genome that mutates (RNA processing/modifications) to improve it's transmission to reach herd immunity --- and to find it's next host species (spillover)
Us mere mortals - or our fancy computers - would never know what this ever-changing part of the genome could be.
Hence the saying - there is no cure for a common cold - a cold virus infects us all in a season and then disappears (spillover) it can't come back because our herd is immune.
This is why they've lied to us about natural (herd) immunity and spillover.
There is no SARS-CoV-2 variant - the one and only variant was created to jump to another host species.
SARS2 found an ACE2 match and spillover was to cats (felines) Fe-COV.
The outbreak was one and done.
The BS lie about 'variants' was to falsely imply we are never immune - and what does it mean if we're never immune....you guessed it --- vaccines of course!
In contrast --- dsRNA viruses like HSV1 or Varicella, we host for life.
As the host reserve - the dsRNA virus will replicate and infect the exact same cell in every human being until the end of time. So the genome should never change.
Basically the reason why you didn't get a complete contig for HKU1 is that the reference genome of HKU1 contains a region where the same 30-base segment is repeated 14 times in a row. And in HIV-1 and HIV-2 there's a long terminal repeat where a long segment at the 5' end of the genome is repeated at the 3' end of the genome. And porcine adenovirus contains a tandem repeat where the same 724-base segment is repeated twice in a row.
When you mixed together reads from multiple different viruses, you failed to get complete contigs for SARS2 and SARS1 because the contigs were split at a spot where there's a 74-base segment that is identical in the reference genomes of SARS2 and SARS1. But I was able to get complete contigs for SARS2 and SARS1 by increasing the maximum k-value of MEGAHIT from 141 to 161.
One of the reasons why the no-virus people were saying that the genome of SARS2 was fake was that in the Wu et al. paper where the Wuhan-Hu-1 reference genome was described, they wrote that the longest contig they got with MEGAHIT was 30,474 bases long but the longest contig they got with Trinity was only 11,760 bases long, so the no-virus people thought that Trinity produced a completely different genome for the virus. And they didn't realize that Trinity just split the genome into a couple of incomplete contigs which likely had only small gaps in between, even though with different settings Trinity may have also produced a complete contig.
However your experiments show that even though you generated sets of reads which covered the whole genome of a virus, de-novo assemblers like MEGAHIT still occasionally fail to produce a single complete contig for the whole virus. So it also indicates that there's nothing that anomalous in how Trinity failed to generate a complete contig from Wu et al.'s reads even though the reads actually covered the entire genome of Wuhan-Hu-1 apart from the last couple of bases of the poly(A) tail.
BTW the genomes of influenza viruses are about 15,000 bases long, but the reason why your influenza A, B, C, and D references are so short is because they don't include all genes.
I don’t think anyone is saying the genome is fake. The question is it’s validity. We both have failed to assemble the perfect reference contig and/or genome.
Someone in Christine Massey's Substack comments linked to the first post in your series as evidence that the genome of SARS2 is a fake in-silico creation.
I have already solved the mystery of why Wu et al.'s original 30,474-base contig cannot be reproduced, which is because the 3' end of the contig accidentally included a 618-base segment of human DNA or RNA, and human reads were masked with N bases in the reads that Wu et al. uploaded to the SRA: https://output.jsbin.com/suwuxoy#Why_is_the_longest_MEGAHIT_contig_not_reproducible_. And the reason why de-novo assemblers like MEGAHIT cannot recreate the full poly(A) tail of MN908947.3 is because the raw reads don't contain the full poly(A) tail, and Wu et al. described using RACE to sequence the poly(A) tail. Neither of those issues means that the genome of SARS2 is not valid.
Usually whenever you come up with a new reason why the genome of SARS2 is not valid, you end up being corrected by McKernan or me and you have to move your goalposts, but if your followers don't read your Substack comments or your Discord or your Twitter discussions with McKernan, then they don't necessarily realize that basically all of your arguments in this Substack series have turned out to be duds.
Obviously, I am not responsible for how other people interpret my articles.
I am also not coming up with a new reason, let's simply list the facts:
1) No one, to this date, was able to perfectly reproduce MN908947.3 with Megahit or any other assembler, even when trimming the alleged adapters, we still end up with two or three leading G's and a missing tail.
2) There are no reads in the sample which match perfectly. At best, there are 4 reads that map well to the head - when you assume trimming 3 bases is accurate, which you don't know for a fact. There's none for the tail.
1) At least 10–20 reads that show a perfect alignment to the head & the tail.
2) A complete genetic sequence (RNA) of ~30kb length exists in the sample.
If you can show this, then I'm happy to announce that I would consider the sequence itself valid. Obviously, a valid sequence assembly would still need to be shown to originate from the virus particle, and then demonstrate causation of illness. The latter, should be fairly easy, if you can present a study, i.e. a double blinded RCT e.g. in an animal model where a synthesized sequence leads to production of the same particle and to uncontrolled spread and disease via subsequent natural exposure.
How many virions do you think were in the original sample? Surely, if the patient should have died from it, it must've been thousands or even more?
I've already shown that 100 virions, randomly fragmented, lead to about 26 matches of which maybe half are perfect, in a simulated environment. Hence, so far there's no evidence there were such virions in the sample.
I don't know if the 3-5 base inserts at the start of the reads are part of the adapters or not, but if they are, then you could technically say that if you remove the first 5 bases of each read with `fastp -f 5 -F 5`, then it's one aspect of adapter trimming.
I don't know if the patient in the Wu et al. paper died, because at least the paper didn't say that he died, and he was only 41 years old.
As evidence that some of the reads originate from a virion, it's sufficient that you have a single 150-base read which has a low number of mismatches to the SARS2 genome, which has been sequenced tens of millions times already.
There's a paper from last year which I believe confirms Koch's third postulate, because they were able to induce disease by infecting human volunteers with SARS2: https://www.nature.com/articles/s41591-022-01780-9. The isolate they used was SARS-CoV-2/human/GBR/484861/2020, which has an identical sequence to Wuhan-Hu-1 at GenBank. The authors of the paper didn't synthesize the virus but they cultured it, even though they sequenced a sample they cultured to verify that the virus was identical to the original isolate. But in the paper where Baric's team inserted the spike protein from the SARS1-like bat virus SHC014 to a mouse-adapted strain of SARS1, they were able to cause disease in mice by inoculating the mice with the synthetic virus: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4797993/.
Because they will track and destroy, likely because everyone knows the comments are where the gold is? You can't control what people say, and their comments have all the terms they're looking for to identify dissidents?
In your assembly experiment where you simulated the reads with wgsim, your longest MEGAHIT contig for HKU1 was only about 89% of the length of the HKU1 reference genome.
I also tried using wgsim to generate 100,000 reads for reference genome of HKU1, and when I ran MEGAHIT to assemble the reads, my longest contig was only 26,535 bases even though the HKU1 reference genome is 29,926 bases: `brew install megahit -s;brew install seqkit samtools;curl 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&rettype=fasta&id=NC_006577' >hku1.fa;wgsim -N100000 hku1.fa hku1_{1,2}.fq;megahit -1 hku1_1.fq -2 hku1_2.fq -o megahku;seqkit stat hku1.fa megahku/final.contigs.fa`.
When I aligned the contigs with Bowtie2, I noticed that my three contigs covered the positions 54-3136, 3397-29933, and 3397-3727, so there was a short gap from position 3137 to 3396: `bowtie2-build hku1{,}.fa;bowtie2 -p4 -x hku1.fa -fU megahku3/final.contigs.fa|samtools sort ->hku1.bam;samtools view hku1.bam|awk -F\\t '{l=length($10);print$4,$6,l,$4+l}'|column -t`.
When I ran `seqkit subseq -r 3137:3396 hku1.fa`, I noticed that the gap which was not covered by any contig fell within a region where the 30-base segment AATGACGATGAAGATGTTGTTACTGGTGAC was repeated 15 times. You can also see the repeats from here: https://www.ncbi.nlm.nih.gov/nuccore/NC_006577.2.
So if MEGAHIT would've had to assemble the contigs from unpaired reads that were 150 bases long, how could it know how many times the 30-base segment was repeated, if it can only see one 150-base window of the genome at a time? Actually one of the main reasons why the paired read layout is used is that it helps sequence regions with repeats, because there are variable-length gaps between the forward and reverse reads so that the read pair covers a region that is longer than an individual read. But even though wgsim generated paired reads, I guess the region with repeats was so long that the paired reads didn't help MEGAHIT assemble the region correctly. And actually the default read length used by wgsim is only 70 bases.
A paper about HKU1 said: "Genome analysis also revealed various numbers of tandem copies of a perfect 30-base acidic tandem repeat (ATR) which encodes NDDEDVVTGD and various numbers and sequences of imperfect repeats in the N terminus of nsp3 inside the acidic domain upstream of papain-like protease 1 among the 22 genomes." (https://pubmed.ncbi.nlm.nih.gov/16809319/) And NDDEDVVTGD is the translation of the 30-base repeat: `seqkit translate<<<$'>a\nAATGACGATGAAGATGTTGTTACTGGTGAC'`.
You also failed to assemble a complete contig for the porcine adenovirus genome, but it also contains a 723-base segment that is repeated twice between positions 29481 and 30929. This code finds repeats that are 100 bases or longer: `curl 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&rettype=fasta&id=NC_044935' >adeno.fa;grep -v ^\> adeno.fa|tr -d \\n|awk '{x=100;for(i=1;i<=length-x;i++){s=substr($0,i,x);if(s in a)print i,s;a[s]}}'`.
Even if the reference genome is scattered in multiple contigs, how would you ever identify such a phenomenon in a novel virus assembly? I'd have a hard time accepting, that we assemble contigs another time, until it fits...
Also, did you ever find out how the sequencer knows the distance between the paired reads?
If you get multiple contigs, you can do a BLAST search for the longest contig to see what virus it's the closest to. Then you can use a short read aligner like Bowtie2 to align the other contigs against the closest virus and see which contigs have a match. So for example when Wu et al. got multiple contigs for SARS2 with Trinity, they could've aligned all of their Trinity contigs against the Zhoushan bat virus ZC45, which used to be the best match for SARS2 on BLAST in January 2020, and then they would've gotten a couple of different contigs which aligned against different parts of ZC45. And if there would've been only small gaps between the contigs, they could've used the contigs to design PCR primers in order to fill the gaps.
Or if your longest contig is only about 12,000 bases long, and you search for the contig on BLAST and find that it is similar to a family of viruses which are generally around 30,000 bases long, then you can tell that the contig probably doesn't cover the whole genome of the virus.
I don't think the sequencer knows the distance between the paired reads, or at least the distance is not reported in the FASTQ files even though it would help de-novo assemblers to assemble the reads more accurately.
On Twitter you showed that you also failed to get a complete contigs for HIV-1 and HIV-2, but that's because HIV has a long terminal repeat where a long segment at the 5' end is repeated at the 3' end (https://en.wikipedia.org/wiki/Long_terminal_repeat):
> The HIV-1 LTR is 634 bp[5] in length and, like other retroviral LTRs, is segmented into the U3, R, and U5 regions. U3 and U5 has been further subdivided according to transcription factor sites and their impact on LTR activity and viral gene expression. The multi-step process of reverse transcription results in the placement of two identical LTRs, each consisting of a U3, R, and U5 region, at either end of the proviral DNA. The ends of the LTRs subsequently participate in integration of the provirus into the host genome. Once the provirus has been integrated, the LTR on the 5′ end serves as the promoter for the entire retroviral genome, while the LTR at the 3′ end provides for nascent viral RNA polyadenylation and, in HIV-1, HIV-2, and SIV, encodes the accessory protein, Nef.[6]
When I tried running MEGAHIT on a file that contained simulated reads from both SARS1 and SARS2, I got two contigs for SARS1 which covered positions 2-15045 and 14992-29714, and I got two contigs for SARS2 which covered positions 1-15114 and 15062-29859, so the contigs had about 50 bases of overlap in both cases but for some reason the contigs were not merged into a single contig:
Then I tried comparing parts of the SARS1 and SARS2 genomes around the region where the contigs were split: `(seqkit subseq -r 14950:15100 sars2.fa;seqkit subseq -r 15000:15200 sars1.fa)|mafft --clustalout -`. I noticed that SARS2 and SARS1 had an identical 74-base segment at that region (TAGACTTTATTATGATTCAATGAGTTATGAGGATCAAGATGCACTTTTCGCATATACAAAACGTAATGTCATCC).
The default read length used by wgsim is 70 bases, which is a few bases shorter than the length of the shared segment, but I failed to get single contigs for SARS1 and SARS2 even after I tried increasing the read length to 150 or 300. However when I ran MEGAHIT with the flag `--k-list 21,29,39,59,79,99,119,141,161,181,201,221,241,255`, I was able to get single contigs for both SARS1 and SARS2 even with the default read length of 70. And even `--k-list 21,29,39,59,79,99,119,141,161` was enough to get single contigs.
WOW
The overlords don’t want billions of people using up the earths resources. Satan wants to modify the natural God-given genome into his own image.
The intercession will occur in time to preserve a remnant of the natural genome.
My question is exactly what is the point? Is this going to do anything? Save a life? This whole mass Democide shit show is exactly that, kill off the population.
Unfortunately, it’s working. Yet, I’ll remain hopeful that just maybe we will be able to find a way to reverse the damage while there are still humans walking the earth. But am not going to hold my breath.
They engineered the viruses - virus can not be eradicated from earth.
There's no 'reference' genome - in that us mere mortals could ever be the arbiters of what constitutes a genome. Virus (RNA) forges its path forward through life to get to its next host.
Mere mortals like us can only watch in wonder.
Nothing was engineered. Nothing had to be engineered as fear was enough to get things in motion. Humans just complied as they are so scared of dying.
https://mikestone.substack.com/p/fear-is-the-real-virus
And that PCR test was a perfect tool that was NEVER supposed to be used as it's 100% unable to diagnose disease and sure cannot prove contagion (and contagion has never been proven for any "virus")
They flipped the script and inverted the truth over 40 years ago - to condition people into believing that ‘viruses kill’ and ‘vaccines save’ their lives.
They’ve been engineering viruses in labs for decades - to create vaccines and manipulate genetics.
The PCR test had nothing to do with anyone’s health - it drove the fear narrative and created a false data set that frustratingly people still, to this day, use as a metrics.
They also more than likely created a DNA database from everyone who got the PCR swabbed up the nose.
Virus spreads - but it doesn’t kill us and rarely makes us sick. The very fact it is contagious - means it can’t cause us much harm. Life is contagious - death and disease is not.
Your got heart
Why did they have to discover what they had made?
They discovered nothing! They created something so they could make people be-LIE-ve that a PCR test can diagnose a disease as it was able to amplify a very, very short sequence which they say belonged to viral genome. It's all been fabricated and people keep buying the narrative.
This whole charade is just wild! I am NO biologist. But if one used their own eyes you could plainly see the flu was having a moment, again, as it does every few years. Beyond that isn't everything else they've been shrieking, a lie...? Faux vaccines, hiding useful medicines, firing people for non-compliance, threatening interment camps and withholding medical treatment. I could go on, but it all sounds insane when you see it on paper. Reads like a poorly written novel. When they chose to fake people dropping dead in the street, they had already overplayed their hand. But the world was duped and I don't see that changing. I appreciate the sane voices out here, and it's good to know some people see their lies. Fauici and anyone else involved should be jailed. But that's another substack.😊
PCR doesn't amplify, it replicates.
The inventor of the polymerase chain reaction, the late Dr. Kary Mullis, shared the 1993 Nobel Prize in Chemistry with Michael Smith for his work.
Dr. Mullis was very clear that PCR had no diagnostic functionality, saying multiple times that it should not be used for diagnosing any disease.
In the patent for PCR that was issued in 1986 to Kary Mullis et al., they wrote that one application for PCR would be to diagnose the presence of pathogenic micro-organisms including viruses (https://patents.google.com/patent/US4683195). Mullis was also one of the authors of a patent from 1989 titled "Detection of viruses by amplification and hybridization" (https://patents.google.com/patent/US5176995A/en).
When Kary Mullis said that "quantitative PCR is an oxymoron", I think he meant that PCR was not an accurate way to quantify viral load, and not that PCR was not an accurate way to determine whether or not a sample contains a virus. An article by John Lauritsen said: "With regard to the viral load tests, which attempt to use PCR for counting viruses, Mullis has stated: 'Quantitative PCR is an oxymoron.' PCR is intended to identify substances qualitatively, but by its very nature is unsuited for estimating numbers." (https://www.virusmyth.com/aids/hiv/jlprotease.htm)
In an interview with Gary Null, Mullis also said: "(9:05) PCR came along right about the same time that HIV did. And it was in that [unintelligible] that people started looking with PCR for HIV. That was the only way to see it, except for culture. Which was a long protracted procedure, which a lot of times didn't turn right. [...] The culture - the whole method - cell biology is a bunch of magic half of the time. And people who say that they can do quantitative estimations of HIV from culture, they're just - they're fooling themselves." (https://www.bitchute.com/video/8SjzUDxBZL9t/) But by "quantitative estimations", I think Mullis was again talking about estimating viral load, and you don't even have to do cell culture to do PCR.
Three years ago people were making a big deal about how in a document published by the CDC, they wrote: "Since no quantified virus isolates of the 2019-nCoV are currently available, assays designed for detection of the 2019-nCoV RNA were tested with characterized stocks of in vitro transcribed full length RNA (N gene; GenBank accession: MN908947.2) of known titer (RNA copies/μL) spiked into a diluent consisting of a suspension of human A549 cells and viral transport medium (VTM) to mimic clinical specimen." (http://web.archive.org/web/20200205171727/https://www.fda.gov/media/134922/download)
However by "quantified virus isolates", they may have meant samples of the virus which would've been normalized for the number of copies of the virus that were included in each sample, so just because they said that "quantified virus isolates" were not available, it didn't even mean that they didn't have any kind of a sample of the virus available (https://www.reuters.com/article/uk-factcheck-cdc-idUSKBN27633R):
> Dr Thushan de Silva, from the University of Sheffield's Department of Infection, Immunity and Cardiovascular Disease, told Reuters that this was not correct.
> De Silva said that the document is describing what was used to determine the lowest amount of viral genetic material the RT-PCR assay could detect.
> "They describe a very common process during assay set up, where the limit of detection of the RT-PCR assay was determined", he said.
> In this case, the CDC have used 'transcribed' RNA as the positive control - which means they used synthetically produced genetic material identical to that carried by the virus.
> "To calculate the limit of detection of an RT-PCR assay, you need to have a known quantity of virus to extract genetic material (RNA) from, or alternatively a known quantity of RNA identical to that carried by the virus", de Silva said.
> According to de Silva, one reason for using transcribed RNA would have been that at the time of set up, not many standardised and quantified viral stocks would have been available to extract viral RNA from.
Patent applications are written for that specific purpose and can serve no other efficaciously.
Am I correct in observing that all the viruses that Megahit struggled with are coronaviruses?
Mostly, but there's also Adenovirus and Parainfluenza where it seemed to struggle...
It's because they are all viruses we shed (cold viruses) ssRNA
As opposed to viruses we host for life (like Varicella/Chicken Pox) dsRNA
A cold virus (strain) will outbreak and not stop until it reaches the herd immunity threshold. Which is 67% of the human race.
The virus will mutate it's transmission only - to go through or around anything in its path to get to its next host.
For a cold virus - once 2 in 3 from our 'human herd' are infected & immune (threshold) - the virus will disappear by spillover to another susceptible host species.
EG: Coronavirus can host in any vertebrae mammal.
At the start of an outbreak - the virus is using proofreading to find the next susceptible host species. ssRNA genome has a higher mutation rates - these characteristics are to ensure its chances of adaptation to new host species.
Nothing stops a virus - it'll continue to spillover through different host species - until it finds its host reserve.
The poly-A tail is the section of the genome that mutates (RNA processing/modifications) to improve it's transmission to reach herd immunity --- and to find it's next host species (spillover)
Us mere mortals - or our fancy computers - would never know what this ever-changing part of the genome could be.
Hence the saying - there is no cure for a common cold - a cold virus infects us all in a season and then disappears (spillover) it can't come back because our herd is immune.
This is why they've lied to us about natural (herd) immunity and spillover.
There is no SARS-CoV-2 variant - the one and only variant was created to jump to another host species.
SARS2 found an ACE2 match and spillover was to cats (felines) Fe-COV.
The outbreak was one and done.
The BS lie about 'variants' was to falsely imply we are never immune - and what does it mean if we're never immune....you guessed it --- vaccines of course!
In contrast --- dsRNA viruses like HSV1 or Varicella, we host for life.
As the host reserve - the dsRNA virus will replicate and infect the exact same cell in every human being until the end of time. So the genome should never change.
I reposted my comments here: https://mongol-fi.github.io/hamburgmath.html#USMortalitys_Substack_posts.
Basically the reason why you didn't get a complete contig for HKU1 is that the reference genome of HKU1 contains a region where the same 30-base segment is repeated 14 times in a row. And in HIV-1 and HIV-2 there's a long terminal repeat where a long segment at the 5' end of the genome is repeated at the 3' end of the genome. And porcine adenovirus contains a tandem repeat where the same 724-base segment is repeated twice in a row.
When you mixed together reads from multiple different viruses, you failed to get complete contigs for SARS2 and SARS1 because the contigs were split at a spot where there's a 74-base segment that is identical in the reference genomes of SARS2 and SARS1. But I was able to get complete contigs for SARS2 and SARS1 by increasing the maximum k-value of MEGAHIT from 141 to 161.
Because you asked a metagenomics tool to assemble three close genomes at the same time. Rookie move.
One of the reasons why the no-virus people were saying that the genome of SARS2 was fake was that in the Wu et al. paper where the Wuhan-Hu-1 reference genome was described, they wrote that the longest contig they got with MEGAHIT was 30,474 bases long but the longest contig they got with Trinity was only 11,760 bases long, so the no-virus people thought that Trinity produced a completely different genome for the virus. And they didn't realize that Trinity just split the genome into a couple of incomplete contigs which likely had only small gaps in between, even though with different settings Trinity may have also produced a complete contig.
However your experiments show that even though you generated sets of reads which covered the whole genome of a virus, de-novo assemblers like MEGAHIT still occasionally fail to produce a single complete contig for the whole virus. So it also indicates that there's nothing that anomalous in how Trinity failed to generate a complete contig from Wu et al.'s reads even though the reads actually covered the entire genome of Wuhan-Hu-1 apart from the last couple of bases of the poly(A) tail.
BTW the genomes of influenza viruses are about 15,000 bases long, but the reason why your influenza A, B, C, and D references are so short is because they don't include all genes.
The first sentence of the post says: "The Wu et al. 2020 paper is the first to discover the genetic sequence of the novel pathogen SARS-CoV-2." However I think the team of Winjor Small Mountain Dog discovered it earlier in December 2019: https://www.researchgate.net/profile/Gilles-Demaneuf/publication/360313016_Sequencing_and_early_analysis_of_SARS-CoV-2_27_Dec_2019_-_The_crushed_hopes_of_Little_Mountain_Dog_of_Vision_Medicals_China/links/626fa7afb1ad9f66c89a1d13/Sequencing-and-early-analysis-of-SARS-CoV-2-27-Dec-2019-The-crushed-hopes-of-Little-Mountain-Dog-of-Vision-Medicals-China.pdf.
I don’t think anyone is saying the genome is fake. The question is it’s validity. We both have failed to assemble the perfect reference contig and/or genome.
Someone in Christine Massey's Substack comments linked to the first post in your series as evidence that the genome of SARS2 is a fake in-silico creation.
I have already solved the mystery of why Wu et al.'s original 30,474-base contig cannot be reproduced, which is because the 3' end of the contig accidentally included a 618-base segment of human DNA or RNA, and human reads were masked with N bases in the reads that Wu et al. uploaded to the SRA: https://output.jsbin.com/suwuxoy#Why_is_the_longest_MEGAHIT_contig_not_reproducible_. And the reason why de-novo assemblers like MEGAHIT cannot recreate the full poly(A) tail of MN908947.3 is because the raw reads don't contain the full poly(A) tail, and Wu et al. described using RACE to sequence the poly(A) tail. Neither of those issues means that the genome of SARS2 is not valid.
Usually whenever you come up with a new reason why the genome of SARS2 is not valid, you end up being corrected by McKernan or me and you have to move your goalposts, but if your followers don't read your Substack comments or your Discord or your Twitter discussions with McKernan, then they don't necessarily realize that basically all of your arguments in this Substack series have turned out to be duds.
Obviously, I am not responsible for how other people interpret my articles.
I am also not coming up with a new reason, let's simply list the facts:
1) No one, to this date, was able to perfectly reproduce MN908947.3 with Megahit or any other assembler, even when trimming the alleged adapters, we still end up with two or three leading G's and a missing tail.
2) There are no reads in the sample which match perfectly. At best, there are 4 reads that map well to the head - when you assume trimming 3 bases is accurate, which you don't know for a fact. There's none for the tail.
My initial requirements, lined out in https://usmortality.substack.com/p/is-the-sars-cov-2-genome-valid have not been fulfilled, which were:
1) At least 10–20 reads that show a perfect alignment to the head & the tail.
2) A complete genetic sequence (RNA) of ~30kb length exists in the sample.
If you can show this, then I'm happy to announce that I would consider the sequence itself valid. Obviously, a valid sequence assembly would still need to be shown to originate from the virus particle, and then demonstrate causation of illness. The latter, should be fairly easy, if you can present a study, i.e. a double blinded RCT e.g. in an animal model where a synthesized sequence leads to production of the same particle and to uncontrolled spread and disease via subsequent natural exposure.
How many virions do you think were in the original sample? Surely, if the patient should have died from it, it must've been thousands or even more?
I've already shown that 100 virions, randomly fragmented, lead to about 26 matches of which maybe half are perfect, in a simulated environment. Hence, so far there's no evidence there were such virions in the sample.
I don't know if the 3-5 base inserts at the start of the reads are part of the adapters or not, but if they are, then you could technically say that if you remove the first 5 bases of each read with `fastp -f 5 -F 5`, then it's one aspect of adapter trimming.
I don't know if the patient in the Wu et al. paper died, because at least the paper didn't say that he died, and he was only 41 years old.
As evidence that some of the reads originate from a virion, it's sufficient that you have a single 150-base read which has a low number of mismatches to the SARS2 genome, which has been sequenced tens of millions times already.
There's a paper from last year which I believe confirms Koch's third postulate, because they were able to induce disease by infecting human volunteers with SARS2: https://www.nature.com/articles/s41591-022-01780-9. The isolate they used was SARS-CoV-2/human/GBR/484861/2020, which has an identical sequence to Wuhan-Hu-1 at GenBank. The authors of the paper didn't synthesize the virus but they cultured it, even though they sequenced a sample they cultured to verify that the virus was identical to the original isolate. But in the paper where Baric's team inserted the spike protein from the SARS1-like bat virus SHC014 to a mouse-adapted strain of SARS1, they were able to cause disease in mice by inoculating the mice with the synthetic virus: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4797993/.
I think the simulated reads are not that realistic, because they have a completely different coverage pattern than the real reads and a completely different distribution of template lengths: https://i.ibb.co/dP3xc0t/coverage-depth-and-template-length-real-vs-wgsim-vs-genome-ts.png.
As a 40-year veteran of Silicon Valley, I can strongly assure you that you should turn off comments.
Why?
Because they will track and destroy, likely because everyone knows the comments are where the gold is? You can't control what people say, and their comments have all the terms they're looking for to identify dissidents?
In your assembly experiment where you simulated the reads with wgsim, your longest MEGAHIT contig for HKU1 was only about 89% of the length of the HKU1 reference genome.
I also tried using wgsim to generate 100,000 reads for reference genome of HKU1, and when I ran MEGAHIT to assemble the reads, my longest contig was only 26,535 bases even though the HKU1 reference genome is 29,926 bases: `brew install megahit -s;brew install seqkit samtools;curl 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&rettype=fasta&id=NC_006577' >hku1.fa;wgsim -N100000 hku1.fa hku1_{1,2}.fq;megahit -1 hku1_1.fq -2 hku1_2.fq -o megahku;seqkit stat hku1.fa megahku/final.contigs.fa`.
When I aligned the contigs with Bowtie2, I noticed that my three contigs covered the positions 54-3136, 3397-29933, and 3397-3727, so there was a short gap from position 3137 to 3396: `bowtie2-build hku1{,}.fa;bowtie2 -p4 -x hku1.fa -fU megahku3/final.contigs.fa|samtools sort ->hku1.bam;samtools view hku1.bam|awk -F\\t '{l=length($10);print$4,$6,l,$4+l}'|column -t`.
When I ran `seqkit subseq -r 3137:3396 hku1.fa`, I noticed that the gap which was not covered by any contig fell within a region where the 30-base segment AATGACGATGAAGATGTTGTTACTGGTGAC was repeated 15 times. You can also see the repeats from here: https://www.ncbi.nlm.nih.gov/nuccore/NC_006577.2.
So if MEGAHIT would've had to assemble the contigs from unpaired reads that were 150 bases long, how could it know how many times the 30-base segment was repeated, if it can only see one 150-base window of the genome at a time? Actually one of the main reasons why the paired read layout is used is that it helps sequence regions with repeats, because there are variable-length gaps between the forward and reverse reads so that the read pair covers a region that is longer than an individual read. But even though wgsim generated paired reads, I guess the region with repeats was so long that the paired reads didn't help MEGAHIT assemble the region correctly. And actually the default read length used by wgsim is only 70 bases.
A paper about HKU1 said: "Genome analysis also revealed various numbers of tandem copies of a perfect 30-base acidic tandem repeat (ATR) which encodes NDDEDVVTGD and various numbers and sequences of imperfect repeats in the N terminus of nsp3 inside the acidic domain upstream of papain-like protease 1 among the 22 genomes." (https://pubmed.ncbi.nlm.nih.gov/16809319/) And NDDEDVVTGD is the translation of the 30-base repeat: `seqkit translate<<<$'>a\nAATGACGATGAAGATGTTGTTACTGGTGAC'`.
You also failed to assemble a complete contig for the porcine adenovirus genome, but it also contains a 723-base segment that is repeated twice between positions 29481 and 30929. This code finds repeats that are 100 bases or longer: `curl 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&rettype=fasta&id=NC_044935' >adeno.fa;grep -v ^\> adeno.fa|tr -d \\n|awk '{x=100;for(i=1;i<=length-x;i++){s=substr($0,i,x);if(s in a)print i,s;a[s]}}'`.
Good details, thx.
Even if the reference genome is scattered in multiple contigs, how would you ever identify such a phenomenon in a novel virus assembly? I'd have a hard time accepting, that we assemble contigs another time, until it fits...
Also, did you ever find out how the sequencer knows the distance between the paired reads?
If you get multiple contigs, you can do a BLAST search for the longest contig to see what virus it's the closest to. Then you can use a short read aligner like Bowtie2 to align the other contigs against the closest virus and see which contigs have a match. So for example when Wu et al. got multiple contigs for SARS2 with Trinity, they could've aligned all of their Trinity contigs against the Zhoushan bat virus ZC45, which used to be the best match for SARS2 on BLAST in January 2020, and then they would've gotten a couple of different contigs which aligned against different parts of ZC45. And if there would've been only small gaps between the contigs, they could've used the contigs to design PCR primers in order to fill the gaps.
Or if your longest contig is only about 12,000 bases long, and you search for the contig on BLAST and find that it is similar to a family of viruses which are generally around 30,000 bases long, then you can tell that the contig probably doesn't cover the whole genome of the virus.
I don't think the sequencer knows the distance between the paired reads, or at least the distance is not reported in the FASTQ files even though it would help de-novo assemblers to assemble the reads more accurately.
On Twitter you showed that you also failed to get a complete contigs for HIV-1 and HIV-2, but that's because HIV has a long terminal repeat where a long segment at the 5' end is repeated at the 3' end (https://en.wikipedia.org/wiki/Long_terminal_repeat):
> The HIV-1 LTR is 634 bp[5] in length and, like other retroviral LTRs, is segmented into the U3, R, and U5 regions. U3 and U5 has been further subdivided according to transcription factor sites and their impact on LTR activity and viral gene expression. The multi-step process of reverse transcription results in the placement of two identical LTRs, each consisting of a U3, R, and U5 region, at either end of the proviral DNA. The ends of the LTRs subsequently participate in integration of the provirus into the host genome. Once the provirus has been integrated, the LTR on the 5′ end serves as the promoter for the entire retroviral genome, while the LTR at the 3′ end provides for nascent viral RNA polyadenylation and, in HIV-1, HIV-2, and SIV, encodes the accessory protein, Nef.[6]
The reference genome of HIV-2 has an even longer 855-base long terminal repeat, where positions 1-855 are identical to positions 9505-10359: `curl -s 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&rettype=fasta&id=NC_001722' >hiv2.fa;seqkit subseq -r 1:855 hiv2.fa|seqkit locate -f- hiv2.fa|cut -f1,4-6|column -t`.
When I tried running MEGAHIT on a file that contained simulated reads from both SARS1 and SARS2, I got two contigs for SARS1 which covered positions 2-15045 and 14992-29714, and I got two contigs for SARS2 which covered positions 1-15114 and 15062-29859, so the contigs had about 50 bases of overlap in both cases but for some reason the contigs were not merged into a single contig:
curl 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&rettype=fasta&id=NC_004718.3'>sars1.fa
for i in 1 2;do wgsim -N100000 sars$i.fa sars$i\_{1,2}.fq;done;for i in 1 2;do cat sars[12]_$i.fq>sars3_$i.fq;done
megahit -1 sars3_1.fq -2 sars3_2.fq -o megasars3
cat sars[12].fa>sars12.fa
bowtie2-build sars12.fa{,};bowtie2 -x sars12.fa -fU megasars3/final.contigs.fa|samtools sort ->sars3.bam
samtools view sars3.bam|awk -F\\t '{print$1,$3,$4,$6,length($10)}'|column -t
Then I tried comparing parts of the SARS1 and SARS2 genomes around the region where the contigs were split: `(seqkit subseq -r 14950:15100 sars2.fa;seqkit subseq -r 15000:15200 sars1.fa)|mafft --clustalout -`. I noticed that SARS2 and SARS1 had an identical 74-base segment at that region (TAGACTTTATTATGATTCAATGAGTTATGAGGATCAAGATGCACTTTTCGCATATACAAAACGTAATGTCATCC).
The default read length used by wgsim is 70 bases, which is a few bases shorter than the length of the shared segment, but I failed to get single contigs for SARS1 and SARS2 even after I tried increasing the read length to 150 or 300. However when I ran MEGAHIT with the flag `--k-list 21,29,39,59,79,99,119,141,161,181,201,221,241,255`, I was able to get single contigs for both SARS1 and SARS2 even with the default read length of 70. And even `--k-list 21,29,39,59,79,99,119,141,161` was enough to get single contigs.