26 Comments

Thank you for this very important work! Can you lay out, perhaps here or perhaps in an article of its own, a walk through for lay people on this issue? Have you done that in a past piece? Similarly, can you lay out the scenario of what the assembly assembled, if it was NOT a new virus? Another questions, the CEO of Illumina is on record saying "his team" was flown into China to do the first sequence. Thoughts on that?

Expand full comment

Thanks. I have not broken it down in laymen's terms yet, because I don't have a final conclusion, it's a work in progress to get to the truth ;)

But I can summarize a couple of possible scenarios re assembled sequence:

1) It's from a virus. The inconsistencies seen, stem from the protocol or methods.

2) It's from an artificially created/circulated RNA sequence (JJ Couey's hypothesis)

3) It's an in-silico creation, based on genetic material from (degenerated) human DNA/RNA or other organisms (bacteria, etc.)

Do you have a link to the video you mention?

Expand full comment

yes that's it. i think there might be another one as well where another figure says the same thing. either way, yes. i think it's interesting that he says "we were called in to china to help...." i always thought Prof Zhang had done it himself with his team, but apparently the Illumina team was "called in." I find the timing very odd of course. At that point, we had a some people with pneumonia but no new virus and they are calling in an international team to find a new virus? Wuhan ALWAYS has people with pneumonia bc its a big polluted city... seems a bit odd to be picking up the bat-phone already. i suppose a good fear of sars could do it, nevertheless, just one more oddity in a story of oddities....

Expand full comment

Thanks. And what do you think could prove one of these scenarios vs the others? Even if you could align the very same alignment from the experiment each time, would that prove 1 not 2 or 3? Is it not logically beyond the scope of the sequencing itself to say whether or not 1 or 3 is true?

Expand full comment

Yes, if the sequence was found valid/complete, then It would still be open where the sequence originates from. B/c the patient sample consists of all sorts of genetic material.

Expand full comment

Agreed. It would be great if you ran out an explanation of that for lay people and also then what you are trying to accomplish with the work you are doing here.

I have seen some folk claim that they can filter out all non-human rna before running the alignment, and others claim that is not fully possible. Thoughts on that? My understandng is that is not actually possible as we don't truly know the extent of any single individuals rna at any given time.

I also/even read a fellow who said he filtered out all non-human rna and then ran the alignment and was able to get it 100% for sars-cov2 on blast. thoughts regarding that?

Expand full comment

Yes, besides knowing all human dna, mrna etc. how it’s possible to account for degeneration (due to the methods, or illness) of such, is unclear to me ;)

Expand full comment

To Australians in the know, Prof. Ed Holmes is known as 'sweaty Eddie.'

Sir was on TV multiple times in 2020 discussing the origins of covid, looking like someone was holding a gun to his head off screen sweating profusely. It's not about your emails, and it is definitely not about science. It's about the US DOD, the ADF, WHO and the CCP. He has been given his instructions. Tread lightly.

Edit: Hope that answers your question about why he told you to FO and why no one in China will contact you.

Expand full comment

What if this sequence can be created in people by an external delivery vector, for example via air, food, water, or EMF?

Combine this with using lower or higher PCR-cycles, and it would be trivial to dial up or down the number of positive PCR-cases, according to the need.

Expand full comment

I recently looked at the mouse study they did with mutated mice that express ace2. They used stock covid to infect them. I guess you'd like to see that repeated with a synthetic version? I'd like to know what was in the 'stock covid' they used. What is depressing about that study is they mention these mutated mice could be used to assess early treatment of antivirals. I'm not sure such studies were ever done...not that I'd know how to search for them.

Expand full comment

Afaik, typically the viral isolates, e.g. stock covid, one can order are synthetically produced agents based on the sequence in genbank.

Expand full comment

If this sequence only exists in silco, does it have to make sense? I have yet to see any compelling evidence sars-cov2 actually exists, and having just a genomic sequence really proves nothing IMO. These liars can take any old snippets from GenBank or wherever, combine them in silico, and claim it is this or that, but without an actual sample from an infected patient, there is no proof of this alleged virus.

I'm not sure who the man is, or when this event was, but I have a copy of a video clip from some WEF conference and the exact quote from this man is:

'Moderna has never had a live virus in their sight, it was all a software problem'.

link to video:

https://video.twimg.com/amplify_video/1634540155016019971/vid/504x276/Uv3k8zoOZrtQ8Wfd.mp4?tag=16

And many others have publicly stated similar. I'm not sure how many of these people can keep saying they never had an actual virus before people start believing that they never had an actual virus.

Expand full comment

1. You wrote: "Even when trimming off the leading three or four nucleotides from the reads, Megahit still assembles de-novo the head with two extra G bases at the head!"

However I was able to get rid of the two extra G bases when I trimmed the reads with `fastp -f 5 -t 5`, which removes the first 5 and last 5 bases from both forward and reverse reads. Then my longest MEGAHIT contig was 29,870 bases long, and it was otherwise identical to Wuhan-Hu-1 except it was missing the entire poly(A) tail.

There's 25 forward reads which match the first 20 bases of Wuhan-Hu-1 but with two extra G bases added to the start: `seqkit locate -p GGATTAAAGGTTTATACCTTCC SRR10971381_1.fastq`. But in all reads except a single read, the match is on the minus strand so the extra G bases at the start are actually extra C bases at the end of the read. (The forward reads are sense for cDNA and therefore antisense for RNA, so they're what people would intuitively think of as the reverse reads.)

2. You wrote: "No one, to this date, was able to perfectly reproduce MN908947.3 with Megahit or any other assembler, even when trimming the alleged adaptors, we still end up with two (or three) leading G's and a missing tail." But I figured out how to get rid of the leading G bases. And the full poly(A) tail is not included in the metagenomic reads, and Wu et al. wrote that they sequenced the ends of the genome with RACE (rapid amplification of cDNA ends).

Three years ago when the WIV published the second version of RaTG13 which had an extra 15 bases inserted to the 5' end, Steven Quay was saying that that the genome was somehow fake because the extra 15 bases were not included in the metagenomic reads at the SRA (https://twitter.com/quay_dr/status/1318041151211884552). However next year the RACE sequences of RaTG13 were published at the SRA, and they included the extra 15 bases that were missing from the metagenomic reads (https://www.ncbi.nlm.nih.gov/sra?linkname=bioproject_sra_all&from_uid=606165). So the reason why the genome of RaTG13 was updated may have been that the ends of the genome were sequenced with RACE, and the current version of RaTG13 cannot be reproduced by assembling the original metagenomic reads either.

3. You wrote: "Also, one can tell that the adaptors were probably trimmed, from the way that there are several reads which match the Illumina TruSeq single index adaptors apart from one or two errors. So the errors probably prevented the adaptors from getting trimmed."

More specifically you can find a couple of those reads like this: `printf %s\\n \>adapter1 AGATCGGAAGAGCACACGTCTGAACTCCAGTCA \>adapter2 AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT>adapter.fa;bowtie2-build adapter.fa{};bowtie2 -p4 --local --no-unal -x adapter.fa <(seqkit head -n100000 SRR10971381_1.fastq)|grep -v ^@|cut -f1,10|seqkit tab2fx|cat - adapter.fa|mafft --clustalout -`.

4. In your screenshot of the nanopore reads, the reason why the reads have a bunch of N bases and no T bases is that you didn't replace U bases with T bases (for example by running `seqkit seq --rna2dna`). And I don't know why, but a bunch of your reads seem to match the reference but their alignment is just off by about 1-20 bases. The same thing didn't happen to me when I aligned the reads with minimap2: `minimap2 -a --sam-hit-only sars2.fa <(seqkit seq --rna2dna VeroInf24h.all.fastq.zst)|samtools sort -@2 ->nano.bam`.

5. Fast Eddie wrote: "As for the publishing unfiltered reads, there are important ethical issues about realising human DNA. Not my call to make." And in another email he wrote: "Can I also say that I am completely against the raw reads being published as this would be totally unethical. The majority of these reads would be human DNA and it would be a serious breach of ethics for these to be published without the consent of the patient in question, particularly the identity of that patient is now widely known."

Last year some no-virus people were wondering how come half of the reads consisted of only N bases, but it's probably because human reads were masked like Eddie said. The SRA's website says: "Human metagenomic studies may contain human sequences and require that the donor provide consent to archive their data in an unprotected database. If you would like to archive human metagenomic sequences in the public SRA database please contact the SRA and we will screen and remove human sequence contaminants from your submission." (https://www.ncbi.nlm.nih.gov/sra/docs/submit/) The NCBI has a tool called sra-human-scrubber which is used to mask human reads with N bases in SRA submissions (https://ncbiinsights.ncbi.nlm.nih.gov/2023/02/02/scrubbing-human-sequences-sra-submissions/).

Wu et al. also wrote that they had a total of 56,565,928 reads out of which 23,712,657 were non-human reads.

6. You wrote: "How many virions were in the original sample? Surely, if the patient should have died from it, it must've been thousands or even more?" But I don't think the patient in the Wu et al. paper died from COVID, or at least the paper didn't say that he died, and he was only 41 years old.

Expand full comment

1. Sure, but why 5? According to Takara it's 3!

2. Not sure what you mean here... I'm aware, that the end was found via RACE, but why are is it not to be found in the original reads?

3. was actually a statement from yourself.

4. No, I've already shown you how it looks when I ran the command you provided. See discord for screenshot. Please produce your own version, if you think there's a different outcome. Commands provided here: https://twitter.com/USMortality/status/1695584679565590528

5. Yes

6. Agreed, I said that in the the first installment of my series, the outcome of the 41 year old patient is unknown. I will correct this article.

Thx.

Expand full comment

I was wrong in my previous comment when I said that the reason why a bunch of the nanopore reads have segments that are off by 1-20 bases is that the aligner didn't insert enough indels to the reads. Actually the reason why the reads appear to be full of errors is that IGV is displaying soft-clipped parts of reads, and in soft-clipped segments even bases that are identical to the reference are not displayed as blanks. So you have to uncheck "Show soft-clipped bases" from the "Alignments" tab in the preferences.

In order to accurately gauge the number of mismatches in IGV, you also have to uncheck "Quick consensus mode" from the context menu of the alignment view, because otherwise IGV will display mismatches as blanks unless the percentage of mismatches at the position is above a threshold (https://media.discordapp.net/attachments/1125149422867714048/1146507584719831191/a.gif, https://www.pacb.com/blog/igv-3-improves-support-pacbio-long-reads/).

In order to save memory, IGV also hides reads at positions with more than 100 reads unless you uncheck "Preferences > Alignments > Downsample reads" (https://software.broadinstitute.org/software/igv/book/export/html/37). But if you disable the downsampling option, then you'll see that there's 9 reads with starting position 1, and all except one of the reads have zero mismatches from the first 10 bases of Wuhan-Hu-1. All reads are hard-clipped though, and in some reads there's over a thousand hard-clipped bases before the part of the read which matches the 5' end of Wuhan-Hu-1.

Replies to other points:

1. I trimmed 5 bases because the extra piece of crap at the start of the reads was always 3-5 bases long, and I was trying to remove it so I wouldn't get the two extra G bases in my MEGAHIT contig.

2. Since you already know the reason why the full 29,903-base sequence cannot be reproduced, you should've mentioned the reason in your article instead of making it seem like it's some kind of an unsolved mystery (which would appear to strengthen your argument that the genome has somehow not been "validated"). And I don't know why it's common for metagenomic reads to be missing around 10 or 20 bases from either end of the genome of a virus, but the same thing is also true of the metagenomic reads for RaTG13.

Expand full comment

Yes, but hard clipped reads are obviously not a good example of validity. In opposite, they actually tend to prove, that the small reads are just broken down segments of bigger sequences, that got aligned. The longest one we've found thus far was 64 bases long, theoretical chances are extremely low, that this is by chance, is it?

1. Sure, but unless we have ultimate confirmation, why any trimming be it 3-5 would be required, it cannot be serve as definitive evidence.

2. What's the reason RACE?

Expand full comment

I occasionally see references to "no-virus people" but I don't understand. What does "no-virus people" mean?

Expand full comment

If Dr. Sam Bailey is correct, and rabies is caused by bacteria, she should be able to use an animal model of rabies and cure the animal with antibiotics. I'm waiting for her to prove her hypothesis.

Expand full comment

There are people who do not believe viruses are real. No viruses. At all. See Dr. Sam Bailey, Dr. Mark Bailey, Dr. Tom Cowan, Dr. Andrew Kaufman, Dr. Stefan Lanka and Dr. Lee Merritt.

Dr. Sam Bailey said she believed rabies was probably a bacterium instead of a virus but didn't offer up any proof of her hypothesis. Dr. Stefan Lanka told a German newspaper that measles is a psychosomatic illness and that people become ill after traumatic separations.

Expand full comment

So what do these people attribute the obvious sickness that is COVID to?

Expand full comment

They are claiming no one was actually sick with anything resembling COVID-19 until after the roll out of the COVID-19 vaccine. They simply say they were "poisoned".

Expand full comment

Crazy. I knew two people who died of COVID before the vaccines. What do these people say happened to my mother-in-law and my friend?

Expand full comment

Some people, not even no virus folks, speculate a lot of the severe cases were mistreated, or untreated, pneumonia. If they had a postive Covid test an happened to be having pneumonia from a bacteria...

Expand full comment

If they were in a hospital, they would probably say they were poisoned with Remdesivir which I am sure some of them were. I don't follow them very closely because they make my head spin.

Expand full comment
Comment deleted
Sep 17, 2023
Comment deleted
Expand full comment

I tried assembling contigs from the reads and aligning them against NCBI's file of virus reference sequences:

parallel-fastq-dump --gzip --threads 10 --split-files --sra-id SRR10571724

megahit -1 SRR10571724_1.fastq.gz -2 SRR10571724_2.fastq.gz -o megabalf

wget https://ftp.ncbi.nlm.nih.gov/refseq/release/viral/viral.1.1.genomic.fna.gz

bowtie2-build --threads 3 viral.1.1.genomic.fna.gz{,}

bowtie2 -x viral.1.1.genomic.fna.gz -fU megabalf/final.contigs.fa --no-unal -p3|samtools sort ->balf.bam

x=balf.bam;samtools coverage $x|awk \$4|cut -f1,3-6|(gsed -u '1s/$/\terr%\tname/;q';sort -rnk5|awk -F\\t -v OFS=\\t 'NR==FNR{a[$1]=$2;next}{print$0,sprintf("%.2f",a[$1])}' <(samtools view $x|awk -F\\t '{x=$3;n[x]++;len[x]+=length($10);sub(/.*NM:i:/,"");mis[x]+=$1}END{for(i in n)print i"\t"100*mis[i]/len[i]}') -|awk -F\\t 'NR==FNR{a[$1]=$2;next}{print$0"\t"a[$1]}' <(seqkit seq -n viral.fa|gsed 's/ /\t/;s/,.*//') -)|column -ts$'\t'

However I got only 8 aligned contigs. One long contig aligned against "Escherichia phage phiX174" with about 94% coverage. Four contigs aligned against "Human endogenous retrovirus K113", but they may have been human reads because the genomes of HERVs are incorporated into the human genome, and a lot of other human lung samples I have checked have also matched the K113 HERV: https://mongol-fi.github.io/hamburgmath.html#A_different_lung_metagenome_can_be_used_as_a_control_for_MEGAHIT_assembly. I got one short contig which aligned against "Enterobacteria phage P7". And I got two short contigs which aligned against "BeAn 58058 virus", but its sequence contains a part of the human genome so the contigs which matched it also consisted of short segments of the human genome. So basically the only virus I got a long contig for was the Escherichia bacteriophage.

Expand full comment