Real-time phylogenomics

Phylogenomics, Evolution and Bioinformatics by Lonely Joe Parker

 

10 Feb 2017:

Read our real-time phylogenomics paper on bioarXiv...

13 July 2016:

Read more about our real-time phylogenomics experiment in Snowdonia

Introduction

Biology is changing. In the space of a decade the mechanics of reading living things' DNA codes has moved from a specialised job taking weeks and hundreds of thousands of pounds, to a simple procedure many people could carry out in their own home. The implications for evolutionary biology, genetics, medicine, agriculture and conservation are profound. The challenge is to analyse this torrent of data. I build bioinformatics apps to automatically process DNA data immediately, as it is generated: I want to examine and analyse living organisms’ DNA sequences in the field - as simply, quickly, and cheaply as we measure their height, weight or any other aspect of their physical appearance.

Over time, organisms’ DNA sequences evolve in response to their changing environments, competitors and predators. By comparing similar gene sequences between species and individuals, we can use the numbers and patterns of these changes to infer their evolutionary history – for instance, when did two species diverge? Which genes were the most important for their survival? How many individuals where there in each population, and how have they spread across the globe?

These ‘phylogenetic’ studies have now shifted into a whole new gear as both computing power and sequencing ability (the speed and cost to read letters of DNA from a genome) have expanded by several orders of magnitude. We’re discovering that, although the basic principles of molecular evolution hold true, the variety and detail by which these patterns are realised in DNA sequences reflect the infinite multiplicity of physical forms seen in the natural world.

I am an Early Career Research Fellow in Phylogenomics at the Royal Botanic Gardens, Kew and a Fellow of the Software Sustainability Institute.

Questions

Research at the interface of evolution, informatics, and ecology

Projects

I pursue several research questions across linked areas, in collaboration with colleagues and students:

Real-time phylogenomics

The coming ubiquity of both portable DNA sequencers and cloud computation mean scenarios formerly found in sci-fi films (instant DNA analysis) are coming, soon. I'm developing methods to streamline DNA sequence analysis using cloud computation.
View details »

Trees of Life

Although terabases of DNA data are available in public genomic databases, analyses to infer the Tree of Life - a longstanding goal of evolutionary biology - use only a fraction of this data. I'm exploring techniques to use much more of these available datasets.
View details »

Metagenomics

Modern DNA sequencers are highly portable, compared to lab-bound models of a decade ago. I'm trialling field-based sequencing using the MinION USB sequencer - a palm-size device with potential to revolutionise environmental metagenomics and turbotaxonomy.
View details »

Convergent evolution

Phylogenomic big-data allows us to detect statistical patterns with weak effects, such as adaptive convergent molecular evolution. I'm also interested in patterns of gene family evolution, homology, and divergent adaptive selection.
View details »


Other research

Other research interests and collaborations include:

Metrics on tree space

Phylogenomic models accounting for uncertainty require useful metrics on tree space - the 'distance' between two or more phylogenetic trees. However few useful such measures exist and I'm hunting for more...
View details »

Asynchronous bioinformatics

The vast scale of bioinformatics datasets currently being assembled require models of asynchronous computation; meta-algorithms where model areas are updated asynchronously on separate machines.
View details »

Software sustainability & open research

Development of sustainable software and open research norms is a priority for big-data empirical bioscience in the 21st centrury, to avoid the 'reproducibility crisis'. I'm a Fellow of the SSI.
View details »

Sociology and biology

I'm interested in the parallels and divergences between the natural world (in a systems biology context) and organisation of human societies. Maybe I'll get to take a sabbatical one day!
View details »

Outputs

Publications, talks and teaching

Publications

See Google Scholar for the most recent...

Why aren’t we benchmarking bioinformatics?

Talk presented at the #bench16 (benchmarking) symposium at KCL, London, Wed 20th April 2016. Funded by the SSI. Slides (Slideshare – cc-by-nd) Tweet this Digg Post to LinkedIn Slashdot Stumble This

Application note: ‘Befi-BaTS’ version 0.10.1 – Error rate and statistical power of distance-based measures of phylogeny-trait association.

In prep. SUMMARY Building on work presented previously (Parker et al., 2008), we study a number of more complex measures of phylogeny-trait association (implemented in the program Befi-BaTS / BaTS v0.10.1) which take into account the branch lengths of a … Continue reading

Molecular convergence and adaptive evolution in constant-frequency echolocating Chiroptera detected in phylogenomic datasets

In prep. Manuscripts in progress (all rights reserved – you may not copy or distribute these files; content and conclusions subject to change; strictly embargoed until publication in a peer-reviewed journal/book): v1: .doc   Tweet this Digg Post to LinkedIn Slashdot … Continue reading

Application note: CONTEXT, a Phylogenomic Dataset Browser

In prep. (v3 – 14 Jun 2017) Summary. The CONTEXT (COmparative Nucleotides and Trees Exploration Tool) is a phylogenomics dataset browser that consists of a Java API and an executable binary jarfile with graphical user interface (GUI) for the high-throughput analysis … Continue reading

Detection of molecular convergence – literature review

In prep. (v2 – 21 April 2015) Abstract Convergent evolution is a process by which neutral evolutionary processes and adaptive natural selection in response to niche specialisation lead to similar forms arising in unrelated taxa. Phenotypic convergence has been appreciated … Continue reading

Application note: the Genomic Convergence Detection Pipeline

In prep. (v0 – 24 February 2015) Summary. Genome Convergence Pipeline consists of a Java API and an executable binary jarfile with graphical user interface (GUI) for the high-throughput analysis of phylogenomic datasets to detect convergent molecular evolution. Motivation. Although … Continue reading

Phylogenomic convergence detection: lessons and perspectives

Talk presented at the 18th Evolutionary Biology Meeting At Marseille (programme), 16th-19th September 2014. (Powerpoint – note this is a draft, not the final talk, pending authorisation): EBMdraft Tweet this Digg Post to LinkedIn Slashdot Stumble This

MSc Seminar at UCD, Dublin

High-throughput comparative genomics Research seminar presented for MSc students at University College Dublin, 24rd October 2013. Invited by Prof. Emma Teeling’s lab at UCD. Powerpoint: UCD_MSc_phylogenomics_joeParker_edit Tweet this Digg Post to LinkedIn Slashdot Stumble This

Our Nature paper! Genome-wide molecular convergence in echolocating mammals

Exciting news from the lab this week… we’ve published in one of the leading journals, Nature!!! Much of my work in the Rossiter BatLab for the last couple of years has centred around the search for genomic signatures of molecular … Continue reading

High-throughput computing and phylogenomics

Seminar presented at the Tropical Biodiversity in the 21st Century symposium, held at the Natural History Museum, London on the 3rd & 4th June 2013 (programme). Powerpoint: High-throughput computing and phylogenomics Tweet this Digg Post to LinkedIn Slashdot Stumble This

The mode and tempo of hepatitis C virus evolution within and among hosts.

BMC Evol Biol. 2011 May 19;11(1):131. [Epub ahead of print] Gray RR*, Parker J*, Lemey P, Salemi M, Katzourakis A, Pybus OG. *These authors contributed equally to this article. BACKGROUND: Hepatitis C virus (HCV) is a rapidly-evolving RNA virus that … Continue reading

Molecular epidemiology and phylogeny reveals complex spatial dynamics of endemic canine parvovirus.

J Virol. 2011 May 18. [Epub ahead of print] Clegg SR, Coyne KP, Parker J, Dawson S, Godsall SA, Pinchbeck G, Cripps PJ, Gaskell RM, Radford AD. Canine parvovirus 2 (CPV-2) is a severe enteric pathogen of dogs, causing high … Continue reading

Generation of neutralizing antibodies and divergence of SIVmac239 in cynomolgus macaques following short-term early antiretroviral therapy.

PLoS Pathog. 2010 Sep 2;6(9):e1001084. Ozkaya Sahin G, Bowles EJ, Parker J, Uchtenhagen H, Sheik-Khalil E, Taylor S, Pybus OG, Mäkitalo B, Walther-Jallow L, Spångberg M, Thorstensson R, Achour A, Fenyö EM, Stewart-Jones GB, Spetz AL. Neutralizing antibodies (NAb) able … Continue reading

Safety and immunogenicity of novel recombinant BCG and modified vaccinia virus Ankara vaccines in neonate rhesus macaques.

J Virol. 2010 Aug;84(15):7815-21. Epub 2010 May 19. Rosario M, Fulkerson J, Soneji S, Parker J, Im EJ, Borthwick N, Bridgeman A, Bourne C, Joseph J, Sadoff JC, Hanke T Although major inroads into making antiretroviral therapy available in resource-poor … Continue reading

Full-Length Characterization of Hepatitis C Virus Subtype 3a Reveals Novel Hypervariable Regions under Positive Selection during Acute Infection

Humphreys I, Fleming V, Fabris P, Parker J, Schulenberg B, Brown A, Demetriou C, Gaudieri S, Pfafferott K, Lucas M, Collier J, Huang KH, Pybus OG, Klenerman P, Barnes E. J Virol. 2009 Nov;83(22):11456-66. Epub 2009 Sep 9. Hepatitis C … Continue reading

The within- and among-host evolution of chronically-infecting human RNA viruses

A research thesis submitted for the degree of Doctor of Philosophy at the University of Oxford. J Parker Funded by: Natural Environment Research Council (UK) with support from Linacre College, Oxford. Abstract: This thesis examines the evolutionary biology of the … Continue reading

Estimating the Date of Origin of An HIV-1 Circulating Recombinant Form

Virology. 2009 Apr 25;387(1):229-34. Epub 2009 Mar 9. Tee KK, Pybus OG, Parker J, Ng KP, Kamarulzaman A, Takebe Y. HIV is capable of frequent genetic exchange through recombination. Despite the pandemic spread of HIV-1 recombinants, their times of origin … Continue reading

Correlating Viral Phenotypes With Phylogeny: Accounting for Phylogenetic Uncertainty

Infect Genet Evol. 2008 May;8(3):239-46. Epub 2007 Aug 21. Parker J, Rambaut A, Pybus OG. Many recent studies have sought to quantify the degree to which viral phenotypic characters (such as epidemiological risk group, geographic location, cell tropism, drug resistance … Continue reading


Talks and presentations

Invited talks and lectures

Conference Presentations and Posters


Teaching

Code

Phylogenetics and bioinformatics software (with just a smattering of web bits)

Note on previous versions - many of my software projects have (June 2015) migrated from Google Code to GitHub (see more details). If you have a query about a piece of software please include information about when and where you downloaded it to help troubleshooting.

Active software


Legacy software


In development

Open Data

Data and scripts for reproducible science

Datasets

I'm a public-funded scientist and an advocate of Open Data and Reproducible Research. My previous work as a postdoc has been funded via a variety of means and published under multiple licenses, but source data for most of my publications is available. If you want workflow scripts and software please email me and I'll try to help where I can.

For my own work I now use GitHub extensively to document and version-control my analyses; I also use Endnote a lot and complete notebooks will be published with each publication.


Research objects

Obviously truly reproducible research is quite a large step on from 'give us your short reads and executables' - a complete bioinformatics analysis might include several people on multiple machines - and documenting all these steps is a kew challenge for reproducibility, a cornerstone of empirical research. I'm exploring the use of Docker containers, iPython notebooks and Research Objects to make it simpler for me to document, reproduce and communicate my research.


Machine images

I am moving a substantial proportion of my compute load to cloud resources, in particular Amazon's EC2. At present one machine image is available, from my 'lightweight bioinformatics' project. Search AWS AMIs for 'ami-90296be7'.

Credits

Funding, collaborators & students, employers and contact info.

Funding

My work with field-based MinION DNA sequencing is supported by a Pilot Study Fund grant from the Kew Foundation.

The Phylo-Hackathon project is supported by a Fellowship grant from the Software Sustainability Institute.

Previous work as a postdoc and PhD student has been funded by NERC, the BBSRC, the MRC, the European Research Council, the Royal Society and the Daiwa Foundation.


Employer

I am currently employed in the Biodiversity Informatics & Spatial Analysis department of the Science Directorate at the Royal Botanic Gardens, Kew in London.


Collaborators

Current collaborators include:


Students & mentors

MSc and project students:

Mentors and advisors:


Contacts / consultancy

Address

Dr. Joe Parker
Biodiversity Informatics & Spatial Analysis
The Jodrell Laboratory
Royal Botanic Gardens, Kew
Surrey, UK TW9 3AB
p: +44(0)20 8332 5063
e: joe.parker@kew.org
P: _WCn7AYAAAAJ
O: 0000-0003-3777-2269
g: @lonelyjoeparker
T: @lonelyjoeparker

Consulting

I'm also available to provide consultancy services to private partners on big-data projects in genomics, phylogenomics, bioinformatics/informatics and statistics. This work is delivered via Kitson Consulting.


About this site

I was inspired to give my sad-looking Wordpress site the boot (and a kick up my own arse) by the very, very excellent Bedford Lab website. However, although that site uses loads of cool technologies (like CMS/source code control managed directly on GitHub, compiled to static HTML via Jekyll, all served up on a Heroku instance...) I reckoned it was overkill for me.

Instead I've nicked some ideas from that site (layouts in Bootstrap, fonts from Typekit) but haven't completely jettisoned the old (Wordpress) CMS running on LAMP yet - partly because I haven't got the time to write a good parser for all that legacy content, and partly because I still blog there about non-science things.

So the site you see is generated simply from a (largely) static HTML file with a couple of bits of PHP pulling in Wordpress posts to populate the blog and publications. Parallax effects use Aen Tan's Parallax-Scroll code and I figured out the integration with Bootstrap (actually pretty simple) with a lot of help from this tutorial. The whole site probably took less than 20 hours to put together including design, parsing WP, and deployment and testing - I think that's pretty good, on balance.

At some point I might add a couple of sub-pages for projects, etc, as well as some server-side stuff to update publication counts and Github commits on the fly. But that's tomorrow, and tomorrow's a long way away. Lastly, my existing web host is a pretty good deal so I'll probably only move the big bandwidth stuff to S3, and then only if the server logs show I really need to! Suggestions welcome.

Blog

Posts on ongoing research

Blog

What is ‘real-time’ phylogenomics?

Over the past few years I’ve been developing research, which I collectively refer to as ‘real-time phylogenomics’ – and this is the name of our mini-site for MinION-based rapid identification-by-sequencing. Since our paper on this will hopefully be published soon, … Continue reading

Some aspects of BLASTing long-read data

Quick note to explain some of the differences we’ve observed working with long-read data (MinION, PacBio) for sample ID via BLAST. I’ll publish a proper paper on this, but for now: Long reads aren’t just a bit longer than Illumina data, … Continue reading

Science and (small) business

Over the last 10-20 years there’s been a revolution in academic science (or should that be ‘coup’?) where many aspects of the job have been professionalised and formalised, especially project management but management in general. This generally includes tools like … Continue reading

Why aren’t we benchmarking bioinformatics?

Talk presented at the #bench16 (benchmarking) symposium at KCL, London, Wed 20th April 2016. Funded by the SSI. Slides (Slideshare – cc-by-nd) Tweet this Digg Post to LinkedIn Slashdot Stumble This

More MinION – the ‘1D rapid’ prep

My last MinION post described our first experiments with this really cool new technology. I mentioned then that their standard library prep was fairly involved, and we heard that the manufacturers, Oxford Nanopore, were working on a faster, simpler library prep. We … Continue reading

Copying LOADS of files from a folder of LOADS *AND* LOADS more in OSX

Quick one this, as it’s a tricky problem I keep having to Google/SO. So I’m posting here for others but mainly myself too! Here’s the situation: you have a folder (with, ooh, let’s say 140,000 separate MinION reads, for instance…) … Continue reading

How to fake an OSX theme appearance in Linux Ubuntu MATE

I’ve recently been fiddling about and trying to fake an OSX-style GUI appearance in Linux Ubuntu MATE (15.04). This is partly because I prefer the OSX GUI (let’s be honest) and partly because most of my colleagues are also Mac … Continue reading

Messing about with the MinION

Molecular phylogenetics – uncovering the history of evolution using signals in organisms’ genetic sequences – is a powerful science, the latest expression of the human desire to understand our common origins. But for all its achievements, I’d always felt something … Continue reading

Cheat on your exams

Had a heated discussion with a friend the other day. I went to a school, where ‘exam techniques’ were part of the standard toolkit given to students to get them the best possible grades at GCSE, A-levels, and beyond. She … Continue reading

BaTS (and Befi-BaTS), SHiAT, and Genome Convergence Pipeline have moved!

Important – please take note! Headline: All my phylogenetics software is now on GitHub, not websites or Google Code Please use the new FAQ pages and issue/bug tracker forms, rather than emailing me directly in the first instance Until now, … Continue reading

Application note: ‘Befi-BaTS’ version 0.10.1 – Error rate and statistical power of distance-based measures of phylogeny-trait association.

In prep. SUMMARY Building on work presented previously (Parker et al., 2008), we study a number of more complex measures of phylogeny-trait association (implemented in the program Befi-BaTS / BaTS v0.10.1) which take into account the branch lengths of a … Continue reading

Molecular convergence and adaptive evolution in constant-frequency echolocating Chiroptera detected in phylogenomic datasets

In prep. Manuscripts in progress (all rights reserved – you may not copy or distribute these files; content and conclusions subject to change; strictly embargoed until publication in a peer-reviewed journal/book): v1: .doc   Tweet this Digg Post to LinkedIn Slashdot … Continue reading

Application note: CONTEXT, a Phylogenomic Dataset Browser

In prep. (v3 – 14 Jun 2017) Summary. The CONTEXT (COmparative Nucleotides and Trees Exploration Tool) is a phylogenomics dataset browser that consists of a Java API and an executable binary jarfile with graphical user interface (GUI) for the high-throughput analysis … Continue reading

Detection of molecular convergence – literature review

In prep. (v2 – 21 April 2015) Abstract Convergent evolution is a process by which neutral evolutionary processes and adaptive natural selection in response to niche specialisation lead to similar forms arising in unrelated taxa. Phenotypic convergence has been appreciated … Continue reading

Application note: the Genomic Convergence Detection Pipeline

In prep. (v0 – 24 February 2015) Summary. Genome Convergence Pipeline consists of a Java API and an executable binary jarfile with graphical user interface (GUI) for the high-throughput analysis of phylogenomic datasets to detect convergent molecular evolution. Motivation. Although … Continue reading


See more in the blog...

Biography

A bit about me.

'Lonely' Joe Parker..?

I've recorded and toured as 'Lonely Joe Parker' since 2006, hence my username on Twitter - and various other social platforms - and this website's URL. For related enquiries please contact Sotones Records. You can hear music and whatnot over on the music bits of this site.


Background

I'm a big fan of bombay mix, Red Stripe and cycling. Not so crazy about early mornings.

I was born and raised in Southampton, one of the biggest commercial ports in the world, next to the New Forest National Park. That, and some great teachers, gave me an interest in evolution, ecology, and exploring the wide goddamn world.

I studied general biology at Imperial College, University of London (2001-2004; tutors including Andy Purvis, Mike Tristem, Tim Barraclough and Alfreid Vogler), gaining first-class honours. Subsequently I completed a D.Phil at Linacre College, University of Oxford, based in the Zoology department under Andrew Rambaut and Oliver G. Pybus (2004-2008). I developed novel phylogenetic strategies and bioinformatics pipelines for the analysis of viral pathogens' evolution, including hepatitis C virus (HCV) and human immunodeficiency virus (HIV).

From 2009-2011 I worked at the Weatherall Institute of Molecular Medicine (John Radcliffe Hospital: Univ. Oxford / Medical Research Council (UK)), developing phylogenetic and machine-learning methods for the detection of correlates between antigenicity and sequence evolution. This integrated clinical, structural, evolutionary and population genetic data leading to immunogen design and assessment in silico, trialled in vivo for the EU-funded NGIN vaccine consortium.

Between 2011-2015 I worked with Stephen Rossiter at Queen Mary, University of London on large-scale phylogenomics projects looking for signals of molecular natural selection (particularly adaptive convergent evolution) in mammals as the focal taxonomic group. I developed a rich Java API for phylogenomic analyses and authored publications including Nature (2013) and Current Biology (2013).


Present

I currently hold an Early-Career Research Fellowship in Phylogenomics at the Royal Botanic Gardens, Kew. This post allows me wide freedom pursue my research into real-time phylogenomics, field-based DNA sequencing for turbotaxonomy and metagenomics, and integrated alignment & phylogeny models of molecular evolution.

I also lead the Informatics workpackage for the Plant & Fungal Trees Of Life project, a Kew-led initiative to reconstruct a genus-level supertree for 80% of all plant and fungal genera by 2020.

I'm available to provide consultancy services to private partners on big-data projects in genomics, phylogenomics, bioinformatics/informatics and statistics through Kitson Consulting...

...and I'm always excited by new collaborations, particularly across other domains. So if you have an idea for a project or would like to explore a PhD or masters' degree at Kew, get in touch!


Quick CV

Appointments

Qualifications

Selected publications

*These authors contributed equally to this article.

Selected software

My full academic CV is also available.