Second Week

Major Goals

This week focuses on DNA sequences: how to work with them, where to obtain them if you prefer not to sequence organisms yourself, and how to infer phylogenetic trees.

Lectures and videos provide a step-by-step guide on handling sequence data and conducting phylogenetic analyses. Each day, tutorials will cover a different method, and you must complete each tutorial and its exercises to progress to the next day’s topic.

The week begins with assembling, checking, and exporting your raw sequences (PCR products sequenced last week) to generate high-quality consensus sequences. We start with your DNA sequences to help you become familiar with (1) Sanger DNA sequencing, (2) sequence evaluation (or What’s the difference between a bad and good DNA sequence?), (3) ambiguous (wobble) DNA positions and where do they come from, and (4) DNA sequences derived from public data repositories.

Note

At the end of the week, you will know…

  • Different kinds of sequence file types.

  • How to use public databases.

  • How to edit sequences.

  • How to check if sequencing results are correct.

  • What an multiple sequence alignment is.

  • What a model of sequence evolution is and why it is important for phylogenetic analysis.

  • What the difference of Cluster algorithms and Search algorithms is when constructing phylogenetic trees.

  • What ML and BI means.

Monday

Today we will start with recapitulating what you learned last week and discuss the method of Sanger sequencing. After that, you start with processing your sequencing results in Geneious Prime i.e., you will assemble, check and correct the raw reads that have been assigned to you (see sequence assignment list) and export the respective consensus sequences. Then, you can start reading the sections about Geneious Prime and Genbank (see Database and Search Strategies), which introduces you to the handling of sequence data and how to use Genbank, a public sequence data repository.

By doing the exercises in this tutorial, you will generate a toy dataset, which you will be using for the whole week. All following tutorials and exercises are based on this toy dataset.

The basic idea is, that all of you work with the same toy dataset, which makes it easier to compare results. However, it is also fine if you add some of your own sequences (those you checked and exported earlier today).

Tasks of the Day

  1. Read section Geneious Prime and check out the Geneious Prime User Manual.

  2. See the sequence assignment list.

  3. Check out, which raw reads have been assigned to you.

Tip

Just in case, you can read about Geneious Prime again under Sections.

Tuesday

Today, we focus on sequence alignments and their significance in analyzing genetic data. In this tutorial, you will perform sequence alignments using your toy datasets with Geneious Prime.

Remember, sequence files—whether aligned or not—can be saved in various file formats, and the required input format may vary depending on the software you use. If the format is incorrect, the software will not function as expected. Understanding the correct input file format is essential to overcoming initial challenges when working with phylogenetic software.

Note

At the end of the day, you know…

  • How an alignment is generated by the Needleman-Wunsch algorithm.

  • How computer algorithms (basically) perform.

  • The meaning of penalty values and their effects on alignments.

  • How to find criteria that will help you to decide if an alignment is good or not.

  • The difference between sequence file formats, and the difference between multifasta and alignment files and how to recognize them.

Important

The different properties of coding and non-coding sequences will not be explained explicitly and we assume that you already know what reading frames are. However, if you are lost, do not hesitate to ask one of the tutors or me.

Tasks of the Day

  1. Read section Alignment.

Note

Wednesday

Today, we have three learning modules:

  1. Models of Sequence Evolution.

  2. How to Infer Phylogenetic Trees.

  3. How To Draw Phylogenetic Trees.

Note

By the end of the day, you will:

  • Understand how phylogenetics accounts for evolutionary changes in DNA sequences, including past changes that are not immediately visible.

  • Grasp the concept of clustering algorithms, their limitations, and their advantages over search algorithms.

  • Have constructed four phylogenetic trees using your toy dataset.

  • Experience the process of a clustering algorithm by manually calculating and drawing a UPGMA tree.

  • Have practiced drawing phylogenetic trees by hand.

Tasks of the Day

  1. Download and install jmodeltest2 on your PC (you may use this download link directy jmodeltest-2.1.10).

  2. Read section Models of Sequence Evolution.

Thursday

Today, it’s all about search algorithms. You will learn the basics of the two most common methods for calculating phylogenetic trees – Maximum Likelihood in the morning and Bayesian Inference in the afternoon.

Both methods are widely used, because they are more thorough than clustering algorithms (such as UPGMA or NJ) and they approach the mathematical part of inferring phylogenetic trees from different angles. You will hear more about this in the Lectures that are accompanied with the two sections.

Both programs can be installed as plugins in Geneious Prime. See Tutorial 1 and Tutorial 2 for doing so.

Note

Both programs can also be controlled via the command line – you may use this approach during the third week to improve the computing performance of both RAxML (download here) and MrBayes (download here).

While working through the exercises, many topics you have been dealing with earlier this week will come up again, such as input file format or Models of Sequence Evolution.

Note

At the end of the day you will…

  • Know the difference between cluster and search algorithms.

  • Know why search algorithms take so much longer for analysing genetic data than cluster algorithms.

  • Know that ML uses likelihoods, and MrBayes uses posterior probabilities.

  • Know what MCMC is and for which type of analysis it is used.

  • Be able to interpret the different statistics MrBayes provides.

  • Understand the meaning of prior and posterior probabilities.

  • Understand the difference between bootstrap support and posterior probabilites and why they are not directly comparable.

Tasks of the Day

  1. Read section RAxML. Don’t be confused—this section primarily focuses on the command-line version of RAxML. However, all the settings explained here are also available in the Geneious Prime plugin.

  2. Install the RAxML plugin in Geneious Prime Tools -> Plugins -> Available Plugins.

Friday

Now you know all the essential steps and methods how to calculate a phylogenetic tree from sequence data. You may have realized that you had to use different file formats for different programs and different programs for different analyses.

You should also know that you can work with sequence data and make phylogenetic trees in R. One big advantage of using R is, that you can do all analyses in one software, without reformatting the input files.

The other big advantage of R is, that you can do awesome downstream analyses with your phylogenetic tree, like analysing trait evolution when you have trait data for your taxa, or analyse community data. But this is another story.

This day is dedicated to introduce you into the basic commands in R that enable you to calculate a phylogenetic tree. Of course: R walks along the analytical path from sequence to tree in its very own way. However, this may even help you to better remember or even understand the single steps that are involved in building a phylogenetic tree from scratch.

Depending on your present day R skills, you may only skim through some of the sections. You will see which are relevant for you to read.

Note

At the end of the day, you will…

Be more versatile and confident when working with genetic data in R.

Tasks of the Day

  1. Read section Ape package.

  2. Read section Getting Started with R.

  3. Install R and RStudio.

  4. Download the R script and the example files here.