After MITOS/MITOS2 You'll want to read in the genome that mitos annotated into Artemis. Do not use the output files from MITOS as they probably will be edited. You have to use the original sequence file. However you'll want to use the annotation from mitos to put the genes into the right places. I usually take a screenshot of the mitos annotation and have it open while I add features to the Artemis file. Starting in Artemis, you'll want to right click at the bottom area (should be blank) and go to create->new feature. A box will pop up and you'll want to use Key box to select what you are creating. tRNA, rRNA, or gene. Then use the annotation from mitos for the location. make sure to keep the two periods between the numbers (1..56) will select base range 1-56. If it's on the (-) strand you will hit compliment. It will be placed on the bottom (complementary) strand. Then in the box, you will want to add some notes: I like to add the note with the species and the gene name, then the label is also the gene name and the color will be whatever you want. I like to color code the genes to be different colors (4=dark blue= protein coding gene,2=red=tRNA, and 3=light green=rRNA. I also use 1 for the OH. Example: /note="A.halanyhi; -; NAD2" /label=nad2; /colour=4 For each Protein coding gene, you can also create a CDS feature. This will tell you if there are internal stop codons in the amino acid sequences. This will also be useful when you are going to export all of the protein coding genes to compare/put in files to run a tree. The only difference between the CDS and gene files is that in the key you will hit CDS instead of gene, when creating it! It will be below your gene feature and should have the same locations and everything. I like to put all the PCG,tRNA, rRNA, Origin of replication, and introns over 60bp long, into Artemis first, and then I like to go back and edit the start and stop codons of the PCG's. Sometimes MITOS misses the ATP8 gene. its only 156bp long. Most of the time it's right after the trnD gene and will be about 3-4bp into the atp6 gene. This is the ONLY time the genes will overlap. NCBI is aware of this and all of mine overlapped a little. Iverts are weird and awesome but because they are who they are, they have multiple start and stop codons for their mitochondrial PCGs. Sometimes MITOS gets it right, sometimes they don't. They can start with: ATG ATC ATT TTG ATA GTG It's like a fun puzzle. Theres some stop codons as well: TAA TAG TGA Most of the time, the genes will be very close to what MITOS say's and its just a few base pair differences. I usually try what MITOS says first, and then go from there by looking at NCBI. NOTE:the complimentary strands are backwards (Artemis knows this), but you have to read them backwards as so right to left instead of left to right. Also sometimes,The stop codon TAA will not be there and instead it will just include the T and you'll have to specify that the TAA stop codon is completed by residuals in the .txt file you upload to NCBI. cox3 in A pycnogondi is like this and this will be how you define it in the txt file for NCBI 5230 6011 CDS product cytochrome c oxidase subunit 3 product cox3 transl_except (pos:6011,aa:TERM) transl_table 5 note TAA stop codon is completed by the addition of 3' A residues to the mRNA\ I have only found this out through trial and error and most of the time, it's after I try to submit the file to NCBI, they will tell me there's an issue with the start and stop codons. Like is said, fun little puzzle. Its easier after going through it a few times. However,Once you do this, you will want to fill out a .tbl file for NCBI (I attached a Ammothea clausi.tbl file) If the first gene is the trn I and it starts at about 300bp that means the Control region is actually going from the "end" of the fasta file to the bp 299. It's circular, so this makes sense, juut hard to explain, but the tbl file does a good job of how you would put it. Either way, you will want to creat the tbl file. the complimentary (reverse) genes will "start" at what would be the end but its the start becuase it goes in the opposite direction. See trnF (tRNA-PHE) in the .tble file and the NAD5 gene as well. its that way for all of them that are on the reverse strand. Then you will want to go to NCBI and go to submit and then type in mitochondrial in the search bar and Bankit will pop up. Hit that and scroll to submit. You have to log in through a third party, once you do that you will be taken to a page that says "Welcome BankIt User!" (personal, I know) Then you will want to click start BankIt submission and it will direct you to a contact information page. Then work through the rest of the pages: GenBank Submissions Contact Reference Sequencing Technology Nucleotide Submission Category Source Modifiers Features Review and Correct Ask me all the questions if you need to! For the reference authors, add You as first, and then the rest of your authors. They ask for a publication status, it will be unpublished and then the title. Take your best shot. It's okay for the title to change later on, once the accession numbers get published it will publish the genome. I also attached my submission text for you to see what I wrote. Work through these and then for the features tab, you will want to uplaod your features .tbl file. Then review and correct will highlight if any of the genes are wrong and need to be edited. most of the time it will detect internal stop codons in the PCGs. I edit the genes and view them in artemis in this step. Ask me if you have questions and we can actually go through this together on webex if needed. Once the genes are good to go, you will submit and it takes a few days for NCBI to accept them. Sometimes you will have to resubmit. But your protein coding genes will be good if they don't have any internal stops in them. While you wait for NCBI you can use the PCG data to create a tree or run pairwise distances. To retrieve the PCG data, up to the top of the Artemis->Select->All CDS Feautures Then you can hit View->Bases of Selection as FASTA This will put all of the PCG information as a fasta file for you. you can copy and paste that into bbedit. For tree's you will want to create separate files for each PCG (you can reference one of the tree folders i have on the labtop) if you do this, make sure all the names are exactly the same for each of the species in the tree. if you spell one name different for one gene, that will create a new branch. its annoying but it its what it is. IF your looking for pairwise distances, you will want to create a .fasta file of whatever you are looking at (COI of all the ammothea's) then you will want to open MEGA. MEGA11 is way better than the one on the labtop. for this go to Align->Edit/Build Alignment->Retrieve Sequences from file (the fasta you created) double click it and it will open. Then you will want to go to the top and hit Edit->Select All and then Alignment->Align by CustalW or Muscle (Ive done both, they are very similar in output) Sometimes you wont see the "Okay" button but if you click the bottom right corner of the box, it will pop up. Then after it aligns you will go to the top and click Data->export alignment->MEGA format and save it as a .meg file. then you can close out out of that screen and go back into MEGA Then you can click Distances-> comute pairwise distances->and select the .meg file. IT will take you the analysis preferences. I use the p-distance model the other ones also work and are also very similar in output (in my experience). hit okay and it will run one for you. That will give you the p-distances between the genes of interest. For phylogenetic trees, see my notes for that ;) I will send you the .fasta files for the PCG for this and you can just add the genes for your 14 guys in there. Hope this helps! Let me know if you have any questions! GOOD LUCK AND HAVE FUN