这是一个美国的C++作业代写,主要与蛋白质序列比较相关
Comparing Protein Sequences
Proteins are typically described by the sequence of amino acids they are composed of. One of the
goals of bioinformatics is to develop software tools for the analysis of protein sequences. An
important task is to compare sequences of similar proteins to identify mutations and help
establish the relationships between different organisms as they evolve.
An important step in the comparison of two protein sequences is the identification of a Maximal
Unique Match (MUM), defined as a subsequence satisfying the following conditions: a) it is
found in both proteins. b) it must be unique in both proteins. c) it must be maximal in the sense
that it is not a subsequence of a larger sequence satisfying a) and b). Finding a MUM helps in the
process of sequence alignment. To avoid generating many small MUM sequences, it is
conventional to search for MUM sequences having at least a given size, typically 20.
Another useful step is to identify the first mutation (or mismatch between the two sequences)
that follows a given position in the sequence.
In this assignment, you will use two programs mum.cpp and nextMutation.cpp
(provided). The first compares two proteins sequences and searches for a MUM sequence of size
at least 20. The second compares two protein sequences and searches for the first mutation
following a given position. Both programs use a class Sequence that represents a protein’s
amino acid sequence, and use functions that perform the search for a MUM and for a mutation.
You will implement the Sequence class so that the programs mum.cpp and
nextMutation.cpp reproduce the example output files provided. The files Makefile,
Sequence.h, mum.cpp and nextMutation.cpp are provided and must not be modified.
You must implement the file Sequence.cpp.
Representation of protein sequence data
Protein sequences are provided in the form of text files obtained from the National Center for
Biotechnology Information (NCBI). The files conform to the FASTA format in which each
amino acid is represented by a capital letter in the range [A-Z]. The first line of a FASTA file
contains information identifying the sequence, starting with the character ‘>’ as for example
>QWE88920.1 surface glycoprotein [Severe acute respiratory syndrome coronavirus 2]
for the spike protein appearing on the surface of a certain variant of the SARS-CoV-2 virus. The
first line is followed by the amino acid sequence itself, as for example
MFVFLVLLPLVSSQCVNLTTRTQLPPAYTNSFTRGVYYPDKVFRSSVLHSTQDLFLPFFSNVTWFHAISG
TNGTKRFDNPVLPFNDGVYFASTEKSNIIRGWIFGTTLDSKTQSLLIVNNATNVVIKVCEFQFCNDPFLG
VYHKNNKSWMESEFRVYSSANNCTFEYVSQPFLMDLEGKQGNFKNLREFVFKNIDGYFKIYSKHTPINLV
RDLPQGFSALEPLVDLPIGINITRFQTLLALHRSYLTPGDSSSGWTAGAAAYYVGYLQPRTFLLKYNENG
TITDAVDCALDPLSETKCTLKSFTVEKGIYQTSNFRVQPTESIVRFPNITNLCPFGEVFNATRFASVYAW
NRKRISNCVADYSVLYNSASFSTFKCYGVSPTKLNDLCFTNVYADSFVIRGDEVRQIAPGQTGKIADYNY

EasyDue™ 支持PayPal, AliPay, WechatPay, Taobao等各种付款方式!
E-mail: easydue@outlook.com 微信:easydue
EasyDue™是一个服务全球中国留学生的专业代写公司
专注提供稳定可靠的北美、澳洲、英国代写服务
专注提供CS、统计、金融、经济、数学等覆盖100+专业的作业代写服务