Create a script that does the following:
- Get the full sequence of human GAPDH transcript 'NM_002046.7' from NCBI.
- Find the two most common 5-mers in this sequence.
- For each of those kmers, find all instances of the kmer in the original sequence, and extract subsequences with 10 leading and 10 trailing bases.
- Save these subsequences to a fasta file.
Write a script that does the same as above, but
- takes the accession number as an argument
- optionally writes the fasta file to a user-supplied destination.
- Performs a pairwise alignment between the first two subsequences.
E.g. the user should run: python script.py --output subsequences.fa NM_anumber