Montag, 26. März 2018

Blast Web Interface and Python script for automatic update


BLAST ("Basic Local Alignment Search Tool") finds regions of similarity between biological sequences. The task was to do this with local oligos and to establish a web interface.

General Webinterface you find here
https://blast.ncbi.nlm.nih.gov/Bblast viroblastlast.cgi

A stand alone program for BLAST called blast+
https://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=Download

I found the nice Web interface Viroblast from Deng et al. 2007
https://indra.mullins.microbiol.washington.edu/viroblast/docs/aboutviroblast.html
You can download the software from
James Mullins
Department of Microbiology
University of Washington
https://els.comotion.uw.edu/express_license_technologies/viroblast

The most important thing is to fix the path to your blast+ programs in the vironblast.ini file.

 blast+: /usr/bin/
for example.

 Create a new database for your Viroblast

To create a new database out of the fasta file you had to install
blast+ from NCBI
https://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=Download

You need the program makeblastdb.

1. Type in in the terminal:
makeblastdb -in   -dbtype 'nucl' -out

makeblastdb - program from blast+ NCBI
in - input
out -output
dbtype - 'nucl' for nucleotides, 'prot' for proteins

2. Copy the new database files to
viroblast/db ,protein or nucleitides sub folder

3. That viroblast know something about the new database, you had to configure the viroblast.ini file.

You can use several databases.
Here write for a nucleotide database (in one line!)

blastn: database_name1 => name_to_display1,database_name1 => name_to_display1,and so on


Python script to automate the tasks

Here is a Python script to automate the task.
The script starts with cron job daily.


import pandas as pd
import re
import os 

import time

path_excel=''
path_viroblast_db='
'
path_viroblast_ini='
'

file = path_excel + 'AG_XX.xlsx' 



# Read the Excel file
xl = pd.ExcelFile(file)


# Read the sheet names in the excel file
sheets_excel=xl.sheet_names

# Load a sheet into a DataFrame by name: df1
df1 = xl.parse(sheets_excel[0])

# slice columns 1- 3
df1=df1.iloc[:,0:3]
# rename column title
df1.columns=['primer_index','primer_name','sequence']

# create a fasta format string for the fasta file
primer_number=''
fasta=''

#find empty cells in  Excel, NaN, Not a number
def isNaN(num):
    return num != num
# fasta format
for index,row in df1.iterrows():
    if not isNaN(row['sequence']):
        # delete all white spaces in sequence
        sequence = ''.join(row['sequence'].split())
        primer_number=row['primer_index']

        fasta=fasta+'>'+str(row['primer_index'])+"|"+str(row['primer_name'])+'\n'+str(sequence)+'\n'


#write to fasta file
with open(path_viroblast_db+ primer_number+'.fa','w') as my_file:
    my_file.write(fasta)



#makeblastdb execute with fasta file and output name
os.system('makeblastdb'+' -in '+ path_viroblast_db+primer_number +'.fa'+" -dbtype 'nucl' -out "+path_viroblast_db+primer_number)

#viroblast.ini update
ini_txt=''
with open(path_viroblast_ini+'viroblast.ini','r') as ini_file:
    for line in ini_file:
        m=re.search(r"XXX_\d{5}",line)
        if m:
            old=m.group()
            #print(m)
            line=line.replace(old,primer_number)
            ini_txt=ini_txt + line
        else:
            ini_txt=ini_txt + line

# write the new viroblast.ini    
with open(path_viroblast_ini+'viroblast.ini','w') as ini_file_new:
    ini_file_new.write(ini_txt) 


time_now=time.asctime()
# write log-file
with open(path_viroblast_ini+'lui.log','a') as log_file:
    log_file.write(primer_number+';'+time_now+'\n')