Blast Web Interface and Python script for automatic update
BLAST ("Basic Local Alignment Search Tool") finds regions of similarity between biological sequences. The task was to do this with local oligos and to establish a web interface.
General Webinterface you find here
https://blast.ncbi.nlm.nih.gov/Bblast viroblastlast.cgi
A stand alone program for BLAST called blast+
https://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=Download
I found the nice Web interface Viroblast from Deng et al. 2007
https://indra.mullins.microbiol.washington.edu/viroblast/docs/aboutviroblast.html
You can download the software from
James Mullins
Department of Microbiology
University of Washington
https://els.comotion.uw.edu/express_license_technologies/viroblast
The most important thing is to fix the path to your blast+ programs in the vironblast.ini file.
blast+: /usr/bin/
for example.
Create a new database for your Viroblast
To create a new database out of the fasta file you had to installblast+ from NCBI
https://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=Download
You need the program makeblastdb.
1. Type in in the terminal:
makeblastdb -in
makeblastdb - program from blast+ NCBI
in - input
out -output
dbtype - 'nucl' for nucleotides, 'prot' for proteins
2. Copy the new database files to
viroblast/db ,protein or nucleitides sub folder
3. That viroblast know something about the new database, you had to configure the viroblast.ini file.
You can use several databases.
Here write for a nucleotide database (in one line!)
blastn: database_name1 => name_to_display1,database_name1 => name_to_display1,and so on
Python script to automate the tasks
Here is a Python script to automate the task.The script starts with cron job daily.
import pandas as pd
import re
import os
import time
path_excel='
path_viroblast_db='
path_viroblast_ini='
file = path_excel + 'AG_XX.xlsx'
# Read the Excel file
xl = pd.ExcelFile(file)
# Read the sheet names in the excel file
sheets_excel=xl.sheet_names
# Load a sheet into a DataFrame by name: df1
df1 = xl.parse(sheets_excel[0])
# slice columns 1- 3
df1=df1.iloc[:,0:3]
# rename column title
df1.columns=['primer_index','primer_name','sequence']
# create a fasta format string for the fasta file
primer_number=''
fasta=''
#find empty cells in Excel, NaN, Not a number
def isNaN(num):
return num != num
# fasta format
for index,row in df1.iterrows():
if not isNaN(row['sequence']):
# delete all white spaces in sequence
sequence = ''.join(row['sequence'].split())
primer_number=row['primer_index']
fasta=fasta+'>'+str(row['primer_index'])+"|"+str(row['primer_name'])+'\n'+str(sequence)+'\n'
#write to fasta file
with open(path_viroblast_db+ primer_number+'.fa','w') as my_file:
my_file.write(fasta)
#makeblastdb execute with fasta file and output name
os.system('makeblastdb'+' -in '+ path_viroblast_db+primer_number +'.fa'+" -dbtype 'nucl' -out "+path_viroblast_db+primer_number)
#viroblast.ini update
ini_txt=''
with open(path_viroblast_ini+'viroblast.ini','r') as ini_file:
for line in ini_file:
m=re.search(r"XXX_\d{5}",line)
if m:
old=m.group()
#print(m)
line=line.replace(old,primer_number)
ini_txt=ini_txt + line
else:
ini_txt=ini_txt + line
# write the new viroblast.ini
with open(path_viroblast_ini+'viroblast.ini','w') as ini_file_new:
ini_file_new.write(ini_txt)
time_now=time.asctime()
# write log-file
with open(path_viroblast_ini+'lui.log','a') as log_file:
log_file.write(primer_number+';'+time_now+'\n')