多种方法用于转换gff和gtf文件
第一种:gffread
gffread my.gff3 -T -o my.gtf gffread my.gtf -o- >my.gff3
第二种: NBISweden/AGAT(安装perl模块有点繁琐)
conda install -c bioconda agat conda install -c bioconda perl-sort-naturally cpanm Bio::Tools::GFF # 需要sudo权限 agat_convert_sp_gff2gtf.pl -gff augustus_out.gff3 -o augustus_out.gtf
第三种:R包 rtracklayer (有待验证)
BiocManager::install(“rtracklayer”) library(rtracklayer) gff_file<- import(“/workcenters/workcenter3/chenhy/BY2_singlecell/Tobacco_genome_K326_2014/Ntab-K326_AWOJ-SS_K326_rnaseq.gff3”) export(gff_file,”/workcenters/workcenter3/chenhy/BY2_singlecell/Tobacco_genome_K326_2014/Ntab-K326_AWOJ-SS_K326_rnaseq_kb.gtf”,”gtf”)
第四种:Python脚本
import sys
inFile = open(sys.argv[1], 'r')
ID_gene = ''
for line in inFile:
# skip comment lines that start with the '#' character
if line[0] != '#':
# split line into columns by tab
data = line.strip().split('\t')
ID_gene = ID_gene
ID_mRNA = ''
# if the feature is a gene
if data[2] == "gene":
# get the id
ID_gene = data[-1].split('ID=')[-1].split(';')[0]
data[-1] = 'gene_id "' + ID_gene + '"; '
elif data[2] == "mRNA":
# get two id
ID_mRNA = data[-1].split('ID=')[-1].split(';')[0]
ID_gene = data[-1].split('Parent=')[-1].split(';')[0]
data[-1] = 'gene_id "' + ID_gene + '"; transcript_id "' + ID_mRNA + '"; '
# if the feature is anything else
else:
# get the parent as the ID
ID_mRNA = data[-1].split('Parent=')[-1].split(';')[0]
data[-1] = 'gene_id "' + ID_gene + '"; transcript_id "' + ID_mRNA + '"; '
# print out this new GTF line
print('\t'.join(data))