RSAT - retrieve-seq-bed manual manual

NAME

retrieve-seq-bed

VERSION

$program_version

DESCRIPTION

Retrieve sequences for a set of genomic coordinates provided in bed, gff or vcf format.

This script is a wrapper around bedtools getfasta, an efficient tool to retrieve sequences from a FASTA-formatted sequence file (e.g. all genome sequences) and a file of coordinates defined on the sequences of the FASTA file. Note that in BED coordinates are zero-based

The wrapper generates the bedtools getfasta command in order to retrieve genomic coordinates from one of the locally supported genomes.

AUTHORS

Bruno Contreras Moreira <bcontreras\@eead.csic.es> Jacques.van-Helden\@univ-amu.fr

CATEGORY

genome

USAGE

retrieve-seq-bed -org organism_name -i inputfile [-o outputfile] [-v #] [...]

INPUT FORMAT

The genomic coordinate file will be used as input by bedtools getfasta, and must be compliant with the supported formats: BED/GFF/VCF.

OUTPUT FORMAT

A sequence file in fasta format (produced by bedtools getfasta.

SEE ALSO

WISH LIST

-server http://some.rsat.server/rsat/

Send the request to a remote RSAT server via the Web services. This option enables to get fasta sequences from any RSAT server without having to install them locally.

-extend length =item -extend_up up_length =item -extend_down down_length

Extend the peaks by a given length on the upstream (-exetend_up), downstream (-extend_down) or both sides (-extend). The side is adapted according to the strand.

Flank extension is done via bedtools flank.

The extended coordinates are exported with the same name as the output file, supplemented with the suffix _flanks.bed.

OPTIONS

-v #

Level of verbosity (detail in the warning messages during execution)

-h

Display full help message

-help

Same as -h

-i coordinate_file

Genomic coordinates, in one of the formats supported by bedtools getfasta: BED, GFF, VCF.

-o outputfile

Output file (in fasta format), where the sequences will be saved. This argument is mandatory, since it is required by bedtools getfasta.

-org organism_name

Organism name, which must correspond to one organism supported on the local RSAT instance.

-rm

Use repeat-masked version of the genome.