RSAT - matrix-distrib manual

NAME
DESCRIPTION
AUTHORS
CATEGORY
USAGE
INPUT FORMATS

Matrix file
Background model file

OUTPUT FORMAT
THEORICAL DISTRIBUTION COMPUTATION
OPTIONS

NAME

matrix-distrib

DESCRIPTION

Computes the theoretical distribtuion of score probabilities of a given PSSM. It is not limited to a Bernoulli assumption and takes into account background models of any Markov order.

AUTHORS

Jacques van Helden Jacques.van-Helden\@univ-amu.fr
Jean Valery Turatsinze jturatsi@bigre.ulb.ac.be
Morgane Thomas-Chollier morgane@bigre.ulb.ac.be
With the help of Matthieu de France defrance@bigre.ulb.ac.be

USAGE

matrix-distrib [-i matrixfile] [-bgfile bgfile][-o outputfile] [-v]

INPUT FORMATS

Matrix file

The matrix format is specified with the option -matrix_format. Supported : tab,cb,consensus,gibbs,meme,assembly. Default : tab.

For a description of these format, see convert-matrix -h

Background model file

The background model format is specified with the option -bg_format.Supported : oligo-analysis, MotifSampler, meme. Default is: oligo-analysis.

For a description of available format, see convert-backgound-model -h

OUTPUT FORMAT

The output is a tab-delimited file with the following columns:

weight: log-likelihood score: w=P(S|M)/P(S|B)
proba: probability density function: P(W=w)
cum: cumulative density function: P(W <= w)
Pval: P-value = inverse cumulative density function: Pval = P(W >= w)
ln_Pval: natural logarithm of the P-value
log_P: base 10 logarithm of the P-value
sig: significance: sig = -log10(Pval)

THEORICAL DISTRIBUTION COMPUTATION

The scoring scheme is the weight (see matrix-scan -h for more details). We calculate in an exaustive way the probabilities that are associated to each score (weight) that can be obtained from a given PSSM.

For Bernoulli (Markov order 0) background models, the distribution of scores is computed with the algorithm described by Bailey (Bioinformatics, 1999).

For Markov background models with higher orders, we have extended this algorithm to take into account the dependencies between residues. For each iteration of the algorithm, weigths associated to all possible transitions are tagged with a prefix. Each residue weight is calculated according to the prefix tag. The prefix corresponds to a word of Markov order size that preceeds the position of the iteration.

OPTIONS

-v #

Level of verbosity (detail in the warning messages during execution)

-h

Display full help message

-help

Same as -h

-m matrixfile

Matrix file.

This argument can be used iteratively to scan the sequence with multiple matrices.

-mlist matrix_list

Matrix list.

Indicate a file containing a list of matrices to be used for scanning the region. This facilitates the scanning of a sequence with a library of matrices (e.g. all the matrices from RegulonDB, or TRANSFAC).

Format: the matrix list file is a text file. The first word of each row is suppose to indicate a file name. Any further information on the same row is ignored.

-o outputfile

If no output file is specified, the standard output is used. This allows to use the command within a pipe.

-matrix_format matrix_format

Matrix format. Default is tab.

-pseudo #

Pseudo-count for the matrix (default: 1). See matrix-scan for details.

-bgfile background_file

 Background model file.

-bg_format matrix_format

        Supported formats: all the input formats supported by
        convert-background-model.

-bg_pseudo #

Pseudo frequency for the background models. Value must be a real between 0 and 1 (default: 0) If the training sequence length (L) is known, the value can be set to square-root of L divided by L+squareroot of L

-decimals #

Number of decimals to print or the transition probabilities.