RSAT - matrix-distrib manual



NAME

matrix-distrib


DESCRIPTION

Computes the theoretical distribtuion of score probabilities of a given PSSM. It is not limited to a Bernoulli assumption and takes into account background models of any Markov order.


AUTHORS

Jacques van Helden Jacques.van-Helden\@univ-amu.fr
Jean Valery Turatsinze jturatsi@bigre.ulb.ac.be
Morgane Thomas-Chollier morgane@bigre.ulb.ac.be
With the help of Matthieu de France defrance@bigre.ulb.ac.be


CATEGORY

util
PSSM


USAGE

matrix-distrib [-i matrixfile] [-bgfile bgfile][-o outputfile] [-v]


INPUT FORMATS

Matrix file

The matrix format is specified with the option -matrix_format. Supported : tab,cb,consensus,gibbs,meme,assembly. Default : tab.

For a description of these format, see convert-matrix -h

Background model file

The background model format is specified with the option -bg_format.Supported : oligo-analysis, MotifSampler, meme. Default is: oligo-analysis.

For a description of available format, see convert-backgound-model -h


OUTPUT FORMAT

The output is a tab-delimited file with the following columns:

  1. weight: log-likelihood score: w=P(S|M)/P(S|B)
  2. proba: probability density function: P(W=w)
  3. cum: cumulative density function: P(W <= w)
  4. Pval: P-value = inverse cumulative density function: Pval = P(W >= w)
  5. ln_Pval: natural logarithm of the P-value
  6. log_P: base 10 logarithm of the P-value
  7. sig: significance: sig = -log10(Pval)


THEORICAL DISTRIBUTION COMPUTATION

The scoring scheme is the weight (see matrix-scan -h for more details). We calculate in an exaustive way the probabilities that are associated to each score (weight) that can be obtained from a given PSSM.

For Bernoulli (Markov order 0) background models, the distribution of scores is computed with the algorithm described by Bailey (Bioinformatics, 1999).

For Markov background models with higher orders, we have extended this algorithm to take into account the dependencies between residues. For each iteration of the algorithm, weigths associated to all possible transitions are tagged with a prefix. Each residue weight is calculated according to the prefix tag. The prefix corresponds to a word of Markov order size that preceeds the position of the iteration.


OPTIONS

-v #
Level of verbosity (detail in the warning messages during execution)

-h
Display full help message

-help
Same as -h

-m matrixfile
Matrix file.

This argument can be used iteratively to scan the sequence with multiple matrices.

-mlist matrix_list
Matrix list.

Indicate a file containing a list of matrices to be used for scanning the region. This facilitates the scanning of a sequence with a library of matrices (e.g. all the matrices from RegulonDB, or TRANSFAC).

Format: the matrix list file is a text file. The first word of each row is suppose to indicate a file name. Any further information on the same row is ignored.

-o outputfile
If no output file is specified, the standard output is used. This allows to use the command within a pipe.

-matrix_format matrix_format
Matrix format. Default is tab.

-pseudo #
Pseudo-count for the matrix (default: 1). See matrix-scan for details.

-bgfile background_file
 Background model file.
-bg_format matrix_format
        Supported formats: all the input formats supported by
        convert-background-model.
-bg_pseudo #
Pseudo frequency for the background models. Value must be a real between 0 and 1 (default: 0) If the training sequence length (L) is known, the value can be set to square-root of L divided by L+squareroot of L

-decimals #
Number of decimals to print or the transition probabilities.