Home  · Classes  · Annotated Classes  · Modules  · Members  · Namespaces  · Related Pages
IDFilter

Filters protein identification engine results by different criteria.

potential predecessor tools $ \longrightarrow $ IDFilter $ \longrightarrow $ potential successor tools
MascotAdapter (or other ID engines) PeptideIndexer
IDFileConverter ProteinInference
FalseDiscoveryRate IDMapper
ConsensusID

This tool is used to filter the identifications found by a peptide/protein identification tool like Mascot. Different filters can be applied:

To enable any of the filters, just change their default value. All active filters will be applied in order.

Note
Currently mzIdentML (mzid) is not directly supported as an input/output format of this tool. Convert mzid files to/from idXML using IDFileConverter if necessary.

The command line parameters of this tool are:

IDFilter -- Filters results from protein or peptide identification engines based on different criteria.
Version: 2.0.0 Mar 30 2016, 12:52:33, Revision: GIT-NOTFOUND

Usage:
  IDFilter <options>

Options (mandatory options marked with '*'):
  -in <file>*                         Input file  (valid formats: 'idXML')
  -out <file>*                        Output file  (valid formats: 'idXML')

Filtering by precursor RT or m/z:
  -precursor:rt [min]:[max]           Retention time range to extract. (default: ':')
  -precursor:mz [min]:[max]           Mass-to-charge range to extract. (default: ':')
  -precursor:allow_missing            When filtering by precursor RT or m/z, keep peptide IDs with missing 
                                      precursor information ('RT'/'MZ' meta values)?

Filtering by peptide/protein score. To enable any of the filters below, just change their default value. All 
active filters will be applied in order.:
  -score:pep <score>                  The score which should be reached by a peptide hit to be kept. The scor
                                      e is dependent on the most recent(!) preprocessing - it could be Mascot
                                      scores (if a MascotAdapter was applied before), or an FDR (if FalseDis
                                      coveryRate was applied before), etc. (default: '0')
  -score:prot <score>                 The score which should be reached by a protein hit to be kept. Use in 
                                      combination with 'delete_unreferenced_peptide_hits' to remove affected
                                      peptides. (default: '0')

Filtering by significance threshold:
  -thresh:pep <fraction>              Keep a peptide hit only if its score is above this fraction of the pept
                                      ide significance threshold. (default: '0')
  -thresh:prot <fraction>             Keep a protein hit only if its score is above this fraction of the prot
                                      ein significance threshold. Use in combination with 'delete_unreference
                                      d_peptide_hits' to remove affected peptides. (default: '0')

Filtering by whitelisting (only instances also present in a whitelist file can pass):
  -whitelist:proteins <file>          Filename of a FASTA file containing protein sequences.
                                      All peptides that are not a substring of a sequence in this file are r
                                      emoved
                                      All proteins whose accession is not present in this file are removed.
                                      (valid formats: 'fasta')
  -whitelist:by_seq_only              Match peptides with FASTA file by sequence instead of accession and 
                                      disable protein filtering.

Filtering by blacklisting (only instances not present in a blacklist file can pass):
  -blacklist:peptides <file>          Peptides having the same sequence and modification assignment as any 
                                      peptide in this file will be filtered out. Use with blacklist:ignore_mo
                                      dification flag to only compare by sequence.
                                      (valid formats: 'idXML')
  -blacklist:ignore_modifications     Compare blacklisted peptides by sequence only.
                                      

Filtering by RT predicted by 'RTPredict':
  -rt:p_value <float>                 Retention time filtering by the p-value predicted by RTPredict. (defaul
                                      t: '0' min: '0' max: '1')
  -rt:p_value_1st_dim <float>         Retention time filtering by the p-value predicted by RTPredict for firs
                                      t dimension. (default: '0' min: '0' max: '1')

Filtering by mz:
  -mz:error <float>                   Filtering by deviation to theoretical mass (disabled for negative value
                                      s). (default: '-1')
  -mz:unit <String>                   Absolute or relative error. (default: 'ppm' valid: 'Da', 'ppm')

Filtering best hits per spectrum (for peptides) or from proteins:
  -best:n_peptide_hits <integer>      Keep only the 'n' highest scoring peptide hits per spectrum (for n>0). 
                                      (default: '0' min: '0')
  -best:n_protein_hits <integer>      Keep only the 'n' highest scoring protein hits (for n>0). (default: 
                                      '0' min: '0')
  -best:strict                        Keep only the highest scoring peptide hit.
                                      Similar to n_peptide_hits=1, but if there are two or more highest scor
                                      ing hits, none are kept.

  -min_length <integer>               Keep only peptide hits with a length greater or equal this value. Value
                                      0 will have no filter effect. (default: '0' min: '0')
  -max_length <integer>               Keep only peptide hits with a length less or equal this value. Value 0 
                                      will have no filter effect. Value is overridden by min_length, i.e. if
                                      max_length < min_length, max_length will be ignored. (default: '0' min:
                                      '0')
  -min_charge <integer>               Keep only peptide hits for tandem spectra with charge greater or equal 
                                      this value. (default: '1' min: '1')
  -var_mods                           Keep only peptide hits with variable modifications (fixed modifications
                                      from SearchParameters will be ignored).
  -unique                             If a peptide hit occurs more than once per PSM, only one instance is 
                                      kept.
  -unique_per_protein                 Only peptides matching exactly one protein are kept. Remember that isof
                                      orms count as different proteins!
  -keep_unreferenced_protein_hits     Proteins not referenced by a peptide are retained in the ids.
  -remove_decoys                      Remove proteins according to the information in the user parameters. 
                                      Usually used in combination with 'delete_unreferenced_peptide_hits'.
  -delete_unreferenced_peptide_hits   Peptides not referenced by any protein are deleted in the ids. Usually 
                                      used in combination with 'score:prot' or 'thresh:prot'.
                                      
Common TOPP options:
  -ini <file>                         Use the given TOPP INI file
  -threads <n>                        Sets the number of threads allowed to be used by the TOPP tool (default
                                      : '1')
  -write_ini <file>                   Writes the default configuration file
  --help                              Shows options
  --helphelp                          Shows all options (including advanced)

INI file documentation of this tool:

Legend:
required parameter
advanced parameter
+IDFilterFilters results from protein or peptide identification engines based on different criteria.
version2.0.0 Version of the tool that generated this parameters file.
++1Instance '1' section for 'IDFilter'
in input file input file*.idXML
out output file output file*.idXML
min_length0 Keep only peptide hits with a length greater or equal this value. Value 0 will have no filter effect.0:∞
max_length0 Keep only peptide hits with a length less or equal this value. Value 0 will have no filter effect. Value is overridden by min_length, i.e. if max_length < min_length, max_length will be ignored.0:∞
min_charge1 Keep only peptide hits for tandem spectra with charge greater or equal this value.1:∞
var_modsfalse Keep only peptide hits with variable modifications (fixed modifications from SearchParameters will be ignored).true,false
uniquefalse If a peptide hit occurs more than once per PSM, only one instance is kept.true,false
unique_per_proteinfalse Only peptides matching exactly one protein are kept. Remember that isoforms count as different proteins!true,false
keep_unreferenced_protein_hitsfalse Proteins not referenced by a peptide are retained in the ids.true,false
remove_decoysfalse Remove proteins according to the information in the user parameters. Usually used in combination with 'delete_unreferenced_peptide_hits'.true,false
delete_unreferenced_peptide_hitsfalse Peptides not referenced by any protein are deleted in the ids. Usually used in combination with 'score:prot' or 'thresh:prot'.true,false
log Name of log file (created only when specified)
debug0 Sets the debug level
threads1 Sets the number of threads allowed to be used by the TOPP tool
no_progressfalse Disables progress logging to command linetrue,false
forcefalse Overwrite tool specific checks.true,false
testfalse Enables the test mode (needed for internal use only)true,false
+++precursorFiltering by precursor RT or m/z
rt: Retention time range to extract.
mz: Mass-to-charge range to extract.
allow_missingfalse When filtering by precursor RT or m/z, keep peptide IDs with missing precursor information ('RT'/'MZ' meta values)?true,false
+++scoreFiltering by peptide/protein score. To enable any of the filters below, just change their default value. All active filters will be applied in order.
pep0 The score which should be reached by a peptide hit to be kept. The score is dependent on the most recent(!) preprocessing - it could be Mascot scores (if a MascotAdapter was applied before), or an FDR (if FalseDiscoveryRate was applied before), etc.
prot0 The score which should be reached by a protein hit to be kept. Use in combination with 'delete_unreferenced_peptide_hits' to remove affected peptides.
+++threshFiltering by significance threshold
pep0 Keep a peptide hit only if its score is above this fraction of the peptide significance threshold.
prot0 Keep a protein hit only if its score is above this fraction of the protein significance threshold. Use in combination with 'delete_unreferenced_peptide_hits' to remove affected peptides.
+++whitelistFiltering by whitelisting (only instances also present in a whitelist file can pass)
proteins filename of a FASTA file containing protein sequences.
All peptides that are not a substring of a sequence in this file are removed
All proteins whose accession is not present in this file are removed.
input file*.fasta
by_seq_onlyfalse Match peptides with FASTA file by sequence instead of accession and disable protein filtering.true,false
+++blacklistFiltering by blacklisting (only instances not present in a blacklist file can pass)
peptides Peptides having the same sequence and modification assignment as any peptide in this file will be filtered out. Use with blacklist:ignore_modification flag to only compare by sequence.
input file*.idXML
ignore_modificationsfalse Compare blacklisted peptides by sequence only.
true,false
+++rtFiltering by RT predicted by 'RTPredict'
p_value0 Retention time filtering by the p-value predicted by RTPredict.0:1
p_value_1st_dim0 Retention time filtering by the p-value predicted by RTPredict for first dimension.0:1
+++mzFiltering by mz
error-1 Filtering by deviation to theoretical mass (disabled for negative values).
unitppm Absolute or relative error.Da,ppm
+++bestFiltering best hits per spectrum (for peptides) or from proteins
n_peptide_hits0 Keep only the 'n' highest scoring peptide hits per spectrum (for n>0).0:∞
n_protein_hits0 Keep only the 'n' highest scoring protein hits (for n>0).0:∞
strictfalse Keep only the highest scoring peptide hit.
Similar to n_peptide_hits=1, but if there are two or more highest scoring hits, none are kept.
true,false
n_to_m_peptide_hits: peptide hit rank range to extracts

OpenMS / TOPP release 2.0.0 Documentation generated on Wed Mar 30 2016 16:18:43 using doxygen 1.8.5