OpenMS
Loading...
Searching...
No Matches
OpenSwathWorkflow

Complete workflow to run OpenSWATH.

This implements the OpenSWATH workflow as described in Rost and Rosenberger et al. (Nature Biotechnology, 2014) and provides a complete, integrated analysis tool without the need to run multiple tools consecutively. See also http://openswath.org/ for additional documentation.

It executes the following steps in order, which is implemented in OpenSwathWorkflow:

  • Reading of input files, which can be provided as one single mzML or multiple "split" mzMLs (one per SWATH)
  • Computing the retention time transformation using RT-normalization peptides
  • Reading of the transition list
  • Extracting the specified transitions
  • Scoring the peak groups in the extracted ion chromatograms (XIC)
  • Reporting the peak groups and the chromatograms

See below or have a look at the INI file (via "OpenSwathWorkflow -write_ini myini.ini") for available parameters and more functionality.

Input: SWATH maps and assay library (transition list)

SWATH maps can be provided as mzML files, either as single file directly from the machine (this assumes that the SWATH method has 1 MS1 and then n MS2 spectra which are ordered the same way for each cycle). E.g. a valid method would be MS1, MS2 [400-425], MS2 [425-450], MS1, MS2 [400-425], MS2 [425-450] while an invalid method would be MS1, MS2 [400-425], MS2 [425-450], MS1, MS2 [425-450], MS2 [400-425] where MS2 [xx-yy] indicates an MS2 scan with an isolation window starting at xx and ending at yy. OpenSwathWorkflow will try to read the SWATH windows from the data, if this is not possible please provide a tab-separated list with the correct windows using the -swath_windows_file parameter (this is recommended). Note that the software expects extraction windows (e.g. which peptides to extract from which window) which cannot have overlaps, otherwise peptides will be extracted from two different windows.

Alternatively, a set of split files (n+1 mzML files) can be provided, each containing one SWATH map (or MS1 map).

Since the file size can become rather large, it is recommended to not load the whole file into memory but rather cache it somewhere on the disk using a fast-access data format. This can be specified using the -readOptions cache parameter (this is recommended!).

The assay library (transition list) is provided through the -tr parameter and can be in one of the following formats:

  • TraML
  • OpenSWATH TSV transition lists
  • OpenSWATH PQP SQLite files
  • SpectraST MRM transition lists
  • Skyline transition lists
  • Spectronaut transition lists

Parameters

The current parameters are optimized for 2 hour gradients on SCIEX 5600 / 6600 TripleTOF instruments with a peak width of around 30 seconds using iRT peptides. If your chromatography differs, please consider adjusting -Scoring:TransitionGroupPicker:min_peak_width to allow for smaller or larger peaks and adjust the -rt_extraction_window to use a different extraction window for the retention time. In m/z domain, consider adjusting -mz_extraction_window to your instrument resolution, which can be in Th or ppm.

Furthermore, if you wish to use MS1 information, use the -use_ms1_traces flag and provide an MS1 map in addition to the SWATH data.

If you encounter issues with peak picking, try to disable peak filtering by setting -Scoring:TransitionGroupPicker:compute_peak_quality false which will disable the filtering of peaks by chromatographic quality. Furthermore, you can adjust the smoothing parameters for the peak picking, by adjusting -Scoring:TransitionGroupPicker:PeakPickerMRM:sgolay_frame_length or using a Gaussian smoothing based on your estimated peak width. Adjusting the signal to noise threshold will make the peaks wider or smaller.

Output: Feature list and chromatograms

The output of the OpenSwathWorkflow is a feature list, either as FeatureXML or as tsv (use -out_features or -out_tsv) while the latter is more memory friendly and can be directly used as input to other tools such as mProphet or pyProphet. If you analyze large datasets, it is recommended to only use -out_tsv and not -out_features. For downstream analysis (e.g. using mProphet or pyProphet) also the -out_tsv format is recommended.

The feature list generated by -out_tsv is a tab-separated file. It can be used directly as input to the mProphet or pyProphet (a Python re-implementation of mProphet) software tool, see Reiter et al (2011, Nature Methods).

In addition, the extracted chromatograms can be written out using the -out_chrom parameter.

Feature list output format

The tab-separated feature output contains the following information:

Header row Format Description
transition_group_id String

A unique id for the transition group (all chromatographic traces that are analyzed together)

peptide_group_label String

A unique id for the peptide group (will be the same for each charge state and heavy/light status)

run_id String

An identifier for the run (currently always 0)

filename String

The input filename

RT Float

Peak group retention time

id String

A unique identifier for the peak group

Sequence String

Peptide sequence (no modifications)

MC Int

Missed cleavages of the sequence (assuming Trypsin as protease)

FullPeptideName String

Full peptide sequence including modifications in Unimod format

Charge Int

Assumed charge state

m/z Float

Precursor m/z

masserror_ppm Float List

Pairs of fragment masses (m/z) and their associated error in ppm for all transitions

Intensity Float

Peak group intensity (sum of all transitions)

ProteinName String

Name of the associated protein

decoy String

Whether the transition is decoy or not (0 = false, 1 = true)

assay_rt Float

The expected RT in seconds (based on normalized iRT value)

delta_rt Float

The difference between the expected RT and the peak group RT in seconds

leftWidth Float

The start of the peak group (left side) in seconds

main_var_xx_swath_prelim_score Float

Initial score

norm_RT Float

The peak group retention time in normalized (iRT) space

nr_peaks Int

The number of transitions used

peak_apices_sum Float

The sum of all peak apices (may be used as alternative intensity)

potentialOutlier String

Potential outlier transitions (or "none" if none was detected)

rightWidth Float

The end of the peak group (left side) in seconds

rt_score Float

The raw RT score (unnormalized)

sn_ratio Float

The raw S/N ratio

total_xic Float

The total XIC of the chromatogram

var_... Float

One of multiple sub-scores used by OpenSWATH to describe the peak group

aggr_prec_Peak_Area String

Intensity (peak area) of MS1 traces separated by semicolon

aggr_prec_Peak_Apex String Intensity (peak apex) of MS1 traces separated by semicolon
aggr_prec_Fragment_Annotation String Annotation of MS1 traces separated by semicolon
aggr_Peak_Area String Intensity (peak area) of fragment ion traces separated by semicolon
aggr_Peak_Apex String Intensity (peak apex) of fragment ion traces separated by semicolon
aggr_Fragment_Annotation String

Annotation of fragment ion traces separated by semicolon

Execution flow:

The overall execution flow for this tool is implemented in OpenSwathWorkflow.

The command line parameters of this tool are:

INI file documentation of this tool: