Format

Several classes and functions to deal with common mass spectrometric format (mostly dealing with File I/O).

Transformation Collection Module

TransformationCollection

LightTransformationData

File Reader Module

SWATHScoringReader

class msproteomicstoolslib.format.SWATHScoringReader.ReadFilter

Bases: object

A callable class which can pre-filters a row and determine whether the row can be skipped.

If the call returns true, the row is examined but if it returns false, the row should be skipped.

class msproteomicstoolslib.format.SWATHScoringReader.SWATHScoringReader
static newReader(infiles, filetype, readmethod="minimal", readfilter=ReadFilter(), errorHandling="strict", enable_isotopic_grouping=False)

Factory to create a new reader

parse_files(read_exp_RT=True, verbosity=10, useCython=False)

Parse the input file(s) (CSV).

Parameters:read_exp_RT (bool) – to read the real, experimental retention time (default behavior) or the delta iRT should be used instead.
Returns:runs(list(SWATHScoringReader.Run))

A single CSV file might contain more than one run and thus to create unique run ids, we number the runs as xx_yy where xx is the current file number and yy is the run found in the current file. However, if an alignment has already been performed and each run has already obtained a unique run id, we can directly use the previous alignment id.

parse_row(run, this_row, read_exp_RT)
class msproteomicstoolslib.format.SWATHScoringReader.OpenSWATH_SWATHScoringReader(infiles, readmethod='minimal', readfilter=<msproteomicstoolslib.format.SWATHScoringReader.ReadFilter object>, errorHandling='strict', enable_isotopic_grouping=False, read_cluster_id=True)

Bases: msproteomicstoolslib.format.SWATHScoringReader.SWATHScoringReader

Parser for OpenSWATH output

parse_row(run, this_row, read_exp_RT)
class msproteomicstoolslib.format.SWATHScoringReader.mProphet_SWATHScoringReader(infiles, readmethod='minimal', readfilter=<msproteomicstoolslib.format.SWATHScoringReader.ReadFilter object>, enable_isotopic_grouping=False)

Bases: msproteomicstoolslib.format.SWATHScoringReader.SWATHScoringReader

Parser for mProphet output

parse_row(run, this_row, read_exp_RT)
class msproteomicstoolslib.format.SWATHScoringReader.Peakview_SWATHScoringReader(infiles, readmethod='minimal', readfilter=<msproteomicstoolslib.format.SWATHScoringReader.ReadFilter object>, enable_isotopic_grouping=False)

Bases: msproteomicstoolslib.format.SWATHScoringReader.SWATHScoringReader

Parser for Peakview output

parse_row(run, this_row, read_exp_RT)
msproteomicstoolslib.format.SWATHScoringReader.inferMapping(rawdata_files, aligned_pg_files, mapping, precursors_mapping, sequences_mapping, protein_mapping, verbose=False, throwOnMismatch=False, fileType=None)

Infers a mapping between raw chromatogram files (mzML) and processed feature TSV files

Usually one feature file can contain multiple aligned runs and maps to multiple chromatogram files (mzML). This function will try to guess the original name of the mzML based on the align_origfilename column in the TSV. Note that both files have some typical endings that are _not_ shared, these are generally removed before comparison.

Only an excact match is allowed.

Data Matrix Module

Functions for handling the output data matrix

MatrixWriters

Spectral library Module

Functions for handling SpectraST spectral library format

Spectral library handler

class msproteomicstoolslib.format.speclib_db_lib.Library(lkey=None)

This class contains one spectral library, whatever that means. It provides an read/write interface to the database. It provides an read/write interface to the SpectraST *.splib and *.pepidx files. One can easily add spectra or retrive the spectra

add_spectra(s)
all_spectra()

Iterate over all specra in the library

Yields:spectrum(Spectra) – current spectrum
annotate_with_libkey()

Annotate spectra with the key of the current library

count_modifications()
delete_library_from_DB(library_key, db)

Delete current library from SQL database

delete_reverse_spectra()
find_by_sequence(sequence, db)

This function can be used to access spectra using a sequence search

find_by_sql(query_in, db)

This function can be used to access spectra using an sql query. The query should produce a single coloumn with spectra_keys. This can be very slow, use find_by_sql_fast instead (~400x faster).

find_by_sql_fast(subQuery, db, tmp_db)

This function can be used to access spectra using an sql query. The query should produce a single coloumn with spectra_keys (ids) which MUST be called tmp_spectra_keys. You need create table privileges in the databse tmp_db for this. But it can be 400x times faster than plain find_by_sql.

get_all_spectra()
get_fileheader(splibFileName)

Get the header preceding the first spectrum in a spectrast file.

get_first_offset(splibFileName)
get_rawspectrum_with_offset(splibFileName, offset)

Get a raw spectrum as it is from a spectrast file by using an offset to locate it.

get_spectra_by_sequence(sequence)

Get all spectra that match a specific sequence

init_with_self(library)

Initialize with another library. Doesnt do a very deep copy

measure_nr_spectra()
nr_unique_peptides()
read_fromDB(library_key, db)

This function can be used to access one complete library from the DB.

static read_from_db_to_file(db, filePrefix)

This function can be used to access one complete library from the DB directly to a file.

static read_library_to_db(pepidxFileName, db, library_key)

Read directly from a spectral library into the database.

read_pepidx(filename)
read_spectrum_sptxt_idx(splibFileName, idx, library_key)

“Fetch a spectrum from the spectral library, by using the binary index

read_sptxt(filename)
read_sptxt_pepidx(splibFileName, pepidxFileName, library_key)

Read directly from a spectral library into memory.

read_sptxt_with_offset(splibFileName, offset)

Read a sptxt spectra library file by using an offset to keep memory free

remove_duplicate_entries()
set_library_key(lkey)
write(filePrefix, append=False)

Write the current library to a file.

write_sorted(filePrefix)
write_toDB(db, cursor)

Write all spectra into a SQL database

class msproteomicstoolslib.format.speclib_db_lib.SequenceHandler

Container class of spectra with the same sequence in a spectral library

Acts as a container of all spectra mapping to the same sequence inside a spectral library

add_meta(meta)
add_spectra(spectra)
add_spectra_no_duplicates(spectra)
empty()
init_with_self(handler)
remove(s)
remove_duplicate_entries()
class msproteomicstoolslib.format.speclib_db_lib.Spectra

A single spectrum inside a spectral library

acetyl_len()
add_meta(sequence, modifications, library_key)
analyse_mod()
carbamido_len()
escape_string(string)
find(id, db)
get_known_modifications()
get_meta_headers()
get_peaks()
get_spectra_headers()
icat_len()
initialize()

Initialize spectrum

is_tryptic()
methyl_len()
other_known_len()
other_len()
oxidations_len()
parse_SearchEngineInfo(searchEngineInfo)
parse_comments(comment)
parse_sptxt(stack)

Parse an sptxt entry and initialize spectrum

phospho_len()
phosphos_len()
save(db)
to_pepidx_str()

Convert spectrum object to pepidx format

to_splib_str()

Convert spectrum object to splib format

validate()