Format¶
Several classes and functions to deal with common mass spectrometric format (mostly dealing with File I/O).
File Reader Module¶
SWATHScoringReader¶
-
class
msproteomicstoolslib.format.SWATHScoringReader.ReadFilter¶ Bases:
objectA callable class which can pre-filters a row and determine whether the row can be skipped.
If the call returns true, the row is examined but if it returns false, the row should be skipped.
-
class
msproteomicstoolslib.format.SWATHScoringReader.SWATHScoringReader¶ -
static
newReader(infiles, filetype, readmethod="minimal", readfilter=ReadFilter(), errorHandling="strict", enable_isotopic_grouping=False)¶ Factory to create a new reader
-
parse_files(read_exp_RT=True, verbosity=10, useCython=False)¶ Parse the input file(s) (CSV).
Parameters: read_exp_RT (bool) – to read the real, experimental retention time (default behavior) or the delta iRT should be used instead. Returns: runs(list(SWATHScoringReader.Run)) A single CSV file might contain more than one run and thus to create unique run ids, we number the runs as xx_yy where xx is the current file number and yy is the run found in the current file. However, if an alignment has already been performed and each run has already obtained a unique run id, we can directly use the previous alignment id.
-
parse_row(run, this_row, read_exp_RT)¶
-
static
-
class
msproteomicstoolslib.format.SWATHScoringReader.OpenSWATH_SWATHScoringReader(infiles, readmethod='minimal', readfilter=<msproteomicstoolslib.format.SWATHScoringReader.ReadFilter object>, errorHandling='strict', enable_isotopic_grouping=False, read_cluster_id=True)¶ Bases:
msproteomicstoolslib.format.SWATHScoringReader.SWATHScoringReaderParser for OpenSWATH output
-
parse_row(run, this_row, read_exp_RT)¶
-
-
class
msproteomicstoolslib.format.SWATHScoringReader.mProphet_SWATHScoringReader(infiles, readmethod='minimal', readfilter=<msproteomicstoolslib.format.SWATHScoringReader.ReadFilter object>, enable_isotopic_grouping=False)¶ Bases:
msproteomicstoolslib.format.SWATHScoringReader.SWATHScoringReaderParser for mProphet output
-
parse_row(run, this_row, read_exp_RT)¶
-
-
class
msproteomicstoolslib.format.SWATHScoringReader.Peakview_SWATHScoringReader(infiles, readmethod='minimal', readfilter=<msproteomicstoolslib.format.SWATHScoringReader.ReadFilter object>, enable_isotopic_grouping=False)¶ Bases:
msproteomicstoolslib.format.SWATHScoringReader.SWATHScoringReaderParser for Peakview output
-
parse_row(run, this_row, read_exp_RT)¶
-
-
msproteomicstoolslib.format.SWATHScoringReader.inferMapping(rawdata_files, aligned_pg_files, mapping, precursors_mapping, sequences_mapping, protein_mapping, verbose=False, throwOnMismatch=False, fileType=None)¶ Infers a mapping between raw chromatogram files (mzML) and processed feature TSV files
Usually one feature file can contain multiple aligned runs and maps to multiple chromatogram files (mzML). This function will try to guess the original name of the mzML based on the align_origfilename column in the TSV. Note that both files have some typical endings that are _not_ shared, these are generally removed before comparison.
Only an excact match is allowed.
Spectral library Module¶
Functions for handling SpectraST spectral library format
Spectral library handler¶
-
class
msproteomicstoolslib.format.speclib_db_lib.Library(lkey=None)¶ This class contains one spectral library, whatever that means. It provides an read/write interface to the database. It provides an read/write interface to the SpectraST *.splib and *.pepidx files. One can easily add spectra or retrive the spectra
-
add_spectra(s)¶
-
annotate_with_libkey()¶ Annotate spectra with the key of the current library
-
count_modifications()¶
-
delete_library_from_DB(library_key, db)¶ Delete current library from SQL database
-
delete_reverse_spectra()¶
-
find_by_sequence(sequence, db)¶ This function can be used to access spectra using a sequence search
-
find_by_sql(query_in, db)¶ This function can be used to access spectra using an sql query. The query should produce a single coloumn with spectra_keys. This can be very slow, use find_by_sql_fast instead (~400x faster).
-
find_by_sql_fast(subQuery, db, tmp_db)¶ This function can be used to access spectra using an sql query. The query should produce a single coloumn with spectra_keys (ids) which MUST be called tmp_spectra_keys. You need create table privileges in the databse tmp_db for this. But it can be 400x times faster than plain find_by_sql.
-
get_all_spectra()¶
-
get_fileheader(splibFileName)¶ Get the header preceding the first spectrum in a spectrast file.
-
get_first_offset(splibFileName)¶
-
get_rawspectrum_with_offset(splibFileName, offset)¶ Get a raw spectrum as it is from a spectrast file by using an offset to locate it.
-
get_spectra_by_sequence(sequence)¶ Get all spectra that match a specific sequence
-
init_with_self(library)¶ Initialize with another library. Doesnt do a very deep copy
-
measure_nr_spectra()¶
-
nr_unique_peptides()¶
-
read_fromDB(library_key, db)¶ This function can be used to access one complete library from the DB.
-
static
read_from_db_to_file(db, filePrefix)¶ This function can be used to access one complete library from the DB directly to a file.
-
static
read_library_to_db(pepidxFileName, db, library_key)¶ Read directly from a spectral library into the database.
-
read_pepidx(filename)¶
-
read_spectrum_sptxt_idx(splibFileName, idx, library_key)¶ “Fetch a spectrum from the spectral library, by using the binary index
-
read_sptxt(filename)¶
-
read_sptxt_pepidx(splibFileName, pepidxFileName, library_key)¶ Read directly from a spectral library into memory.
-
read_sptxt_with_offset(splibFileName, offset)¶ Read a sptxt spectra library file by using an offset to keep memory free
-
remove_duplicate_entries()¶
-
set_library_key(lkey)¶
-
write(filePrefix, append=False)¶ Write the current library to a file.
-
write_sorted(filePrefix)¶
-
write_toDB(db, cursor)¶ Write all spectra into a SQL database
-
-
class
msproteomicstoolslib.format.speclib_db_lib.SequenceHandler¶ Container class of spectra with the same sequence in a spectral library
Acts as a container of all spectra mapping to the same sequence inside a spectral library
-
add_meta(meta)¶
-
add_spectra(spectra)¶
-
add_spectra_no_duplicates(spectra)¶
-
empty()¶
-
init_with_self(handler)¶
-
remove(s)¶
-
remove_duplicate_entries()¶
-
-
class
msproteomicstoolslib.format.speclib_db_lib.Spectra¶ A single spectrum inside a spectral library
-
acetyl_len()¶
-
add_meta(sequence, modifications, library_key)¶
-
analyse_mod()¶
-
carbamido_len()¶
-
escape_string(string)¶
-
find(id, db)¶
-
get_known_modifications()¶
-
get_meta_headers()¶
-
get_peaks()¶
-
get_spectra_headers()¶
-
icat_len()¶
-
initialize()¶ Initialize spectrum
-
is_tryptic()¶
-
methyl_len()¶
-
other_known_len()¶
-
other_len()¶
-
oxidations_len()¶
-
parse_SearchEngineInfo(searchEngineInfo)¶
-
parse_comments(comment)¶
-
parse_sptxt(stack)¶ Parse an sptxt entry and initialize spectrum
-
phospho_len()¶
-
phosphos_len()¶
-
save(db)¶
-
to_pepidx_str()¶ Convert spectrum object to pepidx format
-
to_splib_str()¶ Convert spectrum object to splib format
-
validate()¶
-