DataStructures - Alignment¶
Run Module¶
Run¶
-
class
msproteomicstoolslib.data_structures.Run.Run(header, header_dict, runid, orig_input_filename=None, filename=None, aligned_filename=None)¶ A run contains references to identified precursor groups and precursors.
The run stores a reference to precursor groups (heavy/light pairs) identified in the run. It has a unique id and stores the headers from the csv
- A run has the following attributes:
- an identifier that is unique to this run
- a filename where it originally came from
- a dictionary of precursor groups which are accessible through the following functions - getPrecursorGroup - hasPrecursor - getPrecursor - addPrecursor
-
addPrecursor(precursor, peptide_group_label)¶
-
getPrecursor(peptide_group_label, trgr_id)¶ Return precursor corresponding to the given peptide label group and the transition group id
-
getPrecursorGroup(curr_id)¶
-
get_aligned_filename()¶
-
get_best_peaks()¶
-
get_best_peaks_with_cutoff(cutoff)¶
-
get_id()¶
-
get_openswath_filename()¶
-
get_original_filename()¶
-
hasPrecursor(peptide_group_label, trgr_id)¶
PrecursorGroup Module¶
PrecursorGroup¶
-
class
msproteomicstoolslib.data_structures.PrecursorGroup.PrecursorGroup(peptide_group_label, run)¶ A set of precursors that are isotopically modified versions of each other.
A collection of precursors that are isotopically modified versions of the same underlying peptide sequence. Generally these are heavy/light forms.
-
addPrecursor(self, precursor)¶ Add precursor to peptide group
-
getAllPeakgroups(self)¶ Generator of all peakgroups attached to the precursors in this group
-
getAllPrecursors(self)¶ Return a list of all precursors in this precursor group
-
getOverallBestPeakgroup(self)¶ Get the best peakgroup (by fdr score) of all precursors contained in this precursor group
-
getPeptideGroupLabel(self)¶ Get peptide group label
-
getPrecursor(self, curr_id)¶ Get the precursor for the given transition group id
-
Precursor Module¶
PrecursorBase¶
GeneralPrecursor¶
-
class
msproteomicstoolslib.data_structures.Precursor.GeneralPrecursor(this_id, run)¶ Bases:
msproteomicstoolslib.data_structures.Precursor.PrecursorBaseA set of peakgroups that belong to the same precursor in a single run.
== Implementation details ==
This is a plain implementation where all peakgroup objects are stored in a simple list, this is not very efficient since many objects need to be created which in Python takes a lot of memory.
-
add_peakgroup(peakgroup)¶
-
append(transitiongroup)¶
-
find_closest_in_iRT(delta_assay_rt)¶
-
get_all_peakgroups()¶
-
get_best_peakgroup()¶ Return the best peakgroup according to fdr score
-
get_run_id()¶
-
get_selected_peakgroup()¶
-
id¶
-
peakgroups¶
-
precursor_group¶
-
protein_name¶
-
run¶
-
sequence¶
-
Precursor¶
-
class
msproteomicstoolslib.data_structures.Precursor.Precursor(this_id, run)¶ Bases:
msproteomicstoolslib.data_structures.Precursor.PrecursorBaseA set of peakgroups that belong to the same precursor in a single run.
Each precursor has a backreference to its precursor group (heavy/light pair) it belongs to, the run it belongs to as well as its amino acid sequence. Furthermore, a unique id for the precursor and the protein name are stored.
A precursor can return its best transition group, the selected peakgroup, or can return the transition group that is closest to a given iRT time. Its id is the transition_group_id (e.g. the id of the chromatogram)
The “selected” peakgroup is represented by the peakgroup that belongs to cluster number 1 (cluster_id == 1) which in this case is “special”.
== Implementation details ==
For memory reasons, we store all information about the peakgroup in a tuple (invariable). This tuple contains a unique feature id, a score and a retention time. Additionally, we also store, in which cluster the peakgroup belongs (if the user sets this).
- A peakgroup has the following attributes:
- an identifier that is unique among all other precursors
- a set of peakgroups
- a back-reference to the run it belongs to
-
add_peakgroup_tpl(pg_tuple, tpl_id, cluster_id=-1)¶ Adds a peakgroup to this precursor.
- The peakgroup should be a tuple of length 4 with the following components:
- id
- quality score (FDR)
- retention time (normalized)
3. intensity (4. d_score optional)
-
cluster_ids_¶
-
find_closest_in_iRT(delta_assay_rt)¶
-
getAllPeakgroups()¶
-
getClusteredPeakgroups()¶
-
getPrecursorGroup()¶
-
get_all_peakgroups()¶
-
get_best_peakgroup()¶
-
get_id()¶
-
get_run_id()¶
-
get_selected_peakgroup()¶
-
id¶
-
peakgroups_¶
-
precursor_group¶
-
protein_name¶
-
run¶
-
select_pg(this_id)¶
-
sequence¶
-
setClusterID(this_id, cl_id)¶
-
unselect_all()¶
-
unselect_pg(this_id)¶
PeakGroup Module¶
PeakGroupBase¶
-
class
msproteomicstoolslib.data_structures.PeakGroup.PeakGroupBase¶ Bases:
object-
cluster_id_¶
-
fdr_score¶
-
get_cluster_id()¶
-
get_fdr_score()¶
-
get_feature_id()¶
-
get_intensity()¶
-
get_normalized_retentiontime()¶
-
get_value(value)¶
-
id_¶
-
intensity_¶
-
is_selected()¶
-
normalized_retentiontime¶
-
select_this_peakgroup()¶
-
set_fdr_score(fdr_score)¶
-
set_feature_id(id_)¶
-
set_intensity(intensity)¶
-
set_normalized_retentiontime(normalized_retentiontime)¶
-
set_value(key, value)¶
-
MinimalPeakGroup¶
-
class
msproteomicstoolslib.data_structures.PeakGroup.MinimalPeakGroup(unique_id, fdr_score, assay_rt, selected, cluster_id, peptide, intensity=None, dscore=None)¶ Bases:
msproteomicstoolslib.data_structures.PeakGroup.PeakGroupBaseA single peakgroup that is defined by a retention time in a chromatogram of multiple transitions. Additionally it has an fdr_score and it has an aligned RT (e.g. retention time in normalized space). A peakgroup can be selected for quantification or not (this is stored as having cluster_id == 1).
Note that for performance reasons, the peakgroups are created on-the-fly and not stored as objects but rather as tuples in “Peptide”.
Each peak group has a unique id, a score (fdr score usually), a retention time as well as a back-reference to the precursor that generated the peakgroup. In this case, the peak group can also be assigned a cluster id (where the cluster 1 is special as the one we will use for quantification).
-
get_cluster_id()¶
-
get_dscore()¶
-
print_out()¶
-
select_this_peakgroup()¶
-
setClusterID(id_)¶
-
set_fdr_score(fdr_score)¶
-
set_feature_id(id_)¶
-
set_intensity(intensity)¶
-
set_normalized_retentiontime(normalized_retentiontime)¶
-
GuiPeakGroup¶
-
class
msproteomicstoolslib.data_structures.PeakGroup.GuiPeakGroup(fdr_score, intensity, leftWidth, rightWidth, peptide)¶ Bases:
msproteomicstoolslib.data_structures.PeakGroup.PeakGroupBaseA single peakgroup that is defined by a retention time in a chromatogram of multiple transitions.
-
get_value(value)¶
-
DataStructures - Basic¶
Aminoacides Module¶
Aminoacid¶
-
class
msproteomicstoolslib.data_structures.aminoacides.Aminoacid(name, code, code3, composition)¶ Class to hold information about a single Amino Acid (AA)
-
code= None¶ One letter code
-
code3= None¶ Three letter code
-
composition= None¶ Elemental composition
-
elementsLib= None¶ Library of elements
-
name= None¶ Full name of the AA
-
Modifications Module¶
Modification¶
Modifications¶
-
class
msproteomicstoolslib.data_structures.modifications.Modifications(default_mod_file=None)¶ A collection of modifications
-
appendModification(modification)¶
-
is_bool(expression)¶
-
printModifications()¶
-
readModificationsFile(modificationsfile)¶ It reads a tsv file with additional modifications. Modifications will be appended to the default modifications of this class. Tsv file headers & an example: modified-AA TPP-nomenclature Unimod-Accession ProteinPilot-nomenclature is_a_labeling composition-dictionary S S[167] 21 [Pho] False {‘H’ : 1,’O’ : 3, ‘P’ : 1}
-
translateModificationsFromSequence(sequence, code, aaLib=None)¶ Returns a Peptide object, given a sequence with modifications in any of the available codes. The code (TPP, Unimod,…) to be translated must be given.
-
Peak Module¶
Peptide Module¶
Peptide¶
-
class
msproteomicstoolslib.data_structures.peptide.Peptide(sequence, modifications={}, protein='', aminoacidLib=None)¶ -
addSpectrum(spectrum)¶ Deprecated definition
-
all_ions(ionseries=None, frg_z_list=[1, 2], fragmentlossgains=[0], mass_limits=None, label='')¶ Returns all the fragment ions of the peptide in a tuple of two objects: (annotated, ionmasses_only) annotated is a list of tuples as : (ion_type, ion_number, ion_charge, lossgain, fragment_mz) ionmasses_only is a list of fragment masses. When ionseries is not provided, all existing ion series (see: Peptide.iontypes) will be calculated. When frg_z_list is not provided, fragment ion charge states +1 and +2 will be used.
-
calIsoforms(switchingModification, modLibrary)¶ This returns the full list of peptide species of the same peptide family (isobaric, same composition, different modification site. The list is given as a list of Peptide objects. switchingModification must be given as a Modification object.
-
cal_UIS(otherPeptidesList, UISorder=2, ionseries=None, fragmentlossgains=[0], precision=1e-08, frg_z_list=[1, 2], mass_limits=None)¶ It calculates the UIS for a given peptide referred to a given list of other peptides. It returns a tuple of two objects all_UIS, and all_UIS_annotated. all_UIS contains only a mass list.
-
comparePeptideFragments(otherPeptidesList, ionseries=None, fragmentlossgains=[0], precision=1e-08, frg_z_list=[1, 2])¶ This returns a tuple of lists: (CommonFragments, differentialFragments). The differentialFragmentMasses are the masses of the __self__ peptide are not shared with any of the peptides listed in the otherPeptidesList. otherPeptidesList must be a list of Peptide objects. The fragments are reported as a tuple : (ionserie,ion_number,ion_charge,frqgmentlossgain,mass)
-
fragmentSequence(ion_type, frg_number)¶
-
getDeltaMassFromSequence(sequence)¶
-
getMZ(charge, label='')¶
-
getMZfragment(ion_type, ion_number, ion_charge, label='', fragmentlossgain=0.0)¶
-
getSequenceWithMods(code)¶
-
get_decoy_Q3(frg_serie, frg_nr, frg_z, blackList=[], max_tries=1000)¶
-
pseudoreverse(sequence='None')¶
-
shuffle_sequence()¶
-
Residues Module¶
Residues¶
-
class
msproteomicstoolslib.data_structures.Residues.Residues(type='mono')¶ A class that contains information elements, amino acids and modifications. It stores mainly masse of these but also chemical formulas.
- The most commonly used properties are:
- Residues.average_elments : element weights
- Residues.monoisotopic_elments : element weights
- Residues.aa_codes : Three and One letter amino acid codes
- Residues.aa_names : English names of the amino acids
- Residues.aa_sum_formulas_text : Chemical formulas of all amino acids
- Residues.aa_sum_formulas: Chemical formulas of all amino acids as hash
- Residues.mass_xxx: monoisotopic masses of different compounds (NH3, H2O, CO, HPO4 etc)
- Residues.average_data: average weight of amino acids
- Residues.monoisotopic_data: monoisotopic weight of amino acids
- Residues.monoisotopic_mod: monoisotopic modification data
- Residues.mod_mapping: mapping of + notation to absolute weight notation (K[+8] to K[136])
- Residues.Hydropathy: Hydropathy of amino acids (gravy scores)
- TODO hydrophobicity of amino acids
- TODO basicity of amino acids
- TODO helicity of amino acids
- Residues.pI: pI of amino acids