Fetching hits from Fuzzle¶

class database.data.Hit(id: int, query: str, q_scop_id: str, no: int, sbjct: str, s_scop_id: str, s_desc: str, prob: float, eval: float, pval: float, score: float, ss: float, cols: int, q_start: int, q_end: int, s_start: int, s_end: int, hmm: int, ident: float, q_sufam_id: str, s_sufam_id: str, q_fold_id: str, s_fold_id: str, rmsd_pair: float, ca_pair: int, rmsd_tm_pair: float, score_tm_pair: float, ca_tm_pair: int, rmsd_tm: float, score_tm: float, ca_tm: int, q_tm_start: int, q_tm_end: int, s_tm_start: int, s_tm_end: int, q_cluster: str, s_cluster: str)[source]¶

Some of the documentation of this function was taken from the hhsuite python documentation: https://github.com/soedinglab/hh-suite/wiki as the sequence information from the Fuzzle hits come from HHsearch. The structural superimpositions were performed with TMalign: https://zhanglab.ccmb.med.umich.edu/TM-align/

property ca_pair¶: The number of alpha carbon pairs that were used for the rmsd_pair calculation.

property ca_tm¶: The number of alpha carbon pairs that were used for the rmsd_tm calculation

property ca_tm_pair¶: The number of alpha carbon pairs that were used for the rmsd_tm_pair calculation

property cols¶: The number of aligned Match columns in the HMM-HMM alignment.

property eval¶: E-value

property hmm¶: int

property id¶: The database id for this hit

property ident¶: Identity % for the sequence alignment

property no¶: The HHsearch hit number for this query

property prob¶: HHsearch probability

property pval¶: p-value

property q_cluster¶: Query cluster where this fragment belongs to.

property q_end¶: Residue where the alignment ends for the query domain (sequence position)

property q_fold_id¶: The fold the query belongs to

property q_scop_id¶: The SCOP family the query belongs to

property q_start¶: Residue where the alignment starts for the query domain (sequence position)

property q_sufam_id¶: The superfamily the query belongs to

property q_tm_end¶: Residue in the query structure where the rmsd_tm_pair alignment ends

property q_tm_start¶: Residue in the query structure where the rmsd_tm_pair alignment starts

property query¶: The 7-letter SCOP95 code for the query domain

property rmsd_pair¶: RMSD for the alignment between the two domains, strictly taking the alpha carbons from the structures that exactly appear in the HHsearch sequence alignment

property rmsd_tm¶: RMSD for the TMalign alignment between the two domains without seed

property rmsd_tm_pair¶: RMSD for the TMalign alignment between the two domains, passing the sequence alignment as seed

property s_cluster¶: Subject cluster where this fragment belongs to

property s_desc¶: Description of the subject domain.

property s_end¶: Residue where the alignment ends for the subject domain (sequence position)

property s_fold_id¶: The fold the query belongs to

property s_scop_id¶: The SCOP family the subject belongs to

property s_start¶: Residue where the alignment starts for the subject domain (sequence position)

property s_sufam_id¶: The superfamily the subject belongs to

property s_tm_end¶: Residue in the subject structure where the rmsd_tm_pair alignment ends

property s_tm_start¶: Residue in the subject structure where the rmsd_tm_pair alignment starts

property sbjct¶: A 7-letter SCOP95 code for the subject domain

property score¶: The raw score is computed by the Viterbi HMM-HMM alignment excluding the secondary structure score. It is the sum of similarities of the aligned profile columns minus the position-specific gap penalties in bits.

property score_tm¶: TM-score for the rmsd_tm superposition

property score_tm_pair¶: TM-score for the rmsd_tm_pair superposition

property ss¶: The secondary structure score. This score tells you how well the PSIPRED-predicted (3-state) or actual DSSP-determined (8-state) secondary structure sequences agree with each other.

class database.data.Result(ahits: List[database.data.Hit])[source]¶

Class handling the data obtained from fuzzle

property avg_len¶: It returns the Aminoacid average of the returned hits

property ids¶: It returns all the hits IDs

property list_fams¶: It returns the list of unique folds in the hits list

property list_folds¶: It returns the list of unique folds in the hits list

property list_sufams¶: It returns the list of unique folds in the hits list

property std_len¶: It returns the Aminoacid standard deviation in the hits list

property unique_clusters¶: It returns a list of unique domains

property unique_domains¶: It returns a list of unique domains

database.data.fetch_byPDB(pdb: str, prob: int = 70, rmsd: float = 3.0, ca_min: int = 10, ca_max: int = 200, score_tm_pair: float = 0.3, ratio: float = 1.25, diff_folds: bool = True)[source]¶

Returns the entries in Fuzzle that contain the representative domains that correspond to that PDB

Parameters

prob – Lower cutoff for the hit probability
rmsd – Upper cutoff for rmsd_tm_pair
ca_min – Lower cutoff for the number of AA (ca_tm_pair)
ca_max – Upper cutoff for the number of AA (ca_tm_pair)
score_tm_pair – Lower cutoff for the tm_score
ratio – Proportion between cols/ca_tm_pair
scop_q – A SCOP class. It will retrieve hits that contains domains from this class
query – A SCOP protein domain. It will retrieve hits that contain this query

Returns

A Result object

database.data.fetch_byPDBs(pdb1: str, pdb2: str, prob: int = 70, rmsd: float = 3.0, ca_min: int = 10, ca_max: int = 200, score_tm_pair: float = 0.3, ratio: float = 1.25, diff_folds: bool = True)[source]¶

Includes all hits among the domains that belong to a pair of PDBs

Parameters

pdb1 – The first PDB to check
pdb2 – The second PDB to check
prob – the minimum allowed HHsearch probability
rmsd – The maximum allowed RMSD (rmsd_tm_pair: “RMSD for the TMalign alignment between the two domains, passing the sequence alignment as seed)
ca_min – The minimum allowed fragment length (for the TMalign alignment)
ca_max – The maximun allowed fragment length (for the TMalign alignment)
score_tm_pair – The minimum allowed TM-score (for the TMalign alignment)
ratio – the maximum ratio for the sequence and structural alignment lengths (cols / ca_tm_pair)
diff_folds – Whether to exclude hits from the same fold (True) or not (False)

Returns

A result class obtaining the hits that fulfill these criteria

database.data.fetch_by_domain(domain: str, prob: int = 70, rmsd: float = 3.0, ca_min: int = 10, ca_max: int = 200, score_tm_pair: float = 0.3, ratio: float = 1.25, diff_folds: bool = True)[source]¶

Fetch all the hits that contain a specific domain

Parameters

domain – The 7 letter code for one of the parents
prob – the minimum allowed HHsearch probability
rmsd – The maximum allowed RMSD (rmsd_tm_pair: “RMSD for the TMalign alignment between the two domains, passing the sequence alignment as seed)
ca_min – The minimum allowed fragment length (for the TMalign alignment)
ca_max – The maximun allowed fragment length (for the TMalign alignment)
score_tm_pair – The minimum allowed TM-score (for the TMalign alignment)
ratio – the maximum ratio for the sequence and structural alignment lengths (cols / ca_tm_pair)
diff_folds – Whether to exclude hits from the same fold (True) or not (False)

Returns

A result class obtaining the hits that fulfill these criteria

database.data.fetch_by_domains(domain1: str, domain2: str, prob: int = 70, rmsd: float = 3.0, ca_min: int = 10, ca_max: int = 200, score_tm_pair: float = 0.3, ratio: float = 1.25, diff_folds: bool = True)[source]¶

Fetch all the hits between two parent domains

Parameters

domain1 – The 7 letter code for one of the parents
domain2 – The 7 letter code for one of the parents
prob – the minimum allowed HHsearch probability
rmsd – The maximum allowed RMSD (rmsd_tm_pair: “RMSD for the TMalign alignment between the two domains, passing the sequence alignment as seed)
ca_min – The minimum allowed fragment length (for the TMalign alignment)
ca_max – The maximun allowed fragment length (for the TMalign alignment)
score_tm_pair – The minimum allowed TM-score (for the TMalign alignment)
ratio – the maximum ratio for the sequence and structural alignment lengths (cols / ca_tm_pair)
diff_folds – Whether to exclude hits from the same fold (True) or not (False)

Returns

A result class obtaining the hits that fulfill these criteria

database.data.fetch_group(group1, group2=None, prob: int = 70, rmsd: float = 3.0, ca_min: int = 10, ca_max: int = 200, score_tm_pair: float = 0.3, ratio: float = 1.25, diff_folds: bool = True) → database.data.Result [source]¶

Fetching all hits between two specific groups (folds, superfamilies and families) or inside one specific group (group1)

Parameters

group1 – The first group from where to search. E.g ‘c.2’
(optional) (group2) – The second group from where to search. E.g ‘c.2’
prob – the minimum allowed HHsearch probability
rmsd – The maximum allowed RMSD (rmsd_tm_pair: “RMSD for the TMalign alignment between the two domains, passing the sequence alignment as seed)
ca_min – The minimum allowed fragment length (for the TMalign alignment)
ca_max – The maximun allowed fragment length (for the TMalign alignment)
score_tm_pair – The minimum allowed TM-score (for the TMalign alignment)
ratio – the maximum ratio for the sequence and structural alignment lengths (cols / ca_tm_pair)

Returns

A Result class with the hits that fulfill these criteria

database.data.fetch_id(fuzzle_id: int) → database.data.Hit [source]¶: Returns the hit in fuzzle with that ID :param fuzzle_id: The Fuzzle HIT id to retrieve from hh207clusters :return: A Hit object

database.data.fetch_subspace(prob: int = 70, rmsd: float = 3.0, ca_min: int = 10, ca_max: int = 200, score_tm_pair: float = 0.3, ratio: float = 1.25, scop_q: Optional[str] = None, diff_folds: bool = True) → database.data.Result [source]¶

Returns the entries in Fuzzle that satisfy the conditions:

Parameters

prob – Lower cutoff for the hit probability
rmsd – Upper cutoff for rmsd_tm_pair
ca_min – Lower cutoff for the number of AA (ca_tm_pair)
ca_max – Upper cutoff for the number of AA (ca_tm_pair)
score_tm_pair – Lower cutoff for the tm_score
ratio – Proportion between cols/ca_tm_pair
scop_q – A SCOP class. It will retrieve hits that contains domains from this class

Returns

A Result object

database.data.filter_hits_domain(ahits, domain)[source]¶

Search all hits from a Result class where a certain domain appears

Parameters

ahits – An object Result
domain – a SCOPe domain identifier

Returns

np.array. The starts and ends for the domains in all the hits it appears.

database.data.parse_hit(line: List[str]) → database.data.Hit [source]¶

Parameters: line –
Returns

database.data.validate_scopid(query: str) → bool[source]¶: A SCOP domain is A 7-character sid that consists of “d” followed by the 4-character PDB ID of the file of origin, the PDB chain ID (‘_’ if none, ‘.’ if multiple as is the case in genetic domains), and a single character (usually an integer) if needed to specify the domain uniquely (‘_’ if not). Sids are currently all lower case, even when the chain letter is upper case. Examples: include d4akea1, d1reqa2, and d1cph.1. :param query: The seven letter domain for the query