Fetching hits from Fuzzle¶
-
class
database.data.
Hit
(id: int, query: str, q_scop_id: str, no: int, sbjct: str, s_scop_id: str, s_desc: str, prob: float, eval: float, pval: float, score: float, ss: float, cols: int, q_start: int, q_end: int, s_start: int, s_end: int, hmm: int, ident: float, q_sufam_id: str, s_sufam_id: str, q_fold_id: str, s_fold_id: str, rmsd_pair: float, ca_pair: int, rmsd_tm_pair: float, score_tm_pair: float, ca_tm_pair: int, rmsd_tm: float, score_tm: float, ca_tm: int, q_tm_start: int, q_tm_end: int, s_tm_start: int, s_tm_end: int, q_cluster: str, s_cluster: str)[source]¶ Some of the documentation of this function was taken from the hhsuite python documentation: https://github.com/soedinglab/hh-suite/wiki as the sequence information from the Fuzzle hits come from HHsearch. The structural superimpositions were performed with TMalign: https://zhanglab.ccmb.med.umich.edu/TM-align/
-
property
ca_pair
¶ The number of alpha carbon pairs that were used for the rmsd_pair calculation.
-
property
ca_tm
¶ The number of alpha carbon pairs that were used for the rmsd_tm calculation
-
property
ca_tm_pair
¶ The number of alpha carbon pairs that were used for the rmsd_tm_pair calculation
-
property
cols
¶ The number of aligned Match columns in the HMM-HMM alignment.
-
property
eval
¶ E-value
-
property
hmm
¶ int
-
property
id
¶ The database id for this hit
-
property
ident
¶ Identity % for the sequence alignment
-
property
no
¶ The HHsearch hit number for this query
-
property
prob
¶ HHsearch probability
-
property
pval
¶ p-value
-
property
q_cluster
¶ Query cluster where this fragment belongs to.
-
property
q_end
¶ Residue where the alignment ends for the query domain (sequence position)
-
property
q_fold_id
¶ The fold the query belongs to
-
property
q_scop_id
¶ The SCOP family the query belongs to
-
property
q_start
¶ Residue where the alignment starts for the query domain (sequence position)
-
property
q_sufam_id
¶ The superfamily the query belongs to
-
property
q_tm_end
¶ Residue in the query structure where the rmsd_tm_pair alignment ends
-
property
q_tm_start
¶ Residue in the query structure where the rmsd_tm_pair alignment starts
-
property
query
¶ The 7-letter SCOP95 code for the query domain
-
property
rmsd_pair
¶ RMSD for the alignment between the two domains, strictly taking the alpha carbons from the structures that exactly appear in the HHsearch sequence alignment
-
property
rmsd_tm
¶ RMSD for the TMalign alignment between the two domains without seed
-
property
rmsd_tm_pair
¶ RMSD for the TMalign alignment between the two domains, passing the sequence alignment as seed
-
property
s_cluster
¶ Subject cluster where this fragment belongs to
-
property
s_desc
¶ Description of the subject domain.
-
property
s_end
¶ Residue where the alignment ends for the subject domain (sequence position)
-
property
s_fold_id
¶ The fold the query belongs to
-
property
s_scop_id
¶ The SCOP family the subject belongs to
-
property
s_start
¶ Residue where the alignment starts for the subject domain (sequence position)
-
property
s_sufam_id
¶ The superfamily the subject belongs to
-
property
s_tm_end
¶ Residue in the subject structure where the rmsd_tm_pair alignment ends
-
property
s_tm_start
¶ Residue in the subject structure where the rmsd_tm_pair alignment starts
-
property
sbjct
¶ A 7-letter SCOP95 code for the subject domain
-
property
score
¶ The raw score is computed by the Viterbi HMM-HMM alignment excluding the secondary structure score. It is the sum of similarities of the aligned profile columns minus the position-specific gap penalties in bits.
-
property
score_tm
¶ TM-score for the rmsd_tm superposition
-
property
score_tm_pair
¶ TM-score for the rmsd_tm_pair superposition
-
property
ss
¶ The secondary structure score. This score tells you how well the PSIPRED-predicted (3-state) or actual DSSP-determined (8-state) secondary structure sequences agree with each other.
-
property
-
class
database.data.
Result
(ahits: List[database.data.Hit])[source]¶ Class handling the data obtained from fuzzle
-
property
avg_len
¶ It returns the Aminoacid average of the returned hits
-
property
ids
¶ It returns all the hits IDs
-
property
list_fams
¶ It returns the list of unique folds in the hits list
-
property
list_folds
¶ It returns the list of unique folds in the hits list
-
property
list_sufams
¶ It returns the list of unique folds in the hits list
-
property
std_len
¶ It returns the Aminoacid standard deviation in the hits list
-
property
unique_clusters
¶ It returns a list of unique domains
-
property
unique_domains
¶ It returns a list of unique domains
-
property
-
database.data.
fetch_byPDB
(pdb: str, prob: int = 70, rmsd: float = 3.0, ca_min: int = 10, ca_max: int = 200, score_tm_pair: float = 0.3, ratio: float = 1.25, diff_folds: bool = True)[source]¶ Returns the entries in Fuzzle that contain the representative domains that correspond to that PDB
- Parameters
prob – Lower cutoff for the hit probability
rmsd – Upper cutoff for rmsd_tm_pair
ca_min – Lower cutoff for the number of AA (ca_tm_pair)
ca_max – Upper cutoff for the number of AA (ca_tm_pair)
score_tm_pair – Lower cutoff for the tm_score
ratio – Proportion between cols/ca_tm_pair
scop_q – A SCOP class. It will retrieve hits that contains domains from this class
query – A SCOP protein domain. It will retrieve hits that contain this query
- Returns
A Result object
-
database.data.
fetch_byPDBs
(pdb1: str, pdb2: str, prob: int = 70, rmsd: float = 3.0, ca_min: int = 10, ca_max: int = 200, score_tm_pair: float = 0.3, ratio: float = 1.25, diff_folds: bool = True)[source]¶ Includes all hits among the domains that belong to a pair of PDBs
- Parameters
pdb1 – The first PDB to check
pdb2 – The second PDB to check
prob – the minimum allowed HHsearch probability
rmsd – The maximum allowed RMSD (rmsd_tm_pair: “RMSD for the TMalign alignment between the two domains, passing the sequence alignment as seed)
ca_min – The minimum allowed fragment length (for the TMalign alignment)
ca_max – The maximun allowed fragment length (for the TMalign alignment)
score_tm_pair – The minimum allowed TM-score (for the TMalign alignment)
ratio – the maximum ratio for the sequence and structural alignment lengths (cols / ca_tm_pair)
diff_folds – Whether to exclude hits from the same fold (True) or not (False)
- Returns
A result class obtaining the hits that fulfill these criteria
-
database.data.
fetch_by_domain
(domain: str, prob: int = 70, rmsd: float = 3.0, ca_min: int = 10, ca_max: int = 200, score_tm_pair: float = 0.3, ratio: float = 1.25, diff_folds: bool = True)[source]¶ Fetch all the hits that contain a specific domain
- Parameters
domain – The 7 letter code for one of the parents
prob – the minimum allowed HHsearch probability
rmsd – The maximum allowed RMSD (rmsd_tm_pair: “RMSD for the TMalign alignment between the two domains, passing the sequence alignment as seed)
ca_min – The minimum allowed fragment length (for the TMalign alignment)
ca_max – The maximun allowed fragment length (for the TMalign alignment)
score_tm_pair – The minimum allowed TM-score (for the TMalign alignment)
ratio – the maximum ratio for the sequence and structural alignment lengths (cols / ca_tm_pair)
diff_folds – Whether to exclude hits from the same fold (True) or not (False)
- Returns
A result class obtaining the hits that fulfill these criteria
-
database.data.
fetch_by_domains
(domain1: str, domain2: str, prob: int = 70, rmsd: float = 3.0, ca_min: int = 10, ca_max: int = 200, score_tm_pair: float = 0.3, ratio: float = 1.25, diff_folds: bool = True)[source]¶ Fetch all the hits between two parent domains
- Parameters
domain1 – The 7 letter code for one of the parents
domain2 – The 7 letter code for one of the parents
prob – the minimum allowed HHsearch probability
rmsd – The maximum allowed RMSD (rmsd_tm_pair: “RMSD for the TMalign alignment between the two domains, passing the sequence alignment as seed)
ca_min – The minimum allowed fragment length (for the TMalign alignment)
ca_max – The maximun allowed fragment length (for the TMalign alignment)
score_tm_pair – The minimum allowed TM-score (for the TMalign alignment)
ratio – the maximum ratio for the sequence and structural alignment lengths (cols / ca_tm_pair)
diff_folds – Whether to exclude hits from the same fold (True) or not (False)
- Returns
A result class obtaining the hits that fulfill these criteria
-
database.data.
fetch_group
(group1, group2=None, prob: int = 70, rmsd: float = 3.0, ca_min: int = 10, ca_max: int = 200, score_tm_pair: float = 0.3, ratio: float = 1.25, diff_folds: bool = True) → database.data.Result[source]¶ Fetching all hits between two specific groups (folds, superfamilies and families) or inside one specific group (group1)
- Parameters
group1 – The first group from where to search. E.g ‘c.2’
(optional) (group2) – The second group from where to search. E.g ‘c.2’
prob – the minimum allowed HHsearch probability
rmsd – The maximum allowed RMSD (rmsd_tm_pair: “RMSD for the TMalign alignment between the two domains, passing the sequence alignment as seed)
ca_min – The minimum allowed fragment length (for the TMalign alignment)
ca_max – The maximun allowed fragment length (for the TMalign alignment)
score_tm_pair – The minimum allowed TM-score (for the TMalign alignment)
ratio – the maximum ratio for the sequence and structural alignment lengths (cols / ca_tm_pair)
- Returns
A Result class with the hits that fulfill these criteria
-
database.data.
fetch_id
(fuzzle_id: int) → database.data.Hit[source]¶ Returns the hit in fuzzle with that ID :param fuzzle_id: The Fuzzle HIT id to retrieve from hh207clusters :return: A Hit object
-
database.data.
fetch_subspace
(prob: int = 70, rmsd: float = 3.0, ca_min: int = 10, ca_max: int = 200, score_tm_pair: float = 0.3, ratio: float = 1.25, scop_q: Optional[str] = None, diff_folds: bool = True) → database.data.Result[source]¶ Returns the entries in Fuzzle that satisfy the conditions:
- Parameters
prob – Lower cutoff for the hit probability
rmsd – Upper cutoff for rmsd_tm_pair
ca_min – Lower cutoff for the number of AA (ca_tm_pair)
ca_max – Upper cutoff for the number of AA (ca_tm_pair)
score_tm_pair – Lower cutoff for the tm_score
ratio – Proportion between cols/ca_tm_pair
scop_q – A SCOP class. It will retrieve hits that contains domains from this class
- Returns
A Result object
-
database.data.
filter_hits_domain
(ahits, domain)[source]¶ Search all hits from a Result class where a certain domain appears
- Parameters
ahits – An object Result
domain – a SCOPe domain identifier
- Returns
np.array. The starts and ends for the domains in all the hits it appears.
-
database.data.
parse_hit
(line: List[str]) → database.data.Hit[source]¶ - Parameters
line –
- Returns
-
database.data.
validate_scopid
(query: str) → bool[source]¶ A SCOP domain is A 7-character sid that consists of “d” followed by the 4-character PDB ID of the file of origin, the PDB chain ID (‘_’ if none, ‘.’ if multiple as is the case in genetic domains), and a single character (usually an integer) if needed to specify the domain uniquely (‘_’ if not). Sids are currently all lower case, even when the chain letter is upper case. Examples: include d4akea1, d1reqa2, and d1cph.1. :param query: The seven letter domain for the query