We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Currently the only way to get genomic positions is through reading a GTF file. This is (a) slow and (b) gtfparse repeatedly makes problems.
It could be more conveniente to retrieve this information from online sources such as biomart or Bioconductor AnnotationHub.
Then gtfparse could become an optional dependency.
The text was updated successfully, but these errors were encountered:
This works very well for me as well https://scanpy.readthedocs.io/en/stable/generated/scanpy.queries.biomart_annotations.html
def query_biomart() -> pd.DataFrame: """ Extract gene annotations from Biomart. Parameters ---------- index_key : str, optional Index key for the DataFrame. Returns ------- pd.DataFrame DataFrame with gene annotations from Biomart. """ annot = sc.queries.biomart_annotations( "hsapiens", [ "ensembl_gene_id", "hgnc_symbol", "start_position", "end_position", "chromosome_name", ], use_cache=True, ).rename( columns={ "ensembl_gene_id": "gene_ids", "hgnc_symbol": "gene_symbol", "start_position": "start", "end_position": "end", "chromosome_name": "chromosome", } ) return annot def annotate_var( adata: AnnData, annotation: pd.DataFrame, index_key: str = "gene_ids" ) -> None: """ Annotate the features with in an AnnData object. Parameters ---------- adata : AnnData Input AnnData object. annotation : pd.DataFrame Gene annotation DataFrame. index_key : str, optional Index key for the DataFrame. """ for col in ["start", "end", "chromosome", index_key]: assert ( col in annotation.columns ), f"Annotation DataFrame must contain the column named `{col}`." for col in annotation: var_dict = annotation[col].to_dict() adata.var[col] = [ var_dict[x] if x in var_dict else None for x in adata.var[index_key] ]
Sorry, something went wrong.
very nice 🤩
No branches or pull requests
Description of feature
Currently the only way to get genomic positions is through reading a GTF file. This is (a) slow and (b) gtfparse repeatedly makes problems.
It could be more conveniente to retrieve this information from online sources such as biomart or Bioconductor AnnotationHub.
Then gtfparse could become an optional dependency.
The text was updated successfully, but these errors were encountered: