lamindb.Curator¶
- class lamindb.Curator¶
Bases:
BaseCurator
Dataset curator.
A
Curator
object makes it easy to save validated & annotated artifacts.Example:
>>> curator = ln.Curator.from_df( >>> df, >>> # define validation criteria as mappings >>> columns=ln.Feature.name, # map column names >>> categoricals={"perturbation": ln.ULabel.name}, # map categories >>> ) >>> curator.validate() # validate the data in df >>> artifact = curate.save_artifact(description="my RNA-seq") >>> artifact.describe() # see annotations
curator.validate()
maps values withindf
according to the mapping criteria and logs validated & problematic values.If you find non-validated values, you have several options:
validated values not yet in the registry can be automatically registered using
add_validated_from()
new values found in the data can be registered using
add_new_from()
non-validated values can be accessed using
non_validated()
and addressed manually
Class methods¶
- classmethod from_anndata(data, var_index, categoricals=None, obs_columns=FieldAttr(Feature.name), using_key='default', verbosity='hint', organism=None, sources=None)¶
Curation flow for
AnnData
.See also
Curator
.Note that if genes are removed from the AnnData object, the object should be recreated using
from_anndata()
.See Curate AnnData based on the CELLxGENE schema for instructions on how to curate against a specific cellxgene schema version.
- Parameters:
data (ad.AnnData | UPathStr) – The AnnData object or an AnnData-like path.
var_index (FieldAttr) – The registry field for mapping the
.var
index.categoricals (dict[str, FieldAttr] | None, default:
None
) – A dictionary mapping.obs.columns
to a registry field.using_key (str, default:
'default'
) – A reference LaminDB instance.verbosity (str, default:
'hint'
) – The verbosity level.organism (str | None, default:
None
) – The organism name.sources (dict[str, Record] | None, default:
None
) – A dictionary mapping.obs.columns
to Source records.exclude – A dictionary mapping column names to values to exclude.
- Return type:
AnnDataCurator
Examples
>>> import bionty as bt >>> curate = ln.Curator.from_anndata( ... adata, ... var_index=bt.Gene.ensembl_gene_id, ... categoricals={ ... "cell_type_ontology_id": bt.CellType.ontology_id, ... "donor_id": ln.ULabel.name ... }, ... organism="human", ... )
- classmethod from_df(df, categoricals=None, columns=FieldAttr(Feature.name), using_key=None, verbosity='hint', organism=None)¶
Curation flow for a DataFrame object.
See also
Curator
.- Parameters:
df (
DataFrame
) – The DataFrame object to curate.columns (
DeferredAttribute
, default:FieldAttr(Feature.name)
) – The field attribute for the feature column.categoricals (
dict
[str
,DeferredAttribute
] |None
, default:None
) – A dictionary mapping column names to registry_field.using_key (
str
|None
, default:None
) – The reference instance containing registries to validate against.verbosity (
str
, default:'hint'
) – The verbosity level.organism (
str
|None
, default:None
) – The organism name.sources – A dictionary mapping column names to Source records.
exclude – A dictionary mapping column names to values to exclude.
- Return type:
Examples
>>> import bionty as bt >>> curate = ln.Curator.from_df( ... df, ... categoricals={ ... "cell_type_ontology_id": bt.CellType.ontology_id, ... "donor_id": ln.ULabel.name ... } ... )
- classmethod from_mudata(mdata, var_index, categoricals=None, using_key='default', verbosity='hint', organism=None)¶
Curation flow for a
MuData
object.See also
Curator
.Note that if genes or other measurements are removed from the MuData object, the object should be recreated using
from_mudata()
.- Parameters:
mdata (
MuData
) – The MuData object to curate.var_index (
dict
[str
,dict
[str
,DeferredAttribute
]]) – The registry field for mapping the.var
index for each modality. For example:{"modality_1": bt.Gene.ensembl_gene_id, "modality_2": ln.CellMarker.name}
categoricals (
dict
[str
,DeferredAttribute
] |None
, default:None
) – A dictionary mapping.obs.columns
to a registry field. Use modality keys to specify categoricals for MuData slots such as"rna:cell_type": bt.CellType.name"
.using_key (
str
, default:'default'
) – A reference LaminDB instance.verbosity (
str
, default:'hint'
) – The verbosity level.organism (
str
|None
, default:None
) – The organism name.sources – A dictionary mapping
.obs.columns
to Source records.exclude – A dictionary mapping column names to values to exclude.
- Return type:
Examples
>>> import bionty as bt >>> curate = ln.Curator.from_mudata( ... mdata, ... var_index={ ... "rna": bt.Gene.ensembl_gene_id, ... "adt": ln.CellMarker.name ... }, ... categoricals={ ... "cell_type_ontology_id": bt.CellType.ontology_id, ... "donor_id": ln.ULabel.name ... }, ... organism="human", ... )
Methods¶
- save_artifact(description=None, key=None, revises=None, run=None)¶
Save the dataset as artifact.
- Parameters:
description (
str
|None
, default:None
) –str | None = None
A description of the DataFrame object.key (
str
|None
, default:None
) –str | None = None
A path-like key to reference artifact in default storage, e.g.,"myfolder/myfile.fcs"
. Artifacts with the same key form a revision family.revises (
Artifact
|None
, default:None
) –Artifact | None = None
Previous version of the artifact. Triggers a revision.run (
Run
|None
, default:None
) –Run | None = None
The run that creates the artifact.
- Return type:
- Returns:
A saved artifact record.
- validate()¶
Validate dataset.
- Return type:
bool
- Returns:
Boolean indicating whether the dataset is validated.