lamindb.Curator¶

class lamindb.Curator¶

Bases: BaseCurator

Dataset curator.

A Curator object makes it easy to save validated & annotated artifacts.

Example:

>>> curator = ln.Curator.from_df(
>>>     df,
>>>     # define validation criteria as mappings
>>>     columns=ln.Feature.name,  # map column names
>>>     categoricals={"perturbation": ln.ULabel.name},  # map categories
>>> )
>>> curator.validate()  # validate the data in df
>>> artifact = curate.save_artifact(description="my RNA-seq")
>>> artifact.describe()  # see annotations

curator.validate() maps values within df according to the mapping criteria and logs validated & problematic values.

If you find non-validated values, you have several options:

validated values not yet in the registry can be automatically registered using add_validated_from()
new values found in the data can be registered using add_new_from()
non-validated values can be accessed using non_validated() and addressed manually

Class methods¶

classmethod from_anndata(data, var_index, categoricals=None, obs_columns=FieldAttr(Feature.name), using_key='default', verbosity='hint', organism=None, sources=None)¶

Curation flow for AnnData.

See also Curator.

Note that if genes are removed from the AnnData object, the object should be recreated using from_anndata().

See Curate AnnData based on the CELLxGENE schema for instructions on how to curate against a specific cellxgene schema version.

Parameters:

data (ad.AnnData | UPathStr) – The AnnData object or an AnnData-like path.
var_index (FieldAttr) – The registry field for mapping the .var index.
categoricals (dict[str, FieldAttr] | None, default: None) – A dictionary mapping .obs.columns to a registry field.
using_key (str, default: 'default') – A reference LaminDB instance.
verbosity (str, default: 'hint') – The verbosity level.
organism (str | None, default: None) – The organism name.
sources (dict[str, Record] | None, default: None) – A dictionary mapping .obs.columns to Source records.
exclude – A dictionary mapping column names to values to exclude.

Return type:

AnnDataCurator

Examples

>>> import bionty as bt
>>> curate = ln.Curator.from_anndata(
...     adata,
...     var_index=bt.Gene.ensembl_gene_id,
...     categoricals={
...         "cell_type_ontology_id": bt.CellType.ontology_id,
...         "donor_id": ln.ULabel.name
...     },
...     organism="human",
... )

classmethod from_df(df, categoricals=None, columns=FieldAttr(Feature.name), using_key=None, verbosity='hint', organism=None)¶

Curation flow for a DataFrame object.

See also Curator.

Note that if genes or other measurements are removed from the MuData object, the object should be recreated using from_mudata().

Parameters:

mdata (MuData) – The MuData object to curate.
var_index (dict[str, dict[str, DeferredAttribute]]) – The registry field for mapping the .var index for each modality. For example: {"modality_1": bt.Gene.ensembl_gene_id, "modality_2": ln.CellMarker.name}
categoricals (dict[str, DeferredAttribute] | None, default: None) – A dictionary mapping .obs.columns to a registry field. Use modality keys to specify categoricals for MuData slots such as "rna:cell_type": bt.CellType.name".
using_key (str, default: 'default') – A reference LaminDB instance.
verbosity (str, default: 'hint') – The verbosity level.
organism (str | None, default: None) – The organism name.
sources – A dictionary mapping .obs.columns to Source records.
exclude – A dictionary mapping column names to values to exclude.

Return type:

MuDataCurator

Examples

>>> import bionty as bt
>>> curate = ln.Curator.from_mudata(
...     mdata,
...     var_index={
...         "rna": bt.Gene.ensembl_gene_id,
...         "adt": ln.CellMarker.name
...     },
...     categoricals={
...         "cell_type_ontology_id": bt.CellType.ontology_id,
...         "donor_id": ln.ULabel.name
...     },
...     organism="human",
... )

Methods¶

save_artifact(description=None, key=None, revises=None, run=None)¶

Save the dataset as artifact.

Parameters:

description (str | None, default: None) – str | None = None A description of the DataFrame object.
key (str | None, default: None) – str | None = None A path-like key to reference artifact in default storage, e.g., "myfolder/myfile.fcs". Artifacts with the same key form a revision family.
revises (Artifact | None, default: None) – Artifact | None = None Previous version of the artifact. Triggers a revision.
run (Run | None, default: None) – Run | None = None The run that creates the artifact.

Return type:

Artifact

Returns:

A saved artifact record.

validate()¶

Validate dataset.

Return type:: bool
Returns:: Boolean indicating whether the dataset is validated.