lamindb.core.DataFrameCurator

class lamindb.core.DataFrameCurator(df, columns=FieldAttr(Feature.name), categoricals=None, using_key=None, verbosity='hint', organism=None, sources=None, exclude=None, check_valid_keys=True)

Bases: BaseCurator

Curation flow for a DataFrame object.

See also Curator.

Parameters:
  • df (DataFrame) – The DataFrame object to curate.

  • columns (DeferredAttribute, default: FieldAttr(Feature.name)) – The field attribute for the feature column.

  • categoricals (dict[str, DeferredAttribute] | None, default: None) – A dictionary mapping column names to registry_field.

  • using_key (str | None, default: None) – The reference instance containing registries to validate against.

  • verbosity (str, default: 'hint') – The verbosity level.

  • organism (str | None, default: None) – The organism name.

  • sources (dict[str, Record] | None, default: None) – A dictionary mapping column names to Source records.

  • exclude (dict | None, default: None) – A dictionary mapping column names to values to exclude.

Examples

>>> import bionty as bt
>>> curate = ln.Curator.from_df(
...     df,
...     categoricals={
...         "cell_type_ontology_id": bt.CellType.ontology_id,
...         "donor_id": ln.ULabel.name
...     }
... )

Attributes

property fields: dict

Return the columns fields to validate against.

property non_validated: list

Return the non-validated features and labels.

Methods

add_new_from(key, organism=None, **kwargs)

Add validated & new categories.

Parameters:
  • key (str) – The key referencing the slot in the DataFrame from which to draw terms.

  • organism (str | None, default: None) – The organism name.

  • **kwargs – Additional keyword arguments to pass to the registry model.

add_new_from_columns(organism=None, **kwargs)

Add validated & new column names to its registry.

Parameters:
  • organism (str | None, default: None) – The organism name.

  • **kwargs – Additional keyword arguments to pass to the registry model.

add_validated_from(key, organism=None)

Add validated categories.

Parameters:
  • key (str) – The key referencing the slot in the DataFrame.

  • organism (str | None, default: None) – The organism name.

clean_up_failed_runs()

Clean up previous failed runs that don’t save any outputs.

lookup(using_key=None, public=False)

Lookup categories.

Parameters:

using_key (str | None, default: None) – The instance where the lookup is performed. if None (default), the lookup is performed on the instance specified in “using_key” parameter of the validator. if “public”, the lookup is performed on the public reference.

Return type:

CurateLookup

save_artifact(description=None, key=None, revises=None, run=None)

Save the validated DataFrame and metadata.

Parameters:
  • description (str | None, default: None) – str | None = None Description of the DataFrame object.

  • key (str | None, default: None) – str | None = None A path-like key to reference artifact in default storage, e.g., "myfolder/myfile.fcs". Artifacts with the same key form a revision family.

  • revises (Artifact | None, default: None) – Artifact | None = None Previous version of the artifact. Triggers a revision.

  • run (Run | None, default: None) – Run | None = None The run that creates the artifact.

Return type:

Artifact

Returns:

A saved artifact record.

validate(organism=None)

Validate variables and categorical observations.

Parameters:

organism (str | None, default: None) – The organism name.

Return type:

bool

Returns:

Whether the DataFrame is validated.