lamindb.core.DataFrameCurator¶
- class lamindb.core.DataFrameCurator(df, columns=FieldAttr(Feature.name), categoricals=None, using_key=None, verbosity='hint', organism=None, sources=None, exclude=None, check_valid_keys=True)¶
Bases:
BaseCurator
Curation flow for a DataFrame object.
See also
Curator
.- Parameters:
df (
DataFrame
) – The DataFrame object to curate.columns (
DeferredAttribute
, default:FieldAttr(Feature.name)
) – The field attribute for the feature column.categoricals (
dict
[str
,DeferredAttribute
] |None
, default:None
) – A dictionary mapping column names to registry_field.using_key (
str
|None
, default:None
) – The reference instance containing registries to validate against.verbosity (
str
, default:'hint'
) – The verbosity level.organism (
str
|None
, default:None
) – The organism name.sources (
dict
[str
,Record
] |None
, default:None
) – A dictionary mapping column names to Source records.exclude (
dict
|None
, default:None
) – A dictionary mapping column names to values to exclude.
Examples
>>> import bionty as bt >>> curate = ln.Curator.from_df( ... df, ... categoricals={ ... "cell_type_ontology_id": bt.CellType.ontology_id, ... "donor_id": ln.ULabel.name ... } ... )
Attributes¶
- property fields: dict¶
Return the columns fields to validate against.
- property non_validated: list¶
Return the non-validated features and labels.
Methods¶
- add_new_from(key, organism=None, **kwargs)¶
Add validated & new categories.
- Parameters:
key (
str
) – The key referencing the slot in the DataFrame from which to draw terms.organism (
str
|None
, default:None
) – The organism name.**kwargs – Additional keyword arguments to pass to the registry model.
- add_new_from_columns(organism=None, **kwargs)¶
Add validated & new column names to its registry.
- Parameters:
organism (
str
|None
, default:None
) – The organism name.**kwargs – Additional keyword arguments to pass to the registry model.
- add_validated_from(key, organism=None)¶
Add validated categories.
- Parameters:
key (
str
) – The key referencing the slot in the DataFrame.organism (
str
|None
, default:None
) – The organism name.
- clean_up_failed_runs()¶
Clean up previous failed runs that don’t save any outputs.
- lookup(using_key=None, public=False)¶
Lookup categories.
- Parameters:
using_key (
str
|None
, default:None
) – The instance where the lookup is performed. if None (default), the lookup is performed on the instance specified in “using_key” parameter of the validator. if “public”, the lookup is performed on the public reference.- Return type:
- save_artifact(description=None, key=None, revises=None, run=None)¶
Save the validated DataFrame and metadata.
- Parameters:
description (
str
|None
, default:None
) –str | None = None
Description of the DataFrame object.key (
str
|None
, default:None
) –str | None = None
A path-like key to reference artifact in default storage, e.g.,"myfolder/myfile.fcs"
. Artifacts with the same key form a revision family.revises (
Artifact
|None
, default:None
) –Artifact | None = None
Previous version of the artifact. Triggers a revision.run (
Run
|None
, default:None
) –Run | None = None
The run that creates the artifact.
- Return type:
- Returns:
A saved artifact record.
- validate(organism=None)¶
Validate variables and categorical observations.
- Parameters:
organism (
str
|None
, default:None
) – The organism name.- Return type:
bool
- Returns:
Whether the DataFrame is validated.