lamindb.Transform¶
- class lamindb.Transform(name: str, key: str | None = None, type: Literal['pipeline', 'notebook', 'upload', 'script', 'function', 'glue'] | None = None, revises: Transform | None = None)¶
Bases:
Record
,IsVersioned
Data transformations.
A “transform” can refer to a Python function, a script, a notebook, or a pipeline. If you execute a transform, you generate a run (
Run
). A run has inputs and outputs.A pipeline is typically created with a workflow tool (Nextflow, Snakemake, Prefect, Flyte, MetaFlow, redun, Airflow, …) and stored in a versioned repository.
Transforms are versioned so that a given transform version maps on a given source code version.
Can I sync transforms to git?
If you switch on
sync_git_repo
a script-like transform is synched to its hashed state in a git repository upon callingln.track()
.>>> ln.settings.sync_git_repo = "https://github.com/laminlabs/lamindb" >>> ln.track()
The definition of transforms and runs is consistent the OpenLineage specification where a
Transform
record would be called a “job” and aRun
record a “run”.- Parameters:
name –
str
A name or title.key –
str | None = None
A short name or path-like semantic key.type –
TransformType | None = "pipeline"
SeeTransformType
.revises –
Transform | None = None
An old version of the transform.
Notes
Examples
Create a transform for a pipeline:
>>> transform = ln.Transform(name="Cell Ranger", version="7.2.0", type="pipeline").save()
Create a transform from a notebook:
>>> ln.track()
View predecessors of a transform:
>>> transform.view_lineage()
Attributes¶
Simple fields¶
-
uid:
str
¶ Universal id.
-
name:
str
¶ A name or title. For instance, a pipeline name, notebook title, etc.
-
key:
str
¶ A key for concise reference & versioning (optional).
-
description:
str
¶ A description (optional).
-
type:
Literal
['pipeline'
,'notebook'
,'upload'
,'script'
,'function'
,'glue'
]¶ TransformType
(default"pipeline"
).
-
source_code:
str
|None
¶ Source code of the transform.
Changed in version 0.75: The
source_code
field is no longer an artifact, but a text field.
-
hash:
str
|None
¶ A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
-
reference:
str
¶ Reference for the transform, e.g.. URL.
-
reference_type:
str
¶ A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
-
created_at:
datetime
¶ Time of creation of record.
-
updated_at:
datetime
¶ Time of last update to record.
-
version:
str
¶ Version (default
None
).Defines version of a family of records characterized by the same
stem_uid
.Consider using semantic versioning with Python versioning.
-
is_latest:
bool
¶ Boolean flag that indicates whether a record is the latest in its version family.
Relational fields¶
-
predecessors:
Transform
¶ Preceding transforms.
These are auto-populated whenever an artifact or collection serves as a run input, e.g.,
artifact.run
andartifact.transform
get populated & saved.The table provides a convenience method to query for the predecessors that bypassed querying the
Run
.
-
successors:
Transform
¶ Subsequent transforms.
See
predecessors
.
-
output_artifacts:
Artifact
¶ The artifacts generated by all runs of this transform.
If you’re looking for the outputs of a single run, see
lamindb.Run.output_artifacts
.
-
output_collections:
Collection
¶ The collections generated by all runs of this transform.
If you’re looking for the outputs of a single run, see
lamindb.Run.output_collections
.
Class methods¶
- classmethod df(include=None, join='inner', limit=100)¶
Convert to
pd.DataFrame
.By default, shows all direct fields, except
created_at
.If you’d like to include related fields, use parameter
include
.- Parameters:
include (
str
|list
[str
] |None
, default:None
) – Related fields to include as columns. Takes strings of form"labels__name"
,"cell_types__name"
, etc. or a list of such strings.join (
str
, default:'inner'
) – Thejoin
parameter ofpandas
.
- Return type:
DataFrame
Examples
>>> labels = [ln.ULabel(name="Label {i}") for i in range(3)] >>> ln.save(labels) >>> ln.ULabel.filter().df(include=["created_by__name"])
- classmethod filter(*queries, **expressions)¶
Query records.
- Parameters:
queries – One or multiple
Q
objects.expressions – Fields and values passed as Django query expressions.
- Return type:
QuerySet
- Returns:
A
QuerySet
.
See also
Guide: Query & search registries
Django documentation: Queries
Examples
>>> ln.ULabel(name="my ulabel").save() >>> ulabel = ln.ULabel.get(name="my ulabel")
- classmethod get(idlike=None, **expressions)¶
Get a single record.
- Parameters:
idlike (
int
|str
|None
, default:None
) – Either a uid stub, uid or an integer id.expressions – Fields and values passed as Django query expressions.
- Return type:
- Returns:
A record.
- Raises:
lamindb.core.exceptions.DoesNotExist – In case no matching record is found.
See also
Guide: Query & search registries
Django documentation: Queries
Examples
>>> ulabel = ln.ULabel.get("2riu039") >>> ulabel = ln.ULabel.get(name="my-label")
- classmethod lookup(field=None, return_field=None)¶
Return an auto-complete object for a field.
- Parameters:
field (
str
|DeferredAttribute
|None
, default:None
) – The field to look up the values for. Defaults to first string field.return_field (
str
|DeferredAttribute
|None
, default:None
) – The field to return. IfNone
, returns the whole record.
- Return type:
NamedTuple
- Returns:
A
NamedTuple
of lookup information of the field values with a dictionary converter.
See also
Examples
>>> import bionty as bt >>> bt.settings.organism = "human" >>> bt.Gene.from_source(symbol="ADGB-DT").save() >>> lookup = bt.Gene.lookup() >>> lookup.adgb_dt >>> lookup_dict = lookup.dict() >>> lookup_dict['ADGB-DT'] >>> lookup_by_ensembl_id = bt.Gene.lookup(field="ensembl_gene_id") >>> genes.ensg00000002745 >>> lookup_return_symbols = bt.Gene.lookup(field="ensembl_gene_id", return_field="symbol")
- classmethod search(string, *, field=None, limit=20, case_sensitive=False)¶
Search.
- Parameters:
string (
str
) – The input string to match against the field ontology values.field (
str
|DeferredAttribute
|None
, default:None
) – The field or fields to search. Search all string fields by default.limit (
int
|None
, default:20
) – Maximum amount of top results to return.case_sensitive (
bool
, default:False
) – Whether the match is case sensitive.
- Return type:
QuerySet
- Returns:
A sorted
DataFrame
of search results with a score in columnscore
. Ifreturn_queryset
isTrue
.QuerySet
.
Examples
>>> ulabels = ln.ULabel.from_values(["ULabel1", "ULabel2", "ULabel3"], field="name") >>> ln.save(ulabels) >>> ln.ULabel.search("ULabel2")
- classmethod using(instance)¶
Use a non-default LaminDB instance.
- Parameters:
instance (
str
|None
) – An instance identifier of form “account_handle/instance_name”.- Return type:
QuerySet
Examples
>>> ln.ULabel.using("account_handle/instance_name").search("ULabel7", field="name") uid score name ULabel7 g7Hk9b2v 100.0 ULabel5 t4Jm6s0q 75.0 ULabel6 r2Xw8p1z 75.0
Methods¶
- delete()¶
- Return type:
None
- view_lineage(with_successors=False, distance=5)¶
- async adelete(using=None, keep_parents=False)¶
- async arefresh_from_db(using=None, fields=None, from_queryset=None)¶
- async asave(*args, force_insert=False, force_update=False, using=None, update_fields=None)¶
- clean()¶
Hook for doing any extra model-wide validation after clean() has been called on every field by self.clean_fields. Any ValidationError raised by this method will not be associated with a particular field; it will have a special-case association with the field defined by NON_FIELD_ERRORS.
- clean_fields(exclude=None)¶
Clean all fields and raise a ValidationError containing a dict of all validation errors if any occur.
- date_error_message(lookup_type, field_name, unique_for)¶
- get_constraints()¶
- get_deferred_fields()¶
Return a set containing names of deferred fields for this instance.
- prepare_database_save(field)¶
- refresh_from_db(using=None, fields=None, from_queryset=None)¶
Reload field values from the database.
By default, the reloading happens from the database this instance was loaded from, or by the read router if this instance wasn’t loaded from any database. The using parameter will override the default.
Fields can be used to specify which fields to reload. The fields should be an iterable of field attnames. If fields is None, then all non-deferred fields are reloaded.
When accessing deferred fields of an instance, the deferred loading of the field will call this method.
- save_base(raw=False, force_insert=False, force_update=False, using=None, update_fields=None)¶
Handle the parts of saving which should be done only once per save, yet need to be done in raw saves, too. This includes some sanity checks and signal sending.
The ‘raw’ argument is telling save_base not to save any parent models and not to do any changes to the values before save. This is used by fixture loading.
- serializable_value(field_name)¶
Return the value of the field name for this instance. If the field is a foreign key, return the id value instead of the object. If there’s no Field object with this name on the model, return the model attribute’s value.
Used to serialize a field’s value (in the serializer, or form output, for example). Normally, you would just access the attribute directly and not use this method.
- unique_error_message(model_class, unique_check)¶
- validate_constraints(exclude=None)¶
- validate_unique(exclude=None)¶
Check unique constraints on the model and raise ValidationError if any failed.