leaf_engine.etl.cluster.cluster_schema
Attributes
Classes
Bi-directional dict that allows fast lookup of both keys and values. |
|
Functions
|
Computes cluster geometries given a locations DataFrame, together with the |
Module Contents
- class leaf_engine.etl.cluster.cluster_schema.BiDict(*args, **kwargs)
Bases:
dictBi-directional dict that allows fast lookup of both keys and values.
NOTE that on instances of this class _NOT_ all dict methods are available or work as expected on a regular dict (e.g., pop, update do not work).
- __delitem__(key)
Delete self[key].
- __setitem__(key, value)
Set self[key] to value.
- inverse
- class leaf_engine.etl.cluster.cluster_schema.ClusterSchema
- get_cluster_geometry(cluster_uuid: str, as_string: bool = True) str | shapely.geometry.Polygon | None
- init_from_df(df: pandas.DataFrame, company_id: int, record_type: str, name: str = 'default') None
- Parameters:
df (pandas.DataFrame) –
company_id (int) –
record_type (str) –
name (str) –
- Return type:
None
- is_empty()
- reset(df: pandas.DataFrame) None
Removes all entries from the cluster schema mapping and resets the mapping using values from shipments DataFrame.
- Parameters:
df (pd.DataFrame) – Shipments DataFrame.
- Return type:
None
- update(df: pandas.DataFrame) None
Updates existing entries in the cluster schema using values from shipments DataFrame. Only clusters that are not present in the existing mapping are set, and they have lower priority than existing clusters (i.e., overlaps with existing clusters are decided in favour of existing clusters, existing clusters are not modified in any way).
- Parameters:
df (pd.DataFrame) – Shipments DataFrame.
- Return type:
None
- created_at: datetime.datetime | None = None
- updated_at: datetime.datetime | None = None
- leaf_engine.etl.cluster.cluster_schema._get_clusters(locations_df: pandas.DataFrame) pandas.DataFrame
Computes cluster geometries given a locations DataFrame, together with the number of shipments per cluster. Note that the input locations DataFrame should not contain unique locations (i.e., should be the output of to_locations) so that row counts reflect the number of shipments associated with a location.
- Parameters:
locations_df (pd.DataFrame) – Non-unique locations returned by to_locations.
- Returns:
Clusters DataFrame with cluster_geometry_string and total_shipments columns (and cluster_id as index).
- Return type:
pd.DataFrame
- leaf_engine.etl.cluster.cluster_schema.ClusterSchemaException
- leaf_engine.etl.cluster.cluster_schema.schema