leaf_engine.etl.cluster.cluster_schema

Attributes

ClusterSchemaException

schema

Classes

BiDict

Bi-directional dict that allows fast lookup of both keys and values.

ClusterSchema

Functions

_get_clusters(→ pandas.DataFrame)

Computes cluster geometries given a locations DataFrame, together with the

Module Contents

class leaf_engine.etl.cluster.cluster_schema.BiDict(*args, **kwargs)

Bases: dict

Bi-directional dict that allows fast lookup of both keys and values.

NOTE that on instances of this class _NOT_ all dict methods are available or work as expected on a regular dict (e.g., pop, update do not work).

__delitem__(key)

Delete self[key].

__setitem__(key, value)

Set self[key] to value.

inverse
class leaf_engine.etl.cluster.cluster_schema.ClusterSchema
get_cluster_geometry(cluster_uuid: str, as_string: bool = True) str | shapely.geometry.Polygon | None
Parameters:
  • cluster_uuid (str) –

  • as_string (bool) –

Return type:

Optional[Union[str, shapely.geometry.Polygon]]

get_cluster_h3_indices(cluster_uuid: str) Iterable[str] | None
Parameters:

cluster_uuid (str) –

Return type:

Optional[Iterable[str]]

get_cluster_uuid(point_h3_index: str) str | None
Parameters:

point_h3_index (str) –

Return type:

Optional[str]

init_from_df(df: pandas.DataFrame, company_id: int, record_type: str, name: str = 'default') None
Parameters:
Return type:

None

init_local(company_id: int, record_type: str, name: str = 'default') None
Parameters:
  • company_id (int) –

  • record_type (str) –

  • name (str) –

Return type:

None

init_remote(company_id: int, record_type: str, name: str = 'default') None
Parameters:
  • company_id (int) –

  • record_type (str) –

  • name (str) –

Return type:

None

is_empty()
plot(color: str = 'blue')
Parameters:

color (str) –

reset(df: pandas.DataFrame) None

Removes all entries from the cluster schema mapping and resets the mapping using values from shipments DataFrame.

Parameters:

df (pd.DataFrame) – Shipments DataFrame.

Return type:

None

save(batch_date: str) None
Parameters:

batch_date (str) –

Return type:

None

update(df: pandas.DataFrame) None

Updates existing entries in the cluster schema using values from shipments DataFrame. Only clusters that are not present in the existing mapping are set, and they have lower priority than existing clusters (i.e., overlaps with existing clusters are decided in favour of existing clusters, existing clusters are not modified in any way).

Parameters:

df (pd.DataFrame) – Shipments DataFrame.

Return type:

None

cluster_schema: BiDict
company_id: int | None = None
created_at: datetime.datetime | None = None
id: str | None = None
name: str | None = None
priority: int = 0
record_type: str | None = None
updated_at: datetime.datetime | None = None
leaf_engine.etl.cluster.cluster_schema._get_clusters(locations_df: pandas.DataFrame) pandas.DataFrame

Computes cluster geometries given a locations DataFrame, together with the number of shipments per cluster. Note that the input locations DataFrame should not contain unique locations (i.e., should be the output of to_locations) so that row counts reflect the number of shipments associated with a location.

Parameters:

locations_df (pd.DataFrame) – Non-unique locations returned by to_locations.

Returns:

Clusters DataFrame with cluster_geometry_string and total_shipments columns (and cluster_id as index).

Return type:

pd.DataFrame

leaf_engine.etl.cluster.cluster_schema.ClusterSchemaException
leaf_engine.etl.cluster.cluster_schema.schema