leaf_engine.etl.cluster.cluster_schema ====================================== .. py:module:: leaf_engine.etl.cluster.cluster_schema Attributes ---------- .. autoapisummary:: leaf_engine.etl.cluster.cluster_schema.ClusterSchemaException leaf_engine.etl.cluster.cluster_schema.schema Classes ------- .. autoapisummary:: leaf_engine.etl.cluster.cluster_schema.BiDict leaf_engine.etl.cluster.cluster_schema.ClusterSchema Functions --------- .. autoapisummary:: leaf_engine.etl.cluster.cluster_schema._get_clusters Module Contents --------------- .. py:class:: BiDict(*args, **kwargs) Bases: :py:obj:`dict` Bi-directional dict that allows fast lookup of both keys and values. NOTE that on instances of this class _NOT_ all dict methods are available or work as expected on a regular dict (e.g., pop, update do not work). .. py:method:: __delitem__(key) Delete self[key]. .. py:method:: __setitem__(key, value) Set self[key] to value. .. py:attribute:: inverse .. py:class:: ClusterSchema .. py:method:: get_cluster_geometry(cluster_uuid: str, as_string: bool = True) -> Optional[Union[str, shapely.geometry.Polygon]] .. py:method:: get_cluster_h3_indices(cluster_uuid: str) -> Optional[Iterable[str]] .. py:method:: get_cluster_uuid(point_h3_index: str) -> Optional[str] .. py:method:: init_from_df(df: pandas.DataFrame, company_id: int, record_type: str, name: str = 'default') -> None .. py:method:: init_local(company_id: int, record_type: str, name: str = 'default') -> None .. py:method:: init_remote(company_id: int, record_type: str, name: str = 'default') -> None .. py:method:: is_empty() .. py:method:: plot(color: str = 'blue') .. py:method:: reset(df: pandas.DataFrame) -> None Removes all entries from the cluster schema mapping and resets the mapping using values from shipments DataFrame. :param df: Shipments DataFrame. :type df: pd.DataFrame .. py:method:: save(batch_date: str) -> None .. py:method:: update(df: pandas.DataFrame) -> None Updates existing entries in the cluster schema using values from shipments DataFrame. Only clusters that are not present in the existing mapping are set, and they have lower priority than existing clusters (i.e., overlaps with existing clusters are decided in favour of existing clusters, existing clusters are not modified in any way). :param df: Shipments DataFrame. :type df: pd.DataFrame .. py:attribute:: cluster_schema :type: BiDict .. py:attribute:: company_id :type: Optional[int] :value: None .. py:attribute:: created_at :type: Optional[datetime.datetime] :value: None .. py:attribute:: id :type: Optional[str] :value: None .. py:attribute:: name :type: Optional[str] :value: None .. py:attribute:: priority :type: int :value: 0 .. py:attribute:: record_type :type: Optional[str] :value: None .. py:attribute:: updated_at :type: Optional[datetime.datetime] :value: None .. py:function:: _get_clusters(locations_df: pandas.DataFrame) -> pandas.DataFrame Computes cluster geometries given a locations DataFrame, together with the number of shipments per cluster. Note that the input locations DataFrame should not contain unique locations (i.e., should be the output of `to_locations`) so that row counts reflect the number of shipments associated with a location. :param locations_df: Non-unique locations returned by `to_locations`. :type locations_df: pd.DataFrame :returns: Clusters DataFrame with `cluster_geometry_string` and `total_shipments` columns (and `cluster_id` as index). :rtype: pd.DataFrame .. py:data:: ClusterSchemaException .. py:data:: schema