leaf_engine.etl.cluster.cluster_schema
======================================

.. py:module:: leaf_engine.etl.cluster.cluster_schema


Attributes
----------

.. autoapisummary::

   leaf_engine.etl.cluster.cluster_schema.ClusterSchemaException
   leaf_engine.etl.cluster.cluster_schema.schema


Classes
-------

.. autoapisummary::

   leaf_engine.etl.cluster.cluster_schema.BiDict
   leaf_engine.etl.cluster.cluster_schema.ClusterSchema


Functions
---------

.. autoapisummary::

   leaf_engine.etl.cluster.cluster_schema._get_clusters


Module Contents
---------------

.. py:class:: BiDict(*args, **kwargs)

   Bases: :py:obj:`dict`


   Bi-directional dict that allows fast lookup of both keys and values.

   NOTE that on instances of this class _NOT_ all dict methods are
   available or work as expected on a regular dict  (e.g., pop, update
   do not work).


   .. py:method:: __delitem__(key)

      Delete self[key].


   .. py:method:: __setitem__(key, value)

      Set self[key] to value.


   .. py:attribute:: inverse


.. py:class:: ClusterSchema

   .. py:method:: get_cluster_geometry(cluster_uuid: str, as_string: bool = True) -> Optional[Union[str, shapely.geometry.Polygon]]


   .. py:method:: get_cluster_h3_indices(cluster_uuid: str) -> Optional[Iterable[str]]


   .. py:method:: get_cluster_uuid(point_h3_index: str) -> Optional[str]


   .. py:method:: init_from_df(df: pandas.DataFrame, company_id: int, record_type: str, name: str = 'default') -> None


   .. py:method:: init_local(company_id: int, record_type: str, name: str = 'default') -> None


   .. py:method:: init_remote(company_id: int, record_type: str, name: str = 'default') -> None


   .. py:method:: is_empty()


   .. py:method:: plot(color: str = 'blue')


   .. py:method:: reset(df: pandas.DataFrame) -> None

      Removes all entries from the cluster schema mapping and resets the
      mapping using values from shipments DataFrame.

      :param df: Shipments DataFrame.
      :type df: pd.DataFrame


   .. py:method:: save(batch_date: str) -> None


   .. py:method:: update(df: pandas.DataFrame) -> None

      Updates existing entries in the cluster schema using values from
      shipments DataFrame. Only clusters that are not present in the existing
      mapping are set, and they have lower priority than existing clusters
      (i.e., overlaps with existing clusters are decided in favour of
      existing clusters, existing clusters are not modified in any way).

      :param df: Shipments DataFrame.
      :type df: pd.DataFrame


   .. py:attribute:: cluster_schema
      :type:  BiDict


   .. py:attribute:: company_id
      :type:  Optional[int]
      :value: None


   .. py:attribute:: created_at
      :type:  Optional[datetime.datetime]
      :value: None


   .. py:attribute:: id
      :type:  Optional[str]
      :value: None


   .. py:attribute:: name
      :type:  Optional[str]
      :value: None


   .. py:attribute:: priority
      :type:  int
      :value: 0


   .. py:attribute:: record_type
      :type:  Optional[str]
      :value: None


   .. py:attribute:: updated_at
      :type:  Optional[datetime.datetime]
      :value: None


.. py:function:: _get_clusters(locations_df: pandas.DataFrame) -> pandas.DataFrame

   Computes cluster geometries given a locations DataFrame, together with the
   number of shipments per cluster. Note that the input locations DataFrame
   should not contain unique locations (i.e., should be the output of
   `to_locations`) so that row counts reflect the number of shipments
   associated with a location.

   :param locations_df: Non-unique locations returned by `to_locations`.
   :type locations_df: pd.DataFrame

   :returns: Clusters DataFrame with `cluster_geometry_string` and `total_shipments`
             columns (and `cluster_id` as index).
   :rtype: pd.DataFrame


.. py:data:: ClusterSchemaException

.. py:data:: schema